# BLAZAR BE3-BURST Accelerator Engine Intelligent In Memory Computing

### **BANDWIDTH ENGINE (BE) INTRODUCTION**

The **BLAZAR Family of Accelerator Engines** support high bandwidth, fast random memory access rates and *embedded* <u>In</u> <u>Memory Functions (IMF)</u> that solve critical memory access challenges for memory bottlenecked applications like network search, statistics, buffering, security, firewall, 8k video, anomaly detect, genomics, ML random forest of trees, graph/tree/list walking, traffic monitoring.

The **Bandwidth Engine 3 BURST (BE3-BURST)** combines the high speed serial memory with in memory **Bandwidth functions** called BURST. These are sequential read and write for Data Movement that nearly doubles the memory access bandwidth.

Applications benefits...

MoSys

- FPGA Acceleration for Xilinx and Intel
- Replaces up to 8 QDR/RLDRAM memory devices
- Memory architecture allows up to 32 simultaneous accesses
- Lowers latency up to 4x and increases available access rates 6x by avoiding memory bottlenecks
- Accelerates FPGA application by providing fast, efficient, single function calls for burst, multi-read or multi-Write operations
- Bandwidth IMFs BURST
  - Sequential read and write functions for Data Movement nearly doubles bandwidth.
  - BE3 includes all of the BE2 Burst IMF and additional functions.
- The devices support application acceleration for aggregate throughput rates ranging from 70Gb/s to over 810Gb/s per device
- Facilitates a "Software Define-Hardware Acceleration" system architecture

# PRODUCT BRIEF

## BLAZAR Programmable HyperSpeed

### **KEY FEATURES / PRODUCT OPTIONS**

- High Bandwidth, low pin count serial interface
  - Highly efficient reliable transport command and data protocol optimized for 90% efficiency
- Eases board layout and signal integrity, no trace length matching required, operates over connectors
- 1Gb SRAM (16M x 72b)
- High access rate SRAM class memory
  - Up to 6.5 Billion transactions/sec
- High cycle rate memory
  - 2.7ns tRC
- Low latency: 40ns external (pin to pin)
- In Memory Bandwidth Functions
  - BURST sequential read and write functions for Data Movement nearly doubles bandwidth
  - Burst length: 1, 2, 4, 8 x 72b
  - Reduction of I/O up to 7X
- Highest Single Chip Bandwidth up to 717 Gbps throughput

#### **APPLICATIONS FOCUS**

- High bandwidth data access application where low latency and Movement of Data is a critical requirement.
- Applications needed large SRAMs.
- FPGA Acceleration for Xilinx and Intel

## **MoSys ACCELERATOR ENGINE Elements**

MoSys Engines have a Unique Memory Architecture that can replace SRAM/RLDRAM memories and <u>embeds In Memory</u> <u>Functions (IMF)</u> that execute many times faster. A single embedded function can replace several traditional memory accesses.



## MoSys Bandwidth Engine BURST (BE3) Architecture

Superior

Random

Access

Memory







Fixed

BURST

Functions



## Fixed In Memory BANDWIDTH Functions - BURST

The BURST Functions are focused on DATA MOVEMENT. They accelerate getting data in and out of the memory faster and more efficiently by reducing the number of commands. There are 12 flexible Burst functions.

The BURST Multi-Read/Multi-Write In-Memory Functions can nearly doubling the amount of data that can be moved with that same bandwidth. A single Burst function can result in up to 8 sequential reads or writes.

And, the Accelerator Engine can do several BURST Functions simultaneously! Further increasing system performance.

#### High speed serial I/O

- GCI serial I/O versions of 10, 12.5, 15 and 25 Gbps for high bandwidth (up to 717 Gbps)
- Device can operate with a minimum of 4 lanes.
- Has two, full duplex 8 lane ports that operate independently
- Reduces number of signal pins over traditional memories, increases signal integrity allowing longer board traces to ease board signal routing
- **Operates across connectors**

Main Memory 1Gb

- 4 partitions/128
  - banks 16 READ & 16
  - WRITE ports
- 2.7 ns tRC Allows parallel partition &

Cores.

- Bank execution
- Up to 6B rd/s + 6B wr/s simultaneously

## BE3 - BURST Rx SerDe Rx SerDe GCI-B GCI-A Memory Controller (BURST) and Schedule $\downarrow \downarrow \downarrow \downarrow \downarrow$ $\downarrow\downarrow\downarrow\downarrow\downarrow$ JJ. **Results Reordering** GCI-A GCI-B Tx SerDes Tx SerDes

#### Memory/Function Controller

- Resolves localized bank conflicts
- Directs read/write function to
- selected bank of memory
- Manages the sequence of operations to execute a RMW
  - 4-8x reduction in RMW accesses
  - Insures no stale data (mutec)
- Controls parallel function execution
- Four Domain levels for function execution priority setting
- Multiple scheduling domains minimize blocking short latency operation by long latency operations

#### **Result Reordering**

Reorder buffers insure that results are returned to the output of the submitted input port and tagged with priority Domain if used

## Software Define - Hardware Accelerated

#### www.mosys.com LEARN MORE: https://mosys.com/blazar-family-of-accelerator-engines/

| Software and System<br>Architects can improve                                                                                                                                                                         |         |         |                                                                                                                                          | Package        | Interface |          |            | Memory |     | Access Rate | Commands /Functions |                |     |           |               |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|---------|------------------------------------------------------------------------------------------------------------------------------------------|----------------|-----------|----------|------------|--------|-----|-------------|---------------------|----------------|-----|-----------|---------------|
|                                                                                                                                                                                                                       |         | Part    |                                                                                                                                          | Pkg Size       | Lanes     | Rate     | e per Lane |        | BW  | tRC         | Size                | Billion        |     |           | Custom        |
| application performance by<br>accelerating the memory<br>access and utilizing the In<br>Memory Compute Functions.                                                                                                     |         | Number  | Description                                                                                                                              | mm             | Tx/Rx     | 10-12.5G | 15G        | 25-28G | Gb  | ns          | Gb                  | Transaction/s  | R/W | RMW / ALU | 32 RISC Cores |
|                                                                                                                                                                                                                       | BURST   | IMSR620 | Bandwidth Engine 2 Burst<br>Serial 0.5Gb High Access Memory                                                                              | FCBGA<br>19x19 | 16        | ~        |            |        | 320 | 3.2         | 0.5                 | 3.3            | ~   |           |               |
|                                                                                                                                                                                                                       |         | MSR630  | Bandwidth Engine 3 Burst<br>Serial 1Gb High Access Memory                                                                                | FCBGA<br>27x27 | 16        | ~        | ~          | ~      | 717 | 2.7         | 1                   | 6.5            | ~   |           |               |
| The different Accelerator<br>Engine devices allow<br>application tuning to achieve<br>increasing levels of<br>performance up to our most<br>powerful engine the<br>Programable HyperSpeed<br>Engine with 32 Processor |         |         |                                                                                                                                          |                |           |          |            |        |     |             |                     |                |     |           |               |
|                                                                                                                                                                                                                       | MMA     | MSR820  | Bandwidth Engine 2 RMW<br>Serial 0.5Gb High Access Memory<br>with ALU for RMW functiions                                                 | FCBGA<br>19x19 | 16        | ~        |            |        | 320 | 3.2         | 0.5                 | 3.3            | √   | ~         |               |
|                                                                                                                                                                                                                       |         | MSR830  | Bandwidth Engine 3 RMW<br>Serial 1Gb High Access Memory with<br>ALU for RMW functiions                                                   | FCBGA<br>27x27 | 16        | ~        | ~          | 1      | 717 | 2.7         | 1                   | 6.5            | ~   | 1         |               |
|                                                                                                                                                                                                                       |         |         |                                                                                                                                          |                |           |          |            |        |     |             |                     |                |     |           |               |
|                                                                                                                                                                                                                       | Program | MSPS30  | Programmable Accelerator Engine<br>Serial Interface, 1Gb Memory, 32<br>RISC Processor cores for custom<br>algorithms, compute, functions | FCBGA<br>27x27 | 16        | ~        | >          | ~      | 717 | 2.7         | 1                   | 24<br>Internal | 1   | ~         | ~             |
| Cores.                                                                                                                                                                                                                |         |         |                                                                                                                                          | -              |           |          |            |        |     |             |                     |                |     |           |               |

MoSys is a registered trademark of MoSys, Inc. in the US and/or other countries. Blazar, Bandwidth Engine, HyperSpeed Engine, IC Spotlight, LineSpeed and the MoSys logo are trademarks of MoSys, Downloaded from Arrow.com. heir respective owners.



2309 Bering Drive, San Jose, CA 95131 Tel: 408-418-7500 Fax: 408-418-7501 www.mosys.com