Memory bandwidth


Memory bandwidth is the rate at which data can be read from or stored into a semiconductor memory by a processor. Memory bandwidth is usually expressed in units of bytes/second, though this can vary for systems with natural data sizes that are not a multiple of the commonly used 8-bit bytes.
Memory bandwidth that is advertised for a given memory or system is usually the maximum theoretical bandwidth. In practice the observed memory bandwidth will be less than the advertised bandwidth. A variety of computer benchmarks exist to measure sustained memory bandwidth using a variety of access patterns. These are intended to provide insight into the memory bandwidth that a system should sustain on various classes of real applications.

Measurement conventions

There are three different conventions for defining the quantity of data transferred in the numerator of "bytes/second":
  1. The bcopy convention: counts the amount of data copied from one location in memory to another location per unit time. For example, copying 1 million bytes from one location in memory to another location in memory in one second would be counted as 1 million bytes per second. The bcopy convention is self-consistent, but is not easily extended to cover cases with more complex access patterns, for example three reads and one write.
  2. The STREAM convention: sums the amount of data that the application code explicitly reads plus the amount of data that the application code explicitly writes. Using the previous 1 million byte copy example, the STREAM bandwidth would be counted as 1 million bytes read plus 1 million bytes written in one second, for a total of 2 million bytes per second. The STREAM convention is most directly tied to the user code, but may not count all the data traffic that the hardware is actually required to perform.
  3. The hardware convention: counts the actual amount of data read or written by the hardware, whether the data motion was explicitly requested by the user code or not. Using the same 1 million byte copy example, the hardware bandwidth on computer systems with a write allocate cache policy would include an additional 1 million bytes of traffic because the hardware reads the target array from memory into cache before performing the stores. This gives a total of 3 million bytes per second actually transferred by the hardware. The hardware convention is most directly tied to the hardware, but may not represent the minimum amount of data traffic required to implement the user's code.

    Bandwidth computation and nomenclature

The nomenclature differs across memory technologies, but for commodity DDR SDRAM, DDR2 SDRAM, and DDR3 SDRAM memory, the total bandwidth is the product of:
For example, a computer with dual-channel memory and one DDR2-800 module per channel running at 400 MHz would have a theoretical maximum memory bandwidth of:
This theoretical maximum memory bandwidth is referred to as the "burst rate," which may not be sustainable.
The naming convention for DDR, DDR2 and DDR3 modules specifies either a maximum speed or a maximum bandwidth. The speed rating is not the maximum clock speed, but twice that. The specified bandwidth is the maximum megabytes transferred per second using a 64-bit width. In a dual-channel mode configuration, this is effectively a 128-bit width. Thus, the memory configuration in the example can be simplified as: two DDR2-800 modules running in dual-channel mode.
Two memory interfaces per module is a common configuration for PC system memory, but single-channel configurations are common in older, low-end, or low-power devices. Some personal computers and most modern graphics cards use more than two memory interfaces. High-performance graphics cards running many interfaces in parallel can attain very high total memory bus width.

ECC bits

In systems with error-correcting memory, the additional width of the interfaces is not counted in bandwidth specifications because the extra bits are unavailable to store user data. ECC bits are better thought of as part of the memory hardware rather than as information stored in that hardware.