The most influential implementations of computational RAM came from The Berkeley IRAM Project. Vector IRAM combines DRAM with a vector processor integrated on the same chip. Reconfigurable Architecture DRAM is DRAM with reconfigurable computingFPGA logic elements integrated on the same chip. SimpleScalar simulations show that RADram can give orders of magnitude better performance on some problems than traditional DRAM. Some embarrassingly parallelcomputational problems are already limited by the von Neumann bottleneck between the CPU and the DRAM. Some researchers expect that, for the same total cost, a machine built from computational RAM will run orders of magnitude faster than a traditional general-purpose computer on these kinds of problems. As of 2011, the "DRAM process" and the "CPU process" is distinct enough that there are three approaches to computational RAM:
starting with a CPU-optimized process and a device that uses lots of embedded SRAM, add an additional process step to allow replacing the embedded SRAM with embedded DRAM, giving ≈3x area savings on the SRAM areas.
starting with a system with a separate CPU chip and DRAM chip, add small amounts of "coprocessor" computational ability to the DRAM, working within the limits of the DRAM process and adding only small amounts of area to the DRAM, to do things that would otherwise be slowed down by the narrow bottleneck between CPU and DRAM: zero-fill selected areas of memory, copy large blocks of data from one location to another, find where a given byte occurs in some block of data, etc. The resulting system—the unchanged CPU chip, and "smart DRAM" chip—is at least as fast as the original system, and potentially slightly lower in cost. The cost of the small amount of extra area is expected to be more than paid back in savings in expensive test time, since there is now enough computational capability on a "smart DRAM" for a wafer full of DRAM to do most testing internally in parallel, rather than the traditional approach of fully testing one DRAM chip at a time with an expensive external automatic test equipment.
starting with a DRAM-optimized process, tweak the process to make it slightly more like the "CPU process", and build a general-purpose CPU within the limits of that process.
Some CPUs designed to be built on a DRAM process technology include The Berkeley IRAM Project, TOMI Technology and the AT&T DSP1. Because a memory bus to off-chip memory has many times the capacitance of an on-chip memory bus, a system with separate DRAM and CPU chips can have several times the energy consumption of an IRAM system with the same computer performance. Because computational DRAM is expected to run hotter than traditional DRAM, and increased chip temperatures result in faster charge leakage from the DRAM storage cells, computational DRAM is expected to require more frequent DRAM refresh.
Processor-in-/near-memory
A processor-in-/near-memory refers to a computer processortightly coupled to memory, generally on the same silicon chip. The chief goal of merging the processing and memory components in this way is to reduce memory latency and increase bandwidth. Alternatively reducing the distance that data needs to be moved reduces the power requirements of a system. Much of the complexity in current processors stems from strategies to deal with avoiding memory stalls.