Cache prefetching

Cache prefetching is a technique used by computer processors to boost execution performance by fetching instructions or data from their original storage in slower memory to a faster local memory before it is actually needed. Most modern computer processors have fast and local cache memory in which prefetched data is held until it is required. The source for the prefetch operation is usually main memory. Because of their design, accessing cache memories is typically much faster than accessing main memory, so prefetching data and then accessing it from caches is usually many orders of magnitude faster than accessing it directly from main memory. Prefetching can be done with non-blocking cache control instructions.

Data vs. instruction cache prefetching

Cache prefetching can either fetch data or instructions into cache.

Data prefetching fetches data before it is needed. Because data access patterns show less regularity than instruction patterns, accurate data prefetching is generally more challenging than instruction prefetching.
Instruction prefetching fetches instructions before they need to be executed. The first mainstream microprocessors to use some form of instruction prefetch were the Intel 8086 and the Motorola 68000. In recent years, all high-performance processors use prefetching techniques.
Hardware vs. software cache prefetching

Cache prefetching can be accomplished either by hardware or by software.

Hardware based prefetching is typically accomplished by having a dedicated hardware mechanism in the processor that watches the stream of instructions or data being requested by the executing program, recognizes the next few elements that the program might need based on this stream and prefetches into the processor's cache.
Software based prefetching is typically accomplished by having the compiler analyze the code and insert additional "prefetch" instructions in the program during compilation itself.
Methods of hardware prefetching

Stream buffers

Stream buffers were developed based on the concept of "one block lookahead scheme" proposed by Alan Jay Smith.
Stream buffers are one of the most common hardware based prefetching techniques in use. This technique was originally proposed by Norman Jouppi in 1990 and many variations of this method have been developed since. The basic idea is that the cache miss address are fetched into a separate buffer of depth. This buffer is called a stream buffer and is separate from cache. The processor then consumes data/instructions from the stream buffer if the address associated with the prefetched blocks match the requested address generated by the program executing on the processor. The figure below illustrates this setup:

Whenever the prefetch mechanism detects a miss on a memory block, say A, it allocates a stream to begin prefetching successive blocks from the missed block onward. If the stream buffer can hold 4 blocks, then we would prefetch A+1, A+2, A+3, A+4 and hold those in the allocated stream buffer. If the processor consumes A+1 next, then it shall be moved "up" from the stream buffer to the processor's cache. The first entry of the stream buffer would now be A+2 and so on. This pattern of prefetching successive blocks is called Sequential Prefetching. It is mainly used when contiguous locations are to be prefetched. For example, it is used when prefetching instructions.
This mechanism can be scaled up by adding multiple such 'stream buffers' - each of which would maintain a separate prefetch stream. For each new miss, there would be a new stream buffer allocated and it would operate in a similar way as described above.
The ideal depth of the stream buffer is something that is subject to experimentation against various benchmarks and depends on the rest of the microarchitecture involved.

Another pattern of prefetching instructions is to prefetch addresses that are addresses ahead in the sequence. It is mainly used when the consecutive blocks that are to be prefetched are addresses apart. This is termed as Strided Prefetching.

Methods of software prefetching

Compiler directed prefetching

Compiler directed prefetching is widely used within loops with a large number of iterations. In this technique, the compiler predicts future cache misses and inserts a prefetch instruction based on the miss penalty and execution time of the instructions.
These prefetches are non-blocking memory operations, i.e. these memory accesses do not interfere with actual memory accesses. They do not change the state of the processor or cause page faults.
One main advantage of software prefetching is that it reduces the number of compulsory cache misses.
The following example shows the how a prefetch instruction will be added into a code to improve cache performance.
Consider a for loop as shown below:
for
At each iteration, the i^th element of the array "array1" is accessed. Therefore, we can prefetch the elements that are going to be accessed in future iterations by inserting a "prefetch" instruction as shown below:
for
Here, the prefetch stride, depends on two factors, the cache miss penalty and the time it takes to execute a single iteration of the for loop. For instance, if one iteration of the loop takes 7 cycles to execute, and the cache miss penalty is 49 cycles then we should have - which means that we prefetch 7 elements ahead. With the first iteration, i will be 0, so we prefetch the 7th element. Now, with this arrangement, the first 7 accesses will still be misses.

Comparison of hardware and software prefetching

While software prefetching requires programmer or compiler intervention, hardware prefetching requires special hardware mechanisms.
Software prefetching works well only with loops where there is regular array access as the programmer has to hand code the prefetch instructions. Whereas, hardware prefetchers work dynamically based on the program's behavior at runtime.
Hardware prefetching also has less CPU overhead when compared to software prefetching.
Metrics of cache prefetching

There are three main metrics to judge cache prefetching

Coverage

Coverage is the fraction of total misses that are eliminated because of prefetching, i.e.
where,

Accuracy

Accuracy is the fraction of total prefetches that were useful - i.e. the ratio of the number of memory addresses prefetched were actually referenced by the program to the total prefetches done.
While it appears that having perfect accuracy might imply that there are no misses, this is not the case. The prefetches themselves might result in new misses if the prefetched blocks are placed directly into the cache. Although these may be a small fraction of the total number of misses we might see without any prefetching, this is a non-zero number of misses.

Timeliness

The qualitative definition of timeliness is how early a block is prefetched versus when it is actually referenced. An example to further explain timeliness is as follows :
Consider a for loop where each iteration takes 3 cycles to execute and the 'prefetch' operation takes 12 cycles. This implies that for the prefetched data to be useful, we must start the prefetch iterations prior to its usage to maintain timeliness.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...