CS274: Computer Architecture - The Memory Hierarchy: Cache Design
Activity Goals
The goals of this activity are:- To design a memory architecture that does not starve the CPU despite the memory hierarchy
- To exploit cache design using the principles of locality
Supplemental Reading
Feel free to visit these resources for supplemental background reading material.- Cache Coherence Across Cores: False Cache Sharing
- The Principle of Locality
- Cache Placement Policies
The Activity
Directions
Consider the activity models and answer the questions provided. First reflect on these questions on your own briefly, before discussing and comparing your thoughts with your group. Appoint one member of your group to discuss your findings with the class, and the rest of the group should help that member prepare their response. Answer each question individually from the activity, and compare with your group to prepare for our whole-class discussion. After class, think about the questions in the reflective prompt and respond to those individually in your notebook. Report out on areas of disagreement or items for which you and your group identified alternative approaches. Write down and report out questions you encountered along the way for group discussion.Model 1: The Principle of Locality
Questions
- There are two principles of locality: Temporal Locality (referring to a variable or instruction repeatedly in a short period of time) and Spatial Locality (referring to nearby memory addresses following a particular access). Identify all the examples of each in the code example above.
- What would happen to our locality if we switched the two for loop lines? This is called switching from a row-major order to a column-major order.
- What would happen to our locality if NandMbecame very large?
Model 2: The Memory Hierarchy
 
        Questions
- What do you think processor registers are made of? Why might computer memory and disk be made of a different material?
- Since cache (and, for that matter, processor registers) is so much smaller than primary memory, how can we make efficient use of the space given the principle of locality? For example, should we randomly store data in cache, or take a more strategic approach?
- What do you think stride refers to?
Model 3: Exploiting Temporal Locality with the Direct Mapped Cache
 
 
        Questions
- Which address bits are used to determine which block a word maps to?
- Given a cache with 1024 entries, how many bits would determine the block number (a.k.a. row, or index)?
- What is the purpose of the tag bits? Why is it necessary to store them?
- Why do you think the last two bits of the address are ignored?
Model 4: Direct Mapped Cache Example
| Index | Valid | Tag | Data | 
|---|---|---|---|
| 000 | |||
| 001 | |||
| 010 | |||
| 011 | |||
| 100 | |||
| 101 | |||
| 110 | |||
| 111 | |||
| Address | Binary | Cache Block | Hit/Miss? | 
| 22 | 10110 | 110 | |
| 26 | 11010 | 010 | |
| 22 | 10110 | 110 | |
| 26 | 11010 | 010 | |
| 16 | 10000 | 000 | |
| 3 | 00011 | 011 | |
| 16 | 10000 | 000 | |
| 18 | 10010 | 010 | |
Questions
- Fill in the table above, assuming an 8-word direct mapped cache.
- How many hits and misses were there?
- Why were these cache replacements particularly unfortunate?
- How could we increase the duration of temporal locality in this cache? That is, what could we do to enable these words to remain in cache longer?
Model 5: Write Policies to Avoid Stale Data
Questions
- What is the benefit and drawback of writing data to both cache and to main memory (and every layer in between) whenever a write occurs? This is known as a "write-through" strategy.
- What would need to change if a write-back strategy was used instead, in which data was only written to the top layer of cache. What would we need to do to ensure this data is not lost, and when would we need to copy it to the next lower layer?
Model 6: Adding Spatial Locality Multiword Blocks
 
        Questions
- What is different about this design?
- How does this approach enable spatial locality?
- To what block and index does address 67 map?
- What other words would be loaded into this block?
Model 7: Mitigating Cache Conflicts with Associative Caches
 
        Questions
- Draw a 1 MB cache that is 4-way set associative, with 8 words per block.
- Map address 37 to this cache, and indicate which other addresses would be loaded.
- Show the tag and index for address 37.
- Does a fully associative cache need a block offset, an index, and/or a tag? Why or why not?
Model 8: Cache Replacement Strategies
Draw a direct mapped, two-way set associative, and fully associative cache with 4 words total.
        Questions
- Map these addresses and count the number of hits and misses: 0, 8, 0, 6, 8.
- How does your hit rate change for 8 or 16 word caches?
- When you must replace a block, how can you decide which set to replace? What is your rationale for doing so?
Model 9: Reducing the Miss Penalty with Multi-Level Caches
Suppose you have a base CPI of 1 (assuming L1 cache hits) and a 5 GHz CPU clock (0.2 ns per CPU cycle), 100 ns main memory access time, a 2% miss rate at L1, a 0.5% miss rate at L2.
        Questions
- What is your CPI with and without your secondary cache?
