Seoul National University Computer Architecture Project 2 Cache











![Seoul National University Reference Cache Simulator ¢ Usage: >>. /csim [-v] -s <s> -E Seoul National University Reference Cache Simulator ¢ Usage: >>. /csim [-v] -s <s> -E](https://slidetodoc.com/presentation_image/0b219eeff977d45427a44632a1f0813e/image-12.jpg)
![Seoul National University Cache Simulation Example (1) ¢ ¢ Usage: >>. /csim [-v] -s Seoul National University Cache Simulation Example (1) ¢ ¢ Usage: >>. /csim [-v] -s](https://slidetodoc.com/presentation_image/0b219eeff977d45427a44632a1f0813e/image-13.jpg)












- Slides: 25

Seoul National University Computer Architecture Project #2 Cache Simulator 1

Seoul National University Objectives ¢ To understand cache memory § Organization Set associativity § Operation § Cache Read & Write, Hit & Miss § LRU replacement policy § Performance § Hit/miss ratio, miss penalty § ¢ To develop your own cache simulator Memory Access Pattern Cache Organization Display Option Cache Simulator Hit/Miss Performance 2

Seoul National University General Cache Organization (S, E, B) E = 2 e lines per set line If e = 1, “Direct Mapped Cache” else If s = 1, “Fully Associative Cache” else “E-Way Set Associative Cache” S = 2 s sets v valid bit tag 0 1 2 B-1 Cache size: C = S x E x B data bytes B = 2 b bytes per cache block (the data) 3

Seoul National University E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 0… 01 100 find set 4

Seoul National University E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits compare both 0… 01 100 valid? + match: yes = hit v tag 0 1 2 3 4 5 6 7 block offset 5

Seoul National University E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume that cache block size is 8 bytes Address of short int: t bits compare both 0… 01 100 valid? + match: yes = hit v tag 0 1 2 3 4 5 6 7 block offset short int (2 Bytes) is here No match : • One line in set is selected for eviction and replacement • Replacement policies: random, least recently used (LRU), … 6

Seoul National University LRU Replacement Policy ¢ Theoretically… Addres s 1 2 3 4 5 Set 1 2 3 4 5 1 2 3 4 1 2 3 ¢ Practically… 7

Seoul National University Performance ¢ ¢ (Average Access Time) = (Hit Time) + (Miss Rate) × (Miss Penalty) = (Hit Time) + [1 – (Hit Rate)] × (Miss Penalty) Example § § Suppose cache hit time is 1 cycle, Miss penalty is 100 cycles, and hit rate is 97%. Then average access time is: 1 cycle + ( 1 – 0. 97 ) × 100 cycles = 1 + 0. 03 × 100 = 4 cycles. 8

Seoul National University Requirements of the cache simulator (1) ¢ Cache simulator (hereinafter referred to CSIM) shall implement arbitrary numbers of sets and lines, and block size. § You should implement a way to provide the numbers of sets and lines, and block size as inputs to CSIM. ¢ CSIM shall a read trace file line by line and process it. § You should determine whether each memory operation is a cache hit or miss. § You should implement the LRU replacement policy ¢ CSIM shall report the result of cache simulation. § You should report these three basic results: numbers of Hits, misses, and evicts § You should be able to report the average access time of cache simulation § You should be able to report whether each memory access in trace file results in a cache hit or miss 9

Seoul National University Restrictions & Advices ¢ Implement method for input parameters. § You should implement it by argument passing. (full credit) § If you can’t, you can use standard input such as scanf(). (low credit) ¢ Evaluate only data cache performance. § Therefore, you should ignore instruction load. § You should assume that the memory accesses are aligned properly. Therefore, you can ignore requested size in trace file. § You should evaluate your CSIM with, at least, 3 different trace data. You can use one provided with this project. ¢ Calculate average access time using below assumption: § Hit time = 1 cycle, miss penalty = 100 cycles. ¢ Compile your CSIM without warnings. 10

Seoul National University How to trace memory accesses ¢ “valgrind” § GPL licensed programming tool for memory debugging, memory leak detection, and profiling. (from http: //en. wikipedia. org/wiki/Valgrind) § Usage: >> valgrind -log-fd=1 --tool=lackey -v --trace-mem=yes ls -l – Valgrind prints out memory accesses of “ls -l” on stdout, so you need to capture it by: >> valgrind -log-fd=1 --tool=lackey -v --trace-mem=yes ls -l > ls. trace § Output Format: [space]operation address, size Output Type Example Naccess [space] I 0400 d 7 d 4, 8 Instruction load All instructions 1 X L 04 f 6 b 868, 8 Data Load movl (%eax), %ebx 1 O S 7 ff 0005 c 8, 8 Data Store movl %eax, (%ebx) 1 O M 0421 c 7 f 0, 4 Data Modify incl (%ecx) 2 O 11
![Seoul National University Reference Cache Simulator Usage csim v s s E Seoul National University Reference Cache Simulator ¢ Usage: >>. /csim [-v] -s <s> -E](https://slidetodoc.com/presentation_image/0b219eeff977d45427a44632a1f0813e/image-12.jpg)
Seoul National University Reference Cache Simulator ¢ Usage: >>. /csim [-v] -s <s> -E <E> -b <b> -t <trace file> § § § -v: Optional verbose flag that displays trace info -s <s>: Number of set index bits (S = 2 s is the number of sets) -E <E>: Associativity (number of lines per set) -b <b>: Number of block bits (B = 2 b is the block size) -t <trace file>: Name of the valgrind trace to replay set line S = 2 s sets v valid bit tag 0 1 2 B-1 Cache size: C = S x E x B data bytes B = 2 b bytes per cache block (the data) 12
![Seoul National University Cache Simulation Example 1 Usage csim v s Seoul National University Cache Simulation Example (1) ¢ ¢ Usage: >>. /csim [-v] -s](https://slidetodoc.com/presentation_image/0b219eeff977d45427a44632a1f0813e/image-13.jpg)
Seoul National University Cache Simulation Example (1) ¢ ¢ Usage: >>. /csim [-v] -s <s> -E <E> -b <b> -t <trace file> Example: >>. /csim -v -s 4 -E 1 -b 4 -t. /traces/yi. trace § Number of set index bits = 4 (16 sets) § Associativity = 1 (Direct Mapped Cache) § Number of block bits = 4 (16 blocks in a cache line) ¢ Output L 10, 1 miss M 20, 1 miss hit …. hits: 4 misses: 5 eviction: 3 13

Seoul National University Cache Simulation Example (2) ¢ Example memory access pattern Oper. Address Byte S V 0 I 1 I 2 I 3 I 4 I 5 I 6 I Load 0 x 10 1 Modif y 0 x 20 1 Load 0 x 22 1 Store 0 x 18 1 Load 0 x 110 1 7 I Load 0 x 210 1 8 I Modif y 0 x 12 1 9 I A I B I C I D I E I F I Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 14

Seoul National University Cache Simulation Example (3) R/W Address Byte S V 0 I 1 V 2 I 3 I 4 I 5 I 6 I Load 0 x 10 1 Modif y 0 x 20 1 Load 0 x 22 1 Store 0 x 18 1 Load 0 x 110 1 7 I Load 0 x 210 1 8 I Modif y Hit 0 x 12 1 9 I A I B I C I D I E I F I Miss 0 Evict 1 0 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 x 0 15

Seoul National University Cache Simulation Example (4) R/W Address Byte S V 0 I 1 V 0 x 0 2 V 0 x 0 3 I 4 I 5 I 6 I Load 0 x 10 1 Modif y 0 x 20 1 Load 0 x 22 1 Store 0 x 18 1 Load 0 x 110 1 7 I Load 0 x 210 1 8 I Modif y Hit 0 x 12 1 9 I A I B I C I D I E I F I Miss 1 Evict 2 0 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 16

Seoul National University Cache Simulation Example (5) R/W Address Byte S V 0 I 1 V 0 x 0 2 V 0 x 0 3 I 4 I 5 I 6 I Load 0 x 10 1 Modif y 0 x 20 1 Load 0 x 22 1 Store 0 x 18 1 Load 0 x 110 1 7 I Load 0 x 210 1 8 I Modif y Hit 0 x 12 1 9 I A I B I C I D I E I F I Miss 2 Evict 2 0 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 17

Seoul National University Cache Simulation Example (6) R/W Address Byte S V 0 I 1 V 0 x 0 2 V 0 x 0 3 I 4 I 5 I 6 I Load 0 x 10 1 Modif y 0 x 20 1 Load 0 x 22 1 Store 0 x 18 1 Load 0 x 110 1 7 I Load 0 x 210 1 8 I Modif y Hit 0 x 12 1 9 I A I B I C I D I E I F I Miss 3 Evict 2 0 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 18

Seoul National University Cache Simulation Example (7) R/W Address Byte S V 0 I 1 V 0 x 1 2 V 0 x 0 3 I 4 I 5 I 6 I Load 0 x 10 1 Modif y 0 x 20 1 Load 0 x 22 1 Store 0 x 18 1 Load 0 x 110 1 7 I Load 0 x 210 1 8 I Modif y Hit 0 x 12 1 9 I A I B I C I D I E I F I Miss 3 Evict 3 1 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 19

Seoul National University Cache Simulation Example (8) R/W Address Byte S V 0 I 1 V 0 x 2 2 V 0 x 0 3 I 4 I 5 I 6 I Load 0 x 10 1 Modif y 0 x 20 1 Load 0 x 22 1 Store 0 x 18 1 Load 0 x 110 1 7 I Load 0 x 210 1 8 I Modif y Hit 0 x 12 1 9 I A I B I C I D I E I F I Miss 3 Evict 4 2 Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 20

Seoul National University Cache Simulation Example (9) R/W Address Byte S V 0 I 1 V 0 x 0 2 V 0 x 0 3 I 4 I 5 I 6 I Load 0 x 10 1 Modif y 0 x 20 1 Load 0 x 22 1 Store 0 x 18 1 Load 0 x 110 1 7 I Load 0 x 210 1 8 I Modif y Hit 0 x 12 1 9 I A I B I C I D I E I F I Miss 4 Evict 5 3 Average Access Time = 1 + (5 / 9) * 100 = 56. 5 Cycle Tag 0 1 2 3 4 5 6 7 8 9 A B C D E F 21


Seoul National University 보고서 작성요령 (2) ¢ Design Testing 아래의 내용을 포함할 것 § 성능 평가 § Coding 각각의 Cache 구조 (direct mapped, E-way set associative 및 fully associative cache)별로 성능을 측정하고 각각을 비교할 것 23

Seoul National University 평가기준 Title CSIM Pts. 70 Description 10 Warning: 각 -0. 5 pt. / Error: 각 -1 pt. Parameter Input 10 Argument Passing: 10 pts. , Other methods: 5 pts. Cache Operation 성능 평가 주석 제출지연 30 Details 제출 Cache Organization 보고서 Pts. 5 20 5 10 설계 요구사항 7 구현 7 시험 8 성능 평가 8 매 1일 당 -5 Dynamic allocation 사용 시: 5 pts. - 배열 사용 시: 2 pts. Hit/miss의 정확한 처리: 10 pts. Replacement policy (LRU): 5 pts - implementing random replacement: 3 pts. 각각의 Memory Access에 대한 결과 (Hit/Miss) 시현: 4 pts. - 결과 시현 여부를 선택할 수 있는 옵션 제공: 1 pts. 정확한 Average Access Time의 제공 최대한 각각의 라인에 주석을 제공 제출 기한 1주일까지 제출 가능 24
