Cache Example Fast Math Processor B Ramamurthy Processor
Cache: Example Fast. Math Processor B. Ramamurthy
Processor Characteristics � 12 stage pipeline � At peak speed, the processor can request both an instruction and a data word on every clock. � We cannot afford pipeline stalls: solution: add a cache � Cache is 16 KB, 16 -word blocks
Cache operation � Send address to the appropriate cache. The address comes from either the PC or from the ALU. � If the cache signals hit, the requested word is available on the data lines � Since there are 16 words in the desired block, we need to select the right word. � Block index field is used to select the indexed word from the 16 words in the indexed block. � If cache signals miss, we send the address to main memory and get the data from main memory and fill the cache. Data is then read again. � Lets look at the schematic of the organization: fig. 7. 9
Memory system to support cache CPU Cache Main Memory What is the bus width? How to organize the main memory?
Cache miss dynamics Assume that on a cache miss, � We need � 1 memory cycle to send address to main memory � 15 memory cycles to read DRAM memory word (assume bus width is 32 bits = 4 bytes) � 1 memory cycle to send word of data back � Total for block access: � 1+ 4 X 15 + 1 X 4 = 1 + 60 + 4 = 65 cycle � Bytes received = 1 block of cache = 4 X 4 = 16 bytes � Byte/cycle = 16/65 = 0. 25 ( too low for our fast processor!) � What is your solution? Need better bandwidth. � Increase bus width? Memory interleave? Wide memory organization? � See fig. 7 -11 �
Bus and Memory Organization � Increase memory width: double it � 1 + 2 X 15 + 2 X 1 = 1+ 30 + 2 = 33 cycles � 16/33 = 0. 5 � Memory interleaving: � 1 + 15 + 4 x 1 = 20 cycles � 16/20 = 4/5 = 0. 8 � 65 cycles penalty 33 cycles 20 cycles (not bad at all)
- Slides: 6