CSC 405 Computer Organization Cache Memory Performance Analysis

  • Slides: 7
Download presentation
CSC 405 Computer Organization Cache Memory Performance Analysis

CSC 405 Computer Organization Cache Memory Performance Analysis

The Principle of Locality For the typical program in execution the principle of locality

The Principle of Locality For the typical program in execution the principle of locality states that memory references tend to cluster in both position (spatial locality) and time (temporal locality). With high probability the next call to a word in memory will be close to the previous call. Also, words of memory that have been used recently are more likely to be used again. This figure illustrates the pattern in storage references for a typical program during execution. The horizontal axis is time and the vertical axis is memory address (page #). Notice that in any specific time interval and for a significant time duration, the memory locations being accessed are not random and they constitute a relatively small fraction of the complete program. These patterns of memory reference show the working sets of the program. Figure Ref: IBM Systems Journal, 1971.

Two-Level Memory The locality property can be exploited to improve system performance. Specifically, locality

Two-Level Memory The locality property can be exploited to improve system performance. Specifically, locality makes the effective use of a hierarchical memory system possible. The time required for the CPU to obtain a word from secondary storage (e. g. hard drive) is around 1000 times longer than is required when the word is in primary memory (RAM). The CPU can access a word in cache memory around 5 to 10 times faster than from primary memory. These relative speeds called system access times are primarily driven by bus clock rates and memory block transfer sizes. System access time is not the access time of the memory itself as quoted by chip manufacturers. The quoted memory access time for a DIMM or SIMM is the time required to move a word in memory into the memory buffer register on the DIMM or SIMM itself. CPU 500 MHz Cache 100 MHz Primary Memory 1 Mbyte/S Secondary Storage

Performance Analysis of a Two-Level Memory We will analyze the performance of a two-level

Performance Analysis of a Two-Level Memory We will analyze the performance of a two-level memory consisting of cache memory M 1 and primary memory M 2. To express the average time to access a word we must consider the speeds (CPU access times) of the two memories as well as the probability that a given reference will be in level-one memory (M 1). where TS=average (system) access time T 1=CPU access time of M 1 T 2=CPU access time of M 2 H=hit ratio (fraction of time reference is found in M 1) We must be careful in our interpretation of these terms. For example, since a miss in cache results in a block of primary memory being written into cache, we must use the time required to transfer this block as T 2. Also, we must consider the possibility that the block of memory being written over in cache must first be copied back into primary memory.

Let's work through an example: Compare the average system access time for computer with

Let's work through an example: Compare the average system access time for computer with a 450 MHz processor, a 100 MHz system bus and no cache to a system with the same bus and processor speeds but with an L 1 cache (i. e. 450 MHz internal bus to CPU) that results in a hit ratio of 0. 90. Assume a 2 n. Sec memory for cache and an 8 n. Sec memory for RAM. For the two-level memory system we have, T 1=1/450 MHz = 2 n. S T 2=1/100 MHz = 10 n. S H=0. 90 so TS=3 n. S Compared with TS=10 n. S for the system with no cache. This is over 3 times faster! In reality the improvement is not nearly so dramatic. Homework: Give an explanation for what is wrong with this analysis. (Hint: What operations are performed when there is a miss? )

Case Study: The Truth about Cache You have a Pentium class computer with the

Case Study: The Truth about Cache You have a Pentium class computer with the 430 TX cache controller chipset and 64 MBytes of SIMM RAM, and you are running the Windows 98 operating system. You find a bargain on memory SIMMs that match your current memory so you "upgrade" you system to 128 Mbytes of RAM. The result is a factor of 2 slow down in average performance. Suspecting the bargain memory as the problem you remove your old memory and replace it with just the new bargain memory. You find that your old faster performance returns. You put half your old memory back in bringing your system up to 96 Mbytes to find that your computer now runs somewhat slower (about halfway between the previous two speeds). Please explain this apparent paradox. (Claiming that the instructor has lost his mind, regardless of the validity of the statement, is not relevant in this case. )

Homework A 450 MHz Pentium with 32 KBytes L 1 cache, 128 MBytes RAM,

Homework A 450 MHz Pentium with 32 KBytes L 1 cache, 128 MBytes RAM, and a 133 MHz system bus runs a program with an average working set size of 80 KBytes. While in a working set the program has a 0. 9997 probability that the next memory request will be from this working set and a 0. 9 probability that the next memory request will be the next instruction/data value in memory (i. e. 10% of the time a request is from a random memory address in the working set). (Note: when the program changes working sets, it will begin making memory requests from the new working set with 0. 9997 probability. ) Your task. . . (1) Determine how much (if any) performance improvement could be achieved by adding a 256 KByte L 2 (access speed= 450/2 MHz) to the processor. (2) Determine what size memory blocks should be moved between cache and RAM. (3) Give an outline of a memory caching strategy that makes sense.