Problem 1 [7 points]: - University of Illinois at Urbana

 CS433: Computer Systems Organization Spring 2015 Homework 4 Assigned: Mar 12 Due in class: Apr 07 Total points: 42 for undergraduate students, 54 for graduate students. Instructions: Please write your name, NetID and an alias on your homework submissions for posting grades (If you don’t want your grades posted, then don’t write an alias). Homework assignments are due in class on the date posted. Problem 1 [7 points]: Consider a 16MB 4­way set­associative write­back cache with 32 byte line size and a 32 bit, byte addressable address. Assume a random replacement policy and a single core system. Part A [2 points]:​ Which ​bits​ of the address are used for the cache index? Part B [2 points]:​ Which ​bits​ of the address are used for the cache tag? Part C [3 points]: ​How many bits of total storage does this cache need besides the 16MB for data? Remember to include any state bits needed. Problem 2 [15 Points] A 4 entry victim cache for a 4KB direct mapped cache removes 80% of the ​conflict misses ​in a program. Without the victim cache, the miss rate is 0.064 and 67% of these misses are conflict misses. What is the ​
speedup ​in the AMAT (average memory access time) due to the victim cache? Assume a hit in the main cache takes 1 cycle. For a miss in the main cache that hits in the victim cache, assume an additional penalty of 1 cycle to access the victim cache. For a miss in the main and victim caches, assume a further penalty of 48 cycles to get the data from memory​. ​Assume a simple, single­issue, 5­stage pipeline, in­order processor that blocks on every read and write until it completes. University of Illinois at Urbana Champaign
Computer Science Department
CS433: Computer Systems Organization Spring 2015 Homework 4 Assigned: Mar 12 Due in class: Apr 07 Problem 3 ­ [15 points] Given two systems as follows, using the 5 stage pipeline. System 1 ● 16KB instructions cache ○ Miss rate: 0.64% ● 16KB data cache ○ Miss rate:6.47% ● Hit time for both: 1 cycle ● Miss penalty for both: 50 cycles. System 2 ● 32KB unified cache ○ Miss rate: 1.99% ● Hit time for both: 1 cycle ● Miss penalty for both: 50 cycles. Assume that in the unified cache, instructions which perform data transfer take an extra cycle. Assume that 75% of the memory access time is for instructions. Part A[3 points] ​­ Explain why we assume that in the unified cache instructions which perform data transfer take an extra cycle. Part B[5 points] ­ ​If you had to choose one of these systems based on the miss rate, which one would you choose? Part C[5 points] ­ ​If you had to choose one of these systems based on the AMAT, which one would you choose? Part D[2 points] ­ ​Which do you consider a better metric for choosing the system, the one in B or the one in C? University of Illinois at Urbana Champaign
Computer Science Department
CS433: Computer Systems Organization Spring 2015 Homework 4 Assigned: Mar 12 Due in class: Apr 07 Problem 4 [5 points] Suppose a processor with the following features: Clock Frequency 2GHz CPI 3 cycles Memory reference per instruction 1.5 Main memory access penalty 125 ns L1 miss rate 7% Assume a unified cache but with ​no extra cycle​ for simultaneous data and instruction access Part A[1 point] :​ Calculate the CPI of the processor Part B[2 points]:​ A L2 cache is added, with access time of 10 cycles and a miss rate of 45%. Recalculate the CPI for this case. Part C[2 points]:​ A L3 cache is added to the machine of Part B, with access time of 30 cycles and miss rate of 40%. Recalculate the CPI for this machine. University of Illinois at Urbana Champaign
Computer Science Department
CS433: Computer Systems Organization Spring 2015 Homework 4 Assigned: Mar 12 Due in class: Apr 07 Only graduate students should solve next problem Problem 5 [12 points] Consider a computer with an in­order CPU, and with a data cache block size of 64 bytes (16 words) and a 32­bit wide bus to the memory. The memory takes 10 cycles to supply the first word and 2 cycles per word to supply the rest of the block. The cache is nonblocking, and it can support any number of outstanding misses. The memory can service multiple requests simultaneously if required. This cache and memory system implement a “requested Word First and Early Restart” policy, and the bus delivers the block data in “cyclic order” starting with the requested word. Cyclic order means that if the requested word is the 5th in a block of size 16 words, then the order in which the words in the block are supplied is 5, 6, 7 ... 16, 1, 2, 3, 4. Part A [5 points] ​Consider the following code fragment, which operates on an integer array A which is block­aligned (that is A[0] is located at the starting of a cache block in memory): 1
2
3
for (i=11; i<100; i+=16) { A[i] *= 2; } Suppose that the cache is big enough so that there are only compulsory misses. Further, statement in line 1 takes 4 cycles to execute, and statement in line 2 takes 4 cycles to execute in addition to any miss latency. Assume no overlap in the execution of these statements. Initially, the array A is not present in the cache, so any initial accesses to A cause misses in the cache.
What is the running time of this loop with the “​Requested Word First and Early restart​” policy? Part B [5 points] ​How many cycles would the above loop take to run in a system with just “Early Restart” (i.e. the block is fetched in normal order, but the program is started early at arrival of requested word). Part C [2 points]​ How many cycles would the above loop take to run in a system with the base policy (i.e. normal fetch and restart)? University of Illinois at Urbana Champaign
Computer Science Department