ECE232 Lab Project 3 Sample Report MONITORING CACHE BEHAVIORS USING SPIM-CACHE 1. Your assembly code program, with each line of the program commented Please find a sample assembly code in the last page. 2. The symmetric check result. Please show a screen shot of the first two row of resultant values in the data memory. Also please show the counter value. The count value is 0x64. 3. Record the data cache miss rates in the following table for 16 data cache configurations. Config Cache Block Mapping Writing Replacement Miss Size Size Policy Algorithm Rate 1 256 B 8B Direct Mapping 41.7% 2 256 B 8B 2-way Set 24.7% 3 256 B 8B 4-way Set 31.5% 4 256 B 8B Fully 24.8% 5 512 B 8B Direct Mapping 30.2% 6 512 B 8B 2-way Set 24.4% 7 512 B 8B 4-way Set 24.4% 8 512B 8B Fully 9 128 B 16 B 4-way Set Write through- LRU 24.7% 28.4% 10 256 B 16 B 4-way Set Allocate 11 512B 16B 4-way Set 12.9% 12 1024B 16 B 4-way Set 11.9% 13 1024B 4B Direct Mapping 47.6% 14 1024B 8B Direct Mapping 24.0% 15 1024B 16B Direct Mapping 12.6% 16 512 B 16 B 4-way Set Write Back- 25.2% LRU 12.9% Allocate Miss rate comparison of Configuration 1 and 2 Configuration 2 has lower miss rate since extra way can provide alternative block for holding the data when conflicts happen. For the data access pattern in this lab, this could avoid evicting useful data (data would be used in the near future) from the cache. Although data blocks are more likely mapped into the same set in a two way set associative cache, the alternative way can avoid replacement of useful data. Thus, it gives lower miss rate. Miss rate comparison of Configuration 2 and 3 Configuration 2 has lower miss rate. On the one hand, increasing the number of ways results in less number of sets. Thus, it is more likely that data blocks could be mapped into the same set (increase capacity misses). On the other hand, it is less likely that a data block is replaced by another block if there are more ways. This is a tradeoff between the number of cache sets and the number of cache ways. In our case, although there are more available blocks in one set for 4 way set associative cache, we observer more conflict and capacity misses in configuration 4. So configuration 2 gives us better performance in terms of miss rate. For large data size, increase in associatively generally results in lower miss rate . We have just a 17*17 matrix. Miss rate comparison of Configuration 6 and 7 These two configurations report almost the same miss rate. Using the same argument a tradeoff should be considered between the number of sets and ways. Due to larger cache size, we observe that conflict and capacity misses are reduced to such a low level that it makes no difference for both configurations. The miss rate depends on not only the configuration of the cache but also the data access pattern. So we should perform extensive benchmark tests to find out what is the optimal cache configuration for a specific application. Comparison of Configuration 9, 10, 11 and 12 The cache miss rate decreases as the cache size increases. Configuration 12 has the lowest miss rate among these four configurations. Since more cache blocks (or cache sets) are available to store data in a bigger cache, the conflict and capacity misses are reduced and accesses are more likely to get hits. So under the same configuration, larger cache size normally results in lower miss rate. Comparison of Configuration 13,14 and 15 Configuration 15 has the lowest miss rate. Larger block size could better utilize the spatial locality in this case since we access the matrix elements continuously. However, in real implementation, the block size is restricted by the bandwidth of system bus and the burst access size of memory. The miss penalty would be increased if larger block size is employed due to longer waiting time for the fetching of the whole block. Larger block size negatively impacts system performance when conflicts happen (more useful data is evicted from cache.) Sample code: .data Matrix: .word 41,45,5, 34,8, 15,16,23,44,48,12,32,18,47,22,8,22 …….. the given matrix….. …….. the given matrix. … .word 22,3, 14,7, 46,40,4, 7, 46,3, 19,27,16,16,25,33,41 .text .globl __start # main program starts in the next line __start: addi $s0,$zero,0 # outer loop counter addi $s1,$zero,0 # inner loop counter la $t0,Matrix # Matrix, base address addi $t6,$zero,17 # number of columns addi $t7,$zero,17 # number of rows addi $s6,$zero,0 # initialize counter to zero sll $t1,$t6,2 # byte numbers in a row ext_loop: addi $s1,$s0,0 # reset the inner loop counter inn_loop: mult $s0,$t1 mflo $t2 # calculate the low byte offset sll $t3,$s1,2 # calculate the column byte offset add $t4,$t0,$t2 # add row offset add $s2,$t4,$t3 # add column offset lw $s3,0($s2) # load this element mult $s1,$t1 mflo $t2 sll $t3,$s0,2 add $t4,$t0,$t2 add $s4,$t4,$t3 lw $s5,0($s4) # Repeat the offset calcluation for mirror element beq $s3,$s5,jump_equal # Branch if equal to store zero addi $s7,$zero,1 # write ones if different sw $s7,0($s2) # store the result back in matrix sw $s7,0($s4) # store the result back in matrix add $s6,$s6,1 # increment the counter j jump_unequal # skip the if-equal part jump_equal: addi $s7,$zero,0 # write zeros if equal sw $s7,0($s2) # store the result back sw $s7,0($s4) # store the result back jump_unequal: addi $s1,$s1,1 # increment the inner loop blt $s1,$t7,inn_loop # next inner loop addi $s0,$s0,1 # increment the outer loop blt $s0,$t6,ext_loop # next outer loop .end