ECE232 Lab Project 3 Sample Report

ECE232 Lab Project 3 Sample Report
MONITORING CACHE BEHAVIORS USING SPIM-CACHE
1. Your assembly code program, with each line of the program commented
Please find a sample assembly code in the last page.
2. The symmetric check result. Please show a screen shot of the first two row of
resultant values in the data memory. Also please show the counter value.
The count value is 0x64.
3. Record the data cache miss rates in the following table for 16 data cache
configurations.
Config Cache
Block Mapping
Writing
Replacement
Miss
Size
Size
Policy
Algorithm
Rate
1
256 B
8B
Direct Mapping
41.7%
2
256 B
8B
2-way Set
24.7%
3
256 B
8B
4-way Set
31.5%
4
256 B
8B
Fully
24.8%
5
512 B
8B
Direct Mapping
30.2%
6
512 B
8B
2-way Set
24.4%
7
512 B
8B
4-way Set
24.4%
8
512B
8B
Fully
9
128 B
16 B
4-way Set
Write through-
LRU
24.7%
28.4%
10
256 B
16 B
4-way Set
Allocate
11
512B
16B
4-way Set
12.9%
12
1024B
16 B
4-way Set
11.9%
13
1024B
4B
Direct Mapping
47.6%
14
1024B
8B
Direct Mapping
24.0%
15
1024B
16B
Direct Mapping
12.6%
16
512 B
16 B
4-way Set
Write Back-
25.2%
LRU
12.9%
Allocate
Miss rate comparison of Configuration 1 and 2
Configuration 2 has lower miss rate since extra way can provide alternative block for
holding the data when conflicts happen. For the data access pattern in this lab, this could
avoid evicting useful data (data would be used in the near future) from the cache.
Although data blocks are more likely mapped into the same set in a two way set
associative cache, the alternative way can avoid replacement of useful data. Thus, it gives
lower miss rate.
Miss rate comparison of Configuration 2 and 3
Configuration 2 has lower miss rate. On the one hand, increasing the number of ways
results in less number of sets. Thus, it is more likely that data blocks could be mapped
into the same set (increase capacity misses).
On the other hand, it is less likely that a
data block is replaced by another block if there are more ways. This is a tradeoff between
the number of cache sets and the number of cache ways. In our case, although there are
more available blocks in one set for 4 way set associative cache, we observer more
conflict and capacity misses in configuration 4. So configuration 2 gives us better
performance in terms of miss rate. For large data size, increase in associatively generally
results in lower miss rate . We have just a 17*17 matrix.
Miss rate comparison of Configuration 6 and 7
These two configurations report almost the same miss rate. Using the same argument a
tradeoff should be considered between the number of sets and ways. Due to larger cache
size, we observe that conflict and capacity misses are reduced to such a low level that it
makes no difference for both configurations. The miss rate depends on not only the
configuration of the cache but also the data access pattern. So we should perform
extensive benchmark tests to find out what is the optimal cache configuration for a
specific application.
Comparison of Configuration 9, 10, 11 and 12
The cache miss rate decreases as the cache size increases. Configuration 12 has the
lowest miss rate among these four configurations. Since more cache blocks (or cache sets)
are available to store data in a bigger cache, the conflict and capacity misses are reduced
and accesses are more likely to get hits. So under the same configuration, larger cache
size normally results in lower miss rate.
Comparison of Configuration 13,14 and 15
Configuration 15 has the lowest miss rate. Larger block size could better utilize the
spatial locality in this case since we access the matrix elements continuously. However,
in real implementation, the block size is restricted by the bandwidth of system bus and
the burst access size of memory. The miss penalty would be increased if larger block size
is employed due to longer waiting time for the fetching of the whole block. Larger block
size negatively impacts system performance when conflicts happen (more useful data is
evicted from cache.)
Sample code:
.data
Matrix: .word 41,45,5, 34,8, 15,16,23,44,48,12,32,18,47,22,8,22
…….. the given matrix…..
…….. the given matrix. …
.word 22,3, 14,7, 46,40,4, 7, 46,3, 19,27,16,16,25,33,41
.text
.globl __start
# main program starts in the next line
__start: addi $s0,$zero,0
# outer loop counter
addi $s1,$zero,0
# inner loop counter
la $t0,Matrix
# Matrix, base address
addi $t6,$zero,17
# number of columns
addi $t7,$zero,17
# number of rows
addi $s6,$zero,0
# initialize counter to zero
sll $t1,$t6,2
# byte numbers in a row
ext_loop: addi $s1,$s0,0
# reset the inner loop counter
inn_loop: mult $s0,$t1
mflo $t2
# calculate the low byte offset
sll $t3,$s1,2
# calculate the column byte offset
add $t4,$t0,$t2
# add row offset
add $s2,$t4,$t3
# add column offset
lw $s3,0($s2)
# load this element
mult $s1,$t1
mflo $t2
sll $t3,$s0,2
add $t4,$t0,$t2
add $s4,$t4,$t3
lw $s5,0($s4)
# Repeat the offset calcluation for mirror element
beq
$s3,$s5,jump_equal # Branch if equal to store zero
addi $s7,$zero,1
# write ones if different
sw $s7,0($s2)
# store the result back in matrix
sw $s7,0($s4)
# store the result back in matrix
add $s6,$s6,1
# increment the counter
j jump_unequal
# skip the if-equal part
jump_equal: addi $s7,$zero,0 # write zeros if equal
sw $s7,0($s2)
# store the result back
sw $s7,0($s4)
# store the result back
jump_unequal:
addi $s1,$s1,1
# increment the inner loop
blt $s1,$t7,inn_loop # next inner loop
addi $s0,$s0,1
# increment the outer loop
blt $s0,$t6,ext_loop # next outer loop
.end