ECE486/586 Homework No. 1

ECE486/586 Homework No. 1
Due date: 04/16/2015
Problem No. 1 (10 points)
Read the following paper posted on the course website: S. Borkar, “Design Challenges of Technology
Scaling”, IEEE Micro, pages 23-29, July 1999. Then, answer the following questions:
(a) What are the goals of technology scaling? (2 points)
(b) How much reduction in gate delay is achieved by scaling CMOS technology to the next
technology generation? What is the resultant impact on processor frequency? (4 points)
(c) When this paper was published, processor clock frequencies were increasing by a factor of two
every technology generation. Why was this rate higher than the frequency improvement provided
by technology scaling calculated in part(a)? (4 points)
Problem No. 2 (10 points)
Read the following paper posted on the course website: T. Agerwala and S. Chatterjee, “Computer
Architecture: Challenges and Opportunities for the next decade”, IEEE Micro, pages 58-69, May 2005.
Then, answer the following questions:
(a) What is meant by CAGR? What are the projections for CAGR in the next few years? (2 points)
(b) What is meant by scale-out architectures? Provide two examples of (i) scale-out platforms, (ii)
scale-out workloads. (5 points)
(c) The paper states that: “Integrating a heterogeneous mixture of simple and complex cores on a
chip might provide acceptable performance over a wider variety of workloads”. Under which
scenarios would a heterogeneous multi-core achieve higher average performance than a
homogeneous mixture of cores? Explain with the help of examples. (3 points)
Problem No. 3 (15 points)
Consider two processors which use identical designs but operate at different voltage/frequency points: (i)
Processor-1 operates at 1V and 3.5 GHz, (ii) Processor-2 operates at 0.75V and 2 GHz). Both the
processors are used to run a real-time task in which specific deadlines must be met. Each processor is
turned “OFF” after the task has been fully executed.
Assume that both the processors are able to meet the timing deadline. Also assume that the processors
consume zero power in the “OFF” state:
(a) At the time when both the processors are “ON”, which processor consumes less dynamic power?
Quantify the relative power savings. (8 points)
(b) Which processor is more energy-efficient at executing the task? Is the energy difference between
the two processors identical to the difference in dynamic power consumption computed in part
(a)? If not, why? (7 points)
Problem No. 4 (30 points)
Intel’s Diamondville die, implemented in a 32nm process is 3.27mm x 7.94mm. Assume a defect density
of 0.028/cm2, a process complexity factor of 12 and a wafer cost of $4,000. Intel desires a 55% (gross)
profit margin.
Intel’s newer Cedarview die is implemented in a 22nm process and the die area is 20mm2. Assume the
newer process has a defect density of 0.036/cm2, a process complexity factor of 14, and a wafer cost of
$4,200.
Assume 300mm wafers, $1/die for all the packaging and testing costs, 100% wafer yield, and 99% yield
at final test. Show all your work. Round die to the nearest integer, yields and profits to the nearest 0.01%,
and part prices to three significant digits.
(a) How many dies/wafer each can Intel expect for the Diamondville on 32nm and for the Cedarview
at 22nm? (8 points)
(b) What is the expected die yield (%) for each chip? (6 points)
(c) What is the number of good dies/wafer for each? (4 points)
(d) At what price must Intel sell the Diamondville parts to achieve their target profit margin? (7
points)
(e) If Intel sells Cedarview at the same price, what is their profit margin on Cedarview? (5 points)
(f) (Extra Credit Question) A process engineer proposes an optimization to the 22nm process which
reduces the defect density to 0.03/mm2 while increasing the process complexity factor to 14.5.
However, implementing this optimization will cost 1.5 million dollars in equipment and engineer
salaries. Assuming that Intel expects to sell 80 million Cedarview processors after using this
process optimization, should the Intel management invest in the proposed optimization? Be
quantitative and specific. (5 points)
Problem No. 5 (15 points)
A newly designed processor “P1” running at 2.5 GHz is being evaluated to run a web server benchmark.
The following tables show P1’s CPI for each instruction type and the instruction frequency statistics for
the benchmark:
Instruction Type
Loads/Stores
Branches
Integer ALU
Integer Multiply
Clock
Cycles
3
4
1
7
Instruction Type
Loads/Stores
Branches
Integer ALU
Integer Multiply
Frequency
40%
10%
45%
5%
(a) To compete with other products available in the market, the processor must be able to provide a
throughput (instruction execution rate) of 1,100 instructions per microsecond on the web server
benchmark. Will P1 be able to satisfy the desired throughput requirement? (7 points)
(b) A microarchitect is proposing a change that will cut down the time taken by “Loads/stores” in P1
from 3 cycles to 2 cycles. Calculate the speedup obtained by this change. (4 points)
(c) Using Amdahl’s law, verify the speedup you computed in part (b). (4 points)
Problem No. 6 (15 points)
An engineer at a major processor company is asked to compare two different designs for the
upcoming mobile processor. The first design “C1” is expected to operate at 2.5 GHz, whereas the
second design “C2” is expected to operate at 2 GHz. To compare the two designs the engineer uses a
benchmark suite comprising of three workloads: web browsing, word processing and email. After
simulating the execution of these workloads on the two designs, the engineer obtains the following
data about the number of processor cycles taken by each workload on each design:
Web browsing
Word processing
E-mail
C1 (Execution time in cycles)
50000
100000
30000
C2 (Execution time in cycles)
40000
20000
30000
Note that the numbers in the above table represent execution time for each processor in terms of
“processor cycles” (not seconds). Also note that there is a difference in the operating frequencies of
the two designs:
(a) Using C1 as the reference design, compute the normalized execution times for each workload on
C2. (6 points)
(b) Using the geometric mean method of comparing performance, evaluate the speedup of processor
C2 over C1. (9 points)
Problem No. 7 (15 points)
We are considering enhancing a processor by adding vector hardware to it. When a computation is
run on the vector hardware, it is 6 times faster than the normal mode of execution. We call the
percentage of time that could be spent using vector mode “the percentage of vectorization”.
(a) What is the maximum speedup attainable from using vector mode? (3 points)
(b) How much performance improvement is achieved if the percentage of vectorization is 50%?
(4 points)
(c) What percentage of vectorization is needed to achieve a speedup of 4? (5 points)
(d) What percentage of the computation run time is spent in vector mode if a speedup of 4 is
achieved? (3 points)
Problem No. 8 (15 points)
A server is built from the following components and subsystems: a multicore CPU motherboard with an
MTTF of 10,000 hours, 4 disk drives (each of which has an MTTF of 100,000 hours), a disk controller
with an MTTF of 50,000 hours, and a power supply with an MTTF of 20,000 hours.
(a) What is the MTTF for the server? (7 points)
(b) You need to double the disk storage capacity. You have a choice of purchasing four additional
disk drives identical to those you’re already using, or replacing the four you have with disk drives
that have twice the capacity. The new disk drives have an MTTF of only 60,000 hours. A disk
controller can handle up to four drives. Which choice yields a more reliable system? And by how
much as compared to the other choice? (8 points)