Slides - University of Southern California

Blade – A Timing Violation Resilient
Asynchronous Template
Dylan Hand∗, Matheus Trevisan Moreira∗†, Hsin-Ho Huang∗,
Danlei Chen∗, Frederico Butzke‡, Zhichao Li∗, Matheus Gibiluka†,
Melvin Breuer∗, Ney Laert Vilar Calazans†, and Peter A. Beerel∗
May 4th, 2015
* University of Southern California, Los Angeles, CA
† Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, Brazil
‡ Universidade de Santa Cruz do Sul, Santa Cruz do Sul, Brazil
Delay Overheads
Cycle time of
clocked logic
Logic Time
PVT margin
Clock margin
Logic gates
Flip-flop alignment
[Dreslinksi et al., IEEE Proc. 2010]
Traditional synchronous design suffers from increased margins
• Worse at low and near-threshold regions
MOTIVATION | 2
Data Dependent Delays
Cycle time of
clocked logic
Logic gates
Worst-Case
Data
Number of Operations
Logic Time
Data delay in Plasma CPU
Delay
Slower
Delay variation due to data is rarely exploited in traditional designs
MOTIVATION | 3
Data Dependent Delays
Worst-Case Delay
Cycle time of
clocked logic
Logic gates
Worst-Case
Data
Number of Operations
Logic Time
Data delay in Plasma CPU
Delay
Slower
Delay variation due to data is rarely exploited in traditional designs
MOTIVATION | 3
Data Dependent Delays
Average Delay
Cycle time of
clocked logic
Logic gates
Worst-Case
Data
Data delay in Plasma CPU
Number of Operations
Logic Time
Worst-Case Delay
Delay
Slower
Delay variation due to data is rarely exploited in traditional designs
MOTIVATION | 3
Resilient Design
CLK = 0
Sequential
Gate
Shadow
Sequential
Logic
Error
Detection
Next Pipeline
Stage
To Control Logic
Allow and detect timing errors in datapath
• Correct via architectural replay or gating/pausing clock
RESILIENT DESIGN | 4
Resilient Design
CLK = 0
Sequential
Gate
Shadow
Sequential
Logic
Error
Detection
Next Pipeline
Stage
To Control Logic
Allow and detect timing errors in datapath
• Correct via architectural replay or gating/pausing clock
RESILIENT DESIGN | 4
Resilient Design
CLK = 1
Sequential
Gate
Shadow
Sequential
Next Pipeline
Stage
Logic
Error
Detection
0
To Control Logic
Allow and detect timing errors in datapath
• Correct via architectural replay or gating/pausing clock
RESILIENT DESIGN | 4
Resilient Design
CLK = 1
Sequential
Gate
Shadow
Sequential
Next Pipeline
Stage
Logic
Error
Detection
1
To Control Logic
Allow and detect timing errors in datapath
• Correct via architectural replay or gating/pausing clock
Effective approaches have been elusive!
RESILIENT DESIGN | 4
Metastability Issue
Shadow
FF
Latch
CLK
Logic
Next Pipeline Stage
To Control Logic
[Bubble Razor, Fojtik, 2013]
The Problem [Beer et al, 2014]
• Flop metastability can propagate to ERR signal
• Metastability in control path can cause system failure
RESILIENT DESIGN | 5
Metastability Issue
Shadow
FF
Latch
CLK
Logic
Next Pipeline Stage
To Control Logic
[Bubble Razor, Fojtik, 2013]
The Problem [Beer et al, 2014]
• Flop metastability can propagate to ERR signal
• Metastability in control path can cause system failure
RESILIENT DESIGN | 5
Metastability Issue
Shadow
FF
Latch
CLK
Logic
Next Pipeline Stage
X
To Control Logic
X
[Bubble Razor, Fojtik, 2013]
The Problem [Beer et al, 2014]
• Flop metastability can propagate to ERR signal
• Metastability in control path can cause system failure
RESILIENT DESIGN | 5
Handling Metastability
CLK
Latch
Transition
Detector
Logic
Next Pipeline Stage
To Control Logic
[RazorII, Das, 2009]
Transition detector may exhibit metastability
• Error signal must go through synchronizer, increasing delay
• Uses architectural replay to recover from errors
RESILIENT DESIGN | 6
Handling Metastability
CLK
Logic
Latch
Transition
Detector
Next Pipeline Stage
X
To Control Logic
[RazorII, Das, 2009]
Transition detector may exhibit metastability
• Error signal must go through synchronizer, increasing delay
• Uses architectural replay to recover from errors
RESILIENT DESIGN | 6
Handling Metastability
CLK
Transition
Detector
Next Pipeline Stage
Logic
Latch
X
FF
FF
To Control Logic
Synchronizer
[RazorII, Das, 2009]
Transition detector may exhibit metastability
• Error signal must go through synchronizer, increasing delay
• Uses architectural replay to recover from errors
RESILIENT DESIGN | 6
Handling Metastability
CLK
Latch
Transition
Detector
Next Pipeline Stage
Logic
FF
FF
1 or 0
To Control Logic
Synchronizer
[RazorII, Das, 2009]
Transition detector may exhibit metastability
• Error signal must go through synchronizer, increasing delay
• Uses architectural replay to recover from errors
RESILIENT DESIGN | 6
Hold Time Concerns
Ring Oscillator
Clock Generator
CLK
Data
In
STOP
MS
Detector
Comb.
Logic
In
FF
Internal Data
Latch
Out
[SafeRazor, Cannizzaro, 2014]
Relies on latch for error correction
Hold times are problematic
RESILIENT DESIGN | 7
Hold Time Concerns
Ring Oscillator
Clock Generator
STOP
Data
In
MS
Detector
Comb.
Logic
In
FF
CLK = 1
Internal Data
Latch
Out
[SafeRazor, Cannizzaro, 2014]
Relies on latch for error correction
Hold times are problematic
RESILIENT DESIGN | 7
Resiliency Landscape
Design
Template
Sync /
Async
MTBF
Safe
Avoids
Replay
Logic
Hold
Low Error
Time
Penalty
Robust
Bubble
Razor
Sync
No
Yes
Yes
Yes
Razor II
Sync
Yes
No
No
No
SafeRazor
Async
Yes*
Yes
No
Yes
RESILIENT DESIGN | 8
Resiliency Landscape
Design
Template
Sync /
Async
MTBF
Safe
Avoids
Replay
Logic
Hold
Low Error
Time
Penalty
Robust
Bubble
Razor
Sync
No
Yes
Yes
Yes
Razor II
Sync
Yes
No
No
No
SafeRazor
Async
Yes*
Yes
No
Yes
Blade
Async
Yes
Yes
Yes
Yes
Blade combines the best features of past resiliency schemes
RESILIENT DESIGN | 8
Outline
Our proposed resilient solution – Blade
Case study – 3-stage Plasma CPU
• Automated flow
• Area efficiency features
• Results and comparisons
Conclusions and future work
OUTLINE | 21
Blade Template
Asynchronous
Controller
δ
Reconfigurable Delay Lines
Combinational
Logic
L.data
CLK
L.ack
L.req
Blade
Controller
RE.req
RE.ack
Δ
R.ack
R.req
Sample
LE.req
LE.ack
Err
2
EDL
Error Detection Logic
Blade Stage
R.data
Error Detecting
Latch
Single Rail Datapath
BLADE TEMPLATE | 10
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err = 0
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err = 0
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err = 1
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err = 1
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Err
δ
Combinational
Logic
Error Detection Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Blade Template Operation
Err
EDL
Error Detection Logic
Send request speculatively before data is guaranteed stable
Timing errors delay handshaking signals
BLADE TEMPLATE | 11
Δ
Δ
EDL
Error Detection Logic
Err
δ
Combinational
Logic
Sample
Controller
B
CLK
Controller
A
Sample
CLK
Positive Hold Margins
Err
EDL
Error Detection Logic
Handshaking delays create positive hold margin!
BLADE TEMPLATE | 12
Error Detection Logic
From other
Q-Flops
tTD
delay
X
In
D
Latch
+C
CE = 0
{
Err0 = 0
{
CLK = 0
Err1 = 0
Sample = 0
Blade Controller
Q-Flop
Out
Q
EDL
Error Detection Logic
C-element stores error signal, which is sampled by Q-Flop
BLADE TEMPLATE | 13
Error Detection Logic
From other
Q-Flops
tTD
delay
X
In
D
Latch
+C
CE = 1
{
Err0 = 0
{
CLK = 1
Err1 = 0
Sample = 0
Blade Controller
Q-Flop
Out
Q
EDL
Error Detection Logic
C-element stores error signal, which is sampled by Q-Flop
BLADE TEMPLATE | 13
Error Detection Logic
From other
Q-Flops
tTD
delay
X
In
D
Latch
+C
CE = 1
{
Err0 = 0
{
CLK = 1
Err1 = 0
Sample = 1
Blade Controller
Q-Flop
Out
Q
EDL
Error Detection Logic
C-element stores error signal, which is sampled by Q-Flop
BLADE TEMPLATE | 13
Error Detection Logic
From other
Q-Flops
tTD
delay
X
In
D
Latch
+C
CE = 1
{
Err0 = 0
{
CLK = 1
Err1 = 1
Sample = 1
Blade Controller
Q-Flop
Out
Q
EDL
Error Detection Logic
C-element stores error signal, which is sampled by Q-Flop
BLADE TEMPLATE | 13
Metastable-Safe Operation
From other
Q-Flops
tTD
delay
X
In
D
Latch
+C
CE = 0
{
Err0 = 0
{
CLK = 0
Err1 = 0
Sample = 0
Blade Controller
Q-Flop
Out
Q
EDL
Error Detection Logic
Q-Flop prevents metastability propagation to control path
BLADE TEMPLATE | 14
Metastable-Safe Operation
From other
Q-Flops
tTD
delay
X
In
D
Latch
+C
CE = X
{
Err0 = 0
{
CLK = 1
Err1 = 0
Sample = 0
Blade Controller
Q-Flop
Out
Q
EDL
Error Detection Logic
Q-Flop prevents metastability propagation to control path
BLADE TEMPLATE | 14
Metastable-Safe Operation
From other
Q-Flops
tTD
delay
X
In
D
Latch
+C
CE = X
{
Err0 = 0
{
CLK = 1
Err1 = 0
Sample = 1
Blade Controller
Q-Flop
Out
Q
EDL
Error Detection Logic
Q-Flop prevents metastability propagation to control path
BLADE TEMPLATE | 14
Metastable-Safe Operation
Err0 = 1
From other
Q-Flops
tTD
delay
X
In
D
Latch
+C
CE = 0
Some time
later…
{
{
CLK = 1
Err1 = 0
Sample = 1
Blade Controller
Q-Flop
Out
Q
EDL
Error Detection Logic
Q-Flop prevents metastability propagation to control path
BLADE TEMPLATE | 14
Overhead Reduction
tcomp
tTD
From other
Q-Flops
delay
delay
X
+
C
{
From other
C-elements
In
D
Latch
Err0
{
Sample
CLK
{
Err1
Blade Controller
Q-Flop
Out
Q
EDL
Error Detection Logic
C-element and OR gates amortize overhead over many EDLs
BLADE TEMPLATE | 15
Overhead Reduction
tcomp
tTD
From other
Q-Flops
delay
delay
X
+
C
{
From other
C-elements
In
D
Latch
Err0
{
Sample
CLK
{
Err1
Blade Controller
4-Input
C-element
collects output
from 3 EDLs
Q-Flop
Out
Q
EDL
Error Detection Logic
C-element and OR gates amortize overhead over many EDLs
BLADE TEMPLATE | 15
Overhead Reduction
tcomp
tTD
From other
Q-Flops
delay
delay
X
+
C
{
From other
C-elements
In
D
Latch
Err0
{
Sample
CLK
{
Err1
Blade Controller
Q-Flop
Out
Q
EDL
Error Detection Logic
4-Input
OR gate collects
output from 4
C-elements
C-element and OR gates amortize overhead over many EDLs
BLADE TEMPLATE | 15
Overhead Reduction
tcomp
tTD
From other
Q-Flops
delay
delay
X
+
C
{
From other
C-elements
In
D
Latch
Err0
{
Sample
CLK
{
Err1
Blade Controller
Q-Flop
Each Q-Flop
covers 12
EDLs
Out
Q
EDL
Error Detection Logic
C-element and OR gates amortize overhead over many EDLs
BLADE TEMPLATE | 15
Controller Implementation
L.req
L.ack
LE.req
LE.ack
goR
δ
Left
R.req
R.ack
RE.req
RE.ack
Right
edi
goL
goD
Bottom
edo
Δ
delay
Blade Controller
Sample
Δ
CLK
Err[1]
Err[0]
Three part burst-mode state machine
• Implemented using 3D [Yun, 1992]
BLADE TEMPLATE | 16
Controller Implementation
L.req+ / LE.req+
δ
L.req
L.ack LE.ack+ / goR+
LE.req
LE.ack
goL- / L.ack-
goR
Left
Right
LE.ack- / goR-
edi
goL
L.req- / LE.req-
Blade Controller
Δ
goL+ / Lack+
R.req
R.ack
RE.req
RE.ack
goD
Bottom
edo
Δ
delay
Sample
CLK
Err[1]
Err[0]
Three part burst-mode state machine
• Implemented using 3D [Yun, 1992]
BLADE TEMPLATE | 16
Controller Implementation
goR+ R.ack- / goD+
R.req+
L.req
L.ack
LE.req
LE.ack
RE.req+ Err[0]+ /
RE.ack+ goD-
RE.req+ Err[1]+ /
edo+
goR
δ
Left
Err[1]- edi- /
Err[0]- /
Right
edi+ /
edi+ /
RE.ack+ goD- edo-
RE.ack- goD- edo-
R.req
R.ack
RE.req
RE.ack
edi
RE.req- Err[0]+ /
Err[0]- /
goL
RE.req- Err[1]+ /
edo+
goD
Err[1]- edi- /
edo
Bottom
Δ
RE.ack- goD-
goR- R.ack+ / goD+
R.req-
delay
Blade Controller
Sample
Δ
CLK
Err[1]
Err[0]
Three part burst-mode state machine
• Implemented using 3D [Yun, 1992]
BLADE TEMPLATE | 16
Controller Implementation
L.req
L.ack
LE.req
LE.ack
δ
Left
goR
Right
goD- delay- /
Sample-
goD+ / CLK+
delay+ /
Bottom
goL+ Sample+ CLK-
Sample
goD- delay/
Sample-
edi
goD
goL
Blade Controller
R.req
R.ack
RE.req
RE.ack
edo
Δ
delay+ /
goL- Sample+
delay CLKΔ
CLK goD+ /Err[1]
CLK+
Err[0]
Three part burst-mode state machine
• Implemented using 3D [Yun, 1992]
BLADE TEMPLATE | 16
Case Study: Plasma
MIPS OpenCore
3-stage pipeline
28nm FDSOI @ 666MHz
(w/ ideal clock and Vdd)
Type
Count
Combinational
11,740
Buf/Inv
1,683
Seq. (Non-RF)
531
RF
2,048
Total
14,319
[1] http://opencores.org/project,plasma
CASE STUDY | 17
Automatic Conversion Flow
Convert single-clock sync RTL design to Blade
Re-uses synchronous EDA tools and libraries
Sync
Synthesis
FF to Latch
Conversion
Latch
Retiming
Add EDL +
Async Control
Simulation
Async
Netlist
RTL
Spec
Seamless integration into existing flows
Blade Design Flow
CASE STUDY | 18
Synthesis
Sync
Synthesis
FF to Latch
Conversion
Latch
Retiming
Add EDL +
Async Control
Simulation
CLK
FF
FF
Synthesize sync RTL design using standard EDA tools
CASE STUDY | 19
Latches
Sync
Synthesis
FF to Latch
Conversion
Latch
Retiming
Add EDL +
Async Control
Simulation
CLK2
CLK1
M
S
M
S
Replace flip flops with master-slave latches
Two-phase non-overlapping clocking
CASE STUDY | 20
Latch Retiming
Sync
Synthesis
FF to Latch
Conversion
Latch
Retiming
Add EDL +
Async Control
Simulation
CLK2
CLK1
M
S
S
M
Retime latches to spread logic delay across stages
Allow time borrowing to reduce area overhead
CASE STUDY | 21
EDL Insertion
Sync
Synthesis
FF to Latch
Conversion
Latch
Retiming
Add EDL +
Async Control
Simulation
CLK2
CLK1
M
EDL
TB
M
EDL
TB
Replace non-TB latches with error detecting latches
CASE STUDY | 22
EDL Insertion
Sync
Synthesis
FF to Latch
Conversion
Latch
Retiming
Add EDL +
Async Control
Simulation
CLK2
CLK1
EDL
M
TB
EDL
M
TB
Replace non-TB latches with error detecting latches
Not all latches need be error detecting
CASE STUDY | 22
Async Control
Blade
Controller
EDL
Sync
Synthesis
FF to Latch
Conversion
Blade
Controller
TB
Latch
Retiming
Blade
Controller
EDL
Add EDL +
Async Control
Simulation
Blade
Controller
TB
Remove synchronous clock trees
Add Blade controllers and delay lines
CASE STUDY | 23
Simulation
Blade
Controller
EDL
Sync
Synthesis
Blade
Controller
TB
FF to Latch
Conversion
Latch
Retiming
Add EDL +
Async Control
Blade
Controller
Simulation
Blade
Controller
EDL
TB
Back annotated SDF simulation using final netlist
CASE STUDY | 24
Resynthesis
EDL
EDL
A
D
EDL
EDL
Logic
B
E
EDL
EDL
C
F
CASE STUDY | 25
Resynthesis
EDL
EDL
A
D
Critical Path
EDL
EDL
Logic
B
E
EDL
C
EDL
Delay > δ
F
CASE STUDY | 25
Resynthesis
EDL
EDL
A
D
Critical Path
EDL
EDL
Logic
B
E
EDL
C
EDL
Delay > δ
F
Add set_max_delay from A to F = δ
CASE STUDY | 25
Resynthesis
EDL
EDL
A
D
EDL
EDL
Logic
B
E
EDL
C
Latch
Delay < δ
F
Add set_max_delay from A to F = δ
CASE STUDY | 25
Resynthesis
EDL
EDL
A
D
EDL
EDL
Logic
B
Delay > δ
EDL
C
E
Correlated
EDL
Latch
Delay < δ
F
Add set_max_delay from A to F = δ
CASE STUDY | 25
Resynthesis
EDL
EDL
A
D
EDL
Latch
Logic
B
Delay < δ
EDL
C
E
Latch
Delay < δ
F
Add set_max_delay from A to F = δ
CASE STUDY | 25
Brute Force Resynthesis
Best run reduced
error rate by 39%
and EDLs by 27%
(-1.9% area)
Best Run
122 Correlated EDLs!
Evaluated hundreds of resynthesis runs
• Each run sets a max delay constraint to a single latch
Chose result that led to largest reduction in area and error rate
CASE STUDY | 26
Area and Performance
2%
24%
AND/OR Trees
C-elements
Q-Flops
24%
Plasma running CoreMark
Delay
Elements
Controllers
6%
12%
FF to Latch + EDL
32%
Overall area overhead is 8.4%
Performance increases
• Average: 19%
• Peak: 42%
CASE STUDY | 27
Performance Comparison with Margins
Must add margins for PVT variation, clock skew / jitter, and/or aging
• Synchronous frequency degraded to accommodate
Margin in Blade design is only imposed when an error occurs
• A 30% error rate reduces impact of margin by ~70%
Margin
Synchronous
Blade (30% ER)
% Advantage
0%
666MHz
800MHz
20%
15%
566MHz
764MHz
35%
30%
466Mhz
728MHz
56%
CASE STUDY | 28
Other Related Work
Canary Circuits [Sato, 2007]
• Removes some PVT margins
• But cannot take advantage of data dependency
Bundled Data Designs [Sutherland’89, Nowick’97]
• Speculative completion sensing exploits some data dependency
• Margins impact performance on every cycle
• No observability of errors
Soft Mousetrap [Liu, 2013]
• Hold time constraints remain difficult to meet
CONCLUSIONS | 29
Conclusions
Blade Template
• Achieves higher performance by exploiting data dependency
• Benefits from average vs worst-case MS resolution times
• Reduces impact of margins for PVT variations
• Enables voltage scaling for power savings
Plasma Case Study
• Highlights design and CAD techniques for area efficiency
• Achieves 19% increase in performance with 8.4% area overhead
CONCLUSIONS | 30
Questions?