ARIADNE : Agnostic Reconfiguration In A Disconnected Network Environment

1
ARIADNE: Agnostic
Reconfiguration In A Disconnected
Network Environment
Konstantinos Aisopos, Andrew DeOrio,
Li-Shiuan Peh, and Valeria Bertacco
2011 PACT
Presented by: Helen Arefayne Hagos
Yajing Chen
1
Motivation
2
Time
150 M
transistors
800 M
transistors
1B transistors
 Permanent failure
University of Michigan
2
1
Solution
3
Reconfiguration
University of Michigan
3
2
Existing solutions
Targeted fault
4
Reliability Performance
Area
Bounded in number
Limited
N/A
N/A
Pattern constrained
Limited
N/A
N/A
High
Good
Excess
High
Bad
High
Limited
Bad
Low
Off-chip
Not
constrained On-chip Immunet
Vicis
University of Michigan
4
3
Proposed solution
5
 Agnostic
 Reconfiguration algorithm
 In
A
 Disconnected
 Network
 Environment
University of Michigan
5
4
ROUTING ALGORITHM
6
0
1
2
4
5
6
University of Michigan
3
6
5
ROUTING ALGORITHM..Cont’d
1ST Broadcast
De N S E W
1
0 0 1 0
2
0 0 0 0
3
0 0 0 0
4
0 0 0 0
5
0 0 0 0
6
0 0 0 0
7
0 0 0 0
7
RT[1]=E
root
up
RT[1]=W
up
0
1
2
?
up
up
4
down
down
5
up
3
RT[1]=S,W
6
2
RT[1]=N,E RT[1]=N RT[1]=W
down
Conflict Resolution:
up
6
University of Michigan
7
ROUTING ALGORITHM ..Cont’d
8
2nd Broad Cast…
RT[2]=W
RT[2]=E RT[2]=E
up
up
0
1
2
down
3
up
up
up
4
down
5
6
RT[2]=N,ERT[2]=N RT[2]=N
 N cycles for a single broadcast
 O(N2)cycles for the entire
reconfiguration
University of Michigan
8
ROUTING ALGORITHM ..Cont’d
9
5th Broadcast
RT[5]=W,SX
RT[5]=S
RT[5]=W
0
1
2
4
5
6
RT[5]=W
3
RT[5]=W
Illegal path from Node 0 to Node 5
University of Michigan
9
SYNCHRONIZATION
10
Atomic broadcasts
Atomic synchronization
University of Michigan
10
8
Router Architecture
University of Michigan
11
11
11
Experiment Setup
12
 ARIADNE is implemented in the Wisconsin Multifacet GEMS

simulator as part of the GARNET network model
Two state-of-art routing algorithms: Vicis and Immunet
Network Architecture
GARNET
System Configuration
GEMS
Network
Topology
8x8
2D mesh
Processors
Memory
Controllers
4
Coherence
Channel
Width
64 bits
Router
Architecture
5-stage
pipeline
Router Ports,
VCs
5,
2(private)
Router
Buffers/port
5-flit/VC
L1 Caching
University of Michigan
L2 Caching
in-order
SPARC cores
MOESI
protocol
Simulation Input
Uniform
Random(UR)
TRanspose
(TR)
Synthetic
Traffic
private unified
32KB/node
2-way
latency:3cycles
Benchmark
Suite
PARSEC
Packet
Length
5
shared
distributed
1MB/node
16-way
latency:15cycles
Simulation
Time
100K
cycles
Simulation
Warmup
10K
cycles
12
Performance Evaluation: Synthetic Traffic
13
 Average Latency: the delay experienced by a packet from


source to destination
Throughput: the rate of packets delivered per cycle
Zero Load Latency
Saturation Throughput
University of Michigan
13
Performance Evaluation: PARSEC
 Average Latency for 50 faults

Latency over
variable number of faults
University of Michigan
14
14
Reliability Evaluation
15
 The ability to maximize the connectivity of a faulty network
PARSEC
 Synthetic Traffic
 The probability of deadlock
University of Michigan
15
Reconfiguration Evaluation
16
 During reconfiguration, all packets experience an additional

delay of 𝑁 2 cycles (N=64; 4096 cycles)
Average Latency over time

Reconfiguration Duration
University of Michigan
16
Hardware Overhead
 Verilog HDL
 Beginning with 5-stage router design:
The router with Ariadne:


17
2.708𝒎𝒎𝟐
2.761𝒎𝒎𝟐
Slightly more overhead than Vicis
Three times less than Immunet
Wire
Routing Tables
Overhead
𝒎𝒎𝟐
%Overhead
Ariadne
1
Adaptive
0.053
1.97%
Vicis
1
Deterministic
0.040
1.48%
Immunet
4
Adaptive, deterministic
and a small safe table
0.162
5.98%
University of Michigan
17
Summary
18
 Deadlock Avoidance



 Normal Operation: up*/down*
 Reconfiguration
 Eject and re-inject
 Cost: an additional dedicated flitbuffer per NIC
On-Chip
 A fully “distributed” solution
 Require a global clock to support atomicity
Protect ARIADNE
 Triple Modular Redundancy (TMR)
Partitioned Network
 The packet delivery is guaranteed within each partition
 Cannot transfer packets among partitions
University of Michigan
18
Conclusion
19
 High performance and deadlock-free routing by utilizing
up*/down*
 Performance gain: 40% to 140% at 50 faults during normal operation
 High reliability
 If a path between two node exists, at least one deadlock-free path is
guaranteed
 A fully distributed solution with simple hardware and low
complexity
 Nodes coordinates to explore the surviving topology
 1.97% area overhead
University of Michigan
19
Discussion
20
 Are the two benchmarks representative enough?
 Does the packet size affect the result?
 Does the topology influence the result?
 How about the scalability?
 What is the power consumption during reconfiguration?
University of Michigan
20