Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD) Jiuyue Ma, Xiufeng Sui, Ninghui Sun, Yupeng Li, ZihaoYu, Bowen Huang, Tianni Xu, Zhicheng Yao, Yun Chen, Haibin Wang, Lixin Zhang, Yungang Bao ICT, CAS Huawei 2015.03.16 Data Center Era • Data center as a Infrastructure • • • Internet service Cloud computing Sharing in data centers • Google: Millions of jobs over 12,000 servers in a month 2 Diverse Workloads • • • Latency-Critical • Search engine • online-shopping Throughput-Oriented • Data analyst • Indexing 6min Others • C. Reiss et al. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis, SOCC, 2012. Test-and-Debug 3 Sharing -> Interference Noisy Neighbors Christine Wang. Intel® Xeon® Processor E5-2600 v3 Product Family Performance & Platform Solutions. 2014. [Yang et.al. ISCA ’13] Bubble-Flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computer 4 [Kambadur et.al SC’12] Measuring Interference Between Live Datacenter Applications Prior Works • Covering whole system stack • From hardware to user applications # Layer 1 Datacenter 2 However, App 3 • • • Scenario-specific Contention-varying Time-consuming maintenance activities [Dean, Commun. ACM 2013][Dean, 2012] or backup jobs[Yu NSDI’2011] 4 small packets triggered Nagle’s algorithm [Yu NSDI’2011] 5 limited buffers [Yu NSDI’2011] 6 Delayed ACK results in RTO [Yu NSDI’2011] network stack 7 8 9 OS TCP congestion control [Alizadeh, SIGCOMM’2010][Alizadeh, NSDI’2012] packet scheduling[Vamanan, SIGCOMM’2012][Wilson, SIGCOMM’2011][Zats, SIGCOMM Comput. Commun. 2012][Hong, SIGCOMM Comput. Commun. 2012] kernel sockets[Leverich EuroSys’14] 10 lock contention[Kapoor, SoCC’2012] 11 context switch[Leverich EuroSys’14] kernel kernel scheduling overhead[Leverich EuroSys’14] 13 SMT load imbalance[Leverich EuroSys’14] 14 IRQ imbalance[Leverich EuroSys’14] 15 • Ad-hoc solutions global file system [Dean, Commun. ACM 2013] background deamon [Dean, Commun. ACM 2013] 12 • Contention Point power 16 17 Hypervisor 18 C-State[Leverich EuroSys’14] DVFS[Leverich EuroSys’14] Virtual Machine Scheduling[Xu, NSDI’2013][Wang, INFOCOM’2010] [Xu, SoCC’2013] Network Isolation[Wang, INFOCOM’2010][Shieh, NSDI’2011][Xu, SoCC’2013][Jeyakumar, NSDI’2013] 19 SMT[Zhang, MICRO’2014] 20 shared caches[Leverich EuroSys’14][Tang, ISCA’2011][Kasture, ASPLOS’2014][Sanchez, ISCA’2011][Sanchez, MICRO’2010] [Qureshi, MICRO’2006][Thereska, SOSP’2013] Hardware 21 memory[Tang, ISCA’2011][Yang, ISCA’2013][Yang, ISCA’2013] [Muralidhara, MICRO’2011][Delimitrou, IISWC’2013] 22 NIC[Radhakrishnan, NSDI’2014] 23 I/O[Mesnier, SIGOPS’2011][Delimitrou, IISWC’2013] 5 Integrated Solutions • • Online Service Data Center • Over-provisioning + Batch-Workload Data Center • Highly shared Software Optimization cgroup, backup request LXC, priority, sync-backup-tasks Google Datacenters Utilization: (Jan-Mar, 2013)[1] Online Service Batch Workload v.s. 75% 30% [1] L. Barrosa, J. Clidaras, U. Holzle, The Datacenter as a Computer (2nd Edition), July, 2013. 7 Data Center Era 2010s Search, On-line shopping, Cloud computing, … Applications sharing infrastructure Priority, Throughput, Latency, … Different Requirements QoS v.s. Utilization QoS Problem Separate Online/Offline Service 9 Data Center Era 2010s Internet Era 1990s Search, On-line shopping, Cloud computing, … Applications sharing infrastructure HTTP, FTP, VoIP, Stream Media, Game, … Priority, Throughput, Latency, … Different Requirements VoIP, Game, … : Latency-critical FTP, VoD, …: Bandwidth-sensitive Email: Best Effort QoS v.s. Utilization QoS Problem QoS Separate Online/Offline Service 1994, Integrated services 1998, Differentiated Services 9 Software-Defined Networking (SDN) • Each packet tagged with a flowid • Control Plane • • Packet filters • Tag-based rules Programming Interface • OpenFlow => access control plane • API => business applications 10 Rationale of QoS Technologies • IntServ, DiffServ: • Propagate network applications’ QoS requirements to the network hardware • Also recognized by the architecture community 11 Rationale of QoS Technologies • IntServ, DiffServ: • Propagate network applications’ QoS requirements to the network hardware • Also recognized by the architecture community 11 Rationale of QoS Technologies • IntServ, DiffServ: • Propagate network applications’ QoS requirements to the network hardware • Also recognized by the architecture community “New, high-level interfaces are required to convey programmer and compiler knowledge to the hardware.” 21st Century Computer Architecture 12 Observation A Computer is inherently a Network Apply networking QoS technologies to computer architecture? 13 Observation A Computer is inherently a Network Apply networking QoS technologies to computer architecture? 13 Yes! PARD Programmable Architecture for Resourcing-on-Demand w/o loss of QoS 14 Three Challenges How-to support QoS? Programmable Architecture for Resourcing-on-Demand How-to design? How-to deploy? 15 Challenge #1 How to enable computer hardware to distinguish different applications? APP0 Single Application APP1 … APPn … Core Hypervisor Core Core … Shared Last Level Cache Shared Last Level Cache I/O Chipset Disk I/O Chipset Memory Ctrl NIC Core Core Core reality Disk 16 Memory Ctrl NIC expect Tagging Each Application Single Application • APP0 APP1 … APPn Tagging Grain VM-level tagging • Container-level tagging • Process-level tagging • Thread-level tagging • Object-level tagging Fine-Grain Tagging • Local Tagging Cross-Server Tagging • 17 Connect to network tagging mechanism In Response to This Morning’s Keynote Timing information (e.g. deadlines) can be integrated into tags to covey software’s timing requirements to the hardware 18 Tagging Source VM0 VM1 Core Core VMn Core … Shared Last Level Cache I/O Chipset Disk Disk Memory Controller Disk NIC 19 Tagging Source VM0 VM1 Core Core DS-id DS-id Add tag registers VMn Core … DS-id Shared Last Level Cache I/O Chipset DS-id Disk Memory Controller DS-id Disk DS-id DS-id Disk NIC DS-id DS-id 19 Tagging Datapath VM0 VM1 Core Core DS-id DS-id VMn Core … DS-id Shared Last Level Cache I/O Chipset DS-id Disk Memory Controller DS-id Disk DS-id DS-id Disk NIC DS-id DS-id 20 Tagging Datapath VM0 VM1 VMn Core Core Core DS-id DS-id … Core -> … DS-id Shared Last Level Cache Tagged Request I/O Chipset DS-id Disk DS-id Disk DS-id DS-id Disk Memory Controller NIC DS-id DS-id 20 Tagging Datapath VM0 VM1 Core Core DS-id DS-id VMn Core … DS-id Shared Last Level Cache I/O Chipset DS-id Disk Memory Controller DS-id Disk DS-id DS-id Disk NIC DS-id DS-id 21 Tagging Datapath VM0 VM1 VMn Core Core Core DS-id DS-id … Dev -> … DS-id Shared Last Level Cache Tagged Response Memory & DMA Controller I/O Chipset DS-id Disk DS-id Disk DS-id DS-id Disk NIC DS-id DS-id 21 How to Use Tag? VM0 VM1 Core Core DS-id DS-id VMn Core … Cache Partition DS-id Shared Last Level Cache I/O CP Chipset DS-id Disk CP Memory Controller Rate Limit DS-id Disk CP DS-id DS-id Disk NIC DS-id DS-id 22 Priority-based Scheduling Challenge #2 How to design control planes for a diversity of hardware? Control Plane (CP) 23 CP Design Choices Table-based Processor-based loop:% %%%%rbld %% %%%%rbrd%r1,%<req.type%offset>%% %%%%cmp%r1,%REQUEST be%.request% %%%%cmp%r1,%RESPONSE be%.response% .dispatch:% %%%%rbst % b%.loop% .request:% %%%%call%encrypt % b%.dispatch% .response:% %%%%call%decrypt % b.%dispatch% v.s. • • • Simple to implement, Fast Inflexible • 24 Support advanced functionalities Complicated, slow Advanced Functionality op > stat < = Trigger Table threshold DS-id2 Stat1 DS-id3 … Stat2 … Cond-2 Action-2 DS-id2 Cond-3 Action-3 + firmware action script Stat2 … Parameter Table … Param1 Param2 … DS-id2 Param1 Param2 … … DS-id1 action signal DS-id1 DS-id3 Action-1 Trigger => Action Statistics Table Stat1 Cond-1 … e.g. miss_rate > 30% DS-id1 DS-id1 … e.g. adjust way mask 26 Final CP Design Three Tables + Programming Interface + Interrupt Line Parameter Table Statistics Table Trigger Table DS-id1 Param1 Param2 … DS-id1 Stat1 Stat2 … DS-id1 Cond-1 Action-1 DS-id2 Param1 Param2 … DS-id2 Stat1 Stat2 … DS-id1 Cond-2 Action-2 DS-id2 Cond-3 Action-3 DS-id3 … … DS-id3 … … … Programming Interface Compare Control Plane • Three Control Table: Parameter / Statistics / Trigger • A Programming Interface: Control Tables R/W • A Interrupt Logic: Send Interrupt when trigger condition meet 27 Final CP Design Three Tables + Programming Interface + Interrupt Line Parameter Table Trigger Table Statistics Table DS-id1 Param1 Param2 … DS-id1 Stat1 Stat2 … DS-id1 Cond-1 Action-1 DS-id2 Param1 Param2 … DS-id2 Stat1 Stat2 … DS-id1 Cond-2 Action-2 DS-id2 Cond-3 Action-3 DS-id3 … … DS-id3 … … … Programming Interface Compare Control Plane • Three Control Table: Parameter / Statistics / Trigger • A Programming Interface: Control Tables R/W • A Interrupt Logic: Send Interrupt when trigger condition meet 27 Final CP Design Three Tables + Programming Interface + Interrupt Line Parameter Table Trigger Table Statistics Table DS-id1 Param1 Param2 … DS-id1 Stat1 Stat2 … DS-id1 Cond-1 Action-1 DS-id2 Param1 Param2 … DS-id2 Stat1 Stat2 … DS-id1 Cond-2 Action-2 DS-id2 Cond-3 Action-3 DS-id3 … … DS-id3 … … … Programming Interface Compare Control Plane • Three Control Table: Parameter / Statistics / Trigger • A Programming Interface: Control Tables R/W • A Interrupt Logic: Send Interrupt when trigger condition meet 27 Final CP Design Three Tables + Programming Interface + Interrupt Line Parameter Table Trigger Table Statistics Table DS-id1 Param1 Param2 … DS-id1 Stat1 Stat2 … DS-id1 Cond-1 Action-1 DS-id2 Param1 Param2 … DS-id2 Stat1 Stat2 … DS-id1 Cond-2 Action-2 DS-id2 Cond-3 Action-3 DS-id3 … … DS-id3 … … … Programming Interface Compare Control Plane • Three Control Table: Parameter / Statistics / Trigger • A Programming Interface: Control Tables R/W • A Interrupt Logic: Send Interrupt when trigger condition meet 27 Integrate into HW Components Cache Controller Memory Controller Common Control Plane Structure 28 Challenge #3 How to define/program resourcing-ondemand policy into hardware Parameter Table Statistics Table Trigger Table DS-id1 Param1 Param2 … DS-id1 Stat1 Stat2 … DS-id1 Cond-1 Action-1 DS-id2 Param1 Param2 … DS-id2 Stat1 Stat2 … DS-id1 Cond-2 Action-2 DS-id2 Cond-3 Action-3 DS-id3 … … DS-id3 … … … Programming Interface Compare Control Plane Policy? 29 Platform Resource Manager (PRM) • • • • Augmented IPMI Connect all control planes (CP) Run linux-based firmware Abstract CPs as files /sys/cpa cpa0 ident type ldoms ldom0 parameter param1 VM0 VM1 Core Core DS-id DS-id VMn param2 statistics trigger Core … DS-id ldom1 CP Shared Last Level Cache ldom2 cpa1 CP cpa2 MemoryCP Controller I/O Chipset DS-id Disk CP DS-id DS-id Disk CP Disk CP NIC CP DS-id Programming DS-id Monitoring & Interrupts 30 Centralized PRM Control Plane File Structure /sys/cpa Point to a CP cpa0 ident type Tags ldoms ldom0 parameter statistics trigger ldom1 ldom2 cpa1 Tags cpa2 Parameter Table DS-id1 Param1 Param2 … DS-id2 Param1 Param2 … DS-id3 … … Statistics Table DS-id1 Stat1 Stat2 … DS-id2 Stat1 Stat2 … DS-id3 … … Trigger Table DS-id1 Cond-1 Action-1 DS-id1 Cond-2 Action-2 DS-id2 Cond-3 Action-3 … 31 Access Control Planes Query Control Plane Info cat /sys/cpa/cpa0/ident cat /sys/cpa/cpa0/type Query Parameters cat /sys/cpa/cpa0/…/parameter/param1 Setting Parameters echo 10 > /sys/cpa/cpa0/…/parameter/param2 32 /sys/cpa cpa0 ident type ldoms ldom0 parameter param1 param2 statistics trigger ldom1 ldom2 cpa1 cpa2 Program “Trigger->Action” Statistics Table Trigger Table DS-id1 Stat1 Stat2 … DS-id1 Cond-1 Action-1 DS-id2 Stat1 Stat2 … DS-id1 Cond-2 Action-2 DS-id2 Cond-3 Action-3 DS-id3 … … … Compare 33 /sys/cpa cpa0 ident type ldoms ldom0 parameter statistics trigger 0 1 ldom1 ldom2 cpa1 cpa2 cpaX 64-bit 16-bit Control Plane Address Space Program “Trigger->Action” CP” ’ IDENT type IDENT_HIGH addr cmd data 32-bit Table Selection offset X X tag 1. Register trigger tag waymask ... pardtrigger /dev/cpa0 capacity tag miss_rate -ldom=0 -action=0 tag stats op val -stats=miss_rate -cond=gt,30 0 ... om0_t0.sh illme.sh miss_rate ... > ... 30% ... 2. Prepare action scripts Example 2: /cpa0_ldom0_t0.sh e trigger d=gt,30 ”> m0/triggers/0 1 2 3 4 5 6 # !/bin/sh echo “<log message>” > /log/triggers.log cur_mask=`cat /sys/cpa/.../waymask` miss_rate=`cat /sys/cpa/.../miss_rate` capacity=`cat /sys/cpa/.../capacity` target=update_mask( $cur_mask, $miss_rate, $capacity) 7 echo $targe > /sys/cpa/…/waymask ntrol Plane Programming Methodology. 3. Install trigger action script echo “/cpa0_ldom0_t0.sh” > s occur: (1) The/sys/cpa/cpa0/ldoms/ldom0/triggers/0 control plane uses the DS-id er table to get corresponding address mapping ow-buffer id. (2) The requested LDom phys- 33 /sys/cpa cpa0 ident type ldoms ldom0 parameter statistics trigger 0 1 ldom1 ldom2 cpa1 cpa2 Implementation 现与评测 simulator Open Sourced * • Full-system cycle-accurate Full-system cycle-accurate simulator Ongoing Work Xilinxdevelopment VC709 evaluation board • FPGA prototype on FPGA board ! ! Firmware PRM Logic Domain #0 Unmodified UnmodifiedSimulator Unmodified Full-system Linux Linux Linux Interference Firmware Unmodified Linux Logic Domain #1 Unmodified App Logic Micro Logic Benchmark Domain #2 Domain #3 Unmodified Linux Simulated PARD Server Simulated PARD Server (based on gem5, 4*core, 8GB) *available at http://github.com/fsg-ict/PARD-gem5 34 Full System Simulator of a Server • Based on GEM5, supporting OoO models • 4*core -> 4*LDom • Cache - • • Memory - Address Mapping - Priority I/O - 35 MissRate -> WayMask Bandwidth Fully HW-supported Virtualization Memory Bandwidth (GB/s) 2 2 Boot Unmodified-Linux T 1.5 LDom0 Run 437.leslie3d Boot OS Bash Ready 1 0 3 LDom1 Bash Ready Run 470.lbm 2 1 0 3 2 Boot OS LDom2 Boot OS 1 0 500 Bash Ready Run CacheFlush 1000 1500 2000 Simulated Time (ms) LDom0 CacheFlush Occupied Last Level Cache (MB) 3 2500 TCacheFlush Run 437.leslie3d 1 0.5 0 echo 0xFF00 > /sys/cpa/cpa0/ldom0/parameters/waymask 2 LDom1 1.5 Run 470.lbm 1 0.5 0 echo 0x00FF > /sys/cpa/cpa0/ldom1/parameters/waymask 2 LDom2 1.5 Run CacheFlush 1 0.5 0 echo 0x00FF > /sys/cpa/cpa0/ldom2/parameters/waymask Figure 7. Dynamically Partition a PARD server into Four LDoms and Launch Three LDoms in turn. • Partition singel PARD server into 4 logic domain (LDom) 7. Evaluation tion. Thus, these VMs may contend for hardware resources such as LLC capacity, as shown in the figure. • Boot 3 the of experiments 4 LDoms w/ unmodified linux-2.6.28.4 This section describes we conducted on both of the simulation and FPGA platforms. The goal is to verify new functionalities enabled by PARD architecture and the overhead of • current PARD control plane design. For experimental methodology, we leveraged GEM5’s SimpleTiming mode to boot Linux, launch and warmup workloads, made checkpoints, and then switched to cycle-accurate Out-of- 36 In this experiment, in order to guarantee reasonable LLC capacity for LDom0, we manually ran three echo commands (shown in the figure) to adjust LLC capacity. Since the LLC of the simulated server is 16-way, the way mask bits “0xFF00” indicates that the LLC control plane allocates eight ways for LDom0 and the mask bits “0x00FF” means that LDom1 and LDom2 share the other eight ways. Consequently, the percentage of LDom0’s LLC capacity in- Manually adjust cache partition after system up Fully HW-supported Virtualization Memory Bandwidth (GB/s) 2 LDom0 Run 437.leslie3d Boot OS Bash Ready 1 0 3 LDom1 0 3 2 Bash Ready Run 470.lbm Boot OS LDom2 Boot OS 1 0 437.leslie3d 470.lbm 2 1 TCacheFlush 500 Occupied Last Level Cache (MB) 3 CacheFlush Bash Ready Run CacheFlush 1000 1500 2000 Simulated Time (ms) 2500 2 TCacheFlush LDom0 1.5 Run 437.leslie3d 1 0.5 0 echo 0xFF00 > /sys/cpa/cpa0/ldom0/parameters/waymask 2 LDom1 1.5 Run 470.lbm 1 0.5 0 echo 0x00FF > /sys/cpa/cpa0/ldom1/parameters/waymask 2 LDom2 1.5 Run CacheFlush 1 0.5 0 echo 0x00FF > /sys/cpa/cpa0/ldom2/parameters/waymask Figure 7. Dynamically Partition a PARD server into Four LDoms and Launch Three LDoms in turn. • Partition singel PARD server into 4 logic domain (LDom) 7. Evaluation tion. Thus, these VMs may contend for hardware resources such as LLC capacity, as shown in the figure. • Boot 3 the of experiments 4 LDoms w/ unmodified linux-2.6.28.4 This section describes we conducted on both of the simulation and FPGA platforms. The goal is to verify new functionalities enabled by PARD architecture and the overhead of • current PARD control plane design. For experimental methodology, we leveraged GEM5’s SimpleTiming mode to boot Linux, launch and warmup workloads, made checkpoints, and then switched to cycle-accurate Out-of- 36 In this experiment, in order to guarantee reasonable LLC capacity for LDom0, we manually ran three echo commands (shown in the figure) to adjust LLC capacity. Since the LLC of the simulated server is 16-way, the way mask bits “0xFF00” indicates that the LLC control plane allocates eight ways for LDom0 and the mask bits “0x00FF” means that LDom1 and LDom2 share the other eight ways. Consequently, the percentage of LDom0’s LLC capacity in- Manually adjust cache partition after system up Trigger => Action Utilization: 25% ==> 100% x3 memcached LDom#0 Cache Flush MicroBenchmark LDom#1 LDom#2 PARD Server 37 LDom#3 Trigger => Action • Change of memcached’s Cache MissRate Cache Miss Rate 40% T2 30% 20% T0 T1 20KRPS T3 10% 0% 60 80 100 120 140 160 180 Simulated Time (ms) • T0: memcached alone (monopolize the cache) • T1: startup 3*CacheFlush (shared cache, increased miss rate) • T2: trigger condition met (MissRate > 30%), apply way-partition mechanism • T3: MissRate restored (~10%) 39 Trigger => Action • Change of memcached’s Cache MissRate Memcached Response Time 30 20% 10% 0% 60 • T2 solo w/ LLC Trigger shared 30% Response Time (ms) Cache Miss Rate 40% 20 T0 T1 20KRPS T3 memcached alone co-run with interference 10 80 100 120 140 160 180 w/ LLC Trigger Simulated Time (ms) T0: memcached alone (monopolize the cache) 0 10 12.5 15 17.5 20 22.5 25 • T1: startup 3*CacheFlush (shared cache, increased miss rate) • T2: trigger condition met (MissRate > 30%), apply way-partition mechanism • T3: MissRate restored (~10%) Kilo Requests Per Seconds (KRPS) 39 Trigger => Action • Change of memcached’s Cache MissRate Memcached Response Time 30 T2 solo w/ LLC Trigger utilization shared 30% 20% 10% 0% 60 • Response Time (ms) Cache Miss Rate 40% 20 20KRPS 25%->100% T1 T3 T0 memcached alone co-run with interference 10 80 100 120 140 160 180 w/ LLC Trigger Simulated Time (ms) T0: memcached alone (monopolize the cache) 0 10 12.5 15 17.5 20 22.5 25 • T1: startup 3*CacheFlush (shared cache, increased miss rate) • T2: trigger condition met (MissRate > 30%), apply way-partition mechanism • T3: MissRate restored (~10%) Kilo Requests Per Seconds (KRPS) 39 Trigger => Action • Change of memcached’s Cache MissRate Memcached Response Time 30 T2 solo w/ LLC Trigger utilization shared 30% 20% 10% 0% 60 • Response Time (ms) Cache Miss Rate 40% 20 20KRPS 25%->100% T1 T3 T0 memcached alone co-run with interference 10 80 100 120 140 160 17.5KRPS Simulated Time (ms) 180 w/ LLC Trigger T0: memcached alone (monopolize the cache) 0 10 12.5 15 17.5 20 22.5 25 • T1: startup 3*CacheFlush (shared cache, increased miss rate) • T2: trigger condition met (MissRate > 30%), apply way-partition mechanism • T3: MissRate restored (~10%) Kilo Requests Per Seconds (KRPS) 39 Trigger => Action • Change of memcached’s Cache MissRate Memcached Response Time 30 T2 solo w/ LLC Trigger utilization shared 30% 20% 10% 0% 60 • Response Time (ms) Cache Miss Rate 40% 20 20KRPS 25%->100% T1 T3 T0 memcached alone co-run with interference 10 80 100 120 140 160 180 w/ LLC Trigger Simulated Time (ms) T0: memcached alone (monopolize the cache) 0 10 12.5 15 17.5 20 22.5 25 • T1: startup 3*CacheFlush (shared cache, increased miss rate) • T2: trigger condition met (MissRate > 30%), apply way-partition mechanism • T3: MissRate restored (~10%) Kilo Requests Per Seconds (KRPS) 39 Firmware Full-system Simulator Unmodified App Interference Micro Benchmark CP Overhead Unmodified Linux Simulated PARD Server • Preliminary FPGA prototype (Xilinx VC709 xc7vx690t) - Cache controller: OpenSPARC T1 L2Cache - Memory controller: Xilinx Mig 7Series • Results - Pipeline structure of LLC/MC hide latency introduced by control plane logic - Control plane logic did not introduce too much resource overhead (3.5% for LLC, 10.1% for MC) 40 Summary • Data centers confront with a tough trade-off between utilization and apps’ QoS • A computer is inherently a network so that networking technologies can be applied to computer architecture • We propose PARD that provides a new interface for software to convey QoS requirements to the hardware 41 Q&A Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD) Get update of PARD simulator at http://github.com/fsg-ict/PARD-gem5 Backup: Pipelined Cache • Pipeline of Write Request Receive Write Request Access TagArray Access DataArray Access LRUHistory Lookup Parameter Table Access MSHR Update TagArray Update Statistics Table Enhanced LRU with Way-Partition 43 Check Trigger Table Send Memory Request
© Copyright 2025