Power Efficient Idle Injection

Power Efficient Idle
Injection
Jacob Pan
Intel Open Source Technology Center
LinuxCon Japan 2015
1
Agenda
•
•
•
•
Introduction to idle injection
Techniques available in Linux
Experiment results
Future work
2
Why Injecting Idle?
•
Primary: Thermal/Power limiting
•
Secondary:
• Performance management
• Pay per use
• Idle power efficiency
3
Understanding Processor Idle States/C-States
4
Motivation For Idle Injection: Increasingly lower
Idle power
Deep idle power is negligible!
Idle Power vs Running Power On Broadwell
16
14
14
Power (watt)
12
10
8
6
4
1.9
2
0.32
*TDP=Thermal Design Power
0
95% pc7
95% pc2
TDP C0
5
When to use idle injection?
Idle injection at LFM
(low frequency mode)
Idle injection at LFM
(low frequency mode)
6
Idle Injection in Linux
•
•
Intel PowerClamp driver
Scheduler throttling, RT or CFS bandwidth control
7
Intel Power Clamp V1
(current design in mainline kernel)
The idea: play idle!
8
PowerClamp v1 timeline of idle injection
sched tick
throttled
unthrottled
RT kthread
9
Limitations of Intel PowerClamp V1
•
•
CPU appears busy while playing idle
Scheduler ticks not stopped in NOHZ idle
• Removal of tick_nohz_idle_enter/exit() API
• RCU grace period
•
Relies on timely jiffies updates
10
Limitations of Intel PowerClamp V1
CPU appears busy while playing idle
11
Limitations of Intel PowerClamp V1
Scheduler ticks not stopped in NOHZ idle
• Interrupted sleep is less efficient in power
• Removal of tick_nohz_idle_enter/exit() API
• RCU grace period
12
Limitations of Intel PowerClamp V1
Relies on secondary timing source
• timely jiffy updates
• periodic timers
13
Scheduler Based Throttling
Normal tasks under completely fair scheduling (CFS) class
› Bandwidth control via CPU control group/container
› Runqueue throttling by enqueue/dequeue tasks
Root CG
CG1
CG1.1
CG2
CG1.2
CG2.1
14
Time chart of CFS Bandwidth Control
(two cgroups multithreaded workload)
•
•
Pros: No fake idle task, Finer per cgroup controls
Cons: No synchronization loss of package C-state opportunities
unthrottle
cgroup1
throttle
throttle
unthrottle
cgroup2
15
Power Clamp V2(work in progress)
•
•
Runqueue throttling of CFS class
Synchronization around rounded Ktime instead of jiffies
16
Time Chart Powerclamp v1 vs. v2
17
Experiment Data
•
Goals:
•
•
•
•
Comparing Power Efficiency
Scalability
CPU HW design trend: old vs. new
Configurations:
•
•
•
CPUs: Ivy Bridge/Haswell/Broadwell clients, Haswell EX server
Workload:fspin by Len Brown. CPU bound, floating
Test case: Inject idle from 0 to 50% at 5% increment
18
Power and Performance Control V1 vs. V2
19
Power Efficiency Comparison On A Client Platform
20
Scalability Tests V1 vs. V2
(144 core 4 socket Haswell EX)
21
Power Efficiency Comparison On A Server Platform
22
Comparing Deep vs. Shallow Package C-States
(powerclamp v2)
23
Conclusions
•
•
•
•
Idle injection can effectively reduce power beyond energy efficient frequency
With deeper package C-states, can achieve near linear performance and power
reduction
Scheduler runqueue throttling results in cleaner and more efficient solution
Align activities results in significant power savings
24
Future plan
•
•
•
•
Better handling of interrupts
Integration with scheduler
Synchronize with devices with latency tolerance
Work with hardware duty cycling
25
Backups
26
Time Chart of Redesigned Power Clamp
27
Entering Idle Injection Period
28
Exiting Idle Injection
29