Slides

LLVM Inliner Enhancement
Jiangning Liu
Kevin Qin
1
Background
• Carefully tuned for a large scope of real
applications.
• Some missing opportunities for typical
computation intensive benchmarks.
2
Performance Gain and Code Size Bloat for SPEC2000
After Using -inline-threshold=1000
108.00%
80.00%
106.00%
70.00%
104.00%
60.00%
sweet spots
102.00%
50.00%
100.00%
40.00%
98.00%
30.00%
96.00%
20.00%
94.00%
92.00%
10.00%
90.00%
0.00%
Performance
Code Size
3
Heuristic Rules
• Performance
(A) 2X threshold for callee inside loop.
(B) 2X threshold for callee with const argument.
(C) 4X threshold for callee inside loop with <= 3BB.
• Code Size
(D) Reduce threshold to 225 for *cold* callees.
4
SPEC2000/2006 Performance
Change on AArch64 (r226173)
SPEC2000/2006 Code Size
Change for AArch64 (r226173)
106.00%
40.00%
105.00%
35.00%
104.00%
30.00%
103.00%
25.00%
102.00%
20.00%
101.00%
15.00%
100.00%
10.00%
99.00%
5.00%
98.00%
0.00%
A
A+B
A+B+C
A
A+B
A+B+C
5
SPEC2000 Performance Gain and Code Size Bloat
after applying heuristic rule A+B+C+D (r226173)
106.00%
100.00%
104.00%
80.00%
102.00%
60.00%
100.00%
98.00%
40.00%
96.00%
20.00%
94.00%
0.00%
92.00%
90.00%
-20.00%
Performance
Performance trend of threshold 1000 (*)
Codesize
Codesize of threshold 1000
* Show trend only! The x axis positions don’t 1:1 match with the names in this chart!
6
Compile Time
• Loop Info Analysis is expensive
• Fix A->B->C issue
– Early exit
– Choose A->B->C rather than B->D
llvm bootstrap time
(r226173)
180
A
B
C
Minutes on single x86 core
175
170
165
160
155
150
D
145
Original Patch Patch
without with
ABC fix ABC fix
7
Trade-off between performance and codesize
for SPEC2000 (r226173)
101.40%
8.00%
7.00%
101.20%
6.00%
101.00%
5.00%
100.80%
4.00%
3.00%
100.60%
2.00%
100.40%
1.00%
100.20%
0.00%
A
A+B
Perf Geonmean
A+B+C
A+B+C+D
Codesize Total
8
Current Status
• Patch is under review in community.
9
Thank you!
10
Challenges
• Trade-off
– Performance gain
– Code size bloat
– Compile-time slowdown
11
SPEC2006 Performance Gain and Code Size Bloat
after applying heuristic rule A+B+C+D (r226173)
106.00%
100.00%
104.00%
80.00%
102.00%
60.00%
100.00%
98.00%
40.00%
96.00%
20.00%
94.00%
0.00%
92.00%
90.00%
-20.00%
Performance
Performance of threshold 1000 (*)
Code Size
Codesize of threshold 1000
12