LLVM Inliner Enhancement Jiangning Liu Kevin Qin 1 Background • Carefully tuned for a large scope of real applications. • Some missing opportunities for typical computation intensive benchmarks. 2 Performance Gain and Code Size Bloat for SPEC2000 After Using -inline-threshold=1000 108.00% 80.00% 106.00% 70.00% 104.00% 60.00% sweet spots 102.00% 50.00% 100.00% 40.00% 98.00% 30.00% 96.00% 20.00% 94.00% 92.00% 10.00% 90.00% 0.00% Performance Code Size 3 Heuristic Rules • Performance (A) 2X threshold for callee inside loop. (B) 2X threshold for callee with const argument. (C) 4X threshold for callee inside loop with <= 3BB. • Code Size (D) Reduce threshold to 225 for *cold* callees. 4 SPEC2000/2006 Performance Change on AArch64 (r226173) SPEC2000/2006 Code Size Change for AArch64 (r226173) 106.00% 40.00% 105.00% 35.00% 104.00% 30.00% 103.00% 25.00% 102.00% 20.00% 101.00% 15.00% 100.00% 10.00% 99.00% 5.00% 98.00% 0.00% A A+B A+B+C A A+B A+B+C 5 SPEC2000 Performance Gain and Code Size Bloat after applying heuristic rule A+B+C+D (r226173) 106.00% 100.00% 104.00% 80.00% 102.00% 60.00% 100.00% 98.00% 40.00% 96.00% 20.00% 94.00% 0.00% 92.00% 90.00% -20.00% Performance Performance trend of threshold 1000 (*) Codesize Codesize of threshold 1000 * Show trend only! The x axis positions don’t 1:1 match with the names in this chart! 6 Compile Time • Loop Info Analysis is expensive • Fix A->B->C issue – Early exit – Choose A->B->C rather than B->D llvm bootstrap time (r226173) 180 A B C Minutes on single x86 core 175 170 165 160 155 150 D 145 Original Patch Patch without with ABC fix ABC fix 7 Trade-off between performance and codesize for SPEC2000 (r226173) 101.40% 8.00% 7.00% 101.20% 6.00% 101.00% 5.00% 100.80% 4.00% 3.00% 100.60% 2.00% 100.40% 1.00% 100.20% 0.00% A A+B Perf Geonmean A+B+C A+B+C+D Codesize Total 8 Current Status • Patch is under review in community. 9 Thank you! 10 Challenges • Trade-off – Performance gain – Code size bloat – Compile-time slowdown 11 SPEC2006 Performance Gain and Code Size Bloat after applying heuristic rule A+B+C+D (r226173) 106.00% 100.00% 104.00% 80.00% 102.00% 60.00% 100.00% 98.00% 40.00% 96.00% 20.00% 94.00% 0.00% 92.00% 90.00% -20.00% Performance Performance of threshold 1000 (*) Code Size Codesize of threshold 1000 12
© Copyright 2024