Practical Static Analysis of JavaScript Applications in the Presence of Magnus Madsen

Practical Static Analysis of JavaScript
Applications in the Presence of
Libraries and Frameworks
Magnus Madsen
Benjamin Livshits
Michael Fanning
Outline
Motivation
• auto-complete
• call graph discovery
• capability + API usage
Challenges
• large & complex libraries
• native libraries
• insufficiency of stubs
Technique
• pointer analysis
• use analysis
Evaluation
• improved auto-complete
• improved call graph resolution
• soundness & completeness
2 / 27
Motivation
Windows 8:
– JavaScript is an officially supported language
– .NET library bindings exposed to JavaScript
Q: How can we use static analysis to improve
the development experience?
3 / 27
The Challenge
Modern JavaScript applications are often built
using large and complex libraries:
– Browser API, Win8 API, NodeJS, PhoneGap, ...
– Problems: Reflection? native code? sheer size?
– But: We really only care about the application!
A pragmatic choice: We ignore the libraries (thus
sacrificing soundness) to focus on the applications
themselves
4 / 27
Practical Applications
(which do not require soundness)
•
•
•
•
auto-complete
call graph discovery
capability usage
API usage
5 / 27
Practical Applications
(which do not require soundness)
•
•
•
•
auto-complete
call graph discovery
capability usage
API usage
6 / 27
Practical Applications
(which do not require soundness)
•
•
•
•
auto-complete
call graph discovery
capability usage
API usage
<Capabilities>
<Capability name="internetClient"/>
<Capability name="picturesLibrary"/>
<DeviceCapability name="location"/>
<DeviceCapability name="microphone"/>
<DeviceCapability name="webcam"/>
</Capabilities>
7 / 27
Practical Applications
(which do not require soundness)
•
•
•
•
auto-complete
call graph discovery
capability usage
API usage
•
•
•
•
•
•
•
Windows.Devices.Sensors
Windows.Devices.Sms
Windows.Graphics.Display
Windows.Graphics.Printing
Windows.Media.Capture
Windows.Networking.Sockets
Windows.Storage.Search
8 / 27
Win8 & Web Applications
Web App
Windows 8 App
Builtin
DOM
WinJS
Win8
Builtin
DOM
jQuery
…
3000
functions
9 / 27
Introducing Use Analysis
elm flows into
reset
elm flows into
playVideo
elm must have:
muted and play
elm must have:
pause
Conclusion: elm is a
HTMLVideoElement
10 / 27
Use Analysis:
Determines what an
object is based on
how it is used
11 / 27
Outline
Motivation
• call graph discovery
• API + capability usage
• auto-complete
Challenges
• large & complex libraries
• native libraries
• insufficiency of stubs
Technique
• pointer analysis
• use analysis
Evaluation
• improved call graph resolution
• improved auto-complete
• running time
12 / 27
Heap Partitioning
Application Heap
"Symbolic Heap"
Library Heap
13 / 27
Symbolic Objects and Unification
1. Introduce symbolic objects where flow is
dead (i.e. missing) due to libraries.
2. Collect information about where the
symbolic objects flow and how they are
used.
3. Unify symbolic objects with "compatible"
application or library objects.
14 / 27
Example: Iteration 1
We discover that c is a dead return
15 / 27
Example: Iteration 2
We introduce a symbolic return object
16 / 27
Example: Iteration 3
We unify the symbolic object with
the HTMLCanvasElement
17 / 27
Missing Flow
Where can dataflow be missing when ignoring
the library code?:
– Dead Returns
– Dead Arguments
– Dead Loads
– Dead Prototypes
– Dead Array Accesses
18 / 27
Unification Strategies
Unification strategies based on property names:
– ∃: a single shared property name
– ∀: all shared property names
– ∀: all shared property names, but prioritize
prototype objects
x
x
Application
x
x
y
z
y
Symbolic
Application
19 / 27
Outline
Motivation
• call graph discovery
• API + capability usage
• auto-complete
Challenges
• large & complex libraries
• native libraries
• insufficiency of stubs
Technique
• pointer analysis
• use analysis
Evaluation
• improved call graph resolution
• improved auto-complete
• running time
20 / 27
Benchmarks
25 Windows 8 Apps:
Average ~1,500 lines of code
Approx. 30,000 lines of stubs
21 / 27
Call Graph Resolution
Pointer Analysis
Pointer Analysis
+ Use Analysis
A call site is
resolved if it has
a non-empty set
of call targets
22 / 27
Auto-complete
• We compared our technique to the
auto-complete in four popular IDEs:
– Eclipse for JavaScript developers
– IntelliJ IDEA 11
– Visual Studio 2010
– Visual Studio 2012
• In all cases, where libraries were involved,
our technique was an improvement
23 / 27
Auto-complete: Case study
 0  35  26  1
0  9  7  k
 0  50  7  7
 0  50  1  k
 0  250  7  k
24 / 27
Soundness & Completeness
Use Analysis is inheritenly unsound:
– library code is not analyzed
– library code could have arbitrary side-effects
An example of unsoundness
...
An example of incompleteness:
results of manual (human)
inspection of 200 call sites
25 / 27
Findings
• Auto-completion is improved compared to
four popular IDEs
• Use analysis improves call graph resolution
• In practice unsoundness is limited
• Reasonable analysis time
median analysis time of 10s for apps of avg 1500 loc
26 / 27
Summary
Pointer analysis + Use analysis:
– A technique to statically reason about JavaScript
applications which rely on large and complex libraries
without analyzing the libraries themselves
Practical applications:
–
–
–
–
auto-complete
API usage
capability discovery
call graph construction
Thank You
27 / 27
28 / 27
Architecture
JavaScript
Application
Introduce
New Facts
App
Facts
Analysis
Rules
Pointer
Analysis
Use Analysis
29 / 27
30 / 27
31 / 27
32 / 27
Datalog Formulation
We define the following domains:
𝑽 – variables
𝑯 – heap-allocated objects
𝑷 – property names
𝑪 – call sites
𝒁 – integers (e.g. argument offsets)
Based on Gatekeeper (Livshits et al. 2009)
33 / 27
Pointer Analysis
PointsTo(v, h)
:- NewObj(v, h, _).
PointsTo(v1, h) :- Assign(v1, v2), PointsTo(v2, h).
PointsTo(v2, h2) :- Load(v2, v1, p), PointsTo(v1, h1),
HeapPtsTo(h1, p, h2).
HeapPtsTo(h1, p, h2) :- Store(v1, p, v2), PointsTo(v1, h1),
PointsTo(h2, h2).
HeapPtsTo(h1, p, h3) :- Prototype(h1, h2),
HeapPtsTo(h2, p, h3).
CallGraph(c, f) :- ActualArg(c,
Assign(v1, v2) :- CallGraph(c,
ActualArg(c,
Assign(v1, v2) :- CallGraph(c,
ActualRet(c,
0, v), PointsTo(v, f).
f), FormalArg(f, i, v1),
i, v2), z > 0.
f), FormalRet(f, v1),
v2).
34 / 27
Example: Dead Returns
DeadRet(c, v) :- CallGraph(c, f),
ActualRet(c, v),
!ResolvedVar(v),
!AppAlloc(f).
DeadArg(f, i) :- FormalArg(f, i, v),
!ResolvedVar(v),
AppAlloc(f).
...
35 / 27