ICSME 2014 Victoria BC Canada Exploration, Analysis, and Manipulation of

ICSME 2014
Victoria BC Canada
Exploration, Analysis, and Manipulation of
Source Code with the srcML Infrastructure
Michael L. Collard,
Jonathan I. Maletic
Department of Computer Science
The University of Akron
Ohio, USA
Department of Computer Science
Kent State University
Ohio, USA
Organization of Briefings
•  Part I – Thursday 13:30-14:00
–  What is srcML – a brief overview
–  The srcML interface
–  Using srcML with XPath + extension functions
•  Part II – Friday 13:30-14:00
–  libsrcml C API
–  srcSAX framework
–  Building custom analysis & manipulation tools
10/16/14
srcML.org
2
Support
•  Supported in part by a grant from
CNS 13-05292/05217
–  3 years of current funding to enhance
infrastructure
•  ABB has also supported srcML
10/16/14
srcML.org
3
srcML Team
•  Michael Collard
•  Drew Guarnera
•  Brian Kovacs
10/16/14
• 
• 
• 
• 
• 
srcML.org
Jonathan Maletic
Michael Decker
Brian Bartman
Christian Newman
Tessandra Sage
4
What is srcML?
•  XML format that explicitly embeds
structural information directly into the
source text
•  Markup is selective at a high AST level
(i.e., stop at expression level)
•  No preprocessing, macro or template
expansion is necessary
•  No loss of text, formatting, white space
•  Code fragments as well formed srcML
10/16/14
srcML.org
5
Source Code
#include "rotate.h"!
!
// rotate three values!
void rotate(int& n1, int& n2, int& n3) !
{!
// copy original values!
int tn1 = n1, tn2 = n2, tn3 = n3;!
!
// move!
n1 = tn3;!
n2 = tn1;!
n3 = tn2;!
}!
10/16/14
srcML.org
6
srcML
<unit xmlns="http://www.sdml.info/srcML/src" xmlns:cpp="http://www.sdml.info/srcML/cpp" language="C"
filename="rotate.c">
<cpp:include>#<cpp:directive>include</cpp:directive> <cpp:file>"rotate.h"</cpp:file>
</cpp:include>
!
<comment type="line">// rotate three values</comment>
<function><type>void</type> <name>rotate</name>
<parameter_list>(<param><type>int&amp;</type> <name>n1</name></param>,!
<param><type>int&amp;</type> <name>n2</name></param>,!
<param><type>int&amp;</type> <name>n3</name></param>)</parameter_list> !
<block>{!
<comment type="line">// copy original values</comment>
<decl_stmt><decl><type><name>int</name></type> <name>tn1</name> =<init> <expr><name>n1</name></
expr></init>, <name>tn2</name> =<init> <expr><name>n2</name></expr></init>, <name>tn3</name> =<init>
<expr><name>n3</name></expr></init></decl>;</decl_stmt>
!
<comment type="line">// move</comment>
<expr-stmt><expr><name>n1</name> = <name>tn3</name></expr>;</expr-stmt>
<expr-stmt><expr><name>n2</name> = <name>tn1</name></expr>;</expr-stmt>
<expr-stmt><expr><name>n3</name> = <name>tn2</name></expr>;</expr-stmt>
}</block></function>
</unit>
10/16/14
srcML.org
7
src to srcML Translation
•  1-to1, onto mapping per file
•  Preprocessor is not run (unless you want)
•  All text is retained (comments, whitespace, code)
10/16/14
srcML.org
8
srcML Elements
10/16/14
srcML.org
9
srcML.org
•  Languages: C/C++, C#, Java
•  Executables
–  Windows, Fedora, Mac OSX, and Ubuntu
• 
• 
• 
• 
10/16/14
Source Code - Github
Bug Reporting
Documentation
GPL
srcML.org
10
Implementation
•  Parsing technology in C++ with ANTLR
•  Uses libxml2 and libarchive
•  Current speed: ~35 KLOC/second
•  srcML to text: ~4.5 (~1.4compressed)
•  Allows for various input sources, e.g.,
directories, source archives (tar.gz, etc.)
10/16/14
srcML.org
11
Language Support
•  C11, K&R C
•  C++14, Qt extensions
•  Java SE 8
•  C# Standard ECMA-334
10/16/14
srcML.org
12
srcML Archive
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<unit xmlns="http://www.sdml.info/srcML/src">
<unit xmlns:cpp="http://www.sdml.info/srcML/cpp" language="C#" filename="main.cs" hash="09d0…77f7">
<!-- ... -->
</unit>
<unit xmlns:cpp="http://www.sdml.info/srcML/cpp" language="C" filename="rotate.c" hash="2380…26de">
<!-- ... -->
</unit>
<!-- ... -->
<unit xmlns:cpp="http://www.sdml.info/srcML/cpp" language="C" filename="rotate.h" hash="1e6e…1b35">
<!-- ... -->
</unit>
</unit>
Syntactic Differencing
Fact Extraction
Call-Graph Generation
Metrics
Infrastructure Tools
Syntactic Querying
Transformation/Refactoring
srcSAX
Models
LINQ
XML Technologies
XPath
DOM
C
SAX2
C++
C#
Java
Parser ANTLR
Corpus
srcML Format
Tutorials
Program Dependency
Extension Functions
Community
XSLT
UML
Slicing
13
Bug Tracing
Ad-hoc Models
srcML.org
Blog
Call Graph
10/16/14
Applications of srcML
• 
• 
• 
• 
• 
Fact extraction, analysis, computing metrics
Refactoring, Transformation
Syntactic Differencing
Slicing
Reverse engineering UML class diagrams, method/class
stereotypes
•  C++ preprocessor analysis
•  Reverse engineering C++ template parameter
constraints
10/16/14
srcML.org
15
srcml 1.0
•  New client srcml with C API libsrcml,
combines functionality of current tools
src2srcml and srcml2src
•  Almost identical interface and options
•  Freeze and version srcML tags
•  Cross-linked documentation
•  Multithreaded translation for large projects:
%srcml linux-3.16.tar.xz –o
linux-3.16.xml.gz
•  Macbook Air:
~7 minutes
•  Mac Pro 6 Core: ~2 minutes
10/16/14
srcML.org
16
srcML XPath
•  Names of all functions that include a direct call to
malloc():
%srcml --xpath="//src:function[
.//src:call/src:name='malloc']/src:name"
linux-3.16.xml.gz –o function_names.xml
•  Result: srcML Archive with <unit> for
each function name
10/16/14
srcML.org
17
srcPath Extension Functions
•  private methods
//src:function[src:is_static()]
•  pointer variable declarations
//src:decl[src:is_pointer()]
•  statements in a function
//src:function//src:statement()
10/16/14
srcML.org
18
Next Time…
•  Part II – Friday 13:30-14:00
–  libsrcml C API
–  srcSAX framework
–  Building custom analysis & manipulation tools
10/16/14
srcML.org
19