ICSME 2014 Victoria BC Canada Exploration, Analysis, and Manipulation of Source Code with the srcML Infrastructure Michael L. Collard, Jonathan I. Maletic Department of Computer Science The University of Akron Ohio, USA Department of Computer Science Kent State University Ohio, USA Organization of Briefings • Part I – Thursday 13:30-14:00 – What is srcML – a brief overview – The srcML interface – Using srcML with XPath + extension functions • Part II – Friday 13:30-14:00 – libsrcml C API – srcSAX framework – Building custom analysis & manipulation tools 10/16/14 srcML.org 2 Support • Supported in part by a grant from CNS 13-05292/05217 – 3 years of current funding to enhance infrastructure • ABB has also supported srcML 10/16/14 srcML.org 3 srcML Team • Michael Collard • Drew Guarnera • Brian Kovacs 10/16/14 • • • • • srcML.org Jonathan Maletic Michael Decker Brian Bartman Christian Newman Tessandra Sage 4 What is srcML? • XML format that explicitly embeds structural information directly into the source text • Markup is selective at a high AST level (i.e., stop at expression level) • No preprocessing, macro or template expansion is necessary • No loss of text, formatting, white space • Code fragments as well formed srcML 10/16/14 srcML.org 5 Source Code #include "rotate.h"! ! // rotate three values! void rotate(int& n1, int& n2, int& n3) ! {! // copy original values! int tn1 = n1, tn2 = n2, tn3 = n3;! ! // move! n1 = tn3;! n2 = tn1;! n3 = tn2;! }! 10/16/14 srcML.org 6 srcML <unit xmlns="http://www.sdml.info/srcML/src" xmlns:cpp="http://www.sdml.info/srcML/cpp" language="C" filename="rotate.c"> <cpp:include>#<cpp:directive>include</cpp:directive> <cpp:file>"rotate.h"</cpp:file> </cpp:include> ! <comment type="line">// rotate three values</comment> <function><type>void</type> <name>rotate</name> <parameter_list>(<param><type>int&</type> <name>n1</name></param>,! <param><type>int&</type> <name>n2</name></param>,! <param><type>int&</type> <name>n3</name></param>)</parameter_list> ! <block>{! <comment type="line">// copy original values</comment> <decl_stmt><decl><type><name>int</name></type> <name>tn1</name> =<init> <expr><name>n1</name></ expr></init>, <name>tn2</name> =<init> <expr><name>n2</name></expr></init>, <name>tn3</name> =<init> <expr><name>n3</name></expr></init></decl>;</decl_stmt> ! <comment type="line">// move</comment> <expr-stmt><expr><name>n1</name> = <name>tn3</name></expr>;</expr-stmt> <expr-stmt><expr><name>n2</name> = <name>tn1</name></expr>;</expr-stmt> <expr-stmt><expr><name>n3</name> = <name>tn2</name></expr>;</expr-stmt> }</block></function> </unit> 10/16/14 srcML.org 7 src to srcML Translation • 1-to1, onto mapping per file • Preprocessor is not run (unless you want) • All text is retained (comments, whitespace, code) 10/16/14 srcML.org 8 srcML Elements 10/16/14 srcML.org 9 srcML.org • Languages: C/C++, C#, Java • Executables – Windows, Fedora, Mac OSX, and Ubuntu • • • • 10/16/14 Source Code - Github Bug Reporting Documentation GPL srcML.org 10 Implementation • Parsing technology in C++ with ANTLR • Uses libxml2 and libarchive • Current speed: ~35 KLOC/second • srcML to text: ~4.5 (~1.4compressed) • Allows for various input sources, e.g., directories, source archives (tar.gz, etc.) 10/16/14 srcML.org 11 Language Support • C11, K&R C • C++14, Qt extensions • Java SE 8 • C# Standard ECMA-334 10/16/14 srcML.org 12 srcML Archive <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <unit xmlns="http://www.sdml.info/srcML/src"> <unit xmlns:cpp="http://www.sdml.info/srcML/cpp" language="C#" filename="main.cs" hash="09d0…77f7"> <!-- ... --> </unit> <unit xmlns:cpp="http://www.sdml.info/srcML/cpp" language="C" filename="rotate.c" hash="2380…26de"> <!-- ... --> </unit> <!-- ... --> <unit xmlns:cpp="http://www.sdml.info/srcML/cpp" language="C" filename="rotate.h" hash="1e6e…1b35"> <!-- ... --> </unit> </unit> Syntactic Differencing Fact Extraction Call-Graph Generation Metrics Infrastructure Tools Syntactic Querying Transformation/Refactoring srcSAX Models LINQ XML Technologies XPath DOM C SAX2 C++ C# Java Parser ANTLR Corpus srcML Format Tutorials Program Dependency Extension Functions Community XSLT UML Slicing 13 Bug Tracing Ad-hoc Models srcML.org Blog Call Graph 10/16/14 Applications of srcML • • • • • Fact extraction, analysis, computing metrics Refactoring, Transformation Syntactic Differencing Slicing Reverse engineering UML class diagrams, method/class stereotypes • C++ preprocessor analysis • Reverse engineering C++ template parameter constraints 10/16/14 srcML.org 15 srcml 1.0 • New client srcml with C API libsrcml, combines functionality of current tools src2srcml and srcml2src • Almost identical interface and options • Freeze and version srcML tags • Cross-linked documentation • Multithreaded translation for large projects: %srcml linux-3.16.tar.xz –o linux-3.16.xml.gz • Macbook Air: ~7 minutes • Mac Pro 6 Core: ~2 minutes 10/16/14 srcML.org 16 srcML XPath • Names of all functions that include a direct call to malloc(): %srcml --xpath="//src:function[ .//src:call/src:name='malloc']/src:name" linux-3.16.xml.gz –o function_names.xml • Result: srcML Archive with <unit> for each function name 10/16/14 srcML.org 17 srcPath Extension Functions • private methods //src:function[src:is_static()] • pointer variable declarations //src:decl[src:is_pointer()] • statements in a function //src:function//src:statement() 10/16/14 srcML.org 18 Next Time… • Part II – Friday 13:30-14:00 – libsrcml C API – srcSAX framework – Building custom analysis & manipulation tools 10/16/14 srcML.org 19
© Copyright 2025