Download Report

Co-evolution of Infrastructure and Source Code - An
Empirical Study
Yujuan Jiang, Bram Adams
MCIS lab Polytechnique Montreal, Canada
Email: {yujuan.jiang,bram.adams}@polymtl.ca
Abstract—Infrastructure-as-code automates the process of
configuring and setting up the environment (e.g., servers, VMs
and databases) in which a software system will be tested and/or
deployed, through textual specification files in a language like
Puppet or Chef. Since the environment is instantiated automatically by the infrastructure languages’ tools, no manual
intervention is necessary apart from maintaining the infrastructure specification files. The amount of work involved with such
maintenance, as well as the size and complexity of infrastructure
specification files, have not yet been studied empirically. Through
an empirical study of the version control system of 265 OpenStack
projects, we find that infrastructure files are large and churn
frequently, which could indicate a potential of introducing bugs.
Furthermore, we found that the infrastructure code files are
coupled tightly with the other files in a project, especially test files,
which implies that testers often need to change infrastructure
specifications when making changes to the test framework and
tests.
I.
I NTRODUCTION
Infrastructure-as-code (IaC) is a practice to specify and
automate the environment in which a software system will be
tested and/or deployed [1]. For example, instead of having to
manually configure the virtual machine on which a system
should be deployed with the right versions of all required
libraries, one just needs to specify the requirements for the VM
once, after which tools automatically apply this specification
to generate the VM image. Apart from automation, the fact
that the environment is specified explicitly means that the
same environment will be deployed everywhere, ruling out
inconsistencies.
The suffix “as-code” in IaC refers to the fact that the
specification files for this infrastructure are developed in a kind
of programming language, like regular source code, and hence
can be (and are) versioned in a version control system. Puppet
[2] and Chef [3] are two of the most popular infrastructure
languages. They are both designed to manage deployments on
servers, cloud environments and/or virtual machines, and can
be customized via plug-ins to adapt to one’s own working
environment. Both feature a domain-specific language syntax
that even non-programmers can understand.
The fact that IaC requires a new kind of source code files
to be developed and maintained in parallel to source code and
test code, rings some alarm bells. Indeed, in some respects
IaC plays a similar role as the build system, which consists of
scripts in a special programming language such as GNU Make
or Ant that specify how to compile and package the source
code. McIntosh et al. [4] have shown how build system files
have a high relative churn (i.e., amount of code change) and
have a high coupling with source code and test files, which
means that developers and testers need to perform a certain
effort to maintain the build system files as the code and tests
evolve. Based on these findings, we conjecture that IaC could
run similar risks and generate similar maintenance overhead
as regular build scripts.
In order to validate this conjecture, we perform an empirical case study on 265 OpenStack projects. OpenStack is an
ecosystem of projects implementing a cloud platform, which
requires substantial IaC to support deployment and tests on
virtual machines. The study replicates the analysis of McIntosh
et al. [4], this time to study the co-evolution relationship
between the IaC files and the other categories of files in a
project, i.e., source code, test code, and build scripts. To get a
better idea of the size and change frequency of IaC code, we
first address the following three preliminary questions.
PQ1)
How many infrastructure files does a project
have?
Projects with multiple IaC files have more IaC files
than build files (median of 11.11% of their files).
Furthermore, the size of infrastructure files is in the
same ballpark as that of production and test files, and
larger than build files.
PQ2)
How many infrastructure files change per month?
28% of the infrastructure files in the projects changed
per month, which is as frequently as production files,
and significantly more than build and test files.
PQ3)
How large are infrastructure system changes?
The churn of infrastructure files is comparable to build
files and significantly different with the other file categories. Furthermore, the infrastructure files have the
highest churn per file (MCF) value among the four file
categories.
Based on the preliminary analysis results, we then address
the following research questions:
RQ1)
How tight is the coupling between infrastructure
code and other kinds of code?
Although less commits change infrastructure files than
the other file categories, the changes to IaC files are
tightly coupled with changes to Test and Production
files. Furthermore, the most common reasons for coupling between infrastructure and test are “Integration”
# Chef snippet!
# Puppet snippet !
case node[:platform]!
case $platform{!
when “ubuntu”!
!
‘ubuntu’: {!
!
package “httpd-v1” do!
!
!
package {‘httpd-v1’:!
!
!
version “2.4.12”!
!
!
!
!
!
action: install!
!
!
}!
!
end!
ensure => “2.4.12”!
!
}!
when “centOS”!
!
‘centOS’: {!
!
package “httpd-v2” do!
!
!
package {‘httpd-v2’:!
!
!
version “2.2.29”!
!
!
!
!
!
action: install!
!
!
}!
!
end!
!
}!
end
ensure => “2.2.29”!
}
Fig. 1: Code snippets of Puppet and Chef.
of new test cases and “Updates” of test configuration
in infrastructure files.
RQ2)
Who changes infrastructure code?
Developers changing the infrastructure take up the
lowest proportion among all developers, and they are
not dedicated to IaC alone, since they also work on
production and test files.
II.
BACKGROUND AND R ELATED W ORK
A. Infrastructure as Code
IaC (Infrastructure as Code) makes it possible to manage
the configuration of the environment on which a system needs
to be deployed via specifications similar to source code. A
dedicated programming language allows to specify the environment such as required libraries or servers, or the amount of
RAM memory or CPU speed for a VM. The resulting files can
be versioned and changed like any other kind of source code.
This practice turns the tedious manual procedure of spinning
up a new virtual environment or updating a new version of the
environment (from the low-level operating system installed all
the way up to the concrete application stack) into a simple click
of executing a script. This automation and simplication helps
shorten the release and test cycle, and reduces the potential of
human error.
Currently, Puppet and Chef are the two most popular
infrastructure languages. They allow to define a series of environment parameters and to provide deployment configuration.
They can define functions and classes, and allow users to
customize their own Ruby plug-ins according to their specific
requirements. Figure 1 shows two code snippets of Puppet and
Chef that realize the same functionality, i.e., initializing an
https server on two different platforms (each platform requires
a different version of the server).
B. Related Work
Our work replicates the work of McIntosh et al. [4] who
empirically studied the build system in large open source
projects and found that the build system is coupled tightly with
the source code and test files. In their work, they classify files
into three different categories including “Build”, “Production”,
and “Test”. They also studied the ownership of build files to
look into who spent the most effort maintaining these files.
They observed different build ownership patterns in different
projects: in Linux and Git a small team of build engineers
maintain most of the build maintenance, while in Jazz most
developers contribute code to the build system. In our work, we
added a fourth category of files (IaC) and we focus the analysis
on those files. Hindle et al. [5] studied the release patterns in
four open source systems in terms of the evolution in source
code, tests, build files and documentation. They found that
each individual project has consistent internal release patterns
on its own.
Similar to McIntosh et al. [4], other researchers also
used “association rules” to detect co-change and co-evolution.
Herzig et al. [6] used association rules to predict test failure
and their model shows good performance with precision of
0.85 to 0.90 on average. Zaidman et al. [7] explored the
co-evolution between test and production code. They found
different coupling patterns in projects with different development style. In particular, in test-driven projects there is
a strong coupling between production and test code, while
other projects have a weaker coupling between testing and
development. Gall et al. [8] studied the co-evolution relation
among different modules in a project, while our work focuses
on the relation among different file categories.
Adams et al. [9] studied the evolution of the build system
of the Linux kernel at the level of releases, and found that
the build system co-evolved with the source code in terms of
complexity. McIntosh et al. [10] studied the ANT build system
evolution. Zanetti et al. [11] studied the co-evolution of GitHub
projects and Eclipse from the socio-technical structure point
of view. Our work is the first to study the co-evolution process
between infrastructure and source code in a project.
Rahman et al. [12], Weyuker et al. [13], Meneely et al.
[14] studied the impact of the number of code contributors on
the software quality. Karus et al. [15] proposed a model that
combines both code metrics and social organization metrics to
predict the yearly cumulative code churn. It turns out that the
combined model is better than the model only adopting code
metrics. Bird et al. [16] proposed an approach for analyzing the
impact of branch structure on the software quality. Nagappan et
al. [17] conducted a study about the influence of organizational
structure on software quality. Nagappan et al. [18] predicted
defect density with a set of code churn measures such as the
percentage of churned files. They showed that churn measures
are good indicators of defect density.
Shridhar et al. [19] conducted a qualitative analysis to study
the build file ownership styles, and they found that certain build
changes (such as “Corrective” and “New Functionality”) can
introduce higher churn and are more invasive. Our work is the
first to study the ownership style of infrastructure files.
Curtis et al. [20], Robillard [21], Mockus et al. [22] [23]
focus on how domain knowledge impacts the software quality.
It turns out that the more experienced and familiar in a domain,
the fewer bugs are introduced. This suggests that if IaC experts
change IaC files, they are less likely to introduce bugs than,
say, build developers. We study IaC ownership, yet do not
study bug reports to validate this link.
To summarize, our study is the first to analyze the coevolution of IaC with known file categories, as well as ownership of IaC file changes.
III.
A PPROACH
The process of our study is shown as a flow chart in
Figure 2.
A. Data Collection
OpenStack is an open source project launched jointly by
Rackspace Hosting and NASA in July 2010. It is in fact governed by a consortium of organizations who have developed
multiple interrelated components that together make up a cloud
computing software platform that offers “Infrastructure As A
Service (IaaS)”. Users can deploy their own operating system
or applications on OpenStack to virtualize resources like cloud
computing, storage and networking.
Given that testing and deploying a cloud computing platform like OpenStack requires continuous configuration and
deployment of virtual machines, OpenStack makes substantial
use of IaC, adopting both “Puppet” and “Chef” to automate
infrastructure management. Apart from its adoption of IaC,
OpenStack has many other characteristics that prompted us to
study it in our empirical study. In particular, it has 13 large
components (“modules”) spread across 382 projects with their
own git repositories, 20 million lines of codes, rapid release
cycle (cycle time of 6 months), and a wide variety of users
such as AT& T, Intel, SUSE, PayPal, and eBay.
B. Classifying Files into Five Categories
In order to study the co-evolution relationship between the
infrastructure and the other files, first we need to classify them
into different categories. We mainly identify “Infrastructure”,
“Build”, “Production (i.e., source code)” and “Test” files.
Other files, such as images, text, and data files, were categorized into the “Others” category and were not considered
in our study. Note that we classified any file that ever existed
in an OpenStack project across all git commits, amounting to
133,215 files in total.
In order to do the classification, we used a similar approach
as McIntosh et al. [4]. First, we wrote a script to identify files
with known naming patterns. For example, names containing
“test” or “unitest” should be test files, while the “Makefile”
or “Rakefile” names are build files, and the files with a programming language suffix such as “.py”, “.rb” (typically, but
not always) belong to the production files. The infrastructure
files written in Puppet have a suffix “.pp”. After classifying
those files that are easily identified, we manually classified the
remaining 25,000 unclassified files.
For this, we manually looked inside the files to check
for known constructs or syntax, and for naming conventions
specific to OpenStack. If the content was not related to any of
the four categories of interest, we classified it as “Other”. We
put the resulting file classification online 1 .
1 https://github.com/yujuanjiang/OpenStackClassificationList.
C. Splitting the OpenStack Projects in Two Groups
We collected all the revisions from the 382 git repositories
of the OpenStack ecosystem. After classification, we did a
statistical analysis and found 117 projects without infrastructure file, so we removed them from our data set, as we focus
on the analysis of IaC. Upon manual analysis of the projects
without infrastructure files, we found that they mostly do
not serve the main functionality of OpenStack (namely cloud
computing), but rather provide the supporting services like
reviewing management (e.g., “reviewday”), test drivers (e.g.,
“cloudroast” and “cookbook-openstack-integration-test”), plugins for build support (e.g, “zmq-event-publisher”), front-end
user interface generation (e.g., “bugdaystats” and “clif”), and
external libraries (e.g., “cl-openstack-client” for Lisp).
Of the projects containing infrastructure files, we found that
some of them have only few infrastructure files, while other
projects have many infrastructure files. The former repositories
are relatively small in terms of the number of files (with a
median value of 71.5 and mean of 174.4). In order to study
if the infrastructure system acts differently in these different
contexts, we split the remaining projects into two groups:
the “Multi” group with projects that contain more than one
infrastructure file and the “Single” group with projects that
contain only one infrastructure file. The Multi group contains
155 projects, whereas the Single group 110 projects. In our
study, we compare both groups to understand whether there is
any difference in maintenance effort between the two groups.
D. Association Rules
To analyze the coupling relationship in RQ1 and RQ2, we
use association rules. An association rule is a possible coupling
between two different phenomena. For example, supermarkets
found that diapers and beer are usually associated together,
because normally it is the father who comes to buy diapers
and along the way may buy some beer for himself.
To measure the importance of an association rule, a number
of important metrics can be calculated, such as “Support”
(Supp) and “Confidence” (Conf). The metric Support(X)
indicates the frequency of appearance of X, while the metric
Confidence(X=>Y) indicates how often a change of X will
happen together with a change of Y. For example, if there
are 20 commits in total for project P, and 10 of them are
changing infrastructure files, then Supp(Inf)=0.5 (10/20). If
among these 10 commits, 6 of them also change test files, then
Conf(Inf=>Build)=0.6 (6/10). Association rules can help us
understand the coupling relationship among different projects.
Note that we don’t mine for new association rules, but analyze
the strength of the six rules involving IaC files and the three
other file categories (IaC =>Build, IaC =>Production, IaC
=>Test and the three inverse rules).
Additionally, in the qualitative analysis of the change
coupling, we need to select the most tightly coupled commits
for manual analysis. Since Confidence is not a symmetrical
measure (Conf(X=>Y) is different from Conf(Y=>X)), it
yields two numbers for a particular pair of file categories,
which makes it hard to identify the projects with “most tightly
coupled” commits. For this reason, we adopt the metric “Lift”,
which measures the degree to which the coupling between two
file categories is different from the situation where they would
Commit co-change
Inf
Prod
Prod
Bld
Test
Bld
Test
Inf
Inf
Inf
Bld
Bld
Bld
Prod
Prod
Prod
Test
Test
Test
Other
Inf
Inf
Other
Other
Other
Inf
Inf
Inf
Test
Monthly change ratio
Inf
Inf
Statistical visualization
Average churn
Classifying Files !
into Five Categories
Splitting the OpenStack !
Projects in Two Groups
Preliminary Analysis
Production !
developer
Prod
Test
Data Collection
Ownership coupling
? ?
Bld
Build developer
Infrastructure !
developer
Tester
Case Study Results
Fig. 2: Flow chart of the whole approach.
be independent from each other. For each project, we computed
the Lift value, then for the 10 projects with the highest lift
value for a pair of file categories, we selected the top 10 most
tightly coupled commits. The formula for “Lift” related to the
Conf and Supp metrics is as follows:
T
P (A B)
Conf =
P (A)
\
Supp = P (A B)
T
P (A B)
Lif t =
P (A)P (B)
E. Card Sorting
For our qualitative analysis, we adopted “Card Sorting”
[24] [25], which is an approach that allows to systematically
derive structured information from qualitative data. It consists
of three steps: 1) First, we selected the 10 projects with the
highest lift value for a certain pair of file categories. Then, for
each such project, we wrote a script to retrieve all commits
changing files of both categories. 2) Then, we randomly sort
all selected commits for a project and pick 10 sample commits.
3) The first author then manually looked into the change
log message and code of each commit to understand why
this co-change happens. In particular, we were interested in
understanding the reason why changes of both categories were
necessary in the commit. If this is a new reason, we added it as
a new “card” in our “card list”. Otherwise, we just increased
the count of the existing card. After multiple iterations, we
obtained a list of reasons for co-change between two file
categories. Finally, we clustered cards that had related reasons
into one group, yielding seven groups.
“Card sorting” is an approach commonly used in empirical
software engineering when qualitative analysis and taxonomies
are needed. Bacchelli et al. [26] used this approach for
analyzing code review comments. Hemmati et al. [27] adopted
card sorting to analyze survey discussions [19].
F. Statistical tests and beanplot vsualization.
In our work, we mainly used the Kruskal-Wallis and MannWhitney tests to do the statistical tests, and used the beanplot
package in R as the visualization tool for our results.
The Kruskal-Wallis test [28] is a non-parametric method
that we use to test if there exists any difference between the
distribution of a metric across the four file categories. If the
null hypothesis (“there is no significant difference between the
mean of the four categories”) is rejected, at least one of the
categories has a different distribution of the metric under study.
To find out which of the metrics has a different distribution, we
then use Mann-Whitney tests as post-hoc test. We perform such
a test between each pair of file categories, using the Bonferroni
correction for the alpha value (which is 0.05 by default in all
our tests).
A Beanplot [29] is a visualization of a distribution based
on boxplots, but adding information about the density of each
value in the distribution. Hence, apart from seeing major
moments like median, minimum or maximum, one can also
see which values are the most frequent in the sample under
study. By plotting two or more beanplots next to each other,
one can easily see asymmetry in the distribution of values (see
Figure 5).
IV.
P RELIMINARY A NALYSIS
Before addressing the main research questions, we first
study the characteristics of IaC files themselves.
PQ1: How many infrastructure files does a project have?
Motivation: As infrastructure code is a relatively new
concept, not much is known about its characteristics. How
common are such files, and how large are they in comparison
to other known kinds of files?
Approach: First, we compute the number and percentage
of files in each file category for each project. Afterwards, we
computed the number of lines of each infrastructure file in
each project. Furthermore, we manually checked the Multi
projects to understand why they contain such a high proportion
of infrastructure files, for example whether they are projects
dedicated to IaC code. We also did Kruskal-Wallis and posthoc tests to check the significance of the results with as
null hypothesis “there is no significant difference among the
distributions of the proportion/the LOC of each file category”.
Results: Multi projects have a higher proportion of
infrastructure files than build files, with a median value
of 11.11% across projects. Figure 3 shows the boxplot of the
proportion of the four file categories relative to all files of a
project, while Table I shows the corresponding numbers. We
can see that in both groups, the trends are the same except for
infrastructure files. Unsurprisingly, the production files take
up the largest proportion of files (with a median of 34.62%
in group Multi and 47.80% in group Single). This makes
sense, because the source code files should be the fundamental
composition of projects. The test files take up the second
largest proportion (with a median value of 12.95% in group
Multi and 23.94% in group Single). By definition, for “Single”
projects, the infrastructure files take up the lowest proportion
(with a median of 3.85%), behind build files (with a median of
17.24%), while for Multi projects the order is swapped (with
a median of 11.11% for Infrastructure and 5.71% for Build
files). Indeed, Multi projects not only have more than one
infrastructure file, they tend to have a substantial proportion
Production
0.00
34.62
31.84
57.73
11.46
47.80
42.25
66.67
Test
2.21
12.95
18.04
29.91
12.71
23.94
24.26
35.21
1.0
0.8
Build
2.41
5.71
11.72
12.37
7.60
17.24
25.41
40.00
Build
Production
Test
0.2
0.4
0.6
0.8
1.0
(a) groupMulti
0.0
Although production and test files statistically significantly are larger, the size of infrastructure files is in the
same ballpark. Figure 4 shows for each file category the
boxplot across all projects of the median file size (in terms of
LOC). We can see that the sizes of infrastructure (median of
2,486 for group Multi and 1,398 for group Single), production
(median of 2,991 for group Multi and 2,215 for group Single)
and test files (with a median of 2,768 for group Multi and
1,626 for group Single) have the same order of magnitude.
The size of build files is the smallest (with a median of 54 for
group Multi and 52 for group Single).
↵
Infrastructure files take up a small portion of projects
with a median value of 11.11%, but their size is larger
than build files and in the same ballpark as code and
test files.
⌦
Infrastructure
Infrastructure
Build
Production
Test
(b) groupSingle
Fig. 3: Boxplot of median proportions of four file categories
for each project across all projects (excluding other files)
10000
The percentage of infrastructure files has a large variance for “Multi” projects. The median value is rather small
compared to the other three categories, but within four projects,
the percentage of infrastructure files can reach as high as
100%. Therefore, we ranked all the projects by the percentage
of infrastructure files (from high to low) and manually looked
into the top ones. We found that those projects clearly are
infrastructure-specific. 51 of these projects have names related
to the infrastructure system (28 of them named after Puppet, 23
after Chef). For example, the “openstack-chef-repo” repository
is an example project for deploying an OpenStack architecture
using Chef, and the “puppetlabs-openstack” project is used to
deploy the Puppet Labs Reference and Testing Deployment
Module for OpenStack, as described in the project profile on
GitHub [30] [31].
100
of such files. The differences in proportion between IaC files
and the three other categories all are statistically significant.
0.0
0.2
Group Single
Infrastructure
3.33
11.11
38.40
89.47
1.73
3.85
8.02
11.11
0.6
Group Multi
1st Qu.
Median
Mean
3rd Qu.
1st Qu.
Median
Mean
3rd Qu.
0.4
TABLE I: The proportion of the four file categories in each
project (%) in terms of the number of files.
Motivation: The results of PQ1 related to size and to
some degree proportion of files could indicate a large amount
of effort needed to develop and/or maintain the infrastructure
files, both in Single and Multi projects. To measure this effort,
this question focuses on the percentage of files in a category
that are being changed per month [4]. The more files are
touched, the more effort needed.
Approach: In order to see how often each file category
changes, we computed the number of changed files per month
of each project. To enable comparison across time, we normalize the number by dividing by the corresponding number of
files in a category during that month, yielding the proportion of
1
PQ2: How many infrastructure files change per month?
Infrastructure
Build
Production
Test
Fig. 4: Boxplot of the median size (in LOC) of the four
different file categories across all projects (group “Multi”).
changed files. For each project, we then calculate the average
proportion of changed files per month, and we study the
distribution of this average across all projects. Note that we
use average value in PQ2 and PQ3, whereas we use medians
in the rest of the paper, because “Single” projects only have
PQ3: How large are infrastructure system changes?
Motivation: Now that we know how many infrastructure
files change per month for all the projects, we want to know
how much each file changes as well. In addition to the number
of changed lines, the types of changes matter as well. For
example, two commits could both change one line of an
infrastructure file, but one of them could only change the
version number while the other one may change a macro
definition that could cause a chain reaction of changes to
other files. Hence, we also need to do qualitative analysis on
infrastructure files.
Infrastructure vs Build
Fig. 5: Distributions of average percentage of changed files per
project for the four file categories (group “Multi”).
one infrastructure file and hence a median value would not
make sense.
Furthermore, we again did a Kruskal-Wallis test and posthoc tests to examine the statistical significance of our results.
Results: The average proportion of infrastructure files
changed per month is comparable to that of source code,
and much higher than that of build and test files, with
a median value across projects of 0.28. Figure 5 shows
the distribution across all projects in group “Multi” of the
average proportion of changed files. Since the group “Single”
has the same trend, we omitted its figure. We can see that
the production files (with a median value of 0.28) change as
frequently as the infrastructure files (with a median value of
0.28). The test files change less frequently (with a median
value of 0.21) than infrastructure files but more frequently than
build files (with a median value of 0.18).
The percentage of changed files per month for infrastructure and production files are significantly higher than
for build and test files. A Kruskal-Wallis test on all file
categories yielded a p-value of less than 2.2e-16, hence at least
one of the file categories has a different distribution of monthly
change percentage at 0.05 significance level. We then did
Mann-Whitney post-hoc tests, which showed that, except for
infrastructure and production files (p-value of 0.112 >0.05),
the other file categories have a statistically significantly lower
change proportion (p-value <0.05). Since change of source
code is known to be an indicator of bug proneness [32] [33]
or risk [34], infrastructure code hence risks to have similar
issues.
↵
The distribution of the median proportion of monthly
change for infrastructure files is similar to production
files, with a median value of 0.28, being higher than
the build and test files.
⌦
Approach: Churn, i.e., the number of changed lines of
a commit is a universal indicator for the size of a change. In
the textual logs of a git commit, a changed line always begins
with a plus “+” or minus “-” sign. The lines with “+” are
the newly added lines, while the lines with “-” are deleted
lines from the code. In order to understand how much each
file category changes per month, we define monthly churn of
a project as the total number of changed lines (both added and
deleted lines) per month. However, when we first looked into
the monthly churn value, we found that there are many projects
not active all the time, i.e., in certain months, the churn value
is zero. Therefore, we just considered the “active period” for
each project, namely the period from the first non-zero churn
month until the last non-zero churn month.
To control for projects of different sizes, we also normalize
the monthly churn of each project by dividing by the number
of files of that project in each month. This yields the monthly
churn per file (MCF). We study the distribution of the average
value of churn and of the average value of MCF across all
projects.
Results: The churn of infrastructure files is comparable
to build files and significantly smaller than for production
and test files. Figure 6 shows the beanplot of the monthly
churn for both groups. We can see in group “Multi” that the
test files have the highest average churn value (with a median
value of 9), i.e., the test commits have the largest size, followed
by production files (with a median value of 8). Infrastructure
file changes (5.25) are larger than build file changes (4.25). In
group “Single”, the production files have the largest commits
(median of 10), followed by test files and infrastructure files
(both median of 8), and build files (median of 5). KruskalWallis and post-hoc tests show that the distribution of churn
of IaC files is significantly different from that of the other files
categories, except from build files in the Single group.
The infrastructure files have the highest MCF value,
with a median of 1.5 in group Multi and median of 5 in
group Single. Figure 7 is the beanplot of the MCF for the
two groups. We can see for both groups that the infrastructure
files have the highest MCF value (median 1.5 in group Multi
and median 5 in group Single), which means that the average
change to a single infrastructure file is larger than for the other
file categories. Hence, although the number of infrastructure
files is smaller than the number of production files, there is
proportionally more change being done to them.
The differences in average MCF between IaC files and the
three other categories all are statistically significant.
0.5
0.4
0.3
0.2
0.1
0.0
Infrastructure <=> Build Infrastructure <=> Production
Infrastructure vs Build
Production vs Test
Fig. 6: Beanplot of average monthly churn across all projects
for the four file categories (group “Multi”) (log scaled).
Infrastructure <=> Test
Fig. 8: Distribution of confidence values for the coupling
relations involving IaC files (Group Multi). The left side of
a beanplot for A<=>B represents the confidence values for
A =>B, while the right side of a beanplot corresponds to B
=>A.
to the infrastructure code. This introduces additional effort on
the people responsible for these changes. Kirbas et al. [35]
conducted an empirical study about the effect of evolutionary
coupling on software defects in a large financial legacy software system. They observed a positive correlation between
evolutionary coupling and defect measures in the software
evolution and maintenance phase. These results motivate us
to analyze the coupling of infrastructure code with source, test
and build code.
Infrastructure vs Build
Production vs Test
Fig. 7: Beanplot of average MCF across all projects for the
four file categories (group “Multi”) (log scaled).
⌥
⌃
The churn of infrastructure files is comparable to that
of build files, while the average churn per file of
infrastructure is the highest across all file categories.
V.
C ASE S TUDY R ESULTS
⌅
⇧
Now that we know that projects tend to have a higher
proportion of Infrastructure files than build files, infrastructure
files can be large, churn frequently and substantially, we turn
to the main research questions regarding the coupling relation
among commits and IaC ownership.
Approach: In order to identify the coupling relationship
between changes of different file categories, we analyze for
each pair <A, B>of file categories the percentage of commits
changing at least one file of category A that also changed at
least one file of category B. This percentage corresponds to
the confidence of the association rule A=>B. For example,
Conf(Infrastructure, Build) measures the percentage of commits changing infrastructure files that also change build files.
Afterwards, we performed chi-square statistical tests to test
whether the obtained confidence values are significant, or are
not higher than expected due to chance.
Finally, we also performed qualitative analysis of projects
with high coupling to understand the rationale for such coupling. We used “card sorting” (see Section III-E) for this
analysis. We sampled 100 commits across the top 10 projects
with the highest Lift metric (see Section III-D) to look into
why IaC changes were coupled so tightly with changes to other
file categories.
How tight is the coupling between infrastructure
code and other kinds of code?
Results: Infrastructure files change the least often in
both groups. Table II shows the distribution of the Support and
Confidence metrics across the projects in both groups while
Figure 8 visualizes the distribution of these metrics for the
group Multi. We do not show the beanplot of group Single,
since it follows the same trends.
Motivation: Based on the preliminary questions, we find
that infrastructure files are large and see a lot of churn, which
means that they might be bug prone. However, those results
considered the evolution of each type of file separately from
one another. As shown by McIntosh et al. [4], evolution of
for example code files might require changes to build files
to keep the project compilable. Similarly, one might expect
that, in order to keep a project deployable or executable,
changes to code or tests might require corresponding changes
With group Multi as example, we can see that in terms
of the Support metrics (i.e., the proportion of all commits
involving a particular file category), source code changes occur
the most frequently (with a median value of 0.3789), followed
by test files (median of 0.2348), then build files (median of
0.1276). The infrastructure files change the least often, which
suggests that the build and infrastructure files tend to be more
stable than the other file categories. We observed the same
behavior in group Single.
RQ1)
TABLE II: Median Support and confidence values for the
coupling relations involving IaC files. Valued larger than 0.1
are shown in bold.
Support
Conf
system
Infrastructure
Build
Production
Group Multi
0.0402
0.1276
0.3789
Group Single
0.0412
0.1324
0.3806
Test
0.2348
0.2344
Inf, Bld
0.0044
0.0044
Inf, Prod
Inf, Test
0.0336
0.0585
0.0335
0.0607
Inf =>Bld
0.0347
0.0343
Inf = >Prod
0.2637
0.2730
Inf =>Test
0.4583
0.4673
Bld =>Inf
0.1058
0.1140
Prod =>Inf
Test =>Inf
0.0885
0.2578
0.0911
0.2638
The commits changing infrastructure files also tend
to change production and test files. In group Multi,
Conf(Inf=>Prod) and Conf(Inf=>Test) have a high median
value of 0.2637 and 0.4583 respectively. This indicates that
most commits changing infrastructure files also change source
code and test files. Conf(Inf =>Bld) has the lowest median
value (0.0347), which indicates that commits changing infrastructure files don’t need to change build files too often. Similar
findings were made for group Single.
26% of test files require corresponding IaC changes in
both groups. In group Multi, the Conf(Test =>Inf) metric has
a higher value (median value of 0.2578) than Conf(Production
=>Inf) (median value of 0.0885) and Conf(Build =>Inf)
(median value of 0.1058). This means that one quarter of the
commits changing test files also needs to change infrastructure
files. Furthermore, infrastructure files also have a relatively
high coupling with build files, i.e., around 11% of commits
changing build files need to change infrastructure files as well.
The observations in group Single follow the same trend.
The observed high confidence values are statistically
significant in the majority of projects. Using a chi-square
test on the confidence values, we found that in group Multi,
among 155 projects, in 97 of them we observe a significant
coupling between infrastructure and test files, and in 90 of
them we observe a significant coupling between Infrastructure
and Production files. In contrast, in only 2 of them we
observe a significant coupling between Infrastructure and Build
files. This means that the latter co-change relation statistically
speaking is not unexpected, whereas the former ones are much
stronger than would be expected by chance. In group “Single”,
among 110 projects, we found 33 that had a significant
coupling between Infrastructure and Test files, and 35 of
them that had a significant coupling between Infrastructure
and production files, while there was no significant coupling
observed between Infrastructure and Build files. Although the
co-change relations with test and production files are less
strong than for the Multi group, they are still significant for
many projects.
The most common reasons for the coupling between
infrastructure and Build files are Refactoring and Update.
Table III contains the resulting seven clusters from our card
sort analysis. Those clusters group the different rationales
that we identified by manually analyzing 100 commits. The
table also contains for each cluster the percentage of the
100 commits that mapped to that cluster. Since each commit
mapped to one cluster, the proportions add up to 100%.
The coupling between IaC and build files only had the third
highest median Confidence. Coincidentally, the reasons for this
coupling turn out to be simple, with three of the seven reasons
absent. The most common reasons for this coupling include
refactoring and updating files of both file categories because
they share the same global parameters.
The most common reasons for the coupling between
Infrastructure and Production files are External dependencies and Initialization. The most common reason for this
coupling are changes in the IaC files to external dependencies
like Ruby packages that require corresponding changes to the
production files where these dependencies are used. Another
common reason is initialization. If the project initializes a
new instance of a client instance, it needs to initialize the
parameters in the infrastructure file and add new source code
for it.
The most common reasons for the coupling between
infrastructure and test files are “Integration” and “Update”. The coupling between infrastructure and test files has
the highest value, and the reasons are spread across all seven
reasons (similar to the coupling between Infrastructure and
Production). The most frequent reason is integrating new test
modules into a project and updating the configuration for the
testing process in the IaC files as well as in the test files.
✏
Infrastructure files are changed less than the other
file categories. The changes to Infrastructure files are
tightly coupled with the changes to Test and Production files. The most common reasons for the coupling
between Infrastructure and Test are “Integration” and
“Update”.
RQ2)
Who changes infrastructure code?
Motivation: Herzig et al. [36] studied the impact of test
ownership and team social structure on the testing effectiveness
and reliability and found that they had a strong correlation.
This means that in addition to the code-change relations of
RQ1, we also need to check the relationship between the
developers, i.e., the ownership among different file categories.
Even though test or code changes might require IaC changes, it
likely makes a difference whether a regular tester or developer
makes such a change compared to an IaC expert.
Approach: First, we need to identify the ownership for
each category. For this, we check the author of each commit.
If the commit changes an infrastructure file, then we identify
that commit’s author as infrastructure developer. An author can
have multiple identifications, e.g., one can be an infrastructure
file and production developer at the same time (even for the
same commit). We ignore those developers whose commits
only change the files of the “Other” category.
We then compute the RQ1 metrics, but this time for the
change ownership. Supp(Infrastructure) indicates the percent-
TABLE III: The reasons for high coupling and examples based on a sample of 300 (100*3) commits with confidence of 0.05.
Reason
Initialization
N/A
IaC & Build
0
External dependency
N/A
0
Importing new packages called in the
source code, and defining their globally
shared parameters.
33%
Re-formatting the text, or general
cleanup like typos, quotes in global variables.
Ensuring that a variable is changed consistently.
Removing old python schemas.
Environment management overhaul.
Changing coding standard and style.
8%
Separating a macro definition in an infrastructure file, then changing the data
schema in source files correspondingly.
Disabling a global functionality in infrastructure file.
Integrating global bug fixes, or a new
external module, such as JIRA.
10%
Switching the default test repository.
3%
Enabling a new test module or integrating new test cases
26%
Updating external library dependencies,
Changing global variables such as installation path.
Defining a new function in infrastructure
file, implemented in source code.
10%
Updating test configuration (e.g., changing default test repo location or default
parameter value that is initialized in
infrastructure files)
24%
Textual Edit
Renaming, such as changing the project
name from “Nailgun” to “FuelWeb”.
2%
Refactoring
Getting rid of a Ruby gems mirror.
Cleaning up the global variables such as
the location directory.
Removing code duplication.
N/A
45%
Adding RPM package specifications in
an infrastructure file for use in the packaging stage of the build files.
Fixing errors in installation of packages.
Ad-hoc change: changing Puppet
manifest path in infrastructure file and
build system bug fix in makefile.
Changing installation configuration,
such as installing from repos instead of
mirrors, which requires removing the
path and related parameters of the iso
CD image in the infrastructure file and
makefile.
9%
Organization of development structure
Integration of new module
Update
0
44%
IaC & Production
Initializing a new client instance needs
to state copyright in both files.
age of developers changing infrastructure files out of the total
number of developers. Supp(Infrastructure, Build) is the percentage of developers changing both infrastructure and build
files out of the total number of developers. Conf(Infrastructure,
Build) is the percentage of IaC developers that also change
build files out of the total number of developers changing at
least once at infrastructure file.
0.8
0.6
0.4
0.2
Conf
Infrastructure <=> Production
Infrastructure <=> Test
Fig. 9: Distribution of confidence values for the coupling
relations involving the owners of IaC files (Group Multi).
Result: Infrastructure developers have the lowest proportion among all developers while the production developers are the most common. Table IV and Figure 9 show
the measurements support and confidence metrics in terms of
change ownership for the two groups of projects. Similar to
7%
IaC & Test
Integrating a new test specification or
testing a bug fix in the infrastructure file.
Initializing the value of the parameters
of global methods in both the infrastructure and test file.
Enabling a new test guideline (i.e., how
tests should be performed) in new repositories (most frequent).
Adding a new function for testing external libraries or modules, then configuring the imported external libraries or
other modules deployed by infrastructure (e.g., access to GitHub interface).
Removing excessive dependencies.
Adding the license header for both file
categories.
Changing the value of shared method
parameters such as the private variables
of a test case.
Refactoring or cleaning up the test configurations in the infrastructure files.
Getting rid of a module everywhere.
TABLE IV: Median Support and confidence values for the
coupling relations involving IaC developers. Values larger than
0.5 are shown in bold.
Support
Infrastructure <=> Build
29%
system
Infrastructure
Group Multi
0.2744
Group Single
0.2733
Build
0.5392
0.5384
Production
0.7378
0.7390
Test
0.6978
0.6977
Inf, Bld
0.2094
0.3726
Inf, Prod
Inf, Test
0.4442
0.4859
0.4443
0.4825
Inf =>Bld
0.3732
0.3726
Inf = >Prod
0.8034
0.8021
Inf =>Test
0.8967
0.8962
Bld =>Inf
0.7625
0.7635
Prod =>Inf
Test =>Inf
0.5934
0.7064
0.5940
0.7156
RQ1, both groups have similar distribution trends, so we only
show the beanplots for the Multi group.
We can see that, as expected, the developers of production
code take up the highest proportion amongst all developers
(with a median value of 0.6978), followed by test developers
(median of 0.2744) and build developers (median of 0.5392).
18%
12%
3%
14%
3%
The infrastructure developers take up the lowest proportion
(median of 0.2744).
The high value of metrics Supp(Inf, Prod) and Supp(Inf,
Test) (with median values of 0.4442 and 0.4859 respectively)
indicate that almost half of all developers in their career have
had to change at least one Infrastructure and Production file,
or Infrastructure and Test file.
The majority of the infrastructure developers also
develop Production or Test files. The Conf(Inf =>Prod)
and Conf(Inf =>Test) both have a high value (with median
of 0.8034 and 0.8967 respectively). This shows that most
of the infrastructure developers are also production and test
developers. In contrast, the metric Supp(Inf =>Bld) has the
lowest value (median of 0.2094), which indicates that the
developers changing infrastructure and build files respectively
do not overlap substantially.
All kinds of developers change the IaC files. The
Conf(Bld =>Inf), Conf(Prod =>Inf) and Conf(Test =>Inf)
all have a very high value (median higher than 0.5). This
shows that most of the build, production and test developers
are infrastructure developers at the same time.
In particular, Conf(Inf =>Bld) has a lower value (median
of 0.3732) compared to Conf(Bld =>Inf) (median of 0.7625).
Build developers can be infrastructure developers, but the
majority of infrastructure developers hardly change the build
system, since so many other kinds of developers change the
IaC files.
There is significant coupling between the Infrastructure
and Test, and Infrastructure and Production ownership in
most of the projects. In group Multi, for 107 projects we see
that the coupling between Infrastructure and Test developers is
significant, in 76 projects the coupling between Infrastructure
and Production, and in 20 projects the coupling between Infrastructure and Build change ownership. In group “Single”, in
59 projects we see significant coupling between Infrastructure
and Test developers, in 41 projects between Infrastructure and
Production, and in 3 projects between Infrastructure and Build.
↵
The infrastructure developers take up the lowest proportion among all developers. Developers working on
infrastructure files are normal developers that also
work on production and test files.
⌦
VI.
T HREATS T O VALIDITY
Construct validity threats concern the relation between
theory and observation. First of all, we use the confidence
of association rules to measure how closely file categories and
owners co-change in our research questions, similar to earlier
work [4]. Furthermore, we use the monthly churn per file to
measure the frequency of change and the number of changed
lines of code to measure the amount of change. However, these
metrics might not 100% reflect the actual coupling relationship
and churn rate. Other metrics should be used to replicate our
study and compare findings.
Threats to internal validity concern alternative explanations
of our findings where noise could be introduced. During the
classification of different file categories, we adopted the semiautomatic approach of McIntosh et al. [4], consisting of a script
to separate certain file categories, then manually classifying
the remaining files. To mitigate bias, at first the first author of
this paper did this classification, followed by an independent
verification by the second author.
Threats to external validity concern the ability to generalize
our results. Since we have only studied one large ecosystem of
open source systems, it is difficult to generalize our findings
to other open and closed source projects. However, because
OpenStack consists of multiple projects, and also has adopted
the two most popular infrastructure tools Puppet and Chef, it is
a representative case study to analyze. Further studies should
consider other open and closed source systems.
VII.
C ONCLUSION
IaC (Infrastructure as Code) helps automate the process of
configuring the environment in which the software product will
be deployed. The basic idea is to treat the configuration files
as source code files in a dedicated programming language,
managed under version control. Ideally, this practice helps
simplify the configuration behavior, shorten the release cycle,
and reduce the possible inconsistencies introduced by manual
work, however the amount of maintenance required for this
new kind of source code file is unclear.
We empirically studied this maintenance in the context of
the OpenStack project, which is a large-scale open source
project providing a cloud platform. We studied 265 data
repositories and found that the proportion of infrastructure files
in each project varies from 3.85% to 11.11%, and their size
is larger than for build files and the same order of magnitude
of code and test files (median value of 2,486 in group “Multi”
and 1,398 for groups “Single”). Furthermore, 28% of the
infrastructure files are changed monthly, significantly more
than build and test files. The average size of changes to
infrastructure files is comparable to build files, with a median
value of 5.25 in group Multi and 8 in group Single (in terms
of LOC). In other words, although they are a relatively small
group of files, they are quite large and change relatively
frequently.
Furthermore, we found that the changes to infrastructure
files are tightly coupled to changes to the test files, especially
because of “Integration” of new test cases and “Update” of test
configuration in the infrastructure file. Finally, infrastructure
files are usually changed by regular developers instead of
infrastructure experts.
Taking all these findings together, we believe that IaC files
should be considered as source code files not just because
of the use of a programming language, but also because
their characteristics and maintenance needs show the same
symptoms as build and source code files. Hence, more work is
necessary to study bug-proneness of IaC files as well as help
reduce the maintenance effort.
Note that our findings do not diminish the value of IaC
files, since they provide an explicit specification of a software
system’s environment that can automatically and consistently
be deployed. Instead, we show that, due to their relation with
the actual source code, care is needed when maintaining IaC
files.
R EFERENCES
[1]
J. Humble and D. Farley, Continuous Delivery: Reliable Software
Releases Through Build, Test, and Deployment Automation, 1st ed.
Addison-Wesley Professional, 2010.
[2]
J. Turnbull and J. McCune, Pro Puppet, 1st ed. Paul Manning, 2011.
[3]
M. Taylor and S. Vargo, Learning Chef-A Guide To Configuration
Management And Automation, 1st ed., 2014.
[4]
S. McIntosh, B. Adams, T. H. Nguyen, Y. Kamei, and A. E. Hassan,
“An empirical study of build maintenance effort,” in Proceedings of the
33rd International Conference on Software Engineering, ser. ICSE ’11,
2011, pp. 141–150.
[19]
[20]
[21]
[22]
[5]
A. Hindle, M. W. Godfrey, and R. C. Holt, “Release pattern discovery:
A case study of database systems,” in 23rd IEEE International
Conference on Software Maintenance (ICSM 2007), October 25, 2007, Paris, France, 2007, pp. 285–294. [Online]. Available:
http://dx.doi.org/10.1109/ICSM.2007.4362641
[6]
Herzig, Kim and Nagappan, Nachiappan, “Empirically Detecting False
Test Alarms Using Association Rules,” in Companion Proceedings of
the 37th International Conference on Software Engineering, May 2015.
[24]
[7]
Z. Lubsen, A. Zaidman, and M. Pinzger, “Studying co-evolution of
production and test code using association rule mining,” in Proceedings
of the 6th Working Conference on Mining Software Repositories (MSR
2009), M. W. Godfrey and J. Whitehead, Eds. Washington, DC, USA:
IEEE Computer Society, 2009, pp. 151–154.
[25]
[8]
H. Gall, K. Hajek, and M. Jazayeri, “Detection of logical coupling
based on product release history,” in Proceedings of the International
Conference on Software Maintenance, ser. ICSM ’98. Washington,
DC, USA: IEEE Computer Society, 1998, pp. 190–.
[9]
B. Adams, K. De Schutter, H. Tromp, and W. De Meuter, “The evolution
of the linux build system,” Electronic Communications of the ECEASST,
vol. 8, February 2008.
[10]
S. Mcintosh, B. Adams, and A. E. Hassan, “The evolution of java build
systems,” Empirical Softw. Engg., vol. 17, no. 4-5, pp. 578–608, Aug.
2012.
[11]
M. Serrano Zanetti, “The co-evolution of socio-technical structures in
sustainable software development: Lessons from the open source software communities,” in Proceedings of the 34th International Conference
on Software Engineering, ser. ICSE ’12. Piscataway, NJ, USA: IEEE
Press, 2012, pp. 1587–1590.
[12]
F. Rahman and P. Devanbu, “Ownership, experience and defects: A finegrained study of authorship,” in Proceedings of the 33rd International
Conference on Software Engineering, ser. ICSE ’11. New York, NY,
USA: ACM, 2011, pp. 491–500.
[13]
E. J. Weyuker, T. J. Ostrand, and R. M. Bell, “Do too many cooks spoil
the broth? using the number of developers to enhance defect prediction
models,” Empirical Softw. Engg., vol. 13, no. 5, pp. 539–559, Oct. 2008.
[14]
A. Meneely and L. Williams, “Secure open source collaboration: An
empirical study of linus’ law,” in Proceedings of the 16th ACM
Conference on Computer and Communications Security, ser. CCS ’09.
New York, NY, USA: ACM, 2009, pp. 453–462. [Online]. Available:
http://doi.acm.org/10.1145/1653662.1653717
[15]
S. Karus and M. Dumas, “Code churn estimation using organisational
and code metrics: An experimental comparison,” Information &
Software Technology, vol. 54, no. 2, pp. 203–211, 2012. [Online].
Available: http://dx.doi.org/10.1016/j.infsof.2011.09.004
[16]
C. Bird and T. Zimmermann, “Assessing the value of branches with
what-if analysis,” in Proceedings of the 20th International Symposium
on Foundations of Software Engineering (FSE 2012). Association
for Computing Machinery, Inc., November 2012. [Online]. Available:
http://research.microsoft.com/apps/pubs/default.aspx?id=172572
[17]
N. Nagappan, B. Murphy, and V. Basili, “The influence of
organizational structure on software quality: An empirical case
study,” Microsoft Research, Tech. Rep. MSR-TR-2008-11, January
2008. [Online]. Available: http://research.microsoft.com/apps/pubs/
default.aspx?id=70535
[18]
N. Nagappan and T. Ball, “Use of relative code churn measures to
predict system defect density.” Association for Computing Machinery,
Inc., May 2005.
[23]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
M. Shridhar, B. Adams, and F. Khomh, “A qualitative analysis of software build system changes and build ownership styles,” in Proceedings
of the 8th International Symposium on Empirical Software Engineering
and Measurement (ESEM), Torino, Italy, September 2014.
B. Curtis, H. Krasner, and N. Iscoe, “A field study of the
software design process for large systems,” Commun. ACM, vol. 31,
no. 11, pp. 1268–1287, Nov. 1988. [Online]. Available: http:
//doi.acm.org/10.1145/50087.50089
P. N. Robillard, “The role of knowledge in software development.”
Commun. ACM, vol. 42, no. 1, pp. 87–92, 1999. [Online]. Available:
http://dblp.uni-trier.de/db/journals/cacm/cacm42.html#Robillard99
A. Mockus and D. M. Weiss, “Predicting risk of software changes.”
Bell Labs Technical Journal, vol. 5, no. 2, pp. 169–180, 2000.
A. Mockus, “Organizational volatility and its effects on software
defects,” in Proceedings of the Eighteenth ACM SIGSOFT International
Symposium on Foundations of Software Engineering, ser. FSE ’10.
New York, NY, USA: ACM, 2010, pp. 117–126. [Online]. Available:
http://doi.acm.org/10.1145/1882291.1882311
M. B. Miles and A. M. Huberman, Qualitative data analysis : an expanded sourcebook, 2nd ed. Thousand Oaks, Calif. : Sage Publications,
1994, includes indexes.
L. Barker, “Android and the linux kernel community,” http://www.
steptwo.com.au/papers/kmc whatisinfoarch/, May 2005.
A. Bacchelli and C. Bird, “Expectations, outcomes, and challenges
of modern code review,” in Proceedings of the 2013 International
Conference on Software Engineering, ser. ICSE ’13. Piscataway, NJ,
USA: IEEE Press, 2013, pp. 712–721.
H. Hemmati, S. Nadi, O. Baysal, O. Kononenko, W. Wang,
R. Holmes, and M. W. Godfrey, “The msr cookbook: Mining a
decade of research,” in Proceedings of the 10th Working Conference
on Mining Software Repositories, ser. MSR ’13. Piscataway,
NJ, USA: IEEE Press, 2013, pp. 343–352. [Online]. Available:
http://dl.acm.org/citation.cfm?id=2487085.2487150
“Kruskal-wallis test,” http://www.r-tutor.com/elementary-statistics/
non-parametric-methods/kruskal-wallis-test.
P. Kampstra, “Beanplot: A boxplot alternative for visual comparison
of distributions,” Journal of Statistical Software, Code Snippets,
vol. 28, no. 1, pp. 1–9, October 2008. [Online]. Available:
http://www.jstatsoft.org/v28/c01
“puppetlabspuppetlabs-openstack,”
https://github.com/puppetlabs/
puppetlabs-openstack.
“stackforgeopenstack-chef-repo,”
https://github.com/stackforge/
openstack-chef-repo.
L. Aversano, L. Cerulo, and C. Del Grosso, “Learning from
bug-introducing changes to prevent fault prone code,” in Ninth
International Workshop on Principles of Software Evolution: In
Conjunction with the 6th ESEC/FSE Joint Meeting, ser. IWPSE ’07.
New York, NY, USA: ACM, 2007, pp. 19–26. [Online]. Available:
http://doi.acm.org/10.1145/1294948.1294954
S. Kim, T. Zimmermann, K. Pan, and E. J. W. Jr., “Automatic
identification of bug-introducing changes.” in ASE. IEEE Computer
Society, 2006, pp. 81–90. [Online]. Available: http://dblp.uni-trier.de/
db/conf/kbse/ase2006.html#KimZPW06
E. Shihab, B. Adams, A. E. Hassan, and Z. M. Jiang, “An industrial
study on the risk of software changes,” in Proceedings of the ACM
SIGSOFT 20th International Symposium on the Foundations of Software
Engineering (FSE), Research Triangle Park, NC, US, November 2012,
pp. 62:1–62:11.
S. Kirbas, A. Sen, B. Caglayan, A. Bener, and R. Mahmutogullari,
“The effect of evolutionary coupling on software defects: An
industrial case study on a legacy system,” in Proceedings of the
8th ACM/IEEE International Symposium on Empirical Software
Engineering and Measurement, ser. ESEM ’14.
New York,
NY, USA: ACM, 2014, pp. 6:1–6:7. [Online]. Available: http:
//doi.acm.org/10.1145/2652524.2652577
K. Herzig and N. Nagappan, “The impact of test ownership and team
structure on the reliability and effectiveness of quality test runs,”
in Proceedings of the 8th ACM/IEEE International Symposium on
Empirical Software Engineering and Measurement, ser. ESEM ’14.
New York, NY, USA: ACM, 2014, pp. 2:1–2:10.