Profiling User Communities on Stack Overflow - PLG

Profiling User Communities on Stack Overflow
Jeff Avery
Luis Blanco
Cheriton School of Computer Science
University of Waterloo
Waterloo, Canada
[email protected]
Cheriton School of Computer Science
University of Waterloo
Waterloo, Canada
[email protected]
ABSTRACT
used to encourage developer participation [6].
Stack Overflow is a popular Q&A site and software development
community, where programmers ask and answer each other’s
questions. Unlike other Q&A sites, its focus and content is
entirely determined by its own user community, who decide what
content is posted, collectively vote to choose the best answer, and
collectively refine one another’s content to improve the quality
and usefulness of the site. There has been a great deal of past
work examining the content and character of posts and site
content, but relatively little attention has been `paid to the user
communities that have evolved. This paper defines user
communities on Stack Overflow as a set of users that share
specific technical interests, and work together to author and revise
technical content on the site. We examine user profiles and
posting history of top-users in the context of specific communities
– Java, JavaScript and C# – to characterize the nature of these
communities and collaborations.
The founders and the design team actively work to incorporate
user feedback, and to moderate conversations to keep them within
the established guidelines. They elicit continuous user feedback,
bug reports and features requests, and have iteratively revised the
site based on community feedback [7].
Stack Overflow attracts a diverse group of users from around the
world. It brings people with common technical interests together,
and engages them in a way that many other social sites are unable
to do. We can think of this group of users as an ad hoc online
community of like-minded individuals, with common technical
interests. If Stack Overflow represents developers as a whole, then
contributors within a technical area might represent that
community.
General Terms
We propose to characterize some specific user communities on
Stack Overflow. We group posts by technical area and
programming language, and use this grouping to select three
active communities. We then examine the profiles of top
contributors on each community, using their post history, profile
and demographic data to characterize that technical community. Is
this group scattered globally, or centralized? What characteristics
do programmers within a group have in common?
Human Factors.
For a given community, we are interested in questions like:
Categories and Subject Descriptors
H.1.2 [User-Machine Systems] Human factors.
What is their median age? What is the geographic
distribution for this community?
Keywords
programming; community; human-factors; data-mining.
How active is this community? Who asks the most
questions? Who answers the most?
1. INTRODUCTION
Are there differences between users of this and other
programming language communities?
Stack Overflow was introduced in 2008 as an open Q&A site for
software developers – a place where they could ask and answer
technical questions in a community setting. Since then, the site
has grown to nearly 2.7 million active users, addressing over 8
million questions [1-4].
2. RELATED WORK
A great deal of prior research on Stack Overflow data focuses on
analyzing characteristics of questions. For instance, Cheng et al.
[8] attempt to discover the features of a question that elicit
answers. Anderson et al. [9] tried to identify characteristics of
questions that, in their opinion, provided “long-lasting value”,
where “long-lasting value” was defined as the questions that draw
the most views and attention.
The key to the success of this site is its active user community [5].
Any user can post a question, or add to a list of answers; the
community votes answers "up" or "down" based on their quality.
Incentive mechanisms are baked into the design of the site to
encourage participation; top users complete for badges, ranking
and recognition on the site. Jeff Atwood, one of the founders, has
specifically called out gamification as one of the key approaches
Recently, researchers have started to examine the characteristics
of the user base, attempting to determine how closely related user
characteristics are to the quality of the answers received. Posnett
et al. [10] examine the quality of answers in relation to the tenure
of the users who answer them, and suggest that a user’s expertise
in answering questions remains fairly constant. Pal et al. [11]
focused on trying to characterize the users that provide the
answers to questions in Q&A sites, and how their behavior has
evolved over time. Wang et al. [12] analyzed a large (100,000)
sample of questions and extracted general characteristics of the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
Conference’15, Month 1–2, 2015, City, State, Country.
Copyright 2015 ACM 1-58113-000-0/00/0010 …$15.00.
1
users of the site, such as the fact that less than 10% of users
answer more than 5 questions, or than less than 2% of users ask
more than 5 questions. Morrison et al. [13] examine the quality of
answers in relation to the age of the developers, trying to
determine if expertise is related to the age of a developer.
Reputation
These approaches attempt to look at the breadth of the Stack
Overflow community as a whole, treating all contributing
developers as part of a homogeneous group. Our focus is
narrower. We expect that programmers have different motivations
and usage patterns, and that these differences might be reflected in
how they participate in the community (e.g. C# developers may
have potentially different profiles than Java developers). We are
interested in teasing out the differences between these
communities.
Postings
Location
Users.Location
Ranking
Users.Reputation
Badges
Badges.Name, Badges.Date
Votes
Users.UpVotes/DownVotes
Profile views
Users.Views
Questions, answers
Posts
3.3 Mining Data
3. METHODOLOGY
To mine Stack Overflow, we built queries against this schema, and
executed them using the Stack Exchange Data Explorer interface.
All results were obtained between March 12 and March 29, 2015,
then exported and filtered using Microsoft Excel, Python scripts or
some other appropriate mechanism.
3.1 How Stack Overflow Works
The process we follow to characterize Stack Overflow
communities entails:
Stack Overflow is presented as a Q&A website, where users can
post new questions, or suggest answers to existing questions.
Everyone has the ability to "vote" questions and answers up or
down; the voting mechanism determines the score, or ranking of a
given post, and determines the relative ranking of an answer,
where higher-ranked answers appear below the question. The site
also allows users to perform keyword searches, and returns
questions and answers ordered by their rank.
Although anyone can view questions and answers, users need to
have an account to post, or up/down vote other users posts. Users
can optionally add their age, location and other personal details to
their profile, which are available publicly. Additionally, users can
earn "badges" for participating, which motivates programmers,
and encourages active participation [7].
Categories
PostTags, Tags, TagSynonyms
Questions
Posts
Answers
Posts
Popularity
Posts.ViewCount
Votes
Votes, VoteTypes
Accounts
Account age
Users.CreationDate
Users
Age
Users.Age
3.
Examining demographics, including age and location, and
post history for these users, and using this data to characterize
the larger user community.
4. RESULTS
4.1 Inception & Growth
Jeff Atwood and Joel Spolsky created Stack Overflow in 2008 as
a Q&A site for programmers [6]. It is characterized by intensive
community involvement: the users post questions, suggest
answers, and vote to determine the “best” answer. The community
guides the content: users moderate one another, filter content and
actively work to make the site accessible. Although heavily
involved in the initial design of the site, the management and
operations teams now works to maintain the infrastructure [7].
Table 1: Stack Overflow Data
Posts
Mining popular posts for each community, and using this
data to build a profile of the top-ranked contributors in that
community.
We finish the paper by comparing and contrasting the
communities, outlining our conclusions, and given suggestions for
future work.
Stack Overflow has a relatively simple database schema,
consisting of the Posts table that stores questions and answers, the
PostTags and Tags tables for annotating posts, the Users table
with user profile information, and the Badges for annotating users.
Table 1 describes the data that we consider relevant to our quest
of attempting to characterize Stack Overflow communities.
DB Tables[.Columns]
2.
In the next section of the paper, we go through the steps specified
above, present the results obtained and characterizing the three
selected communities.
3.2 Stack Overflow Data Model
Type
Using post tags to rank area of interest, and using this data to
select active and interesting communities.
When examining large sets of posts, we restrict questions and
answers to the top 10,000 posts, ranked by popularity. This allows
us to focus on high-quality, relevant posts for that community.
Stack Overflow is unique among Q&A sites, in that it makes its
data publically accessible. In addition to a public API [2], the
community periodically releases snapshots of all anonymized data
to the Internet Archive [3], and the site maintainers also provide a
web interface, the Stack Exchange Data Explorer [4] that allows
real-time queries against the most recent snapshot of the data; said
snapshot is typically updated every Monday.
Area
1.
The user community is substantial, having grown to over 4
million users (4,066,740). Members have posted 9 million
questions (9,064,335) and 15 million answers (15,194,996) over
40 thousand topics (40,026 distinct tags).
This is also a very active community, which boasts over 48
million unique hits per month, peaking at 4.5 million hits per day!
[14] Given the volume of users, broad interests and heavy
engagement, we can characterize this user community as large but
diverse, with broad technical interests.
2
Figure 1: Questions Asked Per Year (2015 YTD)
4.2 Defining Communities
Figure 4 : TIOBE Top-10 Programming Languages [15]
We expect that Stack Overflow is used by a broad range of people
for different purposes: professional programmers use it as an
important resource when debugging problems, students use it to
learn an API or discover the best approach to a programming
problem. One thing that crosscuts their profession, location and
age is their common technical interests. We propose that these
technical interests may be used to characterize a community on
Stack Overflow.
However, there are some important differences between how often
a programming language is used, and the amount of discussion
that language elicits.
For instance, JavaScript is only the 7th most popular programming
language of 2015 on the TIOBE index, but it generated the
highest number of questions on Stack Overflow. In contrast, Java
follows a more expected pattern, where it is ranked 2nd in both
usage and questions asked on Stack Overflow.
However, users don’t explicitly join particular communities;
community membership is implied by participation and active
posting. To delineate technical communities, we examine posted
questions and their associated tags (keywords added to the post by
the authors, which may be edited and modified by other posters or
moderators). We can use tags as explicit community identifiers,
and count the tags to infer popularity of various technologies and
programming languages.
This suggests that language popularity isn’t necessarily related to
the amount of community engagement around that language.
There may be characteristics of programming languages that make
them more or less likely to elicit questions. For instance,
JavaScript has been trending-up in popularity, which aligns with
the growing adoption of web applications and technologies. Its
popularity on Stack Overflow might be related to novice users
learning how to use it, the drastic shifts in related technologies
(e.g. Node.js, jQuery and other web toolkits), or other changes
that led to greater community engagement.
The C language on the other hand, has a very different profile.
Although ranked 1st in popularity, it doesn’t even appear on a
recent top-10 list on Stack Overflow. This could indicate that the
C community doesn’t tend to use Stack Overflow, or possibly that
the language itself is so well understood by now that it doesn’t
require much discussion.
Figure 2: Popular Tags by Year
For our research, we’re primarily interested in active
programming communities. To determine the most active and
popular programming languages being discussed, we extracted
tags from all of the questions, and ranked tags by popularity. The
results, shown in Figure 5, are the top-10 most frequently
discussed programming languages on Stack Overflow.
This table indicates that keywords are typically used to refer to
programming language, or programming technologies. Note that
topics trend over time. Tag counts compare favorably to other
measurements of programming language popularity, such as the
TIOBE Index [15].
In the following sections, we focus on three most popular
languages: Java, JavaScript, and C#. These languages were chosen
because of their overall popularity and active user communities.
Though some of the languages may be used together, or more
specifically be used in the same project, particularly JavaScript
with C# and JavaScript with Java, we believe the languages are
conceptually independent for their communities to have their own
characteristics.
We specifically exclude Android and jQuery as separate
categories because they are technologies that are typically used
with other languages on this list (Java and JavaScript specifically),
and are subsumed in those programming languages; also, the
Android tag may be used to make reference to either the operating
Figure 3: TIOBE Community Programming Index [15]
3
system or the programming language used to program android
devices.
Figure 7: Java - Posts Per Year (2015 YTD)
To characterize the Java community, we examine the posting
history and demographics of users that post these questions and
answers. As previously noted, we’re most interested in “useful”
posts, so we restrict queries to the top 10,000 questions (using the
voting system scores).
Figure 5 : Top-10 Programming Languages
To ensure that we find all posts relevant to a community, we
generated a ranked list of all of the tags used on Stack Overflow,
then manually extracted and categorized tags related to the
programming language in question (e.g. “typescript” as a known
variant of “javascript”). In subsequent sections, we use this set of
restricted tags when extracting data for a specific programming
language.
400
350
300
250
200
4.3 Community 1: Java
150
Originally developed by Sun in 1995, Java is a multi-platform
programming language and computing platform. Java serves a
broad range of purposes, from desktop and enterprise
applications, to mobile and embedded systems. Oracle acquired
Sun, including the rights to Java, in 2010 [16].
100
50
0
15
20
25
30
35
40
45
Questions
We extract posts by the Java community on Stack Overflow by
searching for posts containing the tags “java”, java-ee”
(“enterprise edition” for scalable business applications), “javafx”
(web development) and “java-me” (“micro-edition” for embedded
systems).
50
55
60
65
71
86
Answers
Figure 8: Java - Age Distribution of Core Posters
2,781 users ask the top-10,000 Java questions, and 2,606 of them
answer these questions. Users can optionally post their age as part
of their user profile; 55% of them (2,621 of 4,729) include this
information, for a mean age of 35 years. These 2,621 users
represent our “core” Java contributors; their contributions are
shows in Figure 9.
We consider the Java community to include those users who have
posted, or answered a question containing these keywords
(queries use wildcards, to allow for some variation). Wildcards
are used to ensure that we consistently return all variations of
these tags. The top-10 Java keywords are listed in Figure 6.
90000
80000
70000
60000
50000
40000
30000
20000
10000
0
0
20
40
Questions
60
80
100
Answers
Figure 9: Java - Posts By Core Contributors
Figure 6: Java - Top-10 Tags
Core contributors consistently answer more questions than they
ask, providing 1.8 answers for every question (633,991 questions
and 1,150,891 answers). This is consistent with the profile of
other users, who typically answer 1.8 times the number of
questions they ask (894,627 questions to 1,635,196 answers).
The Java community posted over 260 thousand (260,227)
questions and 383 thousand answers (383,771) related to Java and
Java technologies in 2014. As noted by Wang et al [12], users
posting more answers than questions are a common trend. The
number of posts has increased linearly, as the community has
grown.
4
We can further examine the age of user accounts, and posting
history relative to account creation. On average, the age of the
accounts of core contributors is 1740 days (4.8 years).
We describe the JavaScript community on Stack Overflow as
including any users who have authored or responded to a post
containing the keywords “javascript”, “ecmascript” or
“typescript”. Both ECMASScript and TypeScript are variants of
JavaScript. Wildcards are used to ensure that we consistently
return all variations of these tags. Figure 12 lists the top-10
JavaScript keywords, demonstrating a breadth of questions.
3500
3000
2500
2000
1500
1000
500
0
0
500
1000
1500
2000
2500
3000
Figure 10: Java - Time Between Account Creation and Post
Date
Post distribution, shown in Figure 10, indicates that posters tend
to be most busy after they first create their account. This suggests,
that in some cases, users create their accounts for the purpose of
posting questions. In addition, the decay of activity indicates that
some additional incentives might be needed to motivate users to
stay involved.
Figure 12: JavaScript - Top-10 Tags
The JavaScript community posted over 260 thousand (262,784)
JavaScript-related questions in 2014. They also posted over 380
thousand questions (383, 069) in the same time period. The
numbers of posts has increased linearly over time, as shown in
Figure 13.
Finally, we examine user location. Because the user location field
is a free-form text field, there is no consistency, and the data is
highly irregular (e.g. multiple versions of a single location, with
users specifying city, region, or country interchangeably). We
manually categorized the first 30 locations in Figure 11, but were
unable to perform further analysis, other than to notice a very
diverse set of contributors.
Figure 13 : JavaScript - Posts Per Year (2015 YTD)
To characterize the JavaScript community, we examine the
demographics of the users that post these questions and answers,
as well as their posting histories. As previously noted, we’re most
interested in “useful” posts, so we restrict queries to the top
10,000 questions (using the voting system scores).
4.4 Community 2: JavaScript
7,066 users asked the top-10,000 questions. In their profiles, 45%
of those users (3,197) listed their age, with a mean age of 32
years, slightly younger than the Java community; note that 20
users list their current age as “95”, which seems unlikely and
which we ignored in these queries. We consider these 3,197 users
as our “core” community contributors for JavaScript, shown in
Figure 14.
JavaScript is an object-oriented, dynamic programming language
originally developed at Netscape [17]. It’s currently included, in
various forms, in all major web browsers and commonly used to
allow client-side interaction on web pages, and dynamically
modify the browser view independent of a back-end web server.
The rise of web-development, and the general shift from desktop
to web environments, has increased the popularity of JavaScriptbased programming languages.
We then examined the posting patterns of these users. As Figure
15 illustrates, core contributors answer far more questions than
they ask. Figure 13 suggests that typical users post about 1.74
times as many answers as questions (827,505 asked and 1,460,206
answered). In contrast, our core contributors answer about 1.6 as
many questions as they ask (827,505 asked and 1,327,539
answered). There is not a significant difference in ratios between
core and typical users.
Figure 11: Java - Core Users Location
5
population, where 11.5% report age (468,927 out of 4,066,740)
and 9.3% report both age and location (377,158 out of
4,066,740).
600
500
400
2000
1800
300
1600
200
1400
100
1200
1000
0
15
20
25
30
35
40
45
Questions
50
55
60
65
74
80
800
94
600
Answers
400
Figure 14: JavaScript - Age Distribution of Core Posters
200
0
0
100000
90000
500
1000
1500
2000
2500
Figure 17: JavaScript - Time Between Account Creation and
Post Dates
80000
70000
However, the user location field is an unchecked string field,
which led to extremely irregular and hard-to-categorize data (e.g.
locations such as “Seattle”, “Seatle”, “Seattle WA”, “United
States”, “US” and so on). The top-50 locations were manually
categorized, leading to Figure 18.
60000
50000
40000
30000
20000
10000
0
0
20
40
Questions
60
80
100
Answers
Figure 15: JavaScript - Posts by Core Contributors
We can also examine the age of user accounts, and posting history
relative to account creation (i.e. did they create more posts when
their account was new?). From Figure 16, we can see that most of
the core contributors joined Stack Overflow in the years
immediately after it launched. The average core user account is
1719 days (or 4.7 years) old.
3000
2500
Figure 18: JavaScript – Core Users Location
2000
4.5 Community 3: C#
1500
C# is a modern object-oriented programming language created in
2000 and released in 2002 by Microsoft, and ratified as an ECMA
standard in 2006.
1000
500
It was designed for use with the .NET platform and the Common
Language Runtime environment, running on Windows platforms,
and is closely tied to those technologies. C# is integrated with the
Visual Studio IDE, also a Microsoft product.
0
2008
2009
2010
2011
2012
2013
2014
2015
Figure 16: JavaScript - User Account Creation
Some attempts have been made to make .NET applications in
general, and C# applications in particular, run in other platforms.
The most successful has been the Mono Project [21], it even
offers its own IDE to allow for the development of .NET
applications in other operating systems.
Post distribution, shown in Figure 17, indicates that posters tend
to be most busy after they first create their account. This suggests,
that in some cases, users create their accounts for the purpose of
posting questions.
Finally, we can examine geographic location. Users can optionally
list their location in their user profile, and approximately 97.8%
have included this information (9,698 out of 9,910). This is
significantly higher than the remainder of the Stack Overflow
Anders Hejlsberg was the lead architect of C#; he had also been
the lead architect of the team that developed Turbo Pascal and
Delphi at Borland [18-20].
6
To identify the C# community, we examined the tag-list of related
words. The top-10 list of relevant tags is listed in Figure 19. All of
these tags contain the phrase “c#”, indicating that we can use this
in a wildcard search to locate all relevant posts.
accounts sharply declined after this. It may be that the community
has matured or stabilized, or this might be an indication of the
declining popularity of C#, or some other factor entirely.
500
450
400
350
300
250
200
150
100
50
0
14
20
25
30
35
40
Questions
Figure 19: C# - Top-10 Tags
45
50
55
60
66
74
95
Answers
Figure 21: C# - Age Distribution of Core Posters
The C# community posted over 180 thousand C# questions
(182,462) and over 256 thousand related answers (258,612) in
2014. The number of posts increased year-over-year from 2008
through 2013, but actually declined in 2014, as shows in Figure
20. This may indicate a maturity of the platform, or possibly a
migration by developers to other technologies.
100000
90000
80000
70000
60000
50000
40000
30000
400,000
20000
350,000
10000
300,000
0
0
250,000
20
40
Questions
200,000
150,000
60
80
100
Answers
Figure 22: C# - Posts By Core Contributors
100,000
50,000
As in our other two case studies, we can see that account activity
peaks shortly after account creation and slowly declines after that.
This is the same profile that we saw in our previous case studies.
0
2008
2009
2010
2011
Questions
2012
2013
2014
2015
Answers
Figure 20: C# - Posts Per Year
3000
We can further examine the posting history of C# programmers.
As before, we restrict ourselves to the top-10,000 C# posts, as
determined by the rank of questions (based on voting scores).
2500
6,235 users asked the top-10,000 questions. 54% of these core
users list their age in their user profiles (3,338 of 6,235), with a
mean age of 34 years, slightly older than the JavaScript
community. Again, we see a disproportionate number of users –
16 in this case - who list their age as “95” and are excluded. We
consider these 3,338 as “core” contributors that ask and answer
the top questions. We characterize these users in Figure 21 and
Figure 22.
2000
1500
1000
500
0
2008
We examine the posting pattern of these core users. In line with
other communities, core C# contributors answer more questions
than they ask, at a ratio of 1.4 times (910,640 asked to 1,285,298
answered). This is lower than the 1.9 ratio suggested in Figure 20
for typical users (with 797,898 asked and 1,508,532 answered).
2009
2010
2011
2012
2013
2014
2015
Figure 23: C# - User Account Creation
Finally we can examine geographic location. Location data is
optional, with 95% (3,178 out of 3,338) adding a location to their
profile. Unfortunately, the data is also extremely inconsistent,
since users can enter free-form text. The top-12 locations,
manually categorized from the top-50 locations identified by our
core C# users, are listed in Figure 25.
We can also examine the age of user account. The average core
user account is 1907 days (or 5.2 years) old. C# posters rushed to
join when Stack Overflow first launched, but the creation of
7
4000
3000
3500
2500
3000
2000
2500
1500
2000
1000
1500
500
1000
0
500
2008
2009
2010
2011
2012
2013
2014
2015
0
0
500
1000
1500
2000
2500
JavaScript
3000
Figure 24: C# - Time Between Account Creation and Post
Dates
Csharp
Java
Figure 26: Account Creation by Year and Community
Table 2: Community Demographics
Community
Age (mean)
Account age (mean)
JavaScript
32 years
4.7 years
Java
35 years
4.8 years
C#
34 years
5.2 years
We took the approach of extracting "key contributors" for each
community, expecting that the top-posters might have a different
posting profile than average users [11, 12]. We found that key
contributors do post more frequently than typical users, however,
they also tended to post more questions relative to the number of
answers they provide.
We also expected user location to be a key demographic, but the
data was too irregular to reliably process. In the future, we would
like to explore other ways to accurately categorize location
information, and determine if deeper trends exist. We expect that
smaller, and heavily localized developer communities do exist and
that they may be reflected in Stack Overflow data.
Figure 25: C# - Core Users Location
5. DISCUSSION
Part of the reason that Stack Overflow had such a significant and
immediate impact was that the founders were well-known and
popular developers, particularly in the .NET community.
Essentially, they were developers creating a site for other
developers. Belonging to the community gave the site immediate
credibility and popularity in this community.
Finally, it's important to note that user profile information, age
and location in particular, is optional and voluntary; and location
fields are free form and users can enter any data that they wish.
The participation rates are fairly high, but we have no reliable way
of verifying the information that is entered, based on the schema
provided. Some way of location tracking, by IP for instance,
would greatly increase confidence in these results.
This also meant that many of the first developers to join Stack
Overflow were .NET developers, either already familiar with C#
or making the transition from Delphi to C#, once Delphi stooped
being widely used. We see this in the 2008 account creation
statistics, where significant portions of the users were from the C#
community. However, after this initial influx of users, the rate of
account creation in the C#/.NET community has drastically
declined; since 2010, it's had the lowest rate of account creation
among these communities.
6. CONCLUSIONS
Stack Overflow is a popular programmer Q&A site that attracts
millions of hits per day; many developers consider it a key
technical resource [7]. However, it actually represents a number
of different technical communities, each with its own technical
interests. Over time, the population of these communities has
changed to reflect these changes in new technologies and
programming trends.
We can see how Stack Overflow attracted developers with other
interests; the rise of JavaScript for example in 2009 indicates the
growth of that community. Although it quickly stabilized, it hasn't
seen the rapid decline of the C# community. Clearly, JavaScript
remains popular and continues to attract new users.
Although previous studies have tried to find trends in Stack
Overflow by examining user demographics, ours is the first
attempt, that we are aware of, that tries to characterize different
Stack Overflow communities. We use data extracted from Stack
Overflow posts to characterize three of these technical
communities: Java, JavaScript and C#. Using demographic and
post history data, we were able to extract unique characteristics of
each community, and show how these they reflect the changing
nature of Stack Overflow.
There are subtle differences in the demographics of the user
communities, as demonstrated in Table 2. The age of the C#
account reflects the initial onboarding of C# developers, as
discussed. The JavaScript community started to pick up 6 months
later. Java and C# developers also tend to be slightly older, which
might reflect trends in technology; JavaScript is a newer language,
more likely to be learned by people at the start of their careers.
8
Even though we found important differences between
communities based on posting trends and user age, we couldn’t
find any characterizations based on user location. We believe that
further investigation may discover meaningful trends, such as
clusters of users of particular technologies in specific locations.
10. D. Posnett, Warburg, E., Devanbu, P., and Filkov, V. 2012.
Mining Stack Exchange: Expertise Is Evident from Initial
Contributions. IEEE, 2012.
11. A. Pal, S. Chang, and J. A. Konstan. 2012. "Evolution of
Experts in Question Answering Communities." ICWSM.
2012.
7. ACKNOWLEDGMENTS
12. S. Wang, D. Lo, and L. Jiang. 2013. An empirical study on
developer interactions in StackOverflow. In Proceedings of
the 28th Annual ACM Symposium on Applied Computing
(SAC '13). ACM, New York, NY, USA, pp: 1019-1024.
Our thanks to ACM SIGCHI for allowing us to modify templates
they had developed.
8. REFERENCES
1.
Stack Overflow. http://www.stackoverflow.com
2.
Stack Exchange API. http://api.stackexchange.com/
13. P. Morrison and Murphy-Hill, E. 2013. Is programming
knowledge related to age? An exploration of stack overflow.
Mining Software Repositories 2013, pp: 69–72.
3.
Internet Archive Stack Exchange Data Dump.
https://archive.org/details/stackexchange
14. Quantcast Audience Report for Mar 27, 2015. Retrieved
from: https://www.quantcast.com/stackoverflow.com
4.
Stack Exchange Data Explorer.
http://data.stackexchange.com/stackoverflow/queries.
15. TIOBE Index for March 2015. Retrieved from:
http://www.tiobe.com Accessed on: Mar 26, 2015
5.
R. Gazan. 2006. Specialists and Synthesists in a Question
Answering Community. In Proceedings of the 69th Annual
Meeting of the American Society for Information Science
and Technology. Austin, TX: Richard B. Hill
16. “What is Java?” on Java.com. Retrieved from:
https://www.java.com/en/download/faq/whatis_java.xml
Accessed on: Mar 27, 2015
6.
J. Atwood. "The Gamification". Coding Horror Blog. 2011.
Retrieved from: http://blog.codinghorror.com/thegamification/ Accessed on: Mar 12, 2015
7.
L. Mamykina, Manoim, B., Mittal, M., Hripcsak, G., and
Hartmann, B. 2011. Design lessons from the fastest Q&A
site in the west. In Proc. CHI ‘11. New York, NY: ACM.
8.
D. Cheng, Schiff, M., and Wu, W. Eliciting Answers on
StackOverflow. 2013. Retrieved from:
http://bid.berkeley.edu/cs294-1spring13/images/6/6e/ChengSchiffWuFinalPaper.pdf
Accessed on: Mar 27, 2015.
9.
17. “What is JavaScript?” on Mozilla Developer Network.
Retrieved from: https://developer.mozilla.org/enUS/docs/Web/JavaScript/About_JavaScript Accessed on:
Mar 26, 2015
18. “C# Language Specification 5.0” on Microsoft.com.
Retrieved from: https://www.microsoft.com/enus/download/details.aspx?id=7029 Accessed on: Mar 27,
2015
19. “Visual C#” on Microsoft Developer Network. Retrieved
from: https://msdn.microsoft.com/enus/library/kx37x362.aspx Accessed on: Mar 27, 2015
20. “Deep Inside C#: An Interview with Microsoft Chief
Architect Anders Hejlsberg” on O’Reilly Windows Dev
Center. Retrieved from:
http://www.windowsdevcenter.com/pub/a/oreilly/windows/n
ews/hejlsberg_0800.html Accessed on: Mar 27, 2015
A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec.
2012. Discovering value from community activity on focused
question answering sites: a case study of stack overflow. In
Proceedings of the 18th ACM SIGKDD international
conference on Knowledge discovery and data mining 2012.
New York, NY: ACM.
21. The Mono Project. Retrieved from: http://www.monoproject.com/ Accessed on: Mar 27, 2015
9