A Comparative Study on the Social Networks of Fictional Characters

A Comparative Study on the Social Networks of Fictional Characters
Jessica Calderone, Emma Valentine, Josie Trichka (students), Julie Gallagher (assistant principal)
Maine-Endwell High School, Endwell, NY13760, USA
Benjamin James Bush, Jin Akaishi, Hiroki Sayama
Collective Dynamics of Complex Systems Research Group, Binghamton University, Binghamton, NY 13902-6000, USA
Fitting results: a
1196.54, k
Fitting results: a
2.37955
100
1.00
50
0.70
20
0.50
1.25053, k
It was observed that older networks had largest connected
components (LCCs) that had shorter average path lengths, and
smaller minimum closeness centrality than the LCCs of newer
networks. We also found that the newer networks tended to have
smaller scaling exponents than the older networks. The data also
begins to suggest that newer networks may be more clustered than
older networks. The actual networks, their detailed properties, degree
distributions and k-C(k) plots are shown below.
Fitting results: a
1157.13, k
1.8768
Fitting results: a
0.68409, k
0.861389 p 0.0127324
Correlation coefficient
279.468, k
Size of largest connected components
1.5
k
3.0
2.5
1.0
500
400
300
200
2.0
100
1960
1970
1980
1990
2000
2010
1960
1970
1980
Release Date
1990
2000
1960
2010
1970
1980
Correlation coefficient
1990
2000
2010
Release Date
Release Date
0.818427 p 0.0244047
Correlation coefficient
0.35
0.663762 p 0.103992
0.8
0.7
Average clustering coefficient
0.30
0.25
0.20
0.6
0.5
0.4
0.3
0.15
0.2
1960
1970
1980
1990
2000
2010
1960
1970
Release Date
2.13015
Fitting results: a
0.577641, k
0.212238
1980
1990
2000
2010
Release Date
200.391, k
Fitting results: a
1.3945
200
1.00
100
0.70
0.886803, k
0.304775
1.00
0.70
200
50
0.50
20
100
10
0.30
0.20
20
0.20
10
0.15
0.15
10
5
0.10
0.10
# of characters: 717
# of links: 2267
Network density: 0.00883180227983
5
10
20
50
0.10
5
2
2
0.50
50
20
0.20
5
0.50
50
0.30
5
10
20
# of characters: 515
# of links: 815
Network density: 0.00615768199161
50
2
0.05
2
star wars.txt
2
CLEAN LOTR network 1 .txt
5
10
20
5
50
10
20
50
harry potter.txt
# of characters: 808
# of links: 1595
Network density: 0.00489221784632
2
5
10
Largest degree: 71 ['Gandalf']
Smallest degree: 1
Median degree: 4
Average degree: 6.32357043236
Largest degree: 59 ['Princess Leia']
Smallest degree: 0
Median degree: 1
Average degree: 3.16504854369
Number of connected components: 5
Size of Largest Connected Component (LCC): 708
Number of connected components: 91
Size of Largest Connected Component (LCC): 404
Number of connected components: 316
Size of Largest Connected Component (LCC): 447
Average closeness centrality in LCC: 0.250745816339
Max closeness centrality in LCC: 0.393214682981 ['Sauron']
Min closeness centrality in LCC: 0.108535462082 ['Earendil of Gondor']
Average closeness centrality in LCC: 0.300284103224
Max closeness centrality in LCC: 0.485542168675 ['Princess Leia']
Min closeness centrality in LCC: 0.186746987952 ['Tsabong Lah']
Average closeness centrality in LCC: 0.31512276333
Max closeness centrality in LCC: 0.555417185554 ['Harry Potter']
Min closeness centrality in LCC: 0.184985483202 ['Caspar Crouch',
'Harfang Longbottom']
Average degree centrality in LCC: 0.00903794979982
Max degree centrality in LCC: 0.100424328147 ['Sauron']
Min degree centrality in LCC: 0.001414427157 ['Earendil of Gondor']
Average degree centrality in LCC: 0.00975358081714
Max degree centrality in LCC: 0.146401985112 ['Princess Leia']
Min degree centrality in LCC: 0.00248138957816 ['Tsabong Lah']
Average betweenness centrality in LCC: 0.00442566789243
Max betweenness centrality in LCC: 0.198725254487 ['Sauron']
Average degree centrality in LCC: 0.0155195072281
Max degree centrality in LCC: 0.340807174888 ['Harry Potter']
Min degree centrality in LCC: 0.00224215246637 ['Caspar Crouch',
'Harfang Longbottom']
Average betweenness centrality in LCC: 0.00595528609563
Max betweenness centrality in LCC: 0.231131798374 ['Princess Leia']
Average shortest path length in LCC: 4.12452153206
Average shortest path length in LCC: 3.39402501044
Average clustering coefficient: 0.36671580193
Average clustering coefficient: 0.132147236125
0.332347
Fitting results: a
6.48601, k
0.540642
Fitting results: a
15.0
1.07871, k
10
20
50
100
Average shortest path length in LCC: 3.27881943399
Average clustering coefficient: 0.212460518177
Fitting results: a
0.0744247
Average betweenness centrality in LCC: 0.0051209425483
Max betweenness centrality in LCC: 0.395473462821 ['Harry Potter']
39.082, k
Fitting results: a
0.934643
1.15589, k
Fitting results: a
0.303236
13.4699, k
Fitting results: a
0.922754
1.10713, k
0.145513
1.00
1.00
1.
10.0
0.9
20
10.0
0.70
0.8
7.0
7.0
0.7
10
5
20
Largest degree: 152 ['Harry Potter']
Smallest degree: 0
Median degree: 1.0
Average degree: 3.94801980198
10
0.50
5
0.30
0.70
5.0
5.0
0.6
0.30
3.0
5
0.50
3.0
0.5
# of characters: 165
# of links: 555
Network density: 0.0410199556541
2.0
0.20
# of characters: 392
# of links: 1605
Network density: 0.0209431598726
0.84113 p 0.0177035
600
Fitting results: a
Fitting results: a
Correlation coefficient
0.794617 p 0.0327589
2.0
3.5
0.253249
1.00
100
Correlation coefficient
4.0
Minimum closeness centrality in LCC
Network science [1] is becoming an important
part of a scientist’s toolkit. It is being utilized in
many different fields, from biology to
comparative literature. It has been applied to
social network analysis of characters in fictional
worlds [2,3]. To our knowledge, however, no
comparative studies have been conducted on
multiple modern fictional worlds that high school
students are more familiar with. To fill in this gap
of knowledge, we analyzed social network
characteristics across several different modern
fictional worlds. Specifically, we investigated
social networks from Lord of the Rings, Star
Wars, Harry Potter, Lost, Twilight Saga, Pretty
Little Liars, and Glee. These series were chosen
due to their popularity among high school
students and their parents. Social networks were
reconstructed using Wikipedia and other online
resources. We found interesting changes of
network properties over time:
The largest
connected components (LCCs) of newer
networks tended to be smaller and simpler than
the LCC of older networks. The newer networks
were less “scale free” and possibly more
clustered than the older networks as well.
Results
Our research was based upon seven well known TV/book series. The Harry Potter series was
chosen because it was one of the most popular book series with today’s teenagers. The Lord of
the Rings was popular with both adults and teenagers. Star Wars was enjoyed by people of all
ages. On the other hand, Pretty Little Liars and The Twilight Saga had more limited
audiences, those of teenage girls. After we collected the data for the five book series listed
above, we extended our research to TV series in hopes of finding surprising differences
among the two. We chose Glee because it is popular among all age groups. Lost was much
larger and had a completely different story line than Glee, which caught our attention.
For each series, except Pretty Little Liars, we used Wikipedia to extract the fictional social
network data. Each TV/book series had a “List Of Characters” entry on Wikipedia. Starting
with this list, we then navigated to the Wikipedia page for each character, if such a page was
available. These character pages typically contained information about the character’s
fictional social connections. If such a page was not available, a summary of the character’s
connections was typically found on the “List Of Characters” page itself. These connections
were then manually recorded using a text editor. For Pretty Little Liars, network data was
gathered directly from the text of the books themselves.
The collected data was first visualized using Mathematica (shown below). Using
Mathematica and the NetworkX Python package, we then calculated various network metrics,
which are summarized in the figures shown below. We also used Microsoft Excel and
Mathematica to find possible correlations between the properties of the networks and the
series’ original release dates. We inspected the results by looking for similarities and
differences among the networks that stood out. We recorded any interesting patterns we
discovered for later consideration.
Average shortest path length in LCC
Abstract
Method
2
0.15
2
5
10
20
50
Largest degree: 95 ['James Ford']
Smallest degree: 1
Median degree: 3.0
Average degree: 8.1887755102
Number of connected components: 1
Size of Largest Connected Component (LCC): 392
Average degree centrality in LCC: 0.0209431598726
Max degree centrality in LCC: 0.242966751918 ['Kate Austen']
Min degree centrality in LCC: 0.00255754475703 ['Dr. Ian McVay']
Average betweenness centrality in LCC: 0.00452947065114
Max betweenness centrality in LCC: 0.149796210967 ['John Locke']
Average shortest path length in LCC: 2.76649355394
Average clustering coefficient: 0.677815465323
5
10
20
50
100
1.5
0.4
2
5
10
20
50
5
10
20
0.15
2
5
10
20
5
50
Largest degree: 55 ['Emily']
Smallest degree: 0
Median degree: 3
Average degree: 6.72727272727
Largest degree: 78 ['Bella Swan']
Smallest degree: 1
Median degree: 20.5
Average degree: 25.1276595745
2.0
0.20
2
50
Lost Networkrevised1 10 11.txt
10
20
PLL.txt
Average closeness centrality in LCC: 0.56922615a0784
Max closeness centrality in LCC: 0.882352941176 ['Bella Swan']
Min closeness centrality in LCC: 0.348837209302 ['Joshua Uley']
Average degree centrality in LCC: 0.287667887668
Max degree centrality in LCC: 0.866666666667 ['Bella Swan']
Min degree centrality in LCC: 0.0111111111111 ['Joshua Uley']
Average betweenness centrality in LCC: 0.00913144283931
Max betweenness centrality in LCC: 0.173959604211 ['Bella Swan']
1.5
2
5
10
5
20
10
20
Number of connected components: 1
Size of Largest Connected Component (LCC): 67
Average closeness centrality in LCC: 0.376882505691
Max closeness centrality in LCC: 0.5770609319 ['Emily']
Min closeness centrality in LCC: 0.228045325779 ['Jade Smythe']
Average closeness centrality in LCC: 0.49681395974
Max closeness centrality in LCC: 0.709677419355 ['Sue Sylvester']
Min closeness centrality in LCC: 0.295964125561 ['Wes', 'David']
Average degree centrality in LCC: 0.0424814048002
Max degree centrality in LCC: 0.341614906832 ['Emily']
Min degree centrality in LCC: 0.00621118012422 ['Jade Smythe']
Average degree centrality in LCC: 0.175938489371
Max degree centrality in LCC: 0.621212121212 ['Sue Sylvester']
Min degree centrality in LCC: 0.0151515151515 ['Wes', 'David']
Average betweenness centrality in LCC: 0.0107353730542
Max betweenness centrality in LCC: 0.311627967786 ['Emily']
b
Average shortest path length in LCC: 2.71765968867
Average betweenness centrality in LCC: 0.0168388825105
Max betweenness centrality in LCC: 0.233381987752 ['Sue Sylvester']
Average shortest path length in LCC: 2.09452736318
Average clustering coefficient: 0.575063779572
Average clustering coefficient: 0.814941085642
Chronologically, we see a general trend in which the newer networks are
smaller and simpler than the older networks. The newer networks are
more simple in that their largest connected components (LCCs) tend to be
centered around a single well connected character, as evidenced by the
correlation we found between minimum closeness centrality and release
date (see above). The newer networks are also simpler in that they tend to
have smaller scaling exponents, and are therefore less “scale free” than the
older networks. We also found moderately convincing evidence
suggesting that newer networks may also be more clustered.
One marginally significant (p < 0.056, not reported above) relationship
we observed was a decrease in the maximum closeness centrality as the
number of characters in a fictional work or series increases. This may
reflect a similarity with real world social networks, in which there is a
limit to how many people even the most connected person can know.
# of characters: 67
# of links: 389
Network density: 0.175938489371
CLEAN Glee.txt
Average shortest path length in LCC: 1.8126984127
Findings and Discussion
50
Largest degree: 41 ['Sue Sylvester']
Smallest degree: 1
Median degree: 5
Average degree: 11.6119402985
Number of connected components: 3
Size of Largest Connected Component (LCC): 162
Number of connected components: 2
Size of Largest Connected Component (LCC): 91
TwilightSaga.txt
Average closeness centrality in LCC: 0.367724983243
Max closeness centrality in LCC: 0.560171919771 ['Kate Austen']
Min closeness centrality in LCC: 0.248253968254 ['Dr. Ian McVay']
100
# of characters: 94
# of links: 1181
Network density: 0.270189887898
Average clustering coefficient: 0.654784988167
To our knowledge, our study is the first to comparatively examine
multiple fictional social networks. However, our study has several
limitations, the most important of which is our small sample size. While
we did strive to represent a diverse set of time periods and intended
audiences, we had no systematic way of choosing these works. There are
also limitations associated with our use of Wikipedia. Since Wikipedia is
publicly editable, its reliability is not guaranteed. Furthermore, the
different social works were likely documented under different conditions,
which adds unknown variation to our data set. We will address these
limitations in future studies. Since gathering data from Wikipedia
manually was such a tedious and time consuming task (a single network
could take up to 15 hours to construct), one of our most important future
goals is to develop an automated network reconstruction technique. Our
other future goals include comparing fictional social networks to real
social networks, looking at other forms of media, and adapting our
research methodology for use in the classroom.
References
[1] Watts, D. J. (2004) The “new” science of networks. Annu. Rev. Sociol.
30:243–270.
[2] Choia, Y.-M. & Kim, H.-J. (2007) A directed network of Greek and
Roman mythology. Physica A 382:665–671.
[3] Gleiser, P. M. (2007) How to become a superhero. J. Stat. Mech.
P09020.
Acknowledgments
We thank Dr. Shelley Dionne of Binghamton University and Dr. Peter
Dionne for their help in conducting this research project. This research
was supported in part by the National Science Foundation Grant #BCS1027752. Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the authors and do not necessarily
reflect the views of the National Science Foundation.