A Comparative Study on the Social Networks of Fictional Characters Jessica Calderone, Emma Valentine, Josie Trichka (students), Julie Gallagher (assistant principal) Maine-Endwell High School, Endwell, NY13760, USA Benjamin James Bush, Jin Akaishi, Hiroki Sayama Collective Dynamics of Complex Systems Research Group, Binghamton University, Binghamton, NY 13902-6000, USA Fitting results: a 1196.54, k Fitting results: a 2.37955 100 1.00 50 0.70 20 0.50 1.25053, k It was observed that older networks had largest connected components (LCCs) that had shorter average path lengths, and smaller minimum closeness centrality than the LCCs of newer networks. We also found that the newer networks tended to have smaller scaling exponents than the older networks. The data also begins to suggest that newer networks may be more clustered than older networks. The actual networks, their detailed properties, degree distributions and k-C(k) plots are shown below. Fitting results: a 1157.13, k 1.8768 Fitting results: a 0.68409, k 0.861389 p 0.0127324 Correlation coefficient 279.468, k Size of largest connected components 1.5 k 3.0 2.5 1.0 500 400 300 200 2.0 100 1960 1970 1980 1990 2000 2010 1960 1970 1980 Release Date 1990 2000 1960 2010 1970 1980 Correlation coefficient 1990 2000 2010 Release Date Release Date 0.818427 p 0.0244047 Correlation coefficient 0.35 0.663762 p 0.103992 0.8 0.7 Average clustering coefficient 0.30 0.25 0.20 0.6 0.5 0.4 0.3 0.15 0.2 1960 1970 1980 1990 2000 2010 1960 1970 Release Date 2.13015 Fitting results: a 0.577641, k 0.212238 1980 1990 2000 2010 Release Date 200.391, k Fitting results: a 1.3945 200 1.00 100 0.70 0.886803, k 0.304775 1.00 0.70 200 50 0.50 20 100 10 0.30 0.20 20 0.20 10 0.15 0.15 10 5 0.10 0.10 # of characters: 717 # of links: 2267 Network density: 0.00883180227983 5 10 20 50 0.10 5 2 2 0.50 50 20 0.20 5 0.50 50 0.30 5 10 20 # of characters: 515 # of links: 815 Network density: 0.00615768199161 50 2 0.05 2 star wars.txt 2 CLEAN LOTR network 1 .txt 5 10 20 5 50 10 20 50 harry potter.txt # of characters: 808 # of links: 1595 Network density: 0.00489221784632 2 5 10 Largest degree: 71 ['Gandalf'] Smallest degree: 1 Median degree: 4 Average degree: 6.32357043236 Largest degree: 59 ['Princess Leia'] Smallest degree: 0 Median degree: 1 Average degree: 3.16504854369 Number of connected components: 5 Size of Largest Connected Component (LCC): 708 Number of connected components: 91 Size of Largest Connected Component (LCC): 404 Number of connected components: 316 Size of Largest Connected Component (LCC): 447 Average closeness centrality in LCC: 0.250745816339 Max closeness centrality in LCC: 0.393214682981 ['Sauron'] Min closeness centrality in LCC: 0.108535462082 ['Earendil of Gondor'] Average closeness centrality in LCC: 0.300284103224 Max closeness centrality in LCC: 0.485542168675 ['Princess Leia'] Min closeness centrality in LCC: 0.186746987952 ['Tsabong Lah'] Average closeness centrality in LCC: 0.31512276333 Max closeness centrality in LCC: 0.555417185554 ['Harry Potter'] Min closeness centrality in LCC: 0.184985483202 ['Caspar Crouch', 'Harfang Longbottom'] Average degree centrality in LCC: 0.00903794979982 Max degree centrality in LCC: 0.100424328147 ['Sauron'] Min degree centrality in LCC: 0.001414427157 ['Earendil of Gondor'] Average degree centrality in LCC: 0.00975358081714 Max degree centrality in LCC: 0.146401985112 ['Princess Leia'] Min degree centrality in LCC: 0.00248138957816 ['Tsabong Lah'] Average betweenness centrality in LCC: 0.00442566789243 Max betweenness centrality in LCC: 0.198725254487 ['Sauron'] Average degree centrality in LCC: 0.0155195072281 Max degree centrality in LCC: 0.340807174888 ['Harry Potter'] Min degree centrality in LCC: 0.00224215246637 ['Caspar Crouch', 'Harfang Longbottom'] Average betweenness centrality in LCC: 0.00595528609563 Max betweenness centrality in LCC: 0.231131798374 ['Princess Leia'] Average shortest path length in LCC: 4.12452153206 Average shortest path length in LCC: 3.39402501044 Average clustering coefficient: 0.36671580193 Average clustering coefficient: 0.132147236125 0.332347 Fitting results: a 6.48601, k 0.540642 Fitting results: a 15.0 1.07871, k 10 20 50 100 Average shortest path length in LCC: 3.27881943399 Average clustering coefficient: 0.212460518177 Fitting results: a 0.0744247 Average betweenness centrality in LCC: 0.0051209425483 Max betweenness centrality in LCC: 0.395473462821 ['Harry Potter'] 39.082, k Fitting results: a 0.934643 1.15589, k Fitting results: a 0.303236 13.4699, k Fitting results: a 0.922754 1.10713, k 0.145513 1.00 1.00 1. 10.0 0.9 20 10.0 0.70 0.8 7.0 7.0 0.7 10 5 20 Largest degree: 152 ['Harry Potter'] Smallest degree: 0 Median degree: 1.0 Average degree: 3.94801980198 10 0.50 5 0.30 0.70 5.0 5.0 0.6 0.30 3.0 5 0.50 3.0 0.5 # of characters: 165 # of links: 555 Network density: 0.0410199556541 2.0 0.20 # of characters: 392 # of links: 1605 Network density: 0.0209431598726 0.84113 p 0.0177035 600 Fitting results: a Fitting results: a Correlation coefficient 0.794617 p 0.0327589 2.0 3.5 0.253249 1.00 100 Correlation coefficient 4.0 Minimum closeness centrality in LCC Network science [1] is becoming an important part of a scientist’s toolkit. It is being utilized in many different fields, from biology to comparative literature. It has been applied to social network analysis of characters in fictional worlds [2,3]. To our knowledge, however, no comparative studies have been conducted on multiple modern fictional worlds that high school students are more familiar with. To fill in this gap of knowledge, we analyzed social network characteristics across several different modern fictional worlds. Specifically, we investigated social networks from Lord of the Rings, Star Wars, Harry Potter, Lost, Twilight Saga, Pretty Little Liars, and Glee. These series were chosen due to their popularity among high school students and their parents. Social networks were reconstructed using Wikipedia and other online resources. We found interesting changes of network properties over time: The largest connected components (LCCs) of newer networks tended to be smaller and simpler than the LCC of older networks. The newer networks were less “scale free” and possibly more clustered than the older networks as well. Results Our research was based upon seven well known TV/book series. The Harry Potter series was chosen because it was one of the most popular book series with today’s teenagers. The Lord of the Rings was popular with both adults and teenagers. Star Wars was enjoyed by people of all ages. On the other hand, Pretty Little Liars and The Twilight Saga had more limited audiences, those of teenage girls. After we collected the data for the five book series listed above, we extended our research to TV series in hopes of finding surprising differences among the two. We chose Glee because it is popular among all age groups. Lost was much larger and had a completely different story line than Glee, which caught our attention. For each series, except Pretty Little Liars, we used Wikipedia to extract the fictional social network data. Each TV/book series had a “List Of Characters” entry on Wikipedia. Starting with this list, we then navigated to the Wikipedia page for each character, if such a page was available. These character pages typically contained information about the character’s fictional social connections. If such a page was not available, a summary of the character’s connections was typically found on the “List Of Characters” page itself. These connections were then manually recorded using a text editor. For Pretty Little Liars, network data was gathered directly from the text of the books themselves. The collected data was first visualized using Mathematica (shown below). Using Mathematica and the NetworkX Python package, we then calculated various network metrics, which are summarized in the figures shown below. We also used Microsoft Excel and Mathematica to find possible correlations between the properties of the networks and the series’ original release dates. We inspected the results by looking for similarities and differences among the networks that stood out. We recorded any interesting patterns we discovered for later consideration. Average shortest path length in LCC Abstract Method 2 0.15 2 5 10 20 50 Largest degree: 95 ['James Ford'] Smallest degree: 1 Median degree: 3.0 Average degree: 8.1887755102 Number of connected components: 1 Size of Largest Connected Component (LCC): 392 Average degree centrality in LCC: 0.0209431598726 Max degree centrality in LCC: 0.242966751918 ['Kate Austen'] Min degree centrality in LCC: 0.00255754475703 ['Dr. Ian McVay'] Average betweenness centrality in LCC: 0.00452947065114 Max betweenness centrality in LCC: 0.149796210967 ['John Locke'] Average shortest path length in LCC: 2.76649355394 Average clustering coefficient: 0.677815465323 5 10 20 50 100 1.5 0.4 2 5 10 20 50 5 10 20 0.15 2 5 10 20 5 50 Largest degree: 55 ['Emily'] Smallest degree: 0 Median degree: 3 Average degree: 6.72727272727 Largest degree: 78 ['Bella Swan'] Smallest degree: 1 Median degree: 20.5 Average degree: 25.1276595745 2.0 0.20 2 50 Lost Networkrevised1 10 11.txt 10 20 PLL.txt Average closeness centrality in LCC: 0.56922615a0784 Max closeness centrality in LCC: 0.882352941176 ['Bella Swan'] Min closeness centrality in LCC: 0.348837209302 ['Joshua Uley'] Average degree centrality in LCC: 0.287667887668 Max degree centrality in LCC: 0.866666666667 ['Bella Swan'] Min degree centrality in LCC: 0.0111111111111 ['Joshua Uley'] Average betweenness centrality in LCC: 0.00913144283931 Max betweenness centrality in LCC: 0.173959604211 ['Bella Swan'] 1.5 2 5 10 5 20 10 20 Number of connected components: 1 Size of Largest Connected Component (LCC): 67 Average closeness centrality in LCC: 0.376882505691 Max closeness centrality in LCC: 0.5770609319 ['Emily'] Min closeness centrality in LCC: 0.228045325779 ['Jade Smythe'] Average closeness centrality in LCC: 0.49681395974 Max closeness centrality in LCC: 0.709677419355 ['Sue Sylvester'] Min closeness centrality in LCC: 0.295964125561 ['Wes', 'David'] Average degree centrality in LCC: 0.0424814048002 Max degree centrality in LCC: 0.341614906832 ['Emily'] Min degree centrality in LCC: 0.00621118012422 ['Jade Smythe'] Average degree centrality in LCC: 0.175938489371 Max degree centrality in LCC: 0.621212121212 ['Sue Sylvester'] Min degree centrality in LCC: 0.0151515151515 ['Wes', 'David'] Average betweenness centrality in LCC: 0.0107353730542 Max betweenness centrality in LCC: 0.311627967786 ['Emily'] b Average shortest path length in LCC: 2.71765968867 Average betweenness centrality in LCC: 0.0168388825105 Max betweenness centrality in LCC: 0.233381987752 ['Sue Sylvester'] Average shortest path length in LCC: 2.09452736318 Average clustering coefficient: 0.575063779572 Average clustering coefficient: 0.814941085642 Chronologically, we see a general trend in which the newer networks are smaller and simpler than the older networks. The newer networks are more simple in that their largest connected components (LCCs) tend to be centered around a single well connected character, as evidenced by the correlation we found between minimum closeness centrality and release date (see above). The newer networks are also simpler in that they tend to have smaller scaling exponents, and are therefore less “scale free” than the older networks. We also found moderately convincing evidence suggesting that newer networks may also be more clustered. One marginally significant (p < 0.056, not reported above) relationship we observed was a decrease in the maximum closeness centrality as the number of characters in a fictional work or series increases. This may reflect a similarity with real world social networks, in which there is a limit to how many people even the most connected person can know. # of characters: 67 # of links: 389 Network density: 0.175938489371 CLEAN Glee.txt Average shortest path length in LCC: 1.8126984127 Findings and Discussion 50 Largest degree: 41 ['Sue Sylvester'] Smallest degree: 1 Median degree: 5 Average degree: 11.6119402985 Number of connected components: 3 Size of Largest Connected Component (LCC): 162 Number of connected components: 2 Size of Largest Connected Component (LCC): 91 TwilightSaga.txt Average closeness centrality in LCC: 0.367724983243 Max closeness centrality in LCC: 0.560171919771 ['Kate Austen'] Min closeness centrality in LCC: 0.248253968254 ['Dr. Ian McVay'] 100 # of characters: 94 # of links: 1181 Network density: 0.270189887898 Average clustering coefficient: 0.654784988167 To our knowledge, our study is the first to comparatively examine multiple fictional social networks. However, our study has several limitations, the most important of which is our small sample size. While we did strive to represent a diverse set of time periods and intended audiences, we had no systematic way of choosing these works. There are also limitations associated with our use of Wikipedia. Since Wikipedia is publicly editable, its reliability is not guaranteed. Furthermore, the different social works were likely documented under different conditions, which adds unknown variation to our data set. We will address these limitations in future studies. Since gathering data from Wikipedia manually was such a tedious and time consuming task (a single network could take up to 15 hours to construct), one of our most important future goals is to develop an automated network reconstruction technique. Our other future goals include comparing fictional social networks to real social networks, looking at other forms of media, and adapting our research methodology for use in the classroom. References [1] Watts, D. J. (2004) The “new” science of networks. Annu. Rev. Sociol. 30:243–270. [2] Choia, Y.-M. & Kim, H.-J. (2007) A directed network of Greek and Roman mythology. Physica A 382:665–671. [3] Gleiser, P. M. (2007) How to become a superhero. J. Stat. Mech. P09020. Acknowledgments We thank Dr. Shelley Dionne of Binghamton University and Dr. Peter Dionne for their help in conducting this research project. This research was supported in part by the National Science Foundation Grant #BCS1027752. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
© Copyright 2025