1 Mingo Sanchez Professor Eric Gaze and Professor Jen Jack

Mingo Sanchez
Professor Eric Gaze and Professor Jen Jack Gieseking
Data Driven Societies – INTD 2420
14 May 2015
Under the Umbrella: What Data Visualizations Reveal about the Hong Kong Pro-Democracy
Movement
Introduction
In the past decade, technology has given rise to a societal shift unlike any other before it.
While technology is always evolving, the ubiquity of computers, smart devices, and social media
in recent years has fundamentally altered the ways in which we operate and communicate.
Previously, our social networks were limited primarily to those around us in physical space. With
the rise of the Internet, however, we now have the ability to reach nearly anyone on the entire
planet in minutes. With an entire world of connections at our disposal via websites like
Facebook, Reddit, and Twitter, new social phenomena we would not have dreamed of ten years
ago are now commonplace. Twitter in particular has been a catalyst behind many movements
ranging from small-scale protests to full-blown revolutions all across the globe. When I heard
about the so-called “Umbrella Revolution” that began to gain momentum in Hong Kong last
September, I knew that the protests had the potential to grow exponentially due to extensive
media coverage and presence on social media both domestically (i.e., in Hong Kong) and
internationally. While past research has been done on how political movements spread on social
media websites like Twitter, I was particularly interested in studying the different types of people
present in these online networks. I had previously heard about how Twitter can be used by
1 political figures, media outlets, and even self-proclaimed “hashtivists” (short for hashtag
activists) to reach a large audience. As such, I was incredibly curious about how or if these
different types of Twitter users would play a role in the discussions taking place on Twitter
during large-scale political movements like the Hong Kong protests. Ultimately, I chose to study
two different, but related hashtags on Twitter: #UmbrellaMovement and #UmbrellaRevolution.
Background and Literature Review
There have been several prominent revolutions sparked by social media platforms like
Twitter in the past several years. That being said, the social and political atmospheres in these
countries were and are all vastly different from the unique culture in Hong Kong. As such, I felt
that there were two important components to consider when trying to understand the current
movement in Hong Kong: the sociopolitical climate of Hong Kong, as well as the general nature
of political movements on Twitter. Because the unrest in Hong Kong started before protests took
to Twitter, I began my research by learning more about Hong Kong and its relationship with
mainland China.
In 1841, Hong Kong was taken from China and became a British dependency. It was not
until 1997 that Hong Kong was returned to Chinese control. Because of the strong British
influence in Hong Kong, it was decided that Hong Kong would only be returned to China with
certain stipulations about the territory’s relative autonomy. These agreements formed Hong
Kong’s Basic Law, also known as “one country, two systems.”1 Under the Basic Law, Hong
Kong’s citizens have several rights not granted to mainland Chinese citizens including the right
to assembly and the right to develop a democracy. Former top ranking Chinese official Lu Ping
1
Euan McKirdy, “‘One Country, Two Systems’: How Hong Kong Remains Distinct from
China,” CNN, September 30, 2014. 2 was even quoted as saying, “How Hong Kong develops its democracy in the future is completely
within the sphere of the autonomy of Jong Kong. The central government will not interfere.”2
These promises of autonomy helped to mitigate the unease of Hong Kong citizens, although
recent developments have upset this balance. Last year, resentment of Beijing officials in Hong
Kong began to rise after the government announced it would not allow Hong Kong residents to
have full control over electing a chief executive in 2017.3 Instead of having the free elections
they were promised, Hong Kong citizens were told that a nominating committee would be
formed to select two or three candidates, each of whom must “have the endorsement of more
than half of all the members of the nominating committee.”4 Protesters, comprised mainly of
student activists and members of the older Occupy Central movement, believe that CCP officials
in Beijing plan to use this committee to screen political candidates in Hong Kong. 5 It is
important to note that not all Hong Kong citizens support the protesters in Hong Kong. A study
conducted by the School of Journalism and Communication at the Chinese University of Hong
Kong found that in December of 2014, 33.9 percent of subjects interviewed supported the prodemocracy protesters, while 42.3 percent of people were opposed to the protests.6
After doing background research about Hong Kong and its pro-democracy protests, I
shifted my attention to Twitter and social networks. Twitter is a fascinating social media
platform because of the unique way in which it facilitates communication; unlike websites like
McKirdy, “‘One Country, Two Systems,’” CNN, September 30, 2014.
3
Rishi Iyengar, “Hong Kong’s Umbrella Revolutionaries are Slowly Coming Back to the
Streets,” Time, April 14, 2015.
4
Yang Yi, “NPC Decides on Nominating Committee for HKSAR Chief Executive Selection,”
Xinhuanet, August 31, 2014.
5
“Hong Kong’s Democracy Debate,” BBC, October 7, 2014.
6
School of Journalism and Communication, the Chinese University of Hong Kong, “Public
Opinion & Political Development in Hong Kong Survey Results,” December 18, 2014,
10. 2
3 Facebook and MySpace, on which social connections are bidirectional, Twitter is unique in that
users “follow” other users. Whenever a user tweets a message, all of his or her followers are able
to see those tweets and reply to or retweet them. Because relationships on Twitter are directed,
users’ networks are more akin to audiences than groups of friends. Additionally, the immediacy
of Twitter allows for ideas to spread rapidly. In his paper “The Data-Driven Society,” Alex
Pentland stresses that maximizing idea flow is essential in any social network.7 Unsurprisingly,
the unique qualities of Twitter as a medium have given rise to entirely new types of movements.
As Marshall McCluhan says in his book Understanding Media: The Extensions of Man, “the
personal and social consequences of any medium—that is, of any extension of ourselves—result
from the new scale that is introduced into our affairs by each extension of ourselves, or by any
new technology.”8 In the case of Twitter, citizen journalism and digital activism have become
much more prevalent as a result of the social network. Dhiraj Murthy explains in his article
“Twitter: Microphone for the Masses?” that the user base on Twitter is sufficiently large to allow
for the nearly immediate spread of ideas across a large population. 9 As a result, news
organizations are quickly adopting Twitter as a new means of reaching consumers and citizen
journalists are becoming more and more prominent.
The Arab Spring that took place in the Middle East in 2011 was concrete proof of the
power of Twitter in allowing people to communicate in order to bring about political change. In
2011, several researchers at the University of Washington found that social media platforms,
especially Twitter, Facebook, and YouTube, were critical in allowing for conversations that led
Alex Pentland, “The Data-Driven Society,” Scientific American 309, no. 4 (2013): 82.
8
Marshall McLuhan, “The Medium Is the Message,” in Understanding Media: The Extensions
of Man (New York: McGraw-Hill, 1964), 1.
9
Dhiraj Murthy, “Twitter: Microphone for the Masses?” Media, Culture & Society 33, no. 5
(2011): 786.
7
4 to political uprisings and subsequent calls for democracy in both Tunisia and Egypt.10 Another
similar study published in 2013 found that protesters used Twitter extensively during the
revolution in Egypt and the civil war in Syria. Tweets concerning these events were primarily in
two languages: English and Arabic.11 In both of these uprisings, the groups of English and
Arabic users were largely disconnected from one another.12 Furthermore, in both movements
there appeared to be groups of domestic “elite” users tweeting news in English, supporting
Murthy’s view that Twitter is primarily used as a platform for speaking to an audience.13 The
prevalence of a small number of influential users in the networks during the Egyptian and Syrian
uprisings is also consistent with the idea in networking theory that non-random networks contain
central nodes or hubs that “dominate the structure of all networks in which they are present.”14
For my research, I was particularly interested in studying influential users tweeting about the
Hong Kong pro-democracy protests.
Methods and Hypothesis
In order to collect tweets containing #UmbrellaMovement and #UmbrellaRevolution, I
used the TAGS Google Sheet template created by Martin Hawksey.15 This TAGS software
allowed me to collect several thousand tweets every day from February 2, 2015 to March 6,
2015. Because the tweets were not necessarily posted on the same day they were collected using
Philip Howard et al., “Opening Closed Regimes: What Was the Role of Social Media During
the Arab Spring?” PIPTI, working paper, January 2011, 23.
11
Axel Bruns, Tim Highfield, and Jean Burgess, “The Arab Spring and Social Media Audiences:
English and Arabic Twitter Users and Their Networks,” American Behavioral Scientist
57, no. 7 (2013): 889–894.
12
Ibid., 889, 892.
13
Ibid., 885–886.
14
Albert-Lásálo Barabási, Linked: How Everything is Connected to Everything Else and What it
Means for Business, Science, and Everyday Life (New York: Basic Books, 2014), 64. 15
Martin Hawksey, “#TAGS,” accessed May 13, 2015.
10
5 TAGS, my data set actually ended up containing tweets from January 29, 2015 to March 6, 2015.
Although not the entire Twitter data set was accessible using TAGS, I was able to collect over
18,000 unique tweets without having to request special access to the Twitter API. I collected
tweets at approximately the same time every day (10:00 EST) in order to minimize the number
of duplicate tweets collected.
After collecting over a month’s worth of Twitter data, I removed duplicate tweets using
Microsoft Excel. In order to make the data easier to process, I sorted my complete data set by
date and created a separate data set containing only tweets with geolocation data. I excluded
tweets that supposedly were tweeted at the intersection of the equator and the Prime Meridian
(i.e., 0 degrees latitude and longitude) because the locations of these tweets were almost certainly
masked. I used several tools to analyze my data set in different ways: Wordle and Voyant-Tools
were useful for performing text analysis on my Twitter data set; CartoDB allowed me to map my
geolocated tweets and perform spatial analysis on my data set; and Gephi was essential for
viewing the structure of the network of users in my data set and performing social network
analysis. All of these tools are free to use and available online.
I had four major questions before collecting my data: (1) Where are people tweeting
from? (2) What are people tweeting about? (3) Who is tweeting? (4) What patterns emerge from
my data set? Due to the extensive media coverage in the United States about the Hong Kong
protests, I expected to see two primary sources of tweets in my geolocated data set: the United
States and Hong Kong. Because Twitter is banned in mainland China, I did not expect to see any
geolocated tweets from areas in China other than Hong Kong. I also expected there would not be
many people tweeting negatively about the protests using #UmbrellaMovement and
#UmbrellaRevolution due to the pro-democracy nature of the hashtags. As such, I predicted that
6 most of the frequently used terms in my data set would be hashtags related to
#UmbrellaMovement and #UmbrellaRevolution and other terms related to Hong Kong and
democracy. As for the types of users I expected to see in my data set, I predicted that there would
be many relatively uninfluential users and several major hubs. Based on what past researchers
observed in the Middle East, I expected these hubs to primarily be of two types: media sources
and activists. I did not expect there to be a difference in tweeting activity between these two
types of influential users; it seemed that protesters would likely tweet more during important
events, as would citizen journalists and news organizations.
Analysis and Discussion
Before analyzing my Twitter data set, I first wanted to have an idea of how groups of
people with varying levels of support for the protests were comprised. Using RStudio, I created
the visualization below:
Figure 1. R graph showing breakdowns of different levels of support for
protests by age. Data collected by the School of Journalism and
Communication at the Chinese University of Hong Kong, 2014.
7 This visualization suggests that while people who support the protests are of all different ages,
the majority of those opposed to the protests are largely people aged 40 and over. In some
respects, these results are unsurprising; it makes sense that people who have a certain level of
stability (e.g., those who have a job and family) would not want to challenge the government. On
the other hand, many of these older people probably lived in Hong Kong before it was returned
to Chinese control. In this sense, it seems strange that people who grew up with democracy
would not want to stand up for their right to vote. This data set complements my Twitter data set
in that it shows that not all people involved in the discussion about the Hong Kong protests are in
favor of them.
To answer my first research question about where people were tweeting from, I made the
following cluster map using CartoDB. I should note that only 0.66 percent of my tweets were
geolocated, so the map below might not be an entirely representative sample of my data set.
Figure 2. Cluster map showing geolocated tweets containing the hashtags
#UmbrellaMovement or #UmbrellaRevolution. Data from Twitter. Made
using CartoDB.
As expected, the largest cluster of tweets is in Hong Kong, where the pro-democracy movement
is taking place. There are several isolated tweets along the East Coast of the United States, but
they are not close enough in proximity to one another to form a cluster. The only other cluster of
8 tweets, in fact, is centered in London. This may be because of Britain’s close ties to Hong Kong,
as well as the fact that there have been protests in London about the situation in Hong Kong.16
In order to understand what people were tweeting about in my data set, I decided to make
a word cloud visualization using Wordle. After filtering out words I deemed to be irrelevant to
my analysis – including words such as is, and, the, as well as my hashtags themselves,
#UmbrellaMovement and #UmbrellaRevolution – I uploaded the resulting text to Wordle to
create the following visualization:
Figure 3. Word cloud of tweets containing the hashtags #UmbrellaMovement
or #UmbrellaRevolution. Data from Twitter. Made using Wordle.
As I had predicted, many of most frequently used terms are related to Hong Kong, democracy,
and the protests. These include #OccupyHK, #HK, #OccupyCentral, and #democracy (Wordle
removes hashtag symbols). The large size of #Newsbit suggests many of the tweets in my data
set came from news organizations or citizen journalists. Because I was curious about the type of
conversations that were happening in my data set as well as the content, I did not remove the
word RT from my data set. The fact that RT dominates this word cloud shows that people
16
“Hong Kong Protest Outside Chinese Embassy in London.” BBC, October 1, 2014.
9 tweeting about the Hong Kong protests are sharing other people’s ideas very frequently. Several
Twitter usernames also appear in the word cloud, most notably @hk928umbrella. Although I had
not yet done an analysis of the users in my network when I made this word cloud, I expected this
node to be the center of my Twitter data set due to the relatively large size of @hk928umbrella
in the word cloud.
Below is a visualization showing the influence of users in my data set, as determined by
betweenness centrality:
Figure 4. Gephi visualization of influential users tweeting using the hashtags
#UmbrellaMovement or #UmbrellaRevolution. Data from Twitter. Tweets
converted to Gephi-ready format using Deen Freelon’s t2g.py script.17
Unsurprisingly, @hk928umbrella is by far the most central node in my data set. In fact,
@hk928umbrella was so influential that I needed to adjust the scaling of nodes so that other
nodes would be visible. Because of the scaling changes I made, the relative sizes of the other
influential users compared to @hk928umbrella are somewhat misleading: none of the other
nodes come even close to being as central to the network as @hk928umbrella. Several of the
17
Deen Freelon, “T2G: Convert (all) Twitter Mentions to Gephi Format,” DFreelon.org,
accessed May 14, 2015.
10 other influential users in the network, such as @2legit2trip and @PRHacks, also appeared in the
word cloud. The only large node that appears to be somewhat separated from the others is
@rightnowio_feed, which appears in the pink subgroup of users to the left of the main network.
After identifying the most influential users in my network, I visited each of their Twitter
pages to see what types of users they were. Interestingly, nearly all of the most central users were
protesters from Hong Kong. There were two primary exceptions: @hk928umbrella, which is an
account operated by a group of volunteers to share news about the Hong Kong protests, and
@rightnowio_feed, which is an international online news organization. The latter account was
the only influential user in my data set that was not from Hong Kong or specifically related to the
protests. In order to compare the two types of influential users in my network – activists and
news organizations – I compared the relative frequencies of tweets of four users, two of each
type, over a period of one week. Below is a graph of these relative frequencies:
Figure 5. Line graph of tweet patterns of influential users between 1/29/15
and 2/4/15. Red lines show patterns of protesters, while blue lines represent
news organizations. Data from Twitter. Made using Microsoft Excel.
11 It is important to note that @hk928umbrella tweeted many thousands of times, compared to
several dozen times for each of the other three accounts. This is why I graphed relative
frequencies of tweeting instead of raw tweet counts. I chose to study the week of January 29th to
February 4th because on February 1st, activists held their largest demonstration in months.18
While I had expected the tweeting patterns of news organizations and protesters to be somewhat
similar, I was surprised to see that there was a much larger spike in the tweeting of activists than
the tweeting of news organizations during major events. Overall, the tweeting patterns of news
organizations were relatively consistent, whereas protesters tended to tweet much more during
large demonstrations and events.
Together, these visualizations reveal several things about the people discussing the prodemocracy protests in Hong Kong. First and foremost, the group of people involved in the
conversation is not homogeneous. Rather, there are people of a wide variety of ages in many
areas of the world participating in the discussion about the protests. As Figure 1 shows, people in
Hong Kong have varying levels of support for the protesters, with older Hong Kong citizens
being more likely to not support the protests and younger citizens being more likely to support
the protesters. Figure 2, although it only displays about 0.66 percent of the total data set, shows
that the discussion about the Hong Kong pro-democracy protests is not limited to Hong Kong,
although it is certainly centered in Hong Kong. This is not at all surprising considering the
protests themselves are taking place in Hong Kong. The second largest cluster of tweets is in
London, perhaps due to activist activity in the city and the close relationship between Hong
Kong and Britain. Although the other tweets in the geolocated data set do not form clusters, they
nevertheless show that people all over the world are tweeting using #UmbrellaMovement and
18
Lauren Hilgers, “Hong Kong’s Umbrella Revolution Isn’t Over Yet,” New York Times
Magazine, February 22, 2015.
12 #UmbrellaRevolution. While there are many people discussing the Hong Kong protests on
Twitter, Figure 4 shows that several incredibly influential users dominate the discussion.
In addition to showing that the conversation about the protests is widespread and diverse
– although several users are far more influential on Twitter than others – these visualizations
provide key insights into the nature of the discussion. Figure 3 shows that much of the
conversation on Twitter about the Hong Kong protests consists of retweets. This indicates that
the sharing of information and ideas is vital to the conversation about the Hong Kong protests on
Twitter. Furthermore, the most frequently used words in the data set suggest that people
discussing the Hong Kong protests use Twitter in two primary ways: (1) for planning
demonstrations and (2) for sharing news about the protests. Figure 4 shows exactly how these
conversations take place: ideas are shared by one of several particularly influential users. The
tweets of these users reach nearly all other users in the network, allowing ideas to be shared
quickly with a large number of people.
Perhaps the most interesting finding of my research is illustrated in Figure 5. Not only are
there several different types of influential users in the network; there are also unique patterns of
tweeting for the different types of hubs. This is not at all what I had expected to see; I had
initially assumed that regardless of the type of user, people would post more during important
events and less during periods of relative unimportance. What I instead found was that the two
particularly influential news organizations in my data set had a relatively constant rate of
tweeting. In direct contrast to this, activists tended to post many more tweets during important
demonstrations – such as the one on February 1st – with relatively few tweets in between.
Although I was only able to identify two of these tweeting “signatures” in my data set due to the
fact that there were only two types of hubs in my network, it seems likely that there would be
13 similar patterns for other types of users in different contexts. I would be fascinated to see if
distinct tweeting archetypes could be identified for different types of users. This may be an
interesting direction for future research.
Conclusion
The Hong Kong protests have been a fascinating example of how political movements
can take on an entirely different form through the use of technology. Whereas demonstrations in
the past were largely limited to isolated regions or countries, new media platforms like Twitter
have allowed these conversations to take place on a global scale. Furthermore, the structure of
Twitter is uniquely suitable to directing messages towards large audiences almost
instantaneously. The nature of communication on Twitter allows both activists and news
organizations or citizen journalists to spread their message with relative ease. As such, it is
unsurprising that the most central users in the Twitter discussion about the Hong Kong protests
are news organizations and protesters. These two types of users appear to have distinct tweeting
signatures. That we might be able to identify types of people based on how they tweet is an
exciting prospect not just for the study of political movements on Twitter, but also for social
media research in general.
14 Works Cited
Barabási, Albert-Lásálo. Linked: How Everything is Connected to Everything Else and What it
Means for Business, Science, and Everyday Life. New York: Basic Books, 2014.
Bruns, Axel, Tim Highfield, and Jean Burgess. “The Arab Spring and Social Media Audiences:
English and Arabic Twitter Users and Their Networks.” American Behavioral Scientist
57, no. 7 (2013): 871–898.
Freelon, Deen. “T2G: Convert (all) Twitter Mentions to Gephi Format.” DFreelon.org, accessed
May 14, 2015. http://dfreelon.org/2013/05/14/t2g-convert-all-twitter-mentions-to-gephiformat.
Hawksey, Martin. “#TAGS.” Accessed May 13, 2015. https://tags.hawksey.info.
Hilgers, Lauren. “Hong Kong’s Umbrella Revolution Isn’t Over Yet.” New York Times
Magazine, February 22, 2015. http://www.nytimes.com/2015/02/22/magazine/hongkongs-umbrella-revolution-isnt-over-yet.html.
“Hong Kong Protest Outside Chinese Embassy in London.” BBC, October 1, 2014.
http://www.bbc.com/news/uk-29452299.
“Hong Kong’s Democracy Debate.” BBC, October 7, 2014. http://www.bbc.com/news/worldasia-china-27921954.
Howard, Philip, Aiden Duffy, Deen Freelon, Muzammil Hussain, Will Mari, and Marwa Mazaid.
“Opening Closed Regimes: What Was the Role of Social Media During the Arab
Spring?”
PIPTI.
Working
paper,
January
2011.
http://pitpi.org/wpcontent/uploads/2013/02/2011_Howard-Duffy-Freelon-Hussain-Mari-Mazaid_pITPI.pdf.
Iyengar, Rishi. “Hong Kong’s Umbrella Revolutionaries are Slowly Coming Back to the
Streets.” Time, April 14, 2015. http://time.com/3814943/occupy-hong-kong-chinaumbrella-revolution-democracy.
McKirdy, Euan. “‘One Country, Two Systems’: How Hong Kong Remains Distinct from
China.” CNN, September 30, 2014. http://www.cnn.com/2014/09/29/world/asia/hongkong-protest-backgrounder.
McLuhan, Marshall. “The Medium Is the Message.” In Understanding Media: The Extensions of
Man, 1–11. New York: McGraw-Hill, 1964.
Murthy, Dhiraj. “Twitter: Microphone for the Masses?” Media, Culture & Society 33, no. 5
(2011): 779–789.
Pentland, Alex. “The Data-Driven Society.” Scientific American 309, no. 4 (2013): 78–83.
School of Journalism and Communication, the Chinese University of Hong Kong. “Public
Opinion & Political Development in Hong Kong Survey Results.” December 18, 2014.
http://www.com.cuhk.edu.hk/ccpos/images/news/TaskForce_PressRelease_141218_Engl
ish.pdf.
Yi, Yang. “NPC Decides on Nominating Committee for HKSAR Chief Executive Selection.”
Xinhuanet,
August
31,
2014.
http://news.xinhuanet.com/english/china/201408/31/c_133609213.htm.