ChinaAir: Study of air quality in Chinese cities

ChinaAir: Study of air quality in Chinese cities via Twitter Carter Yagemann ([email protected]) Abstract ​
— ​
In recent months, western media has drawn focus towards China’s air pollution problems. Outlets are claiming that the air pollution in major Chinese cities is unacceptably high and endangers the health of their citizens. Some outlets also speculate that if air pollution continues to worsen, it could start to affect neighboring countries. ChinaAir is a preliminary one month study into China’s air pollution and takes advantage of new air filters which are able to tweet their quality metrics to Twitter in hourly intervals. By collecting this data, the ChinaAir study has found that while Beijing, Shanghai, Guangzhou, and Chengdu do have air pollution at levels which need immediate addressing, there does not yet appear to be a correlation between one city’s air pollution and another city’s air pollution, let alone correlations between one country’s air pollution and another country’s air pollution. However, given the short nature of this study, the findings serve more to demonstrate the nontrivial nature of air pollution dispersion rather than suggest that spreading air pollution is not a problem. I. INTRODUCTION China’s air pollution problem has been gaining a lot of attention in the western media in recent months. Many outlets are claiming that the pollution in major Chinese cities have reached unacceptable levels [4] and some are even going as far as to claim that if the air pollution continues to worsen, it will start to threaten the air quality of other countries such as America [2]. Recently, the United States embassy in Beijing has taken action towards raising awareness to this issue [1]. In a bold political move, the embassy deployed an air quality sensor which is configured to report its air pollution measurements as hourly tweets on the popular social media platform Twitter [5]. Making this near real­time information publicly available sparked great controversy, but ultimately lead to the Chinese government deploying its own sensors which likewise post to Twitter [6][7][8]. This poses a rare opportunity for study and analysis without the need for major financial or logistical resources. An opportunity which lead to the conduction of this study. Claims made by news media need to be accepted with caution and challenged or verified whenever possible. With that in mind, ChinaAir is a preliminary one month study which 1 leverages the tweeting air quality sensors in Beijing, Shanghai, Guangzhou, and Chengdu to explore China’s air pollution problem. II. PROBLEM DEFINITION The objectives of the ChinaAir study are twofold. First, ChinaAir seeks to compare collected data against western media claims to evaluate the severity of China’s air pollution in Beijing, Shanghai, Guangzhou, and Chengdu. How bad is China’s air pollution and is it something which Chinese citizens should be worried about? Second, the ChinaAir study aims to provide an initial glimpse into the correlation between one city’s air pollution and another’s. Western media is concerned that China’s air pollution could start negatively impacting other countries such as America. How valid is this claim and to what degree should we be concern? III. METHODOLOGY ChinaAir looks at four major Chinese cities: Beijing, Shanghai, Guangzhou, and Chengdu. Each has an air quality sensor which tweets its measurements on an hourly basis to Twitter. Figure 1 contains an example tweet. The tweets consist of five sections separated by semicolons. The first is a timestamp for when the measurement was taken. The second is the size of the particles the air sensor is measuring. In the case of the feeds ChinaAir used, measurements were for particles 2.5 micrometers in size and less. These particles are of particular interest because they pose the most threat to human health [9]. Particles of this size are able to reach the deepest parts of the respiratory tract and pass into the bloodstream. The next two numbers represent the actual pollution measurement given in two different units of measurement. For this study, the second measurement was used which is the pollution level in terms of the Air Quality Index (AQI) [10]. Lastly, the corresponding human readable classification for that AQI measurement is given. Figure 1. An example tweet from the BeijingAir Twitter feed. 2 The data for this study was aggregated using a python script scheduled to run every hour. This code has been open sourced and is therefore publicly available [3]. The script is very simple. First, using Twitter’s public API, the script grabs the latest tweet made by the air quality sensor. Next, the script converts the measurement from a semicolon separated list to a comma separated list. This is done so the final output will be in comma separated values (CSV) format which makes later importing into spreadsheet applications easier. Finally, the script writes the new entry to a file. To collect measurements from multiple feeds, the script was modified to run this procedure for each feed. Once the data is collected, cleaning is necessary. Duplicates are removed as well as unneeded tweets. Unneeded tweets include the 24 hour average pollution which is reported at the end of each day and the occasional no data tweet which resulted from errors or maintenance on the air sensors. For analysis of the individual cities, all the remaining data entries were used while for cross­city analysis, only entries for timeframes were all four air sensors successfully provided measurements were used. In other words, no averaging was used to fill in missing data points. IV. ANALYSIS Figures 2, 3, 4, and 5 show one week slices of the air pollution measurements from the four cities of interest. First, looking at the cities individually, we can see that the air pollution from one day to the next varies significantly. Shanghai, for example, can be around 50 one day (moderate, but acceptable air quality) and 200 the next day (unhealthy for everyone). While the four cities had good days and bad days, ​
all the cities are reaching the unhealthy range on the weekly basis with Beijing and Shanghai being slightly worse than Guangzhou and Chengdu. This trend carried across the full month of data that was collected. While Beijing’s worst measurements did exceed 250, no measurement exceeded the 500 maximum value of the AQI. 3 Figure 2. One week of Beijing AQI measurements. Figure 3. One week of Chengdu AQI measurements. Figure 4. One week of Guangzhou AQI measurements. 4 Figure 5. One week of Shanghai AQI measurements. To check for correlations in the air pollution between cities, scatter plots were made with one axis containing the AQI measurements for one city and the other axis containing the AQI measurements of the other city at the same timeframe. From this, linear regressions were calculated along with their corresponding coefficient of determination values. For all the tests except one, one week of data points (roughly 160) were used. For the fifth test, roughly 400 data points were used to test for correlations over longer periods of time. Figure 6 shows the results of these five tests. None of the tests produced any significant correlations with the highest correlation only being about 0.37 and only over one week. As the timeframe increased from one week to roughly one month, the correlation decreased even more. City Pair 2 R​
Beijing ­ Chengdu 0.1132034734
Beijing ­ Shanghai 0.3704225117
Guangzhou ­ Chengdu 0.0089537311
Shanghai ­ Guangzhou 0.245955262
Chengdu ­ Shanghai (long term) 0.007356376
Figure 6. Tests for correlation in air pollution between cities. 5 V. DISCUSSION First looking at the cities individually, I believe this study provides evidence in support of the claim that China has an air pollution problem. While the air pollution varied significantly from day to day, all four examined cities were hitting unhealthy levels on the weekly basis which is not a good thing. Beijing and Shanghai appear to have more air pollution than Guangzhou and Chengdu, but not by much depending on the day. It would be interesting to explore the factors which might influence the differences in air pollution between the four cities, like population density and economic composition, but such research is beyond the scope of this study. Overall, this first finding fell in line with my expectations. More interestingly, though, was the correlations, or lack thereof, between the air pollutions of the different cities. I was expecting there to be a correlation between at least one pair, but this ended up not being the case. And as the datasets increased in size, the correlations only decreased. This finding could be used to argue that America shouldn’t fear Chinese air pollution spreading to the states anytime soon, but given the preliminary nature of this study, I advise against jumping to such a conclusion. Rather, I would prefer to propose that the problem of how air pollution spreads is nontrivial and something which is worthy of more indepth study. I intentionally picked cities in a north, south, east, west orientation hoping that weather or wind would cause a correlation to appear, but this was not the case. Unfortunately for this study, it appears that air pollution is a more complicated matter which cannot be exposed with a simple linear correlation test. Granted, the cities which this study examined are hundreds of miles apart from each other, but I considered this acceptable since the claim I was trying to challenge was concerned about pollution traversing the entire Pacific Ocean. That said, it takes time for particles to move from one place to another, so perhaps a mistake which lead to this study producing no correlations was in the choice of intervals. Perhaps wind is a contributing factor on a larger scale such as a day to day scale or week to week scale. VI. CLOSING REMARKS I must disclaim that my primary profession isn’t environmental studies but rather computer science. Given this and the small timeframe over which this study was conducted, I do not expect the ChinaAir study to be the deciding factor in any reasonable argument. Instead, I hope that this small study will promote cross disciplinary research, which I believe is not only valuable but necessary towards the advancement of science, as well as inspire individuals to challenge and personally verify the claims which all too frequently are stated as fact by news media. While no one person has the time, resources, or patience to test every claim they come 6 across, a society where individual knowledge is comprised of mostly second hand information is a society prone to misconceptions and skewed views. VII. REFERENCES [1] http://www.wired.com/2015/03/opinion­us­embassy­beijing­tweeted­clear­air/ [2] http://www.smithsonianmag.com/science­nature/air­pollution­china­is­spreading­across­ pacific­us­180949395/ [3] https://bitbucket.org/carter­yagemann/beijingair [4] http://www.nytimes.com/2013/01/13/science/earth/beijing­air­pollution­off­the­charts.html [5] https://twitter.com/BeijingAir [6] https://twitter.com/Guangzhou_Air [7] https://twitter.com/CGShanghaiAir [8] https://twitter.com/CGChengduAir [9] https://www.health.ny.gov/environmental/indoors/air/pmq_a.htm [10] http://airnow.gov/index.cfm?action=aqibasics.aqi 7