Infectious diseases impose a significant burden to the U.S. public health system. The rise of HIV/AIDS in
Infectious diseases impose a significant burden to the U.S. public health system. The rise of HIV/AIDS in the late 1970s, pandemic H1N1 flu in 2009, the H3N2 epidemic during the 2012-2013 winter season, the Ebola virus disease outbreak in 2015, and the Zika virus scare in 2016 have demonstrated the susceptibility of people to such contagious diseases. Virtually each year influenza outbreaks happen in various forms and result in consequences of varying impacts.
The annual impact of seasonal influenza outbreaks in the United States is reported to be an average of 610,660 undiscounted life-years lost, 3.1 million hospitalized days, 31.4 million outpatient visits, and a total of $87.1 billion in economic burden. As a result of this growing trend, new data analytics techniques and technologies capable of detecting, tracking, mapping, and managing such diseases have come on the scene in recent years. In particular, digital surveillance systems have shown promise in their capacity to discover public health-seeking patterns and transform these discoveries into actionable strategies.
This project demonstrated that social media can be utilized as an effective method for early detection of influenza outbreaks. We used a Big Data platform to employ Twitter data to monitor influenza activity in the United States. Our Big Data analytics methods comprised temporal, spatial, and text mining. In the temporal analysis, we examined whether Twitter data could indeed be adapted for the nowcasting of influenza outbreaks. In spatial analysis, we mapped flu outbreaks to the geospatial property of Twitter data to identify influenza hotspots. Text analytics was performed to identify popular symptoms and treatments of flu that were mentioned in tweets.
The IBM InfoSphere BigInsights platform was employed to analyze two sets of flu activity data:
Twitter data were used to monitor flu outbreaks in the United States, and Cerner HealthFacts data warehouse was used to track real-world clinical encounters. A huge volume of flu-related tweets was crawled from Twitter using Twitter Streaming API and was then ingested into a Hadoop cluster.
Our findings demonstrated that the integration of social media and medical records can be a valuable supplement to the existing surveillance systems.
Our results confirmed that flu-related traffic on social media is closely related with the actual flu outbreak.
This has been shown by other researchers as well (St Louis & Zorlu, 2012; Broniatowski, Paul, & Dredze, 2013). We performed a time-series analysis to obtain the spatial-temporal cross- correlation between the two trends (91%) and observed that clinical flu encounters lag behind online posts. In addition, our location analysis revealed several public locations from which a majority of tweets were originated. These findings can help health officials and governments to develop more accurate and timely forecasting models during outbreaks and to inform individuals about the locations that they should avoid during that time period.
Questions for Discussion
1. Why would social media be able to serve as an early predictor of flu outbreaks?
2. What other variables might help in predicting such outbreaks?
3. Why would this problem be a good problem to solve using Big Data technologies mentioned in this chapter?
Step by Step Answer: