Power of Predictive Analytics: Using Emotion Classification of Twitter Data for Predicting 2016 US Presidential Elections

Satish Mahadevan Srinivasan; Raghvinder Sangwan; Colin Neill; Tianhai Zu

Authors

Satish Mahadevan Srinivasan Penn State Great Valley
Raghvinder Sangwan Penn State Great Valley
Colin Neill Penn State Great Valley
Tianhai Zu Department of Operations, Business Analytics, and Information Systems, University of Cincinnati, OH 45221USA

Keywords:

machine learning, emotion classification, lexicon-based classifier, predictive analytics, social media, twitter

Abstract

Predictive analytics using the twitter feeds is becoming a popular field for research. A tweet holds a wealth of information on how an individual expresses and communicates their feelings and emotions within their social network. Large-scale collection, cleaning, and mining of tweets will not only help in capturing an individual’s emotion but also the emotions of a larger group. However, capturing a large volume of tweets and identifying the emotions expressed in it is a challenging task. Different classification algorithms employed in the past for classifying emotions have resulted in low-to-moderate accuracies thus making it difficult to precisely predict the outcome of an event. Secondly, the presence of diverse emotion annotated datasets, none of which are specific to a particular domain, has limited the potentiality of supervised algorithms for classification purposes. In this study, we demonstrate the potentiality of a lexicon-based classifier, NRC, which can mine emotions and sentiments in tweets. Using the NRC classifier, we initially determined the emotions and the sentiments within the tweets and used that to predict the swing direction of the 19 US states towards the candidates of the 2016 US presidential election. Comparing the predictions from the NRC against with the actual outcome of the election, we observed a ~90% accuracy, a performance superior to the mainstream pollsters indicating the potential emotion and sentiment-based classification holds in predicting the outcome of significant social and political events.

Author Biographies

Satish Mahadevan Srinivasan, Penn State Great Valley

Satish M. Srinivasan, assistant professor of information science, received his B.E. in Information Technology from Bharathidasan University, India and M.S. in Industrial Engineering and Management from the Indian Institute of Technology Kharagpur, India. He earned his Ph.D. in Information Technology from the University of Nebraska at Omaha. Prior to joining Penn State Great Valley, he worked as a postdoctoral research associate at University of Nebraska Medical Center, Omaha. Dr. Srinivasan teaches courses related to database design, data mining, data collection and cleaning, computer, network and web securities, and business process management.. His research interests include data aggregation in partially connected networks, fault- tolerance, software engineering, social network analysis, data mining, machine learning, Big Data and predictive analytics and bioinformatics.

Raghvinder Sangwan, Penn State Great Valley

Raghu S. Sangwan, associate professor of software engineering, holds a Ph.D. in computer and information sciences from Temple University. He joined Penn State in 2003 after more than seven years in industry, where he worked mostly with large software-intensive systems in the domains of health care, automation, transportation, and mining. His teaching and research involves analysis, design, and development of software intensive systems, their architecture, and automatic and semi-automatic approaches to assessment of their design and code complexity. He actively consults for Siemens Corporate Research in Princeton, New Jersey, and also holds a visiting scientist appointment at the Software Engineering Institute at Carnegie Mellon University in Pittsburgh, Pennsylvania. He is a senior member of the IEEE and ACM.

Colin Neill, Penn State Great Valley

Colin J. Neill, associate professor of software engineering and systems engineering and director of engineering programs, earned his Ph.D. in software and systems engineering, M.Sc. in communication systems, and B.Eng. in electrical engineering from the University of Wales, Swansea, United Kingdom. He teaches a wide range of software and systems engineering courses in system design and architecture, project management, and systems thinking. Prior to joining Penn State, Dr. Neill worked on time and mission critical system modeling and design and manufacturing systems and production management with University of Wales, Swansea, Oxford University, the Rover Car Company, and British Aerospace. He is author of over 80 articles on the development and evolution of complex software and systems and the management and governance thereof. He is a Senior Member of the IEEE, a member of INCOSE, and serves as associate editor-in-chief of Innovations in Systems and Software Engineering. As Director of Engineering Programs, Dr. Neill oversees the Division’s portfolio of graduate degree programs delivered both in residence and online.

References

Alm, C. O. (2008). Affect in Text and Speech. PhD Dissertation. University of Illinois at Urbana-Champaign.

Aman, S., Szpakowicz, S. (2007). Identifying Expressions of Emotion in Text. TSD 2007, LNAI 4629, 196-205.

Badshah, A.M., Ahmad, J., Lee, M.Y., Baik, S.W. (2016). Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest. Proceedings of the 2nd International Integrated Conference & Concert on Convergence, 1-8.

Barbosa, L., Feng, J. (2010). Robust sentiment detection on Twitter from biased and noisy data. Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), 36-44.

Chaffar, S., Inkpen, D. (2011). Using a Heterogeneous Dataset for Emotion Analysis in Text. Advances in Artificial Intelligence – 24th Canadian Conference on Artificial Intelligence.

Choudhury, M.D., Gamon, M., Counts, S., Horvitz, E. (2013). Predicting depression via social media. International AAAI Conference on Weblogs and Social Media (ICWSM’13).

Danisman, T., Alpkocak, A. (2008). Feeler: Emotion Classification of Text Using Vector Space Model. AISB Convention Communication, Interaction and Social Intelligence, 53-59.

Ghazi, D., Inkpen, D., Szpakowicz, S. (2010). Hierarchical versus Flat Classification of Emotions in Text. Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, 140-146.

Hasan, M., Rundensteiner, E., Agu, E. (2014). EMOTEX: Detecting Emotions in Twitter Messages. Academy of Science and Engineering.

Hu, X., Tang, J., Gao, H., Liu, H. (2013). Unsupervised sentiment analysis with emotional signals. Proceedings of the 22nd international conference on World Wide Web, WWW’13. ACM.

Katz, J. (2016, November 8). Who Will Be President? New York Times. Retrieved from https://www.nytimes.com/interactive/2016/upshot/presidential-polls-forecast.html

Ling, R., Baron, N.S. (2007). Text Messaging and IM: Linguistic Comparison of American College Data. Journal of Language and Social Psychology, 26(3), 291-298.

LLiou, T., Anagnostopoulos, C.N. (2009). Comparison of Different Classifiers for Emotion Recognition. 13th Panhellenic IEEE Conference on Informatics, Retrieved from http://ieeexplore.ieee.org/document/5298878/

Maleki, R. E., Rezaei, A., Bidgoli, B. M. (2009). Comparison of classification methods based on the type of attributes and sample size. Journal of Convergence Information Technology. 4(3). 94-102

Mohammad, S., Turney, P. (2011). Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon. Proceedings of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text.

Mohammad, S. (2012). Emotional Tweets. Proceedings of the First Joint Conference on Lexical and Computational Semantics.

Pak, A., Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), 1320-1326.

Peng, B., Lee, L., Vaithyanathan, S. (2002). Thumbs us? Sentiment classification using machine learning techniques. Proceedings of the Seventh Conference on Empirical Methods in Natural Language Processing (EMNLP-02), 79-86.

Purver, M., Battersby, S. (2012). Experimenting with distant supervision for emotion classification. Proceedings of the 13th EACL. Association for Computational Linguistics, 482-491.

Roberts, K., Roach, M. A., Johnson, J., Guthrie, J., Harabagiu, S. M. (2012). EmpaTweet: Annotating and Detecting Emotions on Twitter. LREC, 3806-3813.

Rohini, V., Thomas, M. (2015). Comparison of Lexicon based and Naïve Bayes Classifier in Sentiment Analysis. International Journal for Scientific Research & Development, 3(4).

Russell, J.A. (1980). A Circumplex model of affect. Journal of Personality and Social Psychology, 39, 1161-1178.

Stanton, J. (2013). An Introduction to the Data Science. Retried from https://www.scribd.com/document/194116122/Data-Science-Book-v-3

Thelwall, M., Buckley, K., Platoglou, G., Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544-2558.

Power of Predictive Analytics: Using Emotion Classification of Twitter Data for Predicting 2016 US Presidential Elections

Authors

Keywords:

Abstract

Author Biographies

Satish Mahadevan Srinivasan, Penn State Great Valley

Raghvinder Sangwan, Penn State Great Valley

Colin Neill, Penn State Great Valley

References

Downloads

Published

Issue

Section

License

Information