Power of Predictive Analytics: Using Emotion Classification of Twitter Data for Predicting 2016 US Presidential Elections
Keywords:
machine learning, emotion classification, lexicon-based classifier, predictive analytics, social media, twitterAbstract
Predictive analytics using the twitter feeds is becoming a popular field for research. A tweet holds a wealth of information on how an individual expresses and communicates their feelings and emotions within their social network. Large-scale collection, cleaning, and mining of tweets will not only help in capturing an individual’s emotion but also the emotions of a larger group. However, capturing a large volume of tweets and identifying the emotions expressed in it is a challenging task. Different classification algorithms employed in the past for classifying emotions have resulted in low-to-moderate accuracies thus making it difficult to precisely predict the outcome of an event. Secondly, the presence of diverse emotion annotated datasets, none of which are specific to a particular domain, has limited the potentiality of supervised algorithms for classification purposes. In this study, we demonstrate the potentiality of a lexicon-based classifier, NRC, which can mine emotions and sentiments in tweets. Using the NRC classifier, we initially determined the emotions and the sentiments within the tweets and used that to predict the swing direction of the 19 US states towards the candidates of the 2016 US presidential election. Comparing the predictions from the NRC against with the actual outcome of the election, we observed a ~90% accuracy, a performance superior to the mainstream pollsters indicating the potential emotion and sentiment-based classification holds in predicting the outcome of significant social and political events.
References
Aman, S., Szpakowicz, S. (2007). Identifying Expressions of Emotion in Text. TSD 2007, LNAI 4629, 196-205.
Badshah, A.M., Ahmad, J., Lee, M.Y., Baik, S.W. (2016). Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest. Proceedings of the 2nd International Integrated Conference & Concert on Convergence, 1-8.
Barbosa, L., Feng, J. (2010). Robust sentiment detection on Twitter from biased and noisy data. Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), 36-44.
Chaffar, S., Inkpen, D. (2011). Using a Heterogeneous Dataset for Emotion Analysis in Text. Advances in Artificial Intelligence – 24th Canadian Conference on Artificial Intelligence.
Choudhury, M.D., Gamon, M., Counts, S., Horvitz, E. (2013). Predicting depression via social media. International AAAI Conference on Weblogs and Social Media (ICWSM’13).
Danisman, T., Alpkocak, A. (2008). Feeler: Emotion Classification of Text Using Vector Space Model. AISB Convention Communication, Interaction and Social Intelligence, 53-59.
Ghazi, D., Inkpen, D., Szpakowicz, S. (2010). Hierarchical versus Flat Classification of Emotions in Text. Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, 140-146.
Hasan, M., Rundensteiner, E., Agu, E. (2014). EMOTEX: Detecting Emotions in Twitter Messages. Academy of Science and Engineering.
Hu, X., Tang, J., Gao, H., Liu, H. (2013). Unsupervised sentiment analysis with emotional signals. Proceedings of the 22nd international conference on World Wide Web, WWW’13. ACM.
Katz, J. (2016, November 8). Who Will Be President? New York Times. Retrieved from https://www.nytimes.com/interactive/2016/upshot/presidential-polls-forecast.html
Ling, R., Baron, N.S. (2007). Text Messaging and IM: Linguistic Comparison of American College Data. Journal of Language and Social Psychology, 26(3), 291-298.
LLiou, T., Anagnostopoulos, C.N. (2009). Comparison of Different Classifiers for Emotion Recognition. 13th Panhellenic IEEE Conference on Informatics, Retrieved from http://ieeexplore.ieee.org/document/5298878/
Maleki, R. E., Rezaei, A., Bidgoli, B. M. (2009). Comparison of classification methods based on the type of attributes and sample size. Journal of Convergence Information Technology. 4(3). 94-102
Mohammad, S., Turney, P. (2011). Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon. Proceedings of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text.
Mohammad, S. (2012). Emotional Tweets. Proceedings of the First Joint Conference on Lexical and Computational Semantics.
Pak, A., Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), 1320-1326.
Peng, B., Lee, L., Vaithyanathan, S. (2002). Thumbs us? Sentiment classification using machine learning techniques. Proceedings of the Seventh Conference on Empirical Methods in Natural Language Processing (EMNLP-02), 79-86.
Purver, M., Battersby, S. (2012). Experimenting with distant supervision for emotion classification. Proceedings of the 13th EACL. Association for Computational Linguistics, 482-491.
Roberts, K., Roach, M. A., Johnson, J., Guthrie, J., Harabagiu, S. M. (2012). EmpaTweet: Annotating and Detecting Emotions on Twitter. LREC, 3806-3813.
Rohini, V., Thomas, M. (2015). Comparison of Lexicon based and Naïve Bayes Classifier in Sentiment Analysis. International Journal for Scientific Research & Development, 3(4).
Russell, J.A. (1980). A Circumplex model of affect. Journal of Personality and Social Psychology, 39, 1161-1178.
Stanton, J. (2013). An Introduction to the Data Science. Retried from https://www.scribd.com/document/194116122/Data-Science-Book-v-3
Thelwall, M., Buckley, K., Platoglou, G., Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544-2558.
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).