Empirical comparison of sentiment analysis techniques for social media

Hameed, et al.

	IJAAS
	International journal of ADVANCED AND APPLIED SCIENCES EISSN: 2313-3724, Print ISSN:2313-626X Frequency: 12





Volume 5, Issue 4 (April 2018), Pages: 115-123 ---------------------------------------------- Original Research Paper Title: Empirical comparison of sentiment analysis techniques for social media Author(s): Maria Hameed ¹, Faizan Tahir ^2, , M. Ali Shahzad ¹ Affiliation(s):* ¹Department of Computer Science, University of Sargodha, Lahore, Pakistan ²Department of Computer Science, Virtual University of Pakistan, Faisalabad, Pakistan https://doi.org/10.21833/ijaas.2018.04.015 Full Text - PDF XML Abstract: Nowadays the excessive use of internet produces a huge amount of data due to the social networks such as Twitter, Facebook, Orkut and Tumbler. These are microblogging sites and are used to share the people opinions and suggestions on daily basis relevant to the certain topic. These are beneficial for decision making or extracting conclusions. Analysis of these feeds aims to assess the thinking and comments of people about some personality or topic. Sentiment analysis is a type of text classification and is performed by various techniques such as Machine Learning Techniques and shows that the text is negative, positive or neutral. In this work, we provide a comparison of most recent sentiment analysis techniques such as Naïve Bayes, Bagging, Random Forest, Decision Tree, Support Vector Machine and Maximum entropy. The purpose of the study is to provide an empirical analysis of existing classification techniques for social media for analyzing the good performance and better information retrieval. A comprehensive comparative framework is designed to compare these techniques. Various benchmark datasets (UCI, KEEL) available in different repositories are used for comparison purpose. We presented an empirical analysis of six classifiers. The analysis results that Support Vector Machine performs much better as compared to other. Efforts are made to provide a conclusion about different algorithms on the basis of numerical and graphical metrics to conclude that which algorithm is optimal. © 2018 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Keywords: Sentiment analysis, Social media, UCI database, KEEL support vector machine Article History: Received 29 November 2017, Received in revised form 15 February 2018, Accepted 27 February 2018 Digital Object Identifier: https://doi.org/10.21833/ijaas.2018.04.015 Citation: Hameed M, Tahir F, M. and Shahzad A (2018). Empirical comparison of sentiment analysis techniques for social media. International Journal of Advanced and Applied Sciences, 5(4): 115-123 Permanent Link: http://www.science-gate.com/IJAAS/2018/V5I4/Hameed.html ---------------------------------------------- References (43) Abbasi A, Chen H, and Salem A (2008). Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Transactions on Information Systems (TOIS), 26(3): 1-35. https://doi.org/10.1145/1361684.1361685 [Google Scholar] Barbosa L and Feng J (2010). Robust sentiment detection on twitter from biased and noisy data. In the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics, Beijing, China: 36-44. [Google Scholar] Batra S and Rao D (2010). Entity based sentiment analysis on twitter. Science, 9(4): 1-12. [Google Scholar] Belavagi MC and Muniyal B (2016). Performance evaluation of supervised machine learning algorithms for intrusion detection. Procedia Computer Science, 89: 117-123. https://doi.org/10.1016/j.procs.2016.06.016 [Google Scholar] Bhavsar H and Ganatra A (2012). A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering (IJSCE), 2(4): 2231-2307. [Google Scholar] Breiman L (1994). Heuristics of instability in model selection. Technique Report, Statistics Department, University of California at Berkeley, Berkeley, USA. [Google Scholar] Breiman L (2001). Random forests. Machine Learning, 45(1): 5-32. https://doi.org/10.1023/A:1010933404324 [Google Scholar] Chavan GS, Manjare S, Hegde P, and Sankhe A (2014). A survey of various machine learning techniques for text classification. International Journal of Engineering Trends and Technology (IJETT), 15(6): 288-292. https://doi.org/10.14445/22315381/IJETT-V15P255 [Google Scholar] Das TK, Acharjya DP, and Patra MR (2014). Opinion mining about a product by analyzing public tweets in Twitter. In the International Conference on Computer Communication and Informatics, IEEE, Coimbatore, India: 1-4. https://doi.org/10.1109/ICCCI.2014.6921727 [Google Scholar] Devika MD, Sunitha C, and Ganesh A (2016). Sentiment analysis: A comparative study on different approaches. Procedia Computer Science, 87: 44-49. https://doi.org/10.1016/j.procs.2016.05.124 [Google Scholar] Drummond C and Holte RC (2000). Explicitly representing expected cost: An alternative to ROC representation. In the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Boston, Massachusetts, USA: 198-207. https://doi.org/10.1145/347090.347126 [Google Scholar] Drummond C and Holte RC (2006). Cost curves: An improved method for visualizing classifier performance. Machine Learning, 65(1): 95-130. https://doi.org/10.1007/s10994-006-8199-5 [Google Scholar] Frank A (2010). UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, USA. Genc Y, Sakamoto Y, and Nickerson J (2011). Discovering context: classifying tweets through a semantic transform based on wikipedia. In: Schmorrow DD and Fidopiastis CM (Eds.), Foundations of augmented cognition: Directing the future of adaptive systems: 484-492. Springer Science and Business Media, Berlin, Germany. https://doi.org/10.1007/978-3-642-21852-1_55 [Google Scholar] Holmes G, Donkin A, and Witten IH (1994). Weka: A machine learning workbench. In the 2nd Australian and New Zealand Conference on Intelligent Information Systems, IEEE, Brisbane, Qld., Australia: 357-361. https://doi.org/10.1109/ANZIIS.1994.396988 [Google Scholar] Huang J, Lu J, and Ling CX (2003). Comparing naive Bayes, decision trees, and SVM with AUC and accuracy. In the 3rd IEEE International Conference on Data Mining, IEEE, Melbourne, USA: 553-556. https://doi.org/10.1109/ICDM.2003.1250975 [Google Scholar] Jaynes ET (1957). Information theory and statistical mechanics. Physical review, 106(4): 620-630. https://doi.org/10.1103/PhysRev.106.620 [Google Scholar] Kalarikkal S and Remya PC (2015). Sentiment analysis and dataset collection: A comparitive study. In the IEEE International Advance Computing Conference, IEEE, Banglore, India: 519-524. https://doi.org/10.1109/IADCC.2015.7154762 [Google Scholar] Kang H, Yoo SJ, and Han D (2012). Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications, 39(5): 6000-6010. https://doi.org/10.1016/j.eswa.2011.11.107 [Google Scholar] Kharde V and Sonawane P (2016). Sentiment analysis of twitter data: A survey of techniques. International Journal of Computer Applications, 139(11): 5-15. [Google Scholar] Kotalwar R, Gandhi S, and Chavan R (2014). Data mining: Evaluating performance of employee's using classification algorithm based on decision tree. Engineering Science and Technology: An International Journal, 4(2): 29-35. [Google Scholar] Kotsiantis SB (2007). Supervised machine learning: A review of classification techniques. Informatica, 31(3): 249-268. [Google Scholar] Kumar R and Verma R (2012). Classification algorithms for data mining: A survey. International Journal of Innovations in Engineering and Technology (IJIET), 1(2): 7-14. [Google Scholar] Kwak H, Lee C, Park H, and Moon S (2010). What is Twitter, a social network or a news media?. In the 19th International Conference on World Wide Web, ACM, Raleigh, North Carolina, USA: 591-600. https://doi.org/10.1145/1772690 [Google Scholar] Lane PC, Clarke D, and Hender P (2012). On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data. Decision Support Systems, 53(4): 712-718. https://doi.org/10.1016/j.dss.2012.05.028 [Google Scholar] Medhat W, Hassan A, and Korashy H (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4): 1093-1113. https://doi.org/10.1016/j.asej.2014.04.011 [Google Scholar] Melville P, Gryc W, and Lawrence RD (2009). Sentiment analysis of blogs by combining lexical knowledge with text classification. In the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Paris, France: 1275-1284. https://doi.org/10.1145/1557019 [Google Scholar] Moraes R, Valiati JF, and Neto WPG (2013). Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Systems with Applications, 40(2): 621-633. https://doi.org/10.1016/j.eswa.2012.07.059 [Google Scholar] Neelamegam S and Ramaraj E (2013). Classification algorithm in data mining: An overview. International Journal of P2P Network Trends and Technology (IJPTT), 4(8): 369-374. [Google Scholar] Padmapriya A (2012). Prediction of higher education admissibility using classification algorithms. International Journal of Advanced Research in Computer Science and Software Engineering, 2(11): 330-336. [Google Scholar] Pak A and Paroubek P (2010). Twitter as a corpus for sentiment analysis and opinion mining. In LREc, 10(2010): 1320-1326. [Google Scholar] Pang B, Lee L, and Vaithyanathan S (2002). Thumbs up?: sentiment classification using machine learning techniques. In the ACL-02 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, USA: 10: 79-86. https://doi.org/10.3115/1118693.1118704 [Google Scholar] Parikh R and Movassate M (2009). Sentiment analysis of user-generated twitter updates using various classification techniques. CS224N Final Report: 1-18. [Google Scholar] Quinlan JR (1993). C4. 5: Programming for machine learning. Morgan Kaufmann Publishers, Burlington, USA. [Google Scholar] PMCid:PMC1554917 Saleh MR, Martín-Valdivia MT, Montejo-Ráez A, and Ure-a-López LA (2011). Experiments with SVM to classify opinions in different domains. Expert Systems with Applications, 38(12): 14799-14804. https://doi.org/10.1016/j.eswa.2011.05.070 [Google Scholar] Shrivatava A, Mayor S, and Pant B (2014). Opinion mining of real time twitter tweets. International Journal of Computer Applications, 100(19):1-4. https://doi.org/10.5120/17630-0691 [Google Scholar] Smeureanu I and Bucur C (2012). Applying supervised opinion mining techniques on online user reviews. Informatica Economica, 16(2): 81-91. [Google Scholar] Vaghela VB and Jadav BM (2016). Analysis of various sentiment classification techniques. Analysis, 140(3): 22-27. [Google Scholar] Vapnik VN (1995). The nature of statistical learning theory. Springer Verlag, Germany. https://doi.org/10.1007/978-1-4757-2440-0 [Google Scholar] Wahbeh AH, Al-Radaideh QA, Al-Kabi MN, and Al-Shawakfa EM (2011). A comparison study between data mining tools over some classification methods. International Journal of Advanced Computer Science and Applications, 8(2): 18-26. [Google Scholar] Xia R, Zong C, and Li S (2011). Ensemble of feature sets and classification algorithms for sentiment classification. Information Sciences, 181(6): 1138-1152. https://doi.org/10.1016/j.ins.2010.11.023 [Google Scholar] Ye Q, Zhang Z, and Law R (2009). Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications, 36(3): 6527-6535. https://doi.org/10.1016/j.eswa.2008.07.035 [Google Scholar] Zhang Z, Ye Q, Zhang Z, and Li Y (2011). Sentiment classification of Internet restaurant reviews written in Cantonese. Expert Systems with Applications, 38(6): 7674-7682. https://doi.org/10.1016/j.eswa.2010.12.147 [Google Scholar]