doi:10.21833/ijaas.2020.05.004

	IJAAS
	International Journal of ADVANCED AND APPLIED SCIENCES EISSN: 2313-3724, Print ISSN: 2313-626X Frequency: 12





Volume 7, Issue 5 (May 2020), Pages: 20-26 ---------------------------------------------- Original Research Paper Title: Semi-supervised method for sensitivity based documents’ classification for online service providers Author(s): Sharaf J. Malebary , Shakeel Ahmad Affiliation(s):* Faculty of Computing and Information Technology in Rabigh (FCITR), King Abdulaziz University, Jeddah, Saudi Arabia Full Text - PDF XML * Corresponding Author. Corresponding author's ORCID profile: https://orcid.org/0000-0003-4339-3791 Digital Object Identifier: https://doi.org/10.21833/ijaas.2020.05.004 Abstract: In today’s digital era, many services providing companies exist on the web whereas service is the logical product of a company, which can be utilized through the Internet. Different service providers provide these services i.e., Online counselling service, online doctor consultation, cloud service provider, web hosting service, etc. to their customers. When customers face some problems, they may text to their providers. One solution is that providers can solve these issues based on the First-Come-First-Serve formula. But there should be an option to detect sensitive issue which may need to be solved first. How can this sensitivity be determined? Already there is a lot of researched work based on text to determine the polarity as positive and negative. Besides this classification, there are also some other classification methods investigated, such as aspect, not aspect, subjective, objective, spam, not spam, etc. regarding text sensitivity, whether it is sensitive or not? This classification is not yet considered for service providers. This paper presents a strategy for sensitivity based classification using Latent Semantic Indexing (LSI). The purpose of LSI is to rank documents concerning a given query. However, in this study, a mechanism was provided to generate query automatically based on sensitive general words with the words from all documents. This is a semi-supervised approach because 4782 sensitive words have been labeled from various sources and used based on an unsupervised approach to detect the sensitivity of the document. The sorted lists of documents based on the LSI scores generated by the sensitive-query were checked manually and were proved to be highly satisfactory. The topmost document in this list was the most sensitive, and the last document in the list was least sensitive. © 2020 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Keywords: Service, Sentiment analysis, Supervised learning method, Unsupervised learning method, Latent semantic indexing Article History: Received 4 November 2019, Received in revised form 5 February 2020, Accepted 7 February 2020 Acknowledgment: No Acknowledgment. Compliance with ethical standards Conflict of interest: The authors declare that they have no conflict of interest. Citation: Malebary SJ and Ahmad S (2020). Semi-supervised method for sensitivity based documents’ classification for online service providers. International Journal of Advanced and Applied Sciences, 7(5): 20-26 Permanent Link to this page Figures Fig. 1 Fig. 2 Tables Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Table 7 ---------------------------------------------- References (39) Ahmad S, Saqib SM, Almagrabi AO, and Alotaibi FM (2017). LSI based search technique: Using extracted keywords and key-sentences. VAWKUM Transactions on Computer Sciences, 14(2): 1-8. https://doi.org/10.21015/vtcs.v14i2.471 [Google Scholar] Altaher A (2017). Hybrid approach for sentiment analysis of Arabic tweets based on deep learning model and features weighting. International Journal of Advanced and Applied Sciences, 4(8): 43-49. https://doi.org/10.21833/ijaas.2017.08.007 [Google Scholar] Asfoura E, Abdel-Haq MS, Chatti H, and Radouche T (2018). Classification of business models with focusing on characterizing "as a service" offers. International Journal of Advanced and Applied Sciences, 5(11): 16-23. https://doi.org/10.21833/ijaas.2018.11.002 [Google Scholar] Asghar MZ, Khan A, Ahmad S, and Kundi FM (2014). A review of feature extraction in sentiment analysis. Journal of Basic and Applied Scientific Research, 4(3): 181-186. [Google Scholar] Bazsova B (2019). How can the company choose the best web designer? Decision-making application within a company. International Journal of Advanced and Applied Sciences, 6(2): 6–11. https://doi.org/10.21833/ijaas.2019.02.002 [Google Scholar] Chen LS, Liu CH, and Chiu HJ (2011). A neural network based approach for sentiment classification in the blogosphere. Journal of Informetrics, 5(2): 313-322. https://doi.org/10.1016/j.joi.2011.01.003 [Google Scholar] Chen T, Xu R, He Y, and Wang X (2017). Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Systems with Applications, 72: 221-230. https://doi.org/10.1016/j.eswa.2016.10.065 [Google Scholar] data.world (2018). Hotel-reviews. Available online at: https://bit.ly/38AeLAW Ding X and Liu B (2010). Resolving object and attribute coreference in opinion mining. In the 23^rd International Conference on Computational Linguistics, Association for Computational Linguistics, Beijing, China: 268-276. [Google Scholar] Glover-Thomas N and Fanning J (2010). Medicalisation: The role of e-pharmacies in iatrogenic harm. Medical Law Review, 18(1): 28-55. https://doi.org/10.1093/medlaw/fwp026 [Google Scholar] PMid:20133321 Gojali S and Khodra ML (2016). Aspect based sentiment analysis for review rating prediction. In the International Conference on Advanced Informatics: Concepts, Theory and Application, IEEE, George Town, Malaysia. https://doi.org/10.1109/ICAICTA.2016.7803110 [Google Scholar] Gupta DK and Ekbal A (2014). IITP: Supervised machine learning for aspect based sentiment analysis. In the 8^th International Workshop on Semantic Evaluation, Association for Computational Linguistics, Dublin, Ireland: 319-323. https://doi.org/10.3115/v1/S14-2053 [Google Scholar] Hameed M, Tahir F, and Shahzad MA (2018). Empirical comparison of sentiment analysis techniques for social media. International Journal of Advanced and Applied Sciences, 5(4): 115-123. https://doi.org/10.21833/ijaas.2018.04.015 [Google Scholar] Htay SS and Lynn KT (2013). Extracting product features and opinion words using pattern knowledge in customer reviews. The Scientific World Journal, 2013: 394758. https://doi.org/10.1155/2013/394758 [Google Scholar] PMid:24459430 PMCid:PMC3888732 Huang A, Milne D, Frank E, and Witten IH (2009). Clustering documents using a wikipedia-based concept representation. In the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Bangkok, Thailand: 628-636. https://doi.org/10.1007/978-3-642-01307-2_62 [Google Scholar] Jin J, Ji P, and Gu R (2016). Identifying comparative customer requirements from product online reviews for competitor analysis. Engineering Applications of Artificial Intelligence, 49: 61-73. https://doi.org/10.1016/j.engappai.2015.12.005 [Google Scholar] Khan K, Baharudin BB, and Khan A (2009). Mining opinion from text documents: A survey. In 3^rd IEEE International Conference on Digital Ecosystems and Technologies, IEEE, Istanbul, Turkey: 217-222. https://doi.org/10.1109/DEST.2009.5276756 [Google Scholar] PMCid:PMC2694658 Kundi FM, Ahmad S, Khan A, and Asghar MZ (2014a). Detection and scoring of internet slangs for sentiment analysis using SentiWordNet. Life Science Journal, 11(9): 66-72. [Google Scholar] Kundi FM, Khan A, Ahmad S, and Asghar MZ (2014b). Lexicon-based sentiment analysis in the social web. Journal of Basic and Applied Scientific Research, 4(6): 238-48. [Google Scholar] Li FH, Huang M, Yang Y, and Zhu X (2011). Learning to identify review spam. In the 22^nd International Joint Conference on Artificial Intelligence, Barcelona, Spain: 2488-2493. [Google Scholar] Liu B (2012). Sentiment analysis and opinion mining: Synthesis lectures on human language technologies. Morgan and Claypool Publishers, San Rafael, USA. https://doi.org/10.2200/S00416ED1V01Y201204HLT016 [Google Scholar] Liu B, Hu M, and Cheng J (2005). Opinion observer: Analyzing and comparing opinions on the web. In the 14^th International Conference on World Wide Web, Association for Computing Machinery, Chiba, Japan: 342–351. https://doi.org/10.1145/1060745.1060797 [Google Scholar] Mallen MJ and Vogel DL (2005). Introduction to the major contribution: Counseling psychology and online counseling. The Counseling Psychologist, 33(6): 761-775. https://doi.org/10.1177/0011000005278623 [Google Scholar] Phadnis N and Gadge J (2014). Framework for document retrieval using latent semantic indexing. International Journal of Computer Applications, 94(14): 37-41. https://doi.org/10.5120/16414-6065 [Google Scholar] Raganato A, Camacho-Collados J, and Navigli R (2017). Word sense disambiguation: A unified evaluation framework and empirical comparison. In the 15^th Conference of the European Chapter of the Association for Computational Linguistics, 1: 99-110. https://doi.org/10.18653/v1/E17-1010 [Google Scholar] Rios A, Mascarell L, and Sennrich R (2017). Improving word sense disambiguation in neural machine translation with sense embeddings. In the 2^nd Conference on Machine Translation, Association for Computational Linguistics, Copenhagen, Denmark: 11-19. [Google Scholar] Rosenthal S, Farra N, and Nakov P (2017). SemEval-2017 task 4: Sentiment analysis in Twitter. In the 11^th International Workshop on Semantic Evaluations, Vancouver, Canada: 502–518. https://doi.org/10.18653/v1/S17-2088 [Google Scholar] Saqib SM and Kundi FM (2016). MMO: Multiply-Minus-One rule for detecting and ranking positive and negative opinion. International Journal of Advanced Computer Science and Applications, 7(5): 122-127. https://doi.org/10.14569/IJACSA.2016.070519 [Google Scholar] Saqib SM, Ahmad S, Syed AH, Naeem T, and Alotaibi FM (2019). Grouping of aspects into relevant category based on wordnet definitions. International Journal of Computer Science and Network Security, 19(2): 113–119. [Google Scholar] Saqib SM, Jan MA, Ahmad B, Ahmad S, and Asghar MZ (2011). Custom software under the shade of cloud computing. International Journal of Computer Science and Information Security, 9(5): 219-223. [Google Scholar] Saqib SM, Kundi FM, Syed AH, and Ahmad S (2018). Semi supervised method for detection of ambiguous word and creation of sense: Using WordNet. International Journal of Advanced Computer Science and Applications, 9(11): 353-359. https://doi.org/10.14569/IJACSA.2018.091149 [Google Scholar] Saqib SM, Mahmood K, and Naeem T (2016). Comparison of LSI algorithms without and with pre-processing: Using text document based search. Transactions on Information Security, 1(4): 44-51. [Google Scholar] Shu L, Xu H, and Liu B (2017). Lifelong learning CRF for supervised aspect extraction. In the 55^th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, 2: 148-154. https://doi.org/10.18653/v1/P17-2023 [Google Scholar] PMCid:PMC5576273 Swathy R (2017). A survey on word sense disambiguation used in NLP. International Journal of Innovative Research in Computer and Communication Engineering, 5(3): 5116–5117. [Google Scholar] Teli S and Biradar S (2014). Effective spam detection method for email. In the International Conference on Advances in Engineering and Technology, Singapore, Singapore: 68-72. [Google Scholar] Wang S, Li D, Song X, Wei Y, and Li H (2011). A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Systems with Applications, 38(7): 8696-8702. https://doi.org/10.1016/j.eswa.2011.01.077 [Google Scholar] Wang S, Li D, Wei Y, and Li H (2009). A feature selection method based on fisher’s discriminant ratio for text sentiment classification. In the International Conference on Web Information Systems and Mining, Springer, Shanghai, China: 88-97. https://doi.org/10.1007/978-3-642-05250-7_10 [Google Scholar] Wang T, Li W, Liu F, and Hua J (2017). Sprinkled semantic diffusion kernel for word sense disambiguation. Engineering Applications of Artificial Intelligence, 64: 43-51. https://doi.org/10.1016/j.engappai.2017.05.010 [Google Scholar] Yang Q and Li FM (2005). Support vector machine for customized email filtering based on improving latent semantic indexing. In the International Conference on Machine Learning and Cybernetics, IEEE, Guangzhou, China, 6: 3787-3791. https://doi.org/10.1109/ICMLC.2005.1527599 [Google Scholar]