International Journal of

ADVANCED AND APPLIED SCIENCES

EISSN: 2313-3724, Print ISSN: 2313-626X

Frequency: 12

line decor
  
line decor

 Volume 8, Issue 1 (January 2021), Pages: 11-19

----------------------------------------------

 Original Research Paper

 Title: Text mining: A survey of Arabic root extraction algorithms

 Author(s): Manar Ahmed Mohammed Hamza 1, 2, Tarig Mohamed Ahmed 3, 4, Anwer Mustafa Mohamedsalih Hilal 1, 2, *

 Affiliation(s):

 1Department of Computer and Self Development, Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia
 2Faculty of Computer Science and Information Technology, Omdurman Islamic University, Omdurman, Sudan
 3Department of Information Technology, Faculty of Computing and Information Technology, King Abdul-Aziz University, Jeddah, Saudi Arabia
 4Department of Computer Sciences, University of Khartoum, Khartoum, Sudan

  Full Text - PDF          XML

 * Corresponding Author. 

  Corresponding author's ORCID profile: https://orcid.org/0000-0002-4658-8941

 Digital Object Identifier: 

 https://doi.org/10.21833/ijaas.2021.01.002

 Abstract:

In all Arab countries, the Arabic language is the official language spoken and written and is one of the oldest known languages. This paper aims to explain and discuss the work done on extracting the root of the Arabic word and Stemming algorithms. Text mining has become of interest to scientists, researchers, and users because of the existence of big data and deep learning algorithms that can analyze giant sets of unstructured data. The basic algorithms are used to extract and classify texts, information retrieval systems, and indexes. Algorithms are used to extract the root of a word from different natural languages. This paper will present a brief background and comprehensive presentation of a number of algorithms that handle the Arabic text to extract the word root in its light, heavy, hybrid, leading, and Markovian form. There are a number of papers, articles, and research papers that deal with extracting the Arabic root from the word. This paper will present a brief background for a number of stemming algorithms on how to extracting the root and stem of the Arabic word, then make a comparison and discussion of a number of selected algorithms in terms of accuracy, data set, method of stemming regarding of strengths and weakness. 

 © 2020 The Authors. Published by IASE.

 This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

 Keywords: Accuracy, Arabic word root, Stemming algorithm, Text mining

 Article History: Received 6 May 2020, Received in revised form 22 July 2020, Accepted 17 August 2020

 Acknowledgment:

This publication was supported by the Deanship of Scientific Research at Prince Sattam bin Abdulaziz University, Alkharj, Saudi Arabia.

 Compliance with ethical standards

 Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

 Citation:

  Hamza MAM, Ahmed TM, and Hilal AMM (2021). Text mining: A survey of Arabic root extraction algorithms. International Journal of Advanced and Applied Sciences, 8(1): 11-19

 Permanent Link to this page

 Figures

 No Figure 

 Tables

 Table 1

----------------------------------------------

 References (37)

  1. Ababneh M, Al-Shalabi R, Kanaan G, and Al-Nobani A (2012). Building an effective rule-based light stemmer for Arabic language to improve search effectiveness. The International Arab Journal of Information Technology, 9(4): 368-372.   [Google Scholar]
  2. AbuSafiya M (2017). Arabic root extraction through generation and filtering. In the International Conference on Mathematics and Information Technology, IEEE, Adrar, Algeria: 191-195. https://doi.org/10.1109/MATHIT.2017.8259715   [Google Scholar]
  3. Albogamy F and Ramsay A (2016). Unsupervised stemmer for Arabic tweets. In the 2nd Workshop on Noisy User-generated Text, Osaka, Japan: 78-84.   [Google Scholar]
  4. Alhanini Y and Aziz AMJ (2011). The enhancement of Arabic stemming by using light stemming and dictionary-based stemming. Journal of Software Engineering and Applications, 4(9): 522-526. https://doi.org/10.4236/jsea.2011.49060   [Google Scholar]
  5. Al-Kabi MN (2013). Towards improving Khoja rule-based Arabic stemmer. In the IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies, IEEE, Amman, Jordan: 1-6. https://doi.org/10.1109/AEECT.2013.6716437   [Google Scholar]
  6. Al-Kabi MN, Al-Radaideh QA, and Akkawi KW (2011). Benchmarking and assessing the performance of Arabic stemmers. Journal of Information Science, 37(2): 111-119. https://doi.org/10.1177/0165551510392305   [Google Scholar]
  7. Al-Kabi MN, Kazakzeh SA, Ata BMA, Al-Rababah SA, and Alsmadi IM (2015). A novel root based Arabic stemmer. Journal of King Saud University-Computer and Information Sciences, 27(2): 94-103. https://doi.org/10.1016/j.jksuci.2014.04.001   [Google Scholar]
  8. Alkhatib M, Monem AA, and Shaalan K (2017). A rich Arabic word net resource for Al-Hadith Al-Shareef. Procedia Computer Science, 117: 101-110. https://doi.org/10.1016/j.procs.2017.10.098   [Google Scholar]
  9. Al-Lahham YA, Matarneh K, and Hasan M (2018). Conditional Arabic light stemmer: Condlight. The International Arab Journal of Information Technology, 15(3A): 559-564.   [Google Scholar]
  10. Almusaddar MY (2014). Improving Arabic light stemming in information retrieval systems. M.Sc. Thesis, Islamic University of Gaza, Gaza, Palestine.   [Google Scholar]
  11. Al-Omari A, Abuata B, and Al-Kabi M (2013). Building and benchmarking new heavy/light Arabic stemmer. In the 4th International Conference on Information and Communication Systems (ICICS 2013).   [Google Scholar]
  12. Alsaad A and Abbod M (2014). Arabic text root extraction via morphological analysis and linguistic constraints. In the UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, IEEE, Cambridge, UK: 125-130. https://doi.org/10.1109/UKSim.2014.43   [Google Scholar]
  13. Al-Sarhan H, Al-Shalabi R, and Kanaan G (2003). New approach for extracting Arabic roots. In the 2003 Arab Conference on Information Technology (ACIT 2003), Egypt: 42–59.   [Google Scholar]
  14. Alshalabi R (2005). Pattern-based stemmer for finding Arabic roots. Information Technology Journal, 4(1): 38-43. https://doi.org/10.3923/itj.2005.38.43   [Google Scholar]
  15. Bharati M and Ramageri M (2010). Data mining techniques and applications. Indian Journal of Computer Science and Engineering, 1: 301-305.   [Google Scholar]
  16. Boudad N, Faizi R, Thami ROH, and Chiheb R (2018). Sentiment analysis in Arabic: A review of the literature. Ain Shams Engineering Journal, 9(4): 2479-2490. https://doi.org/10.1016/j.asej.2017.04.007   [Google Scholar]
  17. Boudchiche M and Mazroui A (2018). Improving the Arabic root extraction by using the quadratic splines. In the International Conference on Intelligent Systems and Computer Vision, IEEE, Fez, Morocco: 1-5. https://doi.org/10.1109/ISACV.2018.8354062   [Google Scholar]
  18. Boudlal A, Bebah MOAO, Lakhouaja A, Mazroui A, and Meziane A (2011). A Markovian approach for Arabic root extraction. The International Arab Journal of Information Technology, 8(1): 91-98.   [Google Scholar]
  19. Elazhary HH and Khodeir N (2017). A cognitive tutor of Arabic word root extraction using artificial word generation, scaffolding and self-explanation. International Journal of Emerging Technologies in Learning (iJET), 12(05): 36-49. https://doi.org/10.3991/ijet.v12i05.6651   [Google Scholar]
  20. El-Defrawy M, El-Sonbaty Y, and Belal N (2015). Enhancing root extractors using light stemmers. In the 29th Pacific Asia Conference on Language, Information and Computation: Posters, Shanghai, China: 157-166.   [Google Scholar]
  21. Ghawanmeh S, Al-Shalabi R, Kanaan G, Khanfar K, and Rabab’ah S (2009). Enhanced algorithm for extracting the root of Arabic words. In the 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization, China: 388–391. https://doi.org/10.1109/CGIV.2009.10   [Google Scholar]
  22. Gridach M and Chenfour N (2011). Developing a new approach for Arabic morphological analysis and generation. arXiv:1101.5494. https://doi.org/10.1155/2011/629305   [Google Scholar]
  23. Hajjar M and Zreik K (2010). A system for evaluation of Arabic root extraction methods. In the 5th International Conference on Internet and Web Applications and Services, IEEE, Barcelona, Spain: 506-512. https://doi.org/10.1109/ICIW.2010.98   [Google Scholar]
  24. Hawas FA (2013). Towards a new Approach for Arabic root extraction: Exploit relations between the word letters and their placement in the word for Arabic root extraction. Computer Science, 14(2): 327-341. https://doi.org/10.7494/csci.2012.14.2.327   [Google Scholar]
  25. Jaafar Y, Namly D, Bouzoubaa K, and Yousfi A (2017). Enhancing Arabic stemming process using resources and benchmarking tools. Journal of King Saud University-Computer and Information Sciences, 29(2): 164-170. https://doi.org/10.1016/j.jksuci.2016.11.010   [Google Scholar]
  26. Khafajeh H, Yousef N, and Abdeldeen M (2018). Arabic root extraction using a hybrid technique. International Journal of Advanced Computer Research, 8(35): 90-96. https://doi.org/10.19101/IJACR.2017.733023   [Google Scholar]
  27. Khoja S and Garside R (1999). Stemming Arabic text. Lancaster University, Lancaster, UK.   [Google Scholar]
  28. Kreaa AH, Ahmad AS, and Kabalan K (2014). Arabic words stemming approach using Arabic WordNet. International Journal of Data Mining and Knowledge Management Process, 4(6): 1. https://doi.org/10.5121/ijdkp.2014.4601   [Google Scholar]
  29. Larkey LS, Ballesteros L, and Connell ME (2002). Improving stemming for Arabic information retrieval: Light stemming and co-occurrence analysis. In the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Tampere, Finland: 275-282. https://doi.org/10.1145/564376.564425   [Google Scholar]
  30. Momani M and Faraj J (2007). A novel algorithm to extract tri-literal Arabic roots. In the 2007 IEEE/ACS International Conference on Computer Systems and Applications, IEEE, Amman, Jordan: 309-315. https://doi.org/10.1109/AICCSA.2007.370899   [Google Scholar]
  31. Nehar A, Ziadi D, and Cherroun H (2016). Rational kernels for Arabic root extraction and text classification. Journal of King Saud University-Computer and Information Sciences, 28(2): 157-169. https://doi.org/10.1016/j.jksuci.2015.11.004   [Google Scholar]
  32. Otair MA (2013). Comparative analysis of Arabic stemming algorithms. International Journal of Managing Information Technology, 5(2): 1-13. https://doi.org/10.5121/ijmit.2013.5201   [Google Scholar]
  33. Saad MK and Ashour WM (2010). Arabic morphological tools for text mining. In the 6th International Conference on Electrical and Computer Systems, Lefke, North Cyprus: 1-6.   [Google Scholar]
  34. Salloum SA, AlHamad AQ, Al-Emran M, and Shaalan K (2018). A survey of Arabic text mining. In: Shaalan K, Hassanien A, and Tolba F (Eds.), Intelligent natural language processing: Trends and applications: 417-431. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-319-67056-0_20   [Google Scholar]
  35. Sameer RA (2016). Modified light stemming algorithm for Arabic language. Iraqi Journal of Science, 57(1B): 507-513.   [Google Scholar]
  36. Taghva K, Elkhoury R, and Coombs J (2005). Arabic stemming without a root dictionary. In the International Conference on Information Technology: Coding and Computing (ITCC'05)-Volume II, IEEE, Las Vegas, USA, 1: 152-157. https://doi.org/10.1109/ITCC.2005.90   [Google Scholar]
  37. Yaseen Q and Hmeidi I (2014). Extracting the roots of Arabic words without removing affixes. Journal of Information Science, 40(3): 376-385. https://doi.org/10.1177/0165551514526348   [Google Scholar]