
Volume 12, Issue 4 (April 2025), Pages: 71-78

----------------------------------------------
Original Research Paper
A light but efficient switch transformer model for Arabic text simplification
Author(s):
Aeshah Alsughayyir 1, *, Abdullah Alshanqiti 2
Affiliation(s):
1College of Computer Science and Engineering, Taibah University, Madinah 42353, Saudi Arabia
2Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia
Full text
Full Text - PDF
* Corresponding Author.
Corresponding author's ORCID profile: https://orcid.org/0000-0003-3710-7103
Digital Object Identifier (DOI)
https://doi.org/10.21833/ijaas.2025.04.009
Abstract
Simplifying Arabic text remains a significant challenge in the field of Natural Language Understanding (NLU), making it difficult for current models to perform well. Recent studies have focused on simplifying texts with complex language structures to improve readability for both human users and other Natural Language Processing (NLP) tasks. This study addresses the challenge in the context of low-resource Arabic NLP by introducing a split-and-rephrase approach using a sequence-to-sequence switch transformer model, called ATSimST. Experiments using the ATSC dataset show that ATSimST performs better than existing advanced text generation models for Arabic. The improvements in SARI, BLEU, METEOR, and ROUGE scores demonstrate that ATSimST produces high-quality simplifications that are both semantically accurate and similar to human-written texts. These results confirm the model’s effectiveness and highlight its potential to significantly advance Arabic text simplification.
© 2025 The Authors. Published by IASE.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords
Arabic text simplification, Natural language understanding, Low-resource NLP, Transformer model, Text generation
Article history
Received 14 November 2024, Received in revised form 27 March 2025, Accepted 15 April 2025
Acknowledgment
No Acknowledgment.
Compliance with ethical standards
Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Citation:
Alsughayyir A and Alshanqiti A (2025). A light but efficient switch transformer model for Arabic text simplification. International Journal of Advanced and Applied Sciences, 12(4): 71-78
Permanent Link to this page
Figures
Fig. 1
Tables
Table 1 Table 2 Table 3
----------------------------------------------
References (38)
- Alkaldi W and Inkpen D (2023). Text simplification to specific readability levels. Mathematics, 11(9): 2063. https://doi.org/10.3390/math11092063 [Google Scholar]
- Almarjeh MB (2022). An Arabic abstractive text summarization model. Available online at: https://huggingface.co/malmarjeh/mbert2mbert-arabic-text-summarization
- Alshanqiti A, Alkhodre A, Namoun A, Albouq S, and Nabil E (2023). A transformer seq2seq model with fast Fourier transform layers for rephrasing and simplifying complex Arabic text. International Journal of Advanced Computer Science and Applications, 14(2): 888-898. https://doi.org/10.14569/IJACSA.2023.01402101 [Google Scholar]
- Alshanqiti A, Namoun A, Alsughayyir A, Mashraqi AM, Gilal AR, and Albouq SS (2021). Leveraging DistilBERT for summarizing Arabic text: an extractive dual-stage approach. IEEE Access, 9: 135594-135607. https://doi.org/10.1109/ACCESS.2021.3113256 [Google Scholar]
- Alshanqiti AM, Albouq S, Alkhodre AB, Namoun A, and Nabil E (2022). Employing a multilingual transformer model for segmenting unpunctuated Arabic text. Applied Sciences, 12(20): 10559. https://doi.org/10.3390/app122010559 [Google Scholar]
- Al-Thanyyan SS and Azmi AM (2023). Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model. Journal of King Saud University-Computer and Information Sciences, 35(8): 101662. https://doi.org/10.1016/j.jksuci.2023.101662 [Google Scholar]
- Alva-Manchego F, Scarton C, and Specia L (2020). Data-driven sentence simplification: Survey and benchmark. Computational Linguistics, 46(1): 135-187. https://doi.org/10.1162/coli_a_00370 [Google Scholar]
- Banerjee S and Lavie A (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In the Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Association for Computational Linguistics, Ann Arbor, Michigan, USA: 65-72. [Google Scholar]
- Espinosa-Zaragoza I, Abreu-Salas J, Lloret E, Pozo PM, and Palomar M (2023). A review of research-based automatic text simplification tools. In the Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, INCOMA, Varna, Bulgaria: 321-330. https://doi.org/10.26615/978-954-452-092-2_036 [Google Scholar]
- Fan X, Liu Y, Liu G, and Su B (2020). A memory-based sentence split and rephrase model with multi-task training. In the Neural Information Processing: 27th International Conference, Springer International Publishing, Bangkok, Thailand: 643-654. https://doi.org/10.1007/978-3-030-63830-6_54 [Google Scholar]
- Fedus W, Zoph B, and Shazeer N (2022). Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23(120): 1-39. [Google Scholar]
- Gamal D, Alfonse M, Jiménez-Zafra SM, and Aref M (2022). Survey of Arabic machine translation, methodologies, progress, and challenges. In the 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference, IEEE, Cairo, Egypt: 378-383. https://doi.org/10.1109/MIUCC55081.2022.9781776 [Google Scholar]
- Gooding S and Kochmar E (2019). Recursive context-aware lexical simplification. In the Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, Hong Kong, China: 4853-4863. https://doi.org/10.18653/v1/D19-1491 [Google Scholar]
- Guo Y, Ge T, and Wei F (2020). Fact-aware sentence split and rephrase with permutation invariant training. In the Proceedings of the AAAI Conference on Artificial Intelligence, 34(5): 7855-7862. https://doi.org/10.1609/aaai.v34i05.6291 [Google Scholar]
- Hao T, Li X, He Y, Wang FL, and Qu Y (2022). Recent progress in leveraging deep learning methods for question answering. Neural Computing and Applications, 34: 2765–2783. https://doi.org/10.1007/s00521-021-06748-3 [Google Scholar]
- Horn C, Manduca C, and Kauchak D (2014). Learning a lexical simplifier using Wikipedia. In the Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Baltimore, USA, 2: 458-463. https://doi.org/10.3115/v1/P14-2075 [Google Scholar]
- Karim AA, Usama M, Ibrahim MA, Hatem Y, Wael W, Mazrua AT, El-Monayer GK, Elbanhawy M, Foad K, Moawad IF (2024). Arabic abstractive summarization using the multilingual T5 model. In the 6th International Conference on Computing and Informatics, IEEE, Cairo, Egypt: 223-228. https://doi.org/10.1109/ICCI61671.2024.10485135 [Google Scholar]
- Maddela M and Xu W (2018). A word-complexity lexicon and a neural readability ranking model for lexical simplification. In the Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium: 3749–3760. https://doi.org/10.18653/v1/D18-1410 [Google Scholar]
- Maddela M, Alva-Manchego F, and Xu W (2020). Controllable text simplification with explicit paraphrasing. Arxiv Preprint Arxiv:2010.11004. https://doi.org/10.48550/arXiv.2010.11004 [Google Scholar]
- Martin L, Fan A, De La Clergerie E, Bordes A, and Sagot B (2020). MUSS: Multilingual unsupervised sentence simplification by mining paraphrases. Arxiv Preprint Arxiv:2005.00352. https://doi.org/10.48550/arXiv.2005.00352 [Google Scholar]
- Nagoudi EMB, Elmadany A, and Abdul-Mageed M (2022). AraT5: Text-to-text transformers for Arabic language generation. In the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Dublin, Ireland, 1: 628–647. https://doi.org/10.18653/v1/2022.acl-long.47 [Google Scholar]
- Niklaus C, Cetto M, Freitas A, and Handschuh S (2019). Transforming complex sentences into a semantic hierarchy. Arxiv Preprint Arxiv:1906.01038. https://doi.org/10.48550/arXiv.1906.01038 [Google Scholar]
- Niklaus C, Cetto M, Freitas A, and Handschuh S (2021). Context-preserving text simplification. Arxiv Preprint Arxiv:2105.11178. https://doi.org/10.48550/arXiv.2105.11178 [Google Scholar]
- North K, Ranasinghe T, Shardlow M, and Zampieri M (2024). MultiLS: A multi-task lexical simplification framework. Arxiv Preprint Arxiv:2402.14972. https://doi.org/10.18653/v1/2024.tsar-1.1 [Google Scholar] PMCid:PMC10796270
- Papineni K, Roukos S, Ward T, and Zhu WJ (2002). BLEU: A method for automatic evaluation of machine translation. In the Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, USA, 311-318. https://doi.org/10.3115/1073083.1073135 [Google Scholar]
- Qiang J and Wu X (2019). Unsupervised statistical text simplification. IEEE Transactions on Knowledge and Data Engineering, 33(4): 1802-1806. https://doi.org/10.1109/TKDE.2019.2947679 [Google Scholar]
- Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140): 1-67. [Google Scholar]
- Snover M, Dorr B, Schwartz R, Micciulla L, and Makhoul J (2006). A study of translation edit rate with targeted human annotation. In the Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, Association for Machine Translation in the Americas, Cambridge, USA, 223-231. [Google Scholar]
- Stajner S and Saggion H (2018). Data-driven text simplification. In the Proceedings of the 27th International Conference on Computational Linguistics: Tutorial Abstracts, Association for Computational Linguistics, Santa Fe, USA: 19–23. https://doi.org/10.1093/oxfordhb/9780199573691.013.52 [Google Scholar] PMCid:PMC5985336
- Surya S, Mishra A, Laha A, Jain P, and Sankaranarayanan K (2019). Unsupervised neural text simplification. In the Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy: 2058-2068. https://doi.org/10.18653/v1/P19-1198 [Google Scholar]
- Tang Y, Tran C, Li X, Chen PJ, Goyal N, Chaudhary V, Gu J, Fan A (2020). Multilingual translation with extensible multilingual pretraining and finetuning. Arxiv Preprint Arxiv:2008.00401. https://doi.org/10.48550/arXiv.2008.00401 [Google Scholar]
- Urakawa T, Taguchi Y, Niitsuma T, and Tamori H (2024). A Japanese news simplification corpus with faithfulness. In the Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, ELRA and ICCL, Torino, Italia: 659-665. [Google Scholar]
- Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30: 5998–6008. [Google Scholar]
- Wang M, Ozaki H, Koreeda Y, and Yanai K (2019). Split first and then rephrase: Hierarchical generation for sentence simplification. In the International Conference of the Pacific Association for Computational Linguistics, Springer Singapore, Hanoi, Vietnam: 15–27. https://doi.org/10.1007/978-981-15-6168-9_2 [Google Scholar]
- Xu W, Napoles C, Pavlick E, Chen Q, and Callison-Burch C (2016). Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, 4: 401-415. https://doi.org/10.1162/tacl_a_00107 [Google Scholar]
- Xue L, Constant N, Roberts A, Kale M, Al-Rfou R, Siddhant A, Barua A, Raffel C (2020). mT5: A massively multilingual pre-trained text-to-text transformer. Arxiv Preprint Arxiv:2010.11934. https://doi.org/10.18653/v1/2021.naacl-main.41 [Google Scholar] PMid:33576803 PMCid:PMC8154059
- Yuan H, Yuan Z, Tan C, Huang F, and Huang S (2024). Text diffusion model with encoder-decoder transformers for sequence-to-sequence generation. In the Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1: 22-39. https://doi.org/10.18653/v1/2024.naacl-long.2 [Google Scholar]
- Zhou M, Duan N, Liu S, and Shum HY (2020). Progress in neural NLP: Modeling, learning, and reasoning. Engineering, 6(3): 275-290. https://doi.org/10.1016/j.eng.2019.12.014 [Google Scholar]
|