Optimization of Arabic text classification using SVM integrated with word embedding models on a novel dataset

Authors: Abdulaziz M. Alayba *, Mohammed Altamimi

Affiliations:

Department of Information and Computer Science, College of Computer Science and Engineering, University of Ha'il, Ha'il 81481, Saudi Arabia

Abstract

Arabic linguistics covers various areas such as morphology, syntax, semantics, historical linguistics, applied linguistics, pragmatics, and computational linguistics. The Arabic language presents major challenges for natural language processing (NLP) due to its complex morphological and semantic structure. In text classification tasks, effective feature selection is essential, and word embedding techniques have recently proven successful in representing textual data in a continuous vector space, capturing both semantic and morphological relationships. This study introduces a new, balanced Arabic text dataset for classification and examines the performance of combining word embedding models (Word2Vec, GloVe, and fastText) with a Support Vector Machine (SVM) classifier. The approach converts dense vector representations of Arabic text into single-value features for SVM input. Experimental results show that this method significantly outperforms the benchmark Term Frequency–Inverse Document Frequency (TF-IDF) approach, offering more accurate and reliable classification by effectively capturing Arabic contextual information.

Keywords

Arabic linguistics, Natural language processing, Text classification, Word embedding, Support vector machine

Download

📄 Full PDF

DOI

https://doi.org/10.21833/ijaas.2025.09.013

Citation (APA)

Alayba, A. M., & Altamimi, M. (2025). Optimization of Arabic text classification using SVM integrated with word embedding models on a novel dataset. International Journal of Advanced and Applied Sciences, 12(9), 140–151. https://doi.org/10.21833/ijaas.2025.09.013