International Journal of Advanced and Applied Sciences

Int. j. adv. appl. sci.

EISSN: 2313-3724

Print ISSN: 2313-626X

Volume 3, Issue 9  (September 2016), Pages:  59-66

Title: Artificial intelligence and natural language processing: the Arabic corpora in online translation software 

Author(s):  Mohammed Abdulmalik Ali *


Department of English, Prince Sattam Bin Abdulaziz University, AlKharj, Saudi Arabia

Full Text - PDF          XML


It is ironical to note that worldwide the Internet content in the Arabic language is mere 1%, whereas 5% of the world population speaks Arabic. This speaks of the disproportionate presence of on-line content of Arabic language as compared to other languages which may be due to many reasons including a lack of experts in the field of the Arabic language. This research study will investigate the impact of such Machine Translation (MT) software and TM tools that are widely used by the Arab community for their academic and business purposes. The study aims at finding whether it is possible to bring a paradigm shift from Arabic Localization to Arabic Globalization; hence, facilitating the usage of NLP techniques in the human interface with the computer. For this study; a few machine translation software (e.g. SYSTRAN, IBM Watson) shall be studied for their content and applications, to determine their usage without human intervention and retaining the meaning of the original text. 

© 2016 The Authors. Published by IASE.

This is an open access article under the CC BY-NC-ND license (

Keywords: Arabic corpora, Online content, Translation, Software

Article History: Received 25 May 2016, Received in revised form 28 August 2016, Accepted 20 September 2016

Digital Object Identifier:


Ali MA (2016). Artificial intelligence and natural language processing: the Arabic corpora in online translation software. International Journal of Advanced and Applied Sciences, 3(9): 59-66


Alhihi N (2015). Lexical problems in English to Arabic translation: A critical analysis of health documents in Australia. Arab World English Journal (AWEJ), 6(2): 316-328.
Ali F and Khaled S (2009). Arabic natural language processing. Challenges and Solutions, 8(4): 1-22.
Alqudsi A, Nazlia O and Khalid S (2012). Arabic machine translation. A Survey Artificial Intelligence Review, 8(3): 549-572.
Bouamor H, Nizar H and Kemal O (2014). A multidialectal parallel corpus of Arabic. 9th International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland: 1240-1245.
Chomsky, Noam (1957). Syntactic Structures. Mouton, The Hague.
Chomsky, Noam (1965). Aspects of the Theory of Syntax. MIT Press, MIT Massachusetts, USA.
Cohen S (2015). Morphology parsing Informatics 2A: Lecture 14. School of Informatics, University of Edinburgh. Available online at:
Darwish K, Hassan S and Hamdy M (2014). Verifiably effective Arabic dialect identification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics: 1465–1468.
Embarki M and Ennaji M (2011). Eds. Modern Trends in Arabic Dialectology. The Red Sea Press, New Jersey, USA.
Farghaly A (2010). Arabic Machine translation: A Developmental Perspective. International Journal of Information and Communication Technology, 3(3): 3-10.
Fatma S (2016). Arabic in danger: Efforts to ensure proper transmission of Arabic continue. Available online at:
Habash NY (2010). Introduction to Arabic natural language processing. Synthesis Lectures on Human Language Technologies, 3(1): 1-187.
Hijjawi M and Yousef E (2015). Arabic language challenges in text based conversational agents compared to the English language. International Journal of Computer Science and Information Technology (IJCSIT), 7(5): 1-13.
Hjarvard S (2004). The globalization of language how the media contribute to the spread of English and the emergence of medialects. Norcicom Review, 25(1/2): 75-07.
Huang F (2015). Improved Arabic dialect classification with social media data. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing: 2118-2126.
Meftouh K, Harrat S, Jamoussi S, Abbas M and Smaili K (2015). Machine translation experiments on PADIC: A parallel Arabic dialect corpus. In The 29th Pacific Asia Conference on Language, Information and Computation: 26–34.
Safouan M (2007). Why Are the Arabs Not Free? The Politics of Writing. 1st Edition, Blackwell Publishing, Malden, Massachusetts, USA.
Salloum W and Habash N (2011). Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation. In Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties, Association for Computational Linguistics: 10-21.
Sawaf H (2010). Arabic dialect handling in hybrid machine translation. In Proceedings of the Conference of the Association for Machine Translation in the Americas (amta), Denver, Colorado.
PMid:24693157 PMCid:PMC3908258
Shaalan K, Rafea A, Moneim AA and Baraka H (2004). Machine translation of English noun phrases into Arabic. International Journal of Computer Processing of Oriental Languages, 17(02): 121-134.
Zaidan OF and Callison-Burch C (2011). The Arabic online commentary dataset: An annotated dataset of informal Arabic with high dialectal content. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume. Association for Computational Linguistics, 2: 37-41.
Zaidan OF and Callison-Burch C (2014). Arabic dialect identification. Computational Linguistics, 40(1): 171-202.
Zbib R, Malchiodi E, Devlin J, Stallard D, Matsoukas S, Schwartz R and Callison-Burch C (2012). Machine translation of Arabic dialects. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational linguistics: Human Language Technologies. Association for Computational Linguistics: 49-59.