PhageVir: An evaluation of computational intelligence models for the precise identification of phage virion proteins

Nashwan Alromema; Hussnain Arshad; Sharaf J. Malebary; Faisal Binzagr; Yaser Daanial Khan

	IJAAS
	International Journal of ADVANCED AND APPLIED SCIENCES EISSN: 2313-3724, Print ISSN: 2313-626X Frequency: 12





Volume 12, Issue 5 (May 2025), Pages: 129-147 ---------------------------------------------- Original Research Paper PhageVir: An evaluation of computational intelligence models for the precise identification of phage virion proteins Author(s): Nashwan Alromema ^1,, Hussnain Arshad ², Sharaf J. Malebary ³, Faisal Binzagr ¹, Yaser Daanial Khan ⁴ Affiliation(s):* ¹Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia ²Department of Artificial Intelligence, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan ³Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia ⁴Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan Full text Full Text - PDF * Corresponding Author. Corresponding author's ORCID profile: https://orcid.org/0000-0001-6208-2863 Digital Object Identifier (DOI) https://doi.org/10.21833/ijaas.2025.05.013 Abstract This study presents PhageVir, an enhanced computational model developed to predict Phage Virion Proteins (PVPs), which are essential for bacteriophage infection and replication. PhageVir integrates advanced feature selection methods, including the Position Relative Incidence Matrix (PRIM) and the Reverse Position Relative Incidence Matrix (RPRIM), to effectively capture key sequence features and positional dependencies within protein sequences. Several machine learning and deep learning algorithms were employed, including LightGBM, Random Forest, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Recurrent Neural Network (RNN), and Artificial Neural Network (ANN), to classify PVPs based on sequential data. Model performance was evaluated through independent set testing, self-consistency testing, and cross-validation, using metrics such as accuracy (ACC), specificity (Sp), sensitivity (SN), Z-score, and Matthews correlation coefficient (MCC). The CNN model demonstrated strong performance in cross-validation, achieving an accuracy of 0.833, sensitivity of 0.832, specificity of 0.834, a correlation coefficient of 0.665, an AUC score of 0.927, and a Z-score of 1.37. The results confirm the effectiveness of the proposed computational approach for accurate PVP classification. Beyond its predictive power, PhageVir offers valuable biological insights into phage infection mechanisms, supporting advancements in phage therapy and antibacterial treatments. © 2025 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/). Keywords Phage virion proteins, Computational model, Feature selection, Deep learning, Phage therapy Article history Received 8 January 2025, Received in revised form 29 April 2025, Accepted 3 May 2025 Funding This Project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. (GPIP: 1785-830-2024). Acknowledgment This Project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. (GPIP: 1785-830-2024). The authors, therefore, acknowledge with thanks DSR for technical and financial support. Compliance with ethical standards Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Citation: Alromema N, Arshad H, Malebary SJ, Binzagr F, and Khan YD (2025). PhageVir: An evaluation of computational intelligence models for the precise identification of phage virion proteins. International Journal of Advanced and Applied Sciences, 12(5): 129-147 Permanent Link to this page Figures Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6 Fig. 7 Fig. 8 Fig. 9 Fig. 10 Fig. 11 Fig. 12 Fig. 13 Fig. 14 Fig. 15 Fig. 16 Fig. 17 Fig. 18 Tables Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Table 7 Table 8 Table 9 Table 10 Table 11 Table 12 ---------------------------------------------- References (52) Ahmad A and Shatabda S (2019). EPAI-NC: Enhanced prediction of adenosine to inosine RNA editing sites using nucleotide compositions. Analytical Biochemistry, 569: 16-21. https://doi.org/10.1016/j.ab.2019.01.002 [Google Scholar] PMid:30664849 Ahmad S, Charoenkwan P, Quinn JM, Moni MA, Hasan MM, Lio' P, and Shoombuatong W (2022). SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins. Scientific Reports, 12: 4106. https://doi.org/10.1038/s41598-022-08173-5 [Google Scholar] PMid:35260777 PMCid:PMC8904530 Akmal H and Coulton P (2020). The divination of things by things. In the Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, Honolulu, USA: 1-12. https://doi.org/10.1145/3334480.3381823 [Google Scholar] Alghamdi W, Alzahrani E, Ullah MZ, and Khan YD (2021). 4mC-RF: Improving the prediction of 4mC sites using composition and position relative features and statistical moment. Analytical Biochemistry, 633: 114385. https://doi.org/10.1016/j.ab.2021.114385 [Google Scholar] PMid:34571005 Allehaibi K, Daanial Khan Y, and Khan SA (2021). iTAGPred: A two‐level prediction model for identification of angiogenesis and tumor angiogenesis biomarkers. Applied Bionics and Biomechanics, 2021(1): 2803147. https://doi.org/10.1155/2021/2803147 [Google Scholar] PMid:34616486 PMCid:PMC8490072 Almagrabi AO, Khan YD, and Khan SA (2021). iPhosD-PseAAC: Identification of phosphoaspartate sites in proteins using statistical moments and PseAAC. Biocell, 45(5): 1287-1298. https://doi.org/10.32604/biocell.2021.013770 [Google Scholar] Alzahrani E, Alghamdi W, Ullah MZ, and Khan YD (2021). Identification of stress response proteins through fusion of machine learning models and statistical paradigms. Scientific Reports, 11: 21767. https://doi.org/10.1038/s41598-021-99083-5 [Google Scholar] PMid:34741132 PMCid:PMC8571424 Arora A, Patiyal S, Sharma N, Devi NL, Kaur D, and Raghava GP (2024). A random forest model for predicting exosomal proteins using evolutionary information and motifs. Proteomics, 24(6): 2300231. https://doi.org/10.1002/pmic.202300231 [Google Scholar] PMid:37525341 Ashraf MA, Khan YD, Shoaib B, Khan MA, Khan F, and Whangbo T (2021). βLact‐Pred: A predictor developed for identification of beta‐lactamases using statistical moments and PseAAC via 5‐step rule. Computational Intelligence and Neuroscience, 2021(1): 8974265. https://doi.org/10.1155/2021/8974265 [Google Scholar] PMid:34956358 PMCid:PMC8709780 Attique M, Alkhalifah T, Alturise F, and Khan YD (2023). DeepBCE: Evaluation of deep learning models for identification of immunogenic B-cell epitopes. Computational Biology and Chemistry, 104: 107874. https://doi.org/10.1016/j.compbiolchem.2023.107874 [Google Scholar] PMid:37126975 Ayerdi J, Terragni V, Arrieta A, Tonella P, Sagardui G, and Arratibel M (2021). Generating metamorphic relations for cyber-physical systems with genetic programming: An industrial case study. In the 29 ^th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Association for Computing Machinery, Athens, Greece: 1264-1274. https://doi.org/10.1145/3468264.3473920 [Google Scholar] Baig TI, Khan YD, Alam TM, Biswal B, Aljuaid H, and Gillani DQ (2022). ILipo-PseAAC: Identification of lipoylation sites using statistical moments and general PseAAC. Computers, Materials and Continua, 71(1): 215-230. https://doi.org/10.32604/cmc.2022.021849 [Google Scholar] Bajiya N, Dhall A, Aggarwal S, and Raghava GP (2023). Advances in the field of phage-based therapy with special emphasis on computational resources. Briefings in Bioinformatics, 24(1): bbac574. https://doi.org/10.1093/bib/bbac574 [Google Scholar] PMid:36575815 Bao W, Cui Q, Chen B, and Yang B (2022). Phage_UniR_LGBM: Phage virion proteins classification with UniRep features and LightGBM model. Computational and Mathematical Methods in Medicine, 2022(1): 9470683. https://doi.org/10.1155/2022/9470683 [Google Scholar] PMid:35465015 PMCid:PMC9033350 Barburiceanu S and Terebeș R (2022). Automatic detection of melanoma by deep learning models-based feature extraction and fine-tuning strategy. IOP Conference Series: Materials Science and Engineering, 1254: 012035. https://doi.org/10.1088/1757-899X/1254/1/012035 [Google Scholar] Barshai M, Aubert A, and Orenstein Y (2021). G4detector: Convolutional neural network to predict DNA G-quadruplexes. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(4): 1946-1955. https://doi.org/10.1109/TCBB.2021.3073595 [Google Scholar] PMid:33872156 Barukab O, Khan YD, Khan SA, and Chou KC (2022). DNAPred_Prot: Identification of DNA‐binding proteins using composition‐and position‐based features. Applied Bionics and Biomechanics, 2022(1): 5483115. https://doi.org/10.1155/2022/5483115 [Google Scholar] PMid:35465187 PMCid:PMC9020926 Butt AH, Alkhalifah T, Alturise F, and Khan YD (2022). A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns. Scientific Reports, 12: 15183. https://doi.org/10.1038/s41598-022-19099-3 [Google Scholar] PMid:36071071 PMCid:PMC9452539 Butt AH, Alkhalifah T, Alturise F, and Khan YD (2023). Ensemble learning for hormone binding protein prediction: A promising approach for early diagnosis of thyroid hormone disorders in serum. Diagnostics, 13: 1940. https://doi.org/10.3390/diagnostics13111940 [Google Scholar] PMid:37296792 PMCid:PMC10252793 Charoenkwan P, Kanthawong S, Schaduangrat N, Yana J, and Shoombuatong W (2020a). PVPred-SCM: Improved prediction and analysis of phage virion proteins using a scoring card method. Cells, 9(2): 353. https://doi.org/10.3390/cells9020353 [Google Scholar] PMid:32028709 PMCid:PMC7072630 Charoenkwan P, Nantasenamat C, Hasan MM, and Shoombuatong W (2020b). Meta-iPVP: A sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. Journal of Computer-Aided Molecular Design, 34(10): 1105-1116. https://doi.org/10.1007/s10822-020-00323-z [Google Scholar] PMid:32557165 Emon MI, Das B, Thukkaraju AR, and Zhang L (2024). DeePSP-GIN: Identification and classification of phage structural proteins using predicted protein structure, pretrained protein language model, and graph isomorphism network. In the 15 ^th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Association for Computing Machinery, Shenzhen, China: 1-6. https://doi.org/10.1145/3698587.3701371 [Google Scholar] Fang Z, Feng T, Zhou H, and Chen M (2022). DeePVP: Identification and classification of phage virion proteins using deep learning. Gigascience, 11: 1. https://doi.org/10.1093/gigascience/giac076 [Google Scholar] PMid:35950840 PMCid:PMC9366990 Flah M, Ragab M, Lazhari M, and Nehdi ML (2022). Localization and classification of structural damage using deep learning single-channel signal-based measurement. Automation in Construction, 139: 104271. https://doi.org/10.1016/j.autcon.2022.104271 [Google Scholar] Gao J, Zhu Y, Zhang R, Xu J, Zhou R, Di M, Zhang D, Liang W, Zhou X, Ren X, and Li H (2024). Isolation and characterization of a novel phage against vibrio alginolyticus belonging to a new genus. International Journal of Molecular Sciences, 25(16): 9132. https://doi.org/10.3390/ijms25169132 [Google Scholar] PMid:39201817 PMCid:PMC11354583 Han H, Zhu W, Ding C, and Liu T (2021). iPVP-MCV: A multi-classifier voting model for the accurate identification of phage virion proteins. Symmetry, 13(8): 1506. https://doi.org/10.3390/sym13081506 [Google Scholar] Jahromi AN, Hashemi S, Dehghantanha A, Choo KKR, Karimipour H, Newton DE, and Parizi RM (2020). An improved two-hidden-layer extreme learning machine for malware hunting. Computers and Security, 89: 101655. https://doi.org/10.1016/j.cose.2019.101655 [Google Scholar] Ji R, Geng Y, and Quan X (2024). Inferring gene regulatory networks with graph convolutional network based on causal feature reconstruction. Scientific Reports, 14: 21342. https://doi.org/10.1038/s41598-024-71864-8 [Google Scholar] PMid:39266676 PMCid:PMC11393083 Karim A, Alromema N, Malebary SJ, Binzagr F, Ahmed A, and Khan YD (2025). eNSMBL-PASD: Spearheading early autism spectrum disorder detection through advanced genomic computational frameworks utilizing ensemble learning models. Digital Health, 11: 1-20. https://doi.org/10.1177/20552076241313407 [Google Scholar] PMid:39872002 PMCid:PMC11770729 Khan YD, Khan NS, Naseer S, and Butt AH (2021). iSUMOK-PseAAC: Prediction of lysine sumoylation sites using statistical moments and Chou's PseAAC. PeerJ, 9: e11581. https://doi.org/10.7717/peerj.11581 [Google Scholar] PMid:34430072 PMCid:PMC8349168 Le NQK and Nguyen BP (2019). Prediction of FMN binding sites in electron transport chains based on 2-D CNN and PSSM profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(6): 2189-2197. https://doi.org/10.1109/TCBB.2019.2932416 [Google Scholar] PMid:31380767 Liu G, Jia W, Wang M, Heidari AA, Chen H, Luo Y, and Li C (2020). Predicting cervical hyperextension injury: A covariance guided sine cosine support vector machine. IEEE Access, 8: 46895-46908. https://doi.org/10.1109/ACCESS.2020.2978102 [Google Scholar] Manavalan B, Shin TH, and Lee G (2018). PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine. Frontiers in Microbiology, 9: 476. https://doi.org/10.3389/fmicb.2018.00476 [Google Scholar] PMid:29616000 PMCid:PMC5864850 Mehmood A, Farooq MS, Naseem A, Rustam F, Villar MG, Rodríguez CL, and Ashraf I (2022). Threatening URDU language detection from tweets using machine learning. Applied Sciences, 12: 10342. https://doi.org/10.3390/app122010342 [Google Scholar] Naseer S, Ali RF, Khan YD, and Dominic PDD (2022). iGluK-Deep: Computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions. Journal of Biomolecular Structure and Dynamics, 40(22): 11691-11704. https://doi.org/10.1080/07391102.2021.1962738 [Google Scholar] PMid:34396935 Pallavi CV and Usha S (2024). Linear Z score and Gaussian radial artificial neural network big data analytics to enhance crop yield. Engineering, Technology and Applied Science Research, 14(5): 17125-17129. https://doi.org/10.48084/etasr.8442 [Google Scholar] Perveen G, Alturise F, Alkhalifah T, and Daanial Khan Y (2023). Hemolytic-Pred: A machine learning-based predictor for hemolytic proteins using position and composition-based features. Digital Health, 9: 1-19. https://doi.org/10.1177/20552076231180739 [Google Scholar] PMid:37434723 PMCid:PMC10331097 Phloyphisut P, Pornputtapong N, Sriswasdi S, and Chuangsuwanich E (2019). MHCSeqNet: A deep neural network model for universal MHC binding prediction. BMC Bioinformatics, 20: 270. https://doi.org/10.1186/s12859-019-2892-4 [Google Scholar] PMid:31138107 PMCid:PMC6540523 Ru X, Li L, and Wang C (2019). Identification of phage viral proteins with hybrid sequence features. Frontiers in Microbiology, 10: 507. https://doi.org/10.3389/fmicb.2019.00507 [Google Scholar] PMid:30972038 PMCid:PMC6443926 Shah AA, Alturise F, Alkhalifah T, and Khan YD (2022a). Deep learning approaches for detection of breast adenocarcinoma causing carcinogenic mutations. International Journal of Molecular Sciences, 23(19): 11539. https://doi.org/10.3390/ijms231911539 [Google Scholar] PMid:36232840 PMCid:PMC9570286 Shah AA, Alturise F, Alkhalifah T, and Khan YD (2022b). Evaluation of deep learning techniques for identification of sarcoma-causing carcinogenic mutations. Digital Health, 8: 1-18. https://doi.org/10.1177/20552076221133703 [Google Scholar] PMid:36312852 PMCid:PMC9597026 Shah AA, Alturise F, Alkhalifah T, Faisal A, and Khan YD (2023). EDLM: Ensemble deep learning model to detect mutation for the early detection of cholangiocarcinoma. Genes, 14(5): 1104. https://doi.org/10.3390/genes14051104 [Google Scholar] PMid:37239464 PMCid:PMC10217880 Song X, Bao L, Feng C, Huang Q, Zhang F, Gao X, and Han R (2024). Accurate prediction of protein structural flexibility by deep learning integrating intricate atomic structures and Cryo-EM density information. Nature Communications, 15: 5538. https://doi.org/10.1038/s41467-024-49858-x [Google Scholar] PMid:38956032 PMCid:PMC11219796 Suleman MT and Ali A (2021). Detection of phishing websites through computational intelligence. In the International Conference on Innovative Computing, IEEE, Lahore, Pakistan: 1-7. https://doi.org/10.1109/ICIC53490.2021.9693034 [Google Scholar] PMid:33397497 PMCid:PMC7780590 Suleman MT, Alkhalifah T, Alturise F, and Khan YD (2022). DHU-Pred: Accurate prediction of dihydrouridine sites using position and composition variant features on diverse classifiers. PeerJ, 10: e14104. https://doi.org/10.7717/peerj.14104 [Google Scholar] PMid:36320563 PMCid:PMC9618264 Suleman MT, Alturise F, Alkhalifah T, and Khan YD (2023). iDHU-Ensem: Identification of dihydrouridine sites through ensemble learning models. Digital Health, 9: 1-15. https://doi.org/10.1177/20552076231165963 [Google Scholar] PMid:37009307 PMCid:PMC10064468 Wang S, Jiang K, Chen J, Yang M, Fu Z, Wen T, and Yang D (2022a). Skeleton-based traffic command recognition at road intersections for intelligent vehicles. Neurocomputing, 501: 123-134. https://doi.org/10.1016/j.neucom.2022.05.107 [Google Scholar] Wang Z, Gao X, and Zhang Y (2021). HA-Net: A lake water body extraction network based on hybrid-scale attention and transfer learning. Remote Sensing, 13(20): 4121. https://doi.org/10.3390/rs13204121 [Google Scholar] Wang Z, Sun D, Jiang S, and Huang W (2022b). AChEI-EL: Prediction of acetylcholinesterase inhibitors based on ensemble learning model. In the 7 ^th International Conference on Big Data Analytics, IEEE, Guangzhou, China: 96-103. https://doi.org/10.1109/ICBDA55095.2022.9760329 [Google Scholar] Yang Y, Fan C, and Zhao Q (2020). Recent advances on the machine learning methods in identifying phage virion proteins. Current Bioinformatics, 15(7): 657-661. https://doi.org/10.2174/1574893614666191203155511 [Google Scholar] Zhan ZH, You ZH, Li LP, Zhou Y, and Yi HC (2018). Accurate prediction of ncRNA-protein interactions from the integration of sequence and evolutionary information. Frontiers in Genetics, 9: 458. https://doi.org/10.3389/fgene.2018.00458 [Google Scholar] PMid:30349558 PMCid:PMC6186793 Zulfiqar H, Guo Z, Ahmad RM, Ahmed Z, Cai P, Chen X, Zhang Y, Lin H, and Shi Z (2024). Deep-STP: A deep learning-based approach to predict snake toxin proteins by using word embeddings. Frontiers in Medicine, 10: 1291352. https://doi.org/10.3389/fmed.2023.1291352 [Google Scholar] PMid:38298505 PMCid:PMC10829051

PhageVir: An evaluation of computational intelligence models for the precise identification of phage virion proteins

Full text

Digital Object Identifier (DOI)

Abstract

Keywords

Article history

Citation:

References (52)