Volume 12, Issue 5 (May 2025), Pages: 129-147
----------------------------------------------
Original Research Paper
PhageVir: An evaluation of computational intelligence models for the precise identification of phage virion proteins
Author(s):
Nashwan Alromema 1, *, Hussnain Arshad 2, Sharaf J. Malebary 3, Faisal Binzagr 1, Yaser Daanial Khan 4
Affiliation(s):
1Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia 2Department of Artificial Intelligence, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan 3Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia 4Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
Full text
Full Text - PDF
* Corresponding Author.
Corresponding author's ORCID profile: https://orcid.org/0000-0001-6208-2863
Digital Object Identifier (DOI)
https://doi.org/10.21833/ijaas.2025.05.013
Abstract
This study presents PhageVir, an enhanced computational model developed to predict Phage Virion Proteins (PVPs), which are essential for bacteriophage infection and replication. PhageVir integrates advanced feature selection methods, including the Position Relative Incidence Matrix (PRIM) and the Reverse Position Relative Incidence Matrix (RPRIM), to effectively capture key sequence features and positional dependencies within protein sequences. Several machine learning and deep learning algorithms were employed, including LightGBM, Random Forest, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Recurrent Neural Network (RNN), and Artificial Neural Network (ANN), to classify PVPs based on sequential data. Model performance was evaluated through independent set testing, self-consistency testing, and cross-validation, using metrics such as accuracy (ACC), specificity (Sp), sensitivity (SN), Z-score, and Matthews correlation coefficient (MCC). The CNN model demonstrated strong performance in cross-validation, achieving an accuracy of 0.833, sensitivity of 0.832, specificity of 0.834, a correlation coefficient of 0.665, an AUC score of 0.927, and a Z-score of 1.37. The results confirm the effectiveness of the proposed computational approach for accurate PVP classification. Beyond its predictive power, PhageVir offers valuable biological insights into phage infection mechanisms, supporting advancements in phage therapy and antibacterial treatments.
© 2025 The Authors. Published by IASE.
This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords
Phage virion proteins, Computational model, Feature selection, Deep learning, Phage therapy
Article history
Received 8 January 2025, Received in revised form 29 April 2025, Accepted 3 May 2025
Funding
This Project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. (GPIP: 1785-830-2024).
Acknowledgment
This Project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. (GPIP: 1785-830-2024). The authors, therefore, acknowledge with thanks DSR for technical and financial support.
Compliance with ethical standards
Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Citation:
Alromema N, Arshad H, Malebary SJ, Binzagr F, and Khan YD (2025). PhageVir: An evaluation of computational intelligence models for the precise identification of phage virion proteins. International Journal of Advanced and Applied Sciences, 12(5): 129-147
Permanent Link to this page
Figures
Fig. 1 Fig. 2 Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Tables
Table 1 Table 2 Table 3
Table 4
Table 5 Table 6
Table 7
Table 8
Table 9
Table 10
Table 11
Table 12
----------------------------------------------
References (52)
- Ahmad A and Shatabda S (2019). EPAI-NC: Enhanced prediction of adenosine to inosine RNA editing sites using nucleotide compositions. Analytical Biochemistry, 569: 16-21. https://doi.org/10.1016/j.ab.2019.01.002
[Google Scholar]
PMid:30664849
- Ahmad S, Charoenkwan P, Quinn JM, Moni MA, Hasan MM, Lio' P, and Shoombuatong W (2022). SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins. Scientific Reports, 12: 4106. https://doi.org/10.1038/s41598-022-08173-5
[Google Scholar]
PMid:35260777 PMCid:PMC8904530
- Akmal H and Coulton P (2020). The divination of things by things. In the Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, Honolulu, USA: 1-12. https://doi.org/10.1145/3334480.3381823
[Google Scholar]
- Alghamdi W, Alzahrani E, Ullah MZ, and Khan YD (2021). 4mC-RF: Improving the prediction of 4mC sites using composition and position relative features and statistical moment. Analytical Biochemistry, 633: 114385. https://doi.org/10.1016/j.ab.2021.114385
[Google Scholar]
PMid:34571005
- Allehaibi K, Daanial Khan Y, and Khan SA (2021). iTAGPred: A two‐level prediction model for identification of angiogenesis and tumor angiogenesis biomarkers. Applied Bionics and Biomechanics, 2021(1): 2803147. https://doi.org/10.1155/2021/2803147
[Google Scholar]
PMid:34616486 PMCid:PMC8490072
- Almagrabi AO, Khan YD, and Khan SA (2021). iPhosD-PseAAC: Identification of phosphoaspartate sites in proteins using statistical moments and PseAAC. Biocell, 45(5): 1287-1298. https://doi.org/10.32604/biocell.2021.013770
[Google Scholar]
- Alzahrani E, Alghamdi W, Ullah MZ, and Khan YD (2021). Identification of stress response proteins through fusion of machine learning models and statistical paradigms. Scientific Reports, 11: 21767. https://doi.org/10.1038/s41598-021-99083-5
[Google Scholar]
PMid:34741132 PMCid:PMC8571424
- Arora A, Patiyal S, Sharma N, Devi NL, Kaur D, and Raghava GP (2024). A random forest model for predicting exosomal proteins using evolutionary information and motifs. Proteomics, 24(6): 2300231. https://doi.org/10.1002/pmic.202300231
[Google Scholar]
PMid:37525341
- Ashraf MA, Khan YD, Shoaib B, Khan MA, Khan F, and Whangbo T (2021). βLact‐Pred: A predictor developed for identification of beta‐lactamases using statistical moments and PseAAC via 5‐step rule. Computational Intelligence and Neuroscience, 2021(1): 8974265. https://doi.org/10.1155/2021/8974265
[Google Scholar]
PMid:34956358 PMCid:PMC8709780
- Attique M, Alkhalifah T, Alturise F, and Khan YD (2023). DeepBCE: Evaluation of deep learning models for identification of immunogenic B-cell epitopes. Computational Biology and Chemistry, 104: 107874. https://doi.org/10.1016/j.compbiolchem.2023.107874
[Google Scholar]
PMid:37126975
- Ayerdi J, Terragni V, Arrieta A, Tonella P, Sagardui G, and Arratibel M (2021). Generating metamorphic relations for cyber-physical systems with genetic programming: An industrial case study. In the 29 th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Association for Computing Machinery, Athens, Greece: 1264-1274. https://doi.org/10.1145/3468264.3473920
[Google Scholar]
- Baig TI, Khan YD, Alam TM, Biswal B, Aljuaid H, and Gillani DQ (2022). ILipo-PseAAC: Identification of lipoylation sites using statistical moments and general PseAAC. Computers, Materials and Continua, 71(1): 215-230. https://doi.org/10.32604/cmc.2022.021849
[Google Scholar]
- Bajiya N, Dhall A, Aggarwal S, and Raghava GP (2023). Advances in the field of phage-based therapy with special emphasis on computational resources. Briefings in Bioinformatics, 24(1): bbac574. https://doi.org/10.1093/bib/bbac574
[Google Scholar]
PMid:36575815
- Bao W, Cui Q, Chen B, and Yang B (2022). Phage_UniR_LGBM: Phage virion proteins classification with UniRep features and LightGBM model. Computational and Mathematical Methods in Medicine, 2022(1): 9470683. https://doi.org/10.1155/2022/9470683
[Google Scholar]
PMid:35465015 PMCid:PMC9033350
- Barburiceanu S and Terebeș R (2022). Automatic detection of melanoma by deep learning models-based feature extraction and fine-tuning strategy. IOP Conference Series: Materials Science and Engineering, 1254: 012035. https://doi.org/10.1088/1757-899X/1254/1/012035
[Google Scholar]
- Barshai M, Aubert A, and Orenstein Y (2021). G4detector: Convolutional neural network to predict DNA G-quadruplexes. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(4): 1946-1955. https://doi.org/10.1109/TCBB.2021.3073595
[Google Scholar]
PMid:33872156
- Barukab O, Khan YD, Khan SA, and Chou KC (2022). DNAPred_Prot: Identification of DNA‐binding proteins using composition‐and position‐based features. Applied Bionics and Biomechanics, 2022(1): 5483115. https://doi.org/10.1155/2022/5483115
[Google Scholar]
PMid:35465187 PMCid:PMC9020926
- Butt AH, Alkhalifah T, Alturise F, and Khan YD (2022). A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns. Scientific Reports, 12: 15183. https://doi.org/10.1038/s41598-022-19099-3
[Google Scholar]
PMid:36071071 PMCid:PMC9452539
- Butt AH, Alkhalifah T, Alturise F, and Khan YD (2023). Ensemble learning for hormone binding protein prediction: A promising approach for early diagnosis of thyroid hormone disorders in serum. Diagnostics, 13: 1940. https://doi.org/10.3390/diagnostics13111940
[Google Scholar]
PMid:37296792 PMCid:PMC10252793
- Charoenkwan P, Kanthawong S, Schaduangrat N, Yana J, and Shoombuatong W (2020a). PVPred-SCM: Improved prediction and analysis of phage virion proteins using a scoring card method. Cells, 9(2): 353. https://doi.org/10.3390/cells9020353
[Google Scholar]
PMid:32028709 PMCid:PMC7072630
- Charoenkwan P, Nantasenamat C, Hasan MM, and Shoombuatong W (2020b). Meta-iPVP: A sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. Journal of Computer-Aided Molecular Design, 34(10): 1105-1116. https://doi.org/10.1007/s10822-020-00323-z
[Google Scholar]
PMid:32557165
- Emon MI, Das B, Thukkaraju AR, and Zhang L (2024). DeePSP-GIN: Identification and classification of phage structural proteins using predicted protein structure, pretrained protein language model, and graph isomorphism network. In the 15 th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Association for Computing Machinery, Shenzhen, China: 1-6. https://doi.org/10.1145/3698587.3701371
[Google Scholar]
- Fang Z, Feng T, Zhou H, and Chen M (2022). DeePVP: Identification and classification of phage virion proteins using deep learning. Gigascience, 11: 1. https://doi.org/10.1093/gigascience/giac076
[Google Scholar]
PMid:35950840 PMCid:PMC9366990
- Flah M, Ragab M, Lazhari M, and Nehdi ML (2022). Localization and classification of structural damage using deep learning single-channel signal-based measurement. Automation in Construction, 139: 104271. https://doi.org/10.1016/j.autcon.2022.104271
[Google Scholar]
- Gao J, Zhu Y, Zhang R, Xu J, Zhou R, Di M, Zhang D, Liang W, Zhou X, Ren X, and Li H (2024). Isolation and characterization of a novel phage against vibrio alginolyticus belonging to a new genus. International Journal of Molecular Sciences, 25(16): 9132. https://doi.org/10.3390/ijms25169132
[Google Scholar]
PMid:39201817 PMCid:PMC11354583
- Han H, Zhu W, Ding C, and Liu T (2021). iPVP-MCV: A multi-classifier voting model for the accurate identification of phage virion proteins. Symmetry, 13(8): 1506. https://doi.org/10.3390/sym13081506
[Google Scholar]
- Jahromi AN, Hashemi S, Dehghantanha A, Choo KKR, Karimipour H, Newton DE, and Parizi RM (2020). An improved two-hidden-layer extreme learning machine for malware hunting. Computers and Security, 89: 101655. https://doi.org/10.1016/j.cose.2019.101655
[Google Scholar]
- Ji R, Geng Y, and Quan X (2024). Inferring gene regulatory networks with graph convolutional network based on causal feature reconstruction. Scientific Reports, 14: 21342. https://doi.org/10.1038/s41598-024-71864-8
[Google Scholar]
PMid:39266676 PMCid:PMC11393083
- Karim A, Alromema N, Malebary SJ, Binzagr F, Ahmed A, and Khan YD (2025). eNSMBL-PASD: Spearheading early autism spectrum disorder detection through advanced genomic computational frameworks utilizing ensemble learning models. Digital Health, 11: 1-20. https://doi.org/10.1177/20552076241313407
[Google Scholar]
PMid:39872002 PMCid:PMC11770729
- Khan YD, Khan NS, Naseer S, and Butt AH (2021). iSUMOK-PseAAC: Prediction of lysine sumoylation sites using statistical moments and Chou's PseAAC. PeerJ, 9: e11581. https://doi.org/10.7717/peerj.11581
[Google Scholar]
PMid:34430072 PMCid:PMC8349168
- Le NQK and Nguyen BP (2019). Prediction of FMN binding sites in electron transport chains based on 2-D CNN and PSSM profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(6): 2189-2197. https://doi.org/10.1109/TCBB.2019.2932416
[Google Scholar]
PMid:31380767
- Liu G, Jia W, Wang M, Heidari AA, Chen H, Luo Y, and Li C (2020). Predicting cervical hyperextension injury: A covariance guided sine cosine support vector machine. IEEE Access, 8: 46895-46908. https://doi.org/10.1109/ACCESS.2020.2978102
[Google Scholar]
- Manavalan B, Shin TH, and Lee G (2018). PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine. Frontiers in Microbiology, 9: 476. https://doi.org/10.3389/fmicb.2018.00476
[Google Scholar]
PMid:29616000 PMCid:PMC5864850
- Mehmood A, Farooq MS, Naseem A, Rustam F, Villar MG, Rodríguez CL, and Ashraf I (2022). Threatening URDU language detection from tweets using machine learning. Applied Sciences, 12: 10342. https://doi.org/10.3390/app122010342
[Google Scholar]
- Naseer S, Ali RF, Khan YD, and Dominic PDD (2022). iGluK-Deep: Computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions. Journal of Biomolecular Structure and Dynamics, 40(22): 11691-11704. https://doi.org/10.1080/07391102.2021.1962738
[Google Scholar]
PMid:34396935
- Pallavi CV and Usha S (2024). Linear Z score and Gaussian radial artificial neural network big data analytics to enhance crop yield. Engineering, Technology and Applied Science Research, 14(5): 17125-17129. https://doi.org/10.48084/etasr.8442
[Google Scholar]
- Perveen G, Alturise F, Alkhalifah T, and Daanial Khan Y (2023). Hemolytic-Pred: A machine learning-based predictor for hemolytic proteins using position and composition-based features. Digital Health, 9: 1-19. https://doi.org/10.1177/20552076231180739
[Google Scholar]
PMid:37434723 PMCid:PMC10331097
- Phloyphisut P, Pornputtapong N, Sriswasdi S, and Chuangsuwanich E (2019). MHCSeqNet: A deep neural network model for universal MHC binding prediction. BMC Bioinformatics, 20: 270. https://doi.org/10.1186/s12859-019-2892-4
[Google Scholar]
PMid:31138107 PMCid:PMC6540523
- Ru X, Li L, and Wang C (2019). Identification of phage viral proteins with hybrid sequence features. Frontiers in Microbiology, 10: 507. https://doi.org/10.3389/fmicb.2019.00507
[Google Scholar]
PMid:30972038 PMCid:PMC6443926
- Shah AA, Alturise F, Alkhalifah T, and Khan YD (2022a). Deep learning approaches for detection of breast adenocarcinoma causing carcinogenic mutations. International Journal of Molecular Sciences, 23(19): 11539. https://doi.org/10.3390/ijms231911539
[Google Scholar]
PMid:36232840 PMCid:PMC9570286
- Shah AA, Alturise F, Alkhalifah T, and Khan YD (2022b). Evaluation of deep learning techniques for identification of sarcoma-causing carcinogenic mutations. Digital Health, 8: 1-18. https://doi.org/10.1177/20552076221133703
[Google Scholar]
PMid:36312852 PMCid:PMC9597026
- Shah AA, Alturise F, Alkhalifah T, Faisal A, and Khan YD (2023). EDLM: Ensemble deep learning model to detect mutation for the early detection of cholangiocarcinoma. Genes, 14(5): 1104. https://doi.org/10.3390/genes14051104
[Google Scholar]
PMid:37239464 PMCid:PMC10217880
- Song X, Bao L, Feng C, Huang Q, Zhang F, Gao X, and Han R (2024). Accurate prediction of protein structural flexibility by deep learning integrating intricate atomic structures and Cryo-EM density information. Nature Communications, 15: 5538. https://doi.org/10.1038/s41467-024-49858-x
[Google Scholar]
PMid:38956032 PMCid:PMC11219796
- Suleman MT and Ali A (2021). Detection of phishing websites through computational intelligence. In the International Conference on Innovative Computing, IEEE, Lahore, Pakistan: 1-7. https://doi.org/10.1109/ICIC53490.2021.9693034
[Google Scholar]
PMid:33397497 PMCid:PMC7780590
- Suleman MT, Alkhalifah T, Alturise F, and Khan YD (2022). DHU-Pred: Accurate prediction of dihydrouridine sites using position and composition variant features on diverse classifiers. PeerJ, 10: e14104. https://doi.org/10.7717/peerj.14104
[Google Scholar]
PMid:36320563 PMCid:PMC9618264
- Suleman MT, Alturise F, Alkhalifah T, and Khan YD (2023). iDHU-Ensem: Identification of dihydrouridine sites through ensemble learning models. Digital Health, 9: 1-15. https://doi.org/10.1177/20552076231165963
[Google Scholar]
PMid:37009307 PMCid:PMC10064468
- Wang S, Jiang K, Chen J, Yang M, Fu Z, Wen T, and Yang D (2022a). Skeleton-based traffic command recognition at road intersections for intelligent vehicles. Neurocomputing, 501: 123-134. https://doi.org/10.1016/j.neucom.2022.05.107
[Google Scholar]
- Wang Z, Gao X, and Zhang Y (2021). HA-Net: A lake water body extraction network based on hybrid-scale attention and transfer learning. Remote Sensing, 13(20): 4121. https://doi.org/10.3390/rs13204121
[Google Scholar]
- Wang Z, Sun D, Jiang S, and Huang W (2022b). AChEI-EL: Prediction of acetylcholinesterase inhibitors based on ensemble learning model. In the 7 th International Conference on Big Data Analytics, IEEE, Guangzhou, China: 96-103. https://doi.org/10.1109/ICBDA55095.2022.9760329
[Google Scholar]
- Yang Y, Fan C, and Zhao Q (2020). Recent advances on the machine learning methods in identifying phage virion proteins. Current Bioinformatics, 15(7): 657-661. https://doi.org/10.2174/1574893614666191203155511
[Google Scholar]
- Zhan ZH, You ZH, Li LP, Zhou Y, and Yi HC (2018). Accurate prediction of ncRNA-protein interactions from the integration of sequence and evolutionary information. Frontiers in Genetics, 9: 458. https://doi.org/10.3389/fgene.2018.00458
[Google Scholar]
PMid:30349558 PMCid:PMC6186793
- Zulfiqar H, Guo Z, Ahmad RM, Ahmed Z, Cai P, Chen X, Zhang Y, Lin H, and Shi Z (2024). Deep-STP: A deep learning-based approach to predict snake toxin proteins by using word embeddings. Frontiers in Medicine, 10: 1291352. https://doi.org/10.3389/fmed.2023.1291352
[Google Scholar]
PMid:38298505 PMCid:PMC10829051
|