International Journal of

ADVANCED AND APPLIED SCIENCES

EISSN: 2313-3724, Print ISSN: 2313-626X

Frequency: 12

line decor
  
line decor

 Volume 12, Issue 5 (May 2025), Pages: 129-147

----------------------------------------------

 Original Research Paper

PhageVir: An evaluation of computational intelligence models for the precise identification of phage virion proteins

 Author(s): 

 Nashwan Alromema 1, *, Hussnain Arshad 2, Sharaf J. Malebary 3, Faisal Binzagr 1, Yaser Daanial Khan 4

 Affiliation(s):

  1Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia
  2Department of Artificial Intelligence, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
  3Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia
  4Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan

 Full text

    Full Text - PDF

 * Corresponding Author. 

   Corresponding author's ORCID profile:  https://orcid.org/0000-0001-6208-2863

 Digital Object Identifier (DOI)

  https://doi.org/10.21833/ijaas.2025.05.013

 Abstract

This study presents PhageVir, an enhanced computational model developed to predict Phage Virion Proteins (PVPs), which are essential for bacteriophage infection and replication. PhageVir integrates advanced feature selection methods, including the Position Relative Incidence Matrix (PRIM) and the Reverse Position Relative Incidence Matrix (RPRIM), to effectively capture key sequence features and positional dependencies within protein sequences. Several machine learning and deep learning algorithms were employed, including LightGBM, Random Forest, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Recurrent Neural Network (RNN), and Artificial Neural Network (ANN), to classify PVPs based on sequential data. Model performance was evaluated through independent set testing, self-consistency testing, and cross-validation, using metrics such as accuracy (ACC), specificity (Sp), sensitivity (SN), Z-score, and Matthews correlation coefficient (MCC). The CNN model demonstrated strong performance in cross-validation, achieving an accuracy of 0.833, sensitivity of 0.832, specificity of 0.834, a correlation coefficient of 0.665, an AUC score of 0.927, and a Z-score of 1.37. The results confirm the effectiveness of the proposed computational approach for accurate PVP classification. Beyond its predictive power, PhageVir offers valuable biological insights into phage infection mechanisms, supporting advancements in phage therapy and antibacterial treatments.

 © 2025 The Authors. Published by IASE.

 This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/).

 Keywords

 Phage virion proteins, Computational model, Feature selection, Deep learning, Phage therapy

 Article history

 Received 8 January 2025, Received in revised form 29 April 2025, Accepted 3 May 2025

 Funding

This Project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. (GPIP: 1785-830-2024). 

 Acknowledgment

This Project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. (GPIP: 1785-830-2024). The authors, therefore, acknowledge with thanks DSR for technical and financial support. 

  Compliance with ethical standards

  Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

 Citation:

 Alromema N, Arshad H, Malebary SJ, Binzagr F, and Khan YD (2025). PhageVir: An evaluation of computational intelligence models for the precise identification of phage virion proteins. International Journal of Advanced and Applied Sciences, 12(5): 129-147

  Permanent Link to this page

 Figures

  Fig. 1  Fig. 2  Fig. 3  Fig. 4  Fig. 5  Fig. 6  Fig. 7  Fig. 8  Fig. 9  Fig. 10  Fig. 11  Fig. 12  Fig. 13  Fig. 14  Fig. 15  Fig. 16  Fig. 17  Fig. 18 

 Tables

  Table 1  Table 2  Table 3  Table 4  Table 5  Table 6  Table 7  Table 8  Table 9  Table 10  Table 11  Table 12 

----------------------------------------------   

 References (52)

  1. Ahmad A and Shatabda S (2019). EPAI-NC: Enhanced prediction of adenosine to inosine RNA editing sites using nucleotide compositions. Analytical Biochemistry, 569: 16-21.  https://doi.org/10.1016/j.ab.2019.01.002    [Google Scholar] PMid:30664849
  2. Ahmad S, Charoenkwan P, Quinn JM, Moni MA, Hasan MM, Lio' P, and Shoombuatong W (2022). SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins. Scientific Reports, 12: 4106.  https://doi.org/10.1038/s41598-022-08173-5    [Google Scholar] PMid:35260777 PMCid:PMC8904530
  3. Akmal H and Coulton P (2020). The divination of things by things. In the Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, Honolulu, USA: 1-12.  https://doi.org/10.1145/3334480.3381823    [Google Scholar]
  4. Alghamdi W, Alzahrani E, Ullah MZ, and Khan YD (2021). 4mC-RF: Improving the prediction of 4mC sites using composition and position relative features and statistical moment. Analytical Biochemistry, 633: 114385.  https://doi.org/10.1016/j.ab.2021.114385    [Google Scholar] PMid:34571005
  5. Allehaibi K, Daanial Khan Y, and Khan SA (2021). iTAGPred: A two‐level prediction model for identification of angiogenesis and tumor angiogenesis biomarkers. Applied Bionics and Biomechanics, 2021(1): 2803147.  https://doi.org/10.1155/2021/2803147    [Google Scholar] PMid:34616486 PMCid:PMC8490072
  6. Almagrabi AO, Khan YD, and Khan SA (2021). iPhosD-PseAAC: Identification of phosphoaspartate sites in proteins using statistical moments and PseAAC. Biocell, 45(5): 1287-1298.  https://doi.org/10.32604/biocell.2021.013770    [Google Scholar]
  7. Alzahrani E, Alghamdi W, Ullah MZ, and Khan YD (2021). Identification of stress response proteins through fusion of machine learning models and statistical paradigms. Scientific Reports, 11: 21767.  https://doi.org/10.1038/s41598-021-99083-5    [Google Scholar] PMid:34741132 PMCid:PMC8571424
  8. Arora A, Patiyal S, Sharma N, Devi NL, Kaur D, and Raghava GP (2024). A random forest model for predicting exosomal proteins using evolutionary information and motifs. Proteomics, 24(6): 2300231.  https://doi.org/10.1002/pmic.202300231    [Google Scholar] PMid:37525341
  9. Ashraf MA, Khan YD, Shoaib B, Khan MA, Khan F, and Whangbo T (2021). βLact‐Pred: A predictor developed for identification of beta‐lactamases using statistical moments and PseAAC via 5‐step rule. Computational Intelligence and Neuroscience, 2021(1): 8974265.  https://doi.org/10.1155/2021/8974265    [Google Scholar] PMid:34956358 PMCid:PMC8709780
  10. Attique M, Alkhalifah T, Alturise F, and Khan YD (2023). DeepBCE: Evaluation of deep learning models for identification of immunogenic B-cell epitopes. Computational Biology and Chemistry, 104: 107874.  https://doi.org/10.1016/j.compbiolchem.2023.107874    [Google Scholar] PMid:37126975
  11. Ayerdi J, Terragni V, Arrieta A, Tonella P, Sagardui G, and Arratibel M (2021). Generating metamorphic relations for cyber-physical systems with genetic programming: An industrial case study. In the 29 th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Association for Computing Machinery, Athens, Greece: 1264-1274.  https://doi.org/10.1145/3468264.3473920    [Google Scholar]
  12. Baig TI, Khan YD, Alam TM, Biswal B, Aljuaid H, and Gillani DQ (2022). ILipo-PseAAC: Identification of lipoylation sites using statistical moments and general PseAAC. Computers, Materials and Continua, 71(1): 215-230.  https://doi.org/10.32604/cmc.2022.021849    [Google Scholar]
  13. Bajiya N, Dhall A, Aggarwal S, and Raghava GP (2023). Advances in the field of phage-based therapy with special emphasis on computational resources. Briefings in Bioinformatics, 24(1): bbac574.  https://doi.org/10.1093/bib/bbac574    [Google Scholar] PMid:36575815
  14. Bao W, Cui Q, Chen B, and Yang B (2022). Phage_UniR_LGBM: Phage virion proteins classification with UniRep features and LightGBM model. Computational and Mathematical Methods in Medicine, 2022(1): 9470683.  https://doi.org/10.1155/2022/9470683    [Google Scholar] PMid:35465015 PMCid:PMC9033350
  15. Barburiceanu S and Terebeș R (2022). Automatic detection of melanoma by deep learning models-based feature extraction and fine-tuning strategy. IOP Conference Series: Materials Science and Engineering, 1254: 012035.  https://doi.org/10.1088/1757-899X/1254/1/012035    [Google Scholar]
  16. Barshai M, Aubert A, and Orenstein Y (2021). G4detector: Convolutional neural network to predict DNA G-quadruplexes. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(4): 1946-1955.  https://doi.org/10.1109/TCBB.2021.3073595    [Google Scholar] PMid:33872156
  17. Barukab O, Khan YD, Khan SA, and Chou KC (2022). DNAPred_Prot: Identification of DNA‐binding proteins using composition‐and position‐based features. Applied Bionics and Biomechanics, 2022(1): 5483115.  https://doi.org/10.1155/2022/5483115    [Google Scholar] PMid:35465187 PMCid:PMC9020926
  18. Butt AH, Alkhalifah T, Alturise F, and Khan YD (2022). A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns. Scientific Reports, 12: 15183.  https://doi.org/10.1038/s41598-022-19099-3    [Google Scholar] PMid:36071071 PMCid:PMC9452539
  19. Butt AH, Alkhalifah T, Alturise F, and Khan YD (2023). Ensemble learning for hormone binding protein prediction: A promising approach for early diagnosis of thyroid hormone disorders in serum. Diagnostics, 13: 1940.  https://doi.org/10.3390/diagnostics13111940    [Google Scholar] PMid:37296792 PMCid:PMC10252793
  20. Charoenkwan P, Kanthawong S, Schaduangrat N, Yana J, and Shoombuatong W (2020a). PVPred-SCM: Improved prediction and analysis of phage virion proteins using a scoring card method. Cells, 9(2): 353.  https://doi.org/10.3390/cells9020353    [Google Scholar] PMid:32028709 PMCid:PMC7072630
  21. Charoenkwan P, Nantasenamat C, Hasan MM, and Shoombuatong W (2020b). Meta-iPVP: A sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. Journal of Computer-Aided Molecular Design, 34(10): 1105-1116.  https://doi.org/10.1007/s10822-020-00323-z    [Google Scholar] PMid:32557165
  22. Emon MI, Das B, Thukkaraju AR, and Zhang L (2024). DeePSP-GIN: Identification and classification of phage structural proteins using predicted protein structure, pretrained protein language model, and graph isomorphism network. In the 15 th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Association for Computing Machinery, Shenzhen, China: 1-6.  https://doi.org/10.1145/3698587.3701371    [Google Scholar]
  23. Fang Z, Feng T, Zhou H, and Chen M (2022). DeePVP: Identification and classification of phage virion proteins using deep learning. Gigascience, 11: 1.  https://doi.org/10.1093/gigascience/giac076    [Google Scholar] PMid:35950840 PMCid:PMC9366990
  24. Flah M, Ragab M, Lazhari M, and Nehdi ML (2022). Localization and classification of structural damage using deep learning single-channel signal-based measurement. Automation in Construction, 139: 104271.  https://doi.org/10.1016/j.autcon.2022.104271    [Google Scholar]
  25. Gao J, Zhu Y, Zhang R, Xu J, Zhou R, Di M, Zhang D, Liang W, Zhou X, Ren X, and Li H (2024). Isolation and characterization of a novel phage against vibrio alginolyticus belonging to a new genus. International Journal of Molecular Sciences, 25(16): 9132.  https://doi.org/10.3390/ijms25169132    [Google Scholar] PMid:39201817 PMCid:PMC11354583
  26. Han H, Zhu W, Ding C, and Liu T (2021). iPVP-MCV: A multi-classifier voting model for the accurate identification of phage virion proteins. Symmetry, 13(8): 1506.  https://doi.org/10.3390/sym13081506    [Google Scholar]
  27. Jahromi AN, Hashemi S, Dehghantanha A, Choo KKR, Karimipour H, Newton DE, and Parizi RM (2020). An improved two-hidden-layer extreme learning machine for malware hunting. Computers and Security, 89: 101655.  https://doi.org/10.1016/j.cose.2019.101655    [Google Scholar]
  28. Ji R, Geng Y, and Quan X (2024). Inferring gene regulatory networks with graph convolutional network based on causal feature reconstruction. Scientific Reports, 14: 21342.  https://doi.org/10.1038/s41598-024-71864-8    [Google Scholar] PMid:39266676 PMCid:PMC11393083
  29. Karim A, Alromema N, Malebary SJ, Binzagr F, Ahmed A, and Khan YD (2025). eNSMBL-PASD: Spearheading early autism spectrum disorder detection through advanced genomic computational frameworks utilizing ensemble learning models. Digital Health, 11: 1-20.  https://doi.org/10.1177/20552076241313407    [Google Scholar] PMid:39872002 PMCid:PMC11770729
  30. Khan YD, Khan NS, Naseer S, and Butt AH (2021). iSUMOK-PseAAC: Prediction of lysine sumoylation sites using statistical moments and Chou's PseAAC. PeerJ, 9: e11581.  https://doi.org/10.7717/peerj.11581    [Google Scholar] PMid:34430072 PMCid:PMC8349168
  31. Le NQK and Nguyen BP (2019). Prediction of FMN binding sites in electron transport chains based on 2-D CNN and PSSM profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(6): 2189-2197.  https://doi.org/10.1109/TCBB.2019.2932416    [Google Scholar] PMid:31380767
  32. Liu G, Jia W, Wang M, Heidari AA, Chen H, Luo Y, and Li C (2020). Predicting cervical hyperextension injury: A covariance guided sine cosine support vector machine. IEEE Access, 8: 46895-46908.  https://doi.org/10.1109/ACCESS.2020.2978102    [Google Scholar]
  33. Manavalan B, Shin TH, and Lee G (2018). PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine. Frontiers in Microbiology, 9: 476.  https://doi.org/10.3389/fmicb.2018.00476    [Google Scholar] PMid:29616000 PMCid:PMC5864850
  34. Mehmood A, Farooq MS, Naseem A, Rustam F, Villar MG, Rodríguez CL, and Ashraf I (2022). Threatening URDU language detection from tweets using machine learning. Applied Sciences, 12: 10342.  https://doi.org/10.3390/app122010342    [Google Scholar]
  35. Naseer S, Ali RF, Khan YD, and Dominic PDD (2022). iGluK-Deep: Computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions. Journal of Biomolecular Structure and Dynamics, 40(22): 11691-11704.  https://doi.org/10.1080/07391102.2021.1962738    [Google Scholar] PMid:34396935
  36. Pallavi CV and Usha S (2024). Linear Z score and Gaussian radial artificial neural network big data analytics to enhance crop yield. Engineering, Technology and Applied Science Research, 14(5): 17125-17129.  https://doi.org/10.48084/etasr.8442    [Google Scholar]
  37. Perveen G, Alturise F, Alkhalifah T, and Daanial Khan Y (2023). Hemolytic-Pred: A machine learning-based predictor for hemolytic proteins using position and composition-based features. Digital Health, 9: 1-19.  https://doi.org/10.1177/20552076231180739    [Google Scholar] PMid:37434723 PMCid:PMC10331097
  38. Phloyphisut P, Pornputtapong N, Sriswasdi S, and Chuangsuwanich E (2019). MHCSeqNet: A deep neural network model for universal MHC binding prediction. BMC Bioinformatics, 20: 270.  https://doi.org/10.1186/s12859-019-2892-4    [Google Scholar] PMid:31138107 PMCid:PMC6540523
  39. Ru X, Li L, and Wang C (2019). Identification of phage viral proteins with hybrid sequence features. Frontiers in Microbiology, 10: 507.  https://doi.org/10.3389/fmicb.2019.00507    [Google Scholar] PMid:30972038 PMCid:PMC6443926
  40. Shah AA, Alturise F, Alkhalifah T, and Khan YD (2022a). Deep learning approaches for detection of breast adenocarcinoma causing carcinogenic mutations. International Journal of Molecular Sciences, 23(19): 11539.  https://doi.org/10.3390/ijms231911539    [Google Scholar] PMid:36232840 PMCid:PMC9570286
  41. Shah AA, Alturise F, Alkhalifah T, and Khan YD (2022b). Evaluation of deep learning techniques for identification of sarcoma-causing carcinogenic mutations. Digital Health, 8: 1-18.  https://doi.org/10.1177/20552076221133703    [Google Scholar] PMid:36312852 PMCid:PMC9597026
  42. Shah AA, Alturise F, Alkhalifah T, Faisal A, and Khan YD (2023). EDLM: Ensemble deep learning model to detect mutation for the early detection of cholangiocarcinoma. Genes, 14(5): 1104.  https://doi.org/10.3390/genes14051104    [Google Scholar] PMid:37239464 PMCid:PMC10217880
  43. Song X, Bao L, Feng C, Huang Q, Zhang F, Gao X, and Han R (2024). Accurate prediction of protein structural flexibility by deep learning integrating intricate atomic structures and Cryo-EM density information. Nature Communications, 15: 5538.  https://doi.org/10.1038/s41467-024-49858-x    [Google Scholar] PMid:38956032 PMCid:PMC11219796
  44. Suleman MT and Ali A (2021). Detection of phishing websites through computational intelligence. In the International Conference on Innovative Computing, IEEE, Lahore, Pakistan: 1-7.  https://doi.org/10.1109/ICIC53490.2021.9693034    [Google Scholar] PMid:33397497 PMCid:PMC7780590
  45. Suleman MT, Alkhalifah T, Alturise F, and Khan YD (2022). DHU-Pred: Accurate prediction of dihydrouridine sites using position and composition variant features on diverse classifiers. PeerJ, 10: e14104.  https://doi.org/10.7717/peerj.14104    [Google Scholar] PMid:36320563 PMCid:PMC9618264
  46. Suleman MT, Alturise F, Alkhalifah T, and Khan YD (2023). iDHU-Ensem: Identification of dihydrouridine sites through ensemble learning models. Digital Health, 9: 1-15.  https://doi.org/10.1177/20552076231165963    [Google Scholar] PMid:37009307 PMCid:PMC10064468
  47. Wang S, Jiang K, Chen J, Yang M, Fu Z, Wen T, and Yang D (2022a). Skeleton-based traffic command recognition at road intersections for intelligent vehicles. Neurocomputing, 501: 123-134.  https://doi.org/10.1016/j.neucom.2022.05.107    [Google Scholar]
  48. Wang Z, Gao X, and Zhang Y (2021). HA-Net: A lake water body extraction network based on hybrid-scale attention and transfer learning. Remote Sensing, 13(20): 4121.  https://doi.org/10.3390/rs13204121    [Google Scholar]
  49. Wang Z, Sun D, Jiang S, and Huang W (2022b). AChEI-EL: Prediction of acetylcholinesterase inhibitors based on ensemble learning model. In the 7 th International Conference on Big Data Analytics, IEEE, Guangzhou, China: 96-103.  https://doi.org/10.1109/ICBDA55095.2022.9760329    [Google Scholar]
  50. Yang Y, Fan C, and Zhao Q (2020). Recent advances on the machine learning methods in identifying phage virion proteins. Current Bioinformatics, 15(7): 657-661.  https://doi.org/10.2174/1574893614666191203155511    [Google Scholar]
  51. Zhan ZH, You ZH, Li LP, Zhou Y, and Yi HC (2018). Accurate prediction of ncRNA-protein interactions from the integration of sequence and evolutionary information. Frontiers in Genetics, 9: 458.  https://doi.org/10.3389/fgene.2018.00458    [Google Scholar] PMid:30349558 PMCid:PMC6186793
  52. Zulfiqar H, Guo Z, Ahmad RM, Ahmed Z, Cai P, Chen X, Zhang Y, Lin H, and Shi Z (2024). Deep-STP: A deep learning-based approach to predict snake toxin proteins by using word embeddings. Frontiers in Medicine, 10: 1291352.  https://doi.org/10.3389/fmed.2023.1291352    [Google Scholar] PMid:38298505 PMCid:PMC10829051