International Journal of

ADVANCED AND APPLIED SCIENCES

EISSN: 2313-3724, Print ISSN: 2313-626X

Frequency: 12

line decor
  
line decor

 Volume 10, Issue 7 (July 2023), Pages: 219-223

----------------------------------------------

 Original Research Paper

Respiratory disease classification using selected data mining techniques

 Author(s): 

 Abrahem P. Anqui *

 Affiliation(s):

 College of Technology, Cebu Technological University-Naga Campus, City of Naga, Cebu, Philippines

  Full Text - PDF          XML

 * Corresponding Author. 

  Corresponding author's ORCID profile: https://orcid.org/0009-0000-1231-569X

 Digital Object Identifier: 

 https://doi.org/10.21833/ijaas.2023.07.024

 Abstract:

Lung cancer, known for its high mortality rate, continues to claim numerous lives worldwide. Early detection has proven to offer significant advantages, substantially improving the prospects for successful treatment, medication, and the healing process. Despite various classification methods used to identify certain illnesses, their accuracy has often been suboptimal. In this paper, we employ Linear Discriminant Analysis (LDA) as a classifier and dimensionality reduction model to enhance the predictive accuracy of lung cancer presence. This study aims to predict the occurrence of lung cancer by utilizing a set of predictor variables, including gender, age, allergy, swallowing difficulty, coughing, fatigue, alcohol consumption, wheezing, shortness of breath, yellowish finger, chronic disease, smoking, chest pain, anxiety, and peer pressure. The goal is to enable early diagnosis, leading to timely and effective interventions. The results of our investigation demonstrate that LDA achieves an impressive accuracy rate of 92.2% in predicting lung cancer presence, surpassing the performance of the C4.5 and Naïve Bayes classifiers. This finding underscores the potential of LDA as a valuable tool for the early detection of lung cancer, ultimately contributing to improved patient outcomes. Through the utilization of LDA, we hope to advance the field of medical diagnostics and enhance the prospects for successful lung cancer management and treatment.

 © 2023 The Authors. Published by IASE.

 This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

 Keywords: Lung cancer, Early detection, Linear discriminant analysis, Predictive accuracy, Medical diagnostics

 Article History: Received 25 February 2023, Received in revised form 11 June 2023, Accepted 12 June 2023

 Acknowledgment 

No Acknowledgment.

 Compliance with ethical standards

 Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

 Citation:

 Anqui AP (2023). Respiratory disease classification using selected data mining techniques. International Journal of Advanced and Applied Sciences, 10(7): 219-223

 Permanent Link to this page

 Figures

 Fig. 1 Fig. 2 Fig. 3

 Tables

 Table 1 Table 2

----------------------------------------------   

 References (27)

  1. Al-Nasa'h M, Awwad FMA, and Ahmad I (2021). Estimating students' online learning satisfaction during COVID-19: A discriminant analysis. Heliyon, 7(12): e08544. https://doi.org/10.1016/j.heliyon.2021.e08544   [Google Scholar] PMid:34909480 PMCid:PMC8662340
  2. Anguera A, Barreiro JM, Lara JA, and Lizcano D (2016). Applying data mining techniques to medical time series: An empirical case study in electroencephalography and stabilometry. Computational and Structural Biotechnology Journal, 14: 185-199. https://doi.org/10.1016/j.csbj.2016.05.002   [Google Scholar] PMid:27293535 PMCid:PMC4887593
  3. Babu I, Balan RS, and Mathai PP (2019). Machine learning approaches used for prediction in diverse fields. International Journal of Recent Technology and Engineering, 8(2S4): 762-768. https://doi.org/10.35940/ijrte.B1154.0782S419   [Google Scholar]
  4. Cui M, Prasad S, Mahrooghy M, Bruce LM, and Aanstoos J (2011). Genetic algorithms and linear discriminant analysis based dimensionality reduction for remotely sensed image analysis. In the IEEE International Geoscience and Remote Sensing Symposium, IEEE, Vancouver, Canada: 2373-2376. https://doi.org/10.1109/IGARSS.2011.6049687   [Google Scholar]
  5. Delima AJP (2019). Predicting scholarship grants using data mining techniques. International Journal of Machine Learning and Computing, 9(4): 513-519. https://doi.org/10.18178/ijmlc.2019.9.4.834   [Google Scholar]
  6. Goyal A and Mehta R (2012). Performance comparison of Naïve Bayes and J48 classification algorithms. International Journal of Applied Engineering Research, 7(11): 1389-1393.   [Google Scholar]
  7. Han J, Kamber M, and Pei J (2011). Data mining: Concepts and techniques. 3rd Edition, Morgan Kaufmann Publishers, Burlington, USA.   [Google Scholar]
  8. Hossain MM, Swarna RA, Mostafiz R, Shaha P, Pinky LY, Rahman MM, and Iqbal MS (2022). Analysis of the performance of feature optimization techniques for the diagnosis of machine learning-based chronic kidney disease. Machine Learning with Applications, 9: 100330. https://doi.org/10.1016/j.mlwa.2022.100330   [Google Scholar]
  9. Hussain S, Dahan NA, Ba-Alwib FM, and Ribata N (2018). Educational data mining and analysis of students’ academic performance using WEKA. Indonesian Journal of Electrical Engineering and Computer Science, 9(2): 447-459. https://doi.org/10.11591/ijeecs.v9.i2.pp447-459   [Google Scholar]
  10. Li CN, Shao YH, Chen WJ, Wang Z, and Deng NY (2021). Generalized two-dimensional linear discriminant analysis with regularization. Neural Networks, 142: 73-91. https://doi.org/10.1016/j.neunet.2021.04.030   [Google Scholar] PMid:33984737
  11. Mapa JS, Sison A, and Medina RP (2019). A modified C4.5 classification algorithm: With the discretization method in calculating the goodness score equivalent. In the IEEE 6th International Conference on Engineering Technologies and Applied Sciences, IEEE, Kuala Lumpur, Malaysia: 1-4. https://doi.org/10.1109/ICETAS48360.2019.9117309   [Google Scholar]
  12. Osuna-Galán I, Pérez-Pimentel Y, and Aviles-Cruz C (2022). A novel 2D clustering algorithm based on recursive topological data structure. Symmetry, 14(4): 781. https://doi.org/10.3390/sym14040781   [Google Scholar]
  13. Petelin G, Cenikj G, and Eftimov T (2023). Towards understanding the importance of time-series features in automated algorithm performance prediction. Expert Systems with Applications, 213: 119023. https://doi.org/10.1016/j.eswa.2022.119023   [Google Scholar]
  14. Phoenix P, Sudaryono R, and Suhartono D (2021). Classifying promotion images using optical character recognition and Naïve Bayes classifier. Procedia Computer Science, 179: 498-506. https://doi.org/10.1016/j.procs.2021.01.033   [Google Scholar]
  15. Ponciano R, Pais S, and Casal J (2015). Using accuracy analysis to find the best classifier for intelligent personal assistants. Procedia Computer Science, 52: 310-317. https://doi.org/10.1016/j.procs.2015.05.090   [Google Scholar]
  16. Pradeep KR and Naveen NC (2018). Lung cancer survivability prediction based on performance using classification techniques of support vector machines, C4.5 and Naive Bayes algorithms for healthcare analytics. Procedia Computer Science, 132: 412-420. https://doi.org/10.1016/j.procs.2018.05.162   [Google Scholar]
  17. Prather JC, Lobach DF, Goodwin LK, Hales JW, Hage ML, and Hammond WE (1997). Medical data mining: Knowledge discovery in a clinical data warehouse. In the AMIA Annual Fall Symposium, American Medical Informatics Association, 101-105.   [Google Scholar]
  18. Quinlan JR (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo, USA.   [Google Scholar]
  19. Saranya T, Sridevi S, Deisy C, Chung TD, and Khan MA (2020). Performance analysis of machine learning algorithms in intrusion detection system: A review. Procedia Computer Science, 171: 1251-1260. https://doi.org/10.1016/j.procs.2020.04.133   [Google Scholar]
  20. Senturk ZK (2020). Early diagnosis of Parkinson’s disease using machine learning algorithms. Medical Hypotheses, 138: 109603. https://doi.org/10.1016/j.mehy.2020.109603   [Google Scholar] PMid:32028195
  21. Subba B, Biswas S, and Karmakar S (2015). Intrusion detection systems using linear discriminant analysis and logistic regression. In the Annual IEEE India Conference, IEEE, New Delhi, India: 1-6. https://doi.org/10.1109/INDICON.2015.7443533   [Google Scholar]
  22. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, and Bray F (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 71(3): 209-249. https://doi.org/10.3322/caac.21660   [Google Scholar] PMid:33538338
  23. Şuşnea E (2011). Data mining techniques used in on-line military training. In the 7th International Scientific Conference E-learning and Software for Education, Bucharest, Romania: 201-205.   [Google Scholar]
  24. Taylor C, Guy J, and Bacardit J (2022). Prediction of growth in grower-finisher pigs using recurrent neural networks. Biosystems Engineering, 220: 114-134. https://doi.org/10.1016/j.biosystemseng.2022.05.016   [Google Scholar]
  25. Thapngam T, Yu S, and Zhou W (2012). DDoS discrimination by linear discriminant analysis (LDA). In the International Conference on Computing, Networking and Communications, IEEE, Maui, USA: 532-536. https://doi.org/10.1109/ICCNC.2012.6167480   [Google Scholar]
  26. Vasudha Rani V, Das S, and Kundu TK (2022). Risk prediction model for lung cancer disease using machine learning techniques. In: Saini HS, Sayal R, Govardhan A, and Buyya R (Eds.), Innovations in computer science and engineering: Proceedings of the Ninth ICICSE: 417-425. Springer, Singapore, Singapore. https://doi.org/10.1007/978-981-16-8987-1_44   [Google Scholar]
  27. Yadav SK and Pal S (2012). Data mining: A prediction for performance improvement of engineering students using classification. World of Computer Science and Information Technology Journal, 2(2): 51-56.   [Google Scholar]