Application of machine learning and CRISP-DM methodology for accurate severity classification of dengue

Authors

DOI:

https://doi.org/10.24054/rcta.v1i43.2822

Keywords:

Data Science, CRISP-DM, Dengue, Machine Learning

Abstract

The project focuses on accurately classifying the severity of Dengue cases in Casanare, Colombia, using Machine Learning (ML) and the CRISP-DM methodology. The target variable is "final classification," which categorizes cases into Dengue without warning signs and Dengue with warning signs. Several models and techniques were tested, with 'RandomForest' standing out as the most effective due to its high performance, achieving an accuracy of 100%. This improvement in classification will enable early and precise identification of case severity, which, in turn, can enhance medical care and intervention strategies. The "Dengue Cases in Casanare by hospital service, person type, symptoms, and hospital status" database was used to support the analysis.

Author Biographies

Carlos Alberto Mejia Rodriguez, Universidad Popular del Cesar

System Engineer, Specialist in Educational Informatics, Master in E-learning. Member of the GIDEATIC Research Group.

Miguel Alberto Rincon Pinzon, Universidad Popular del Cesar

Systems Engineer, Bachelor of Foreign Language: English, Master in Educational Technology Management. Leader of the GIDEATIC Research Group categorized in category C endorsed by the Popular University of Cesar.

Luis Manuel Palmera Quintero, Universidad Popular del Cesar

Systems Engineer, Master's in Information Technology Governance.

Lina Marcela Arevalo Vergel, Universidad popular del cesar

Industrial Engineer, Specialized in Occupational Health, Specialized in Project Management, Specialized in Digital Technologies Applied to Education. Teacher-researcher in the GIDEATIC Research Group.

References

Medina L., E. H. Big Data: Los Datos como Generadores de Valor. Universidad Peruana de Ciencias Aplicadas. 2023.

Casas R., J., Nin G., J., & Julbe L., F. (2019). Big data: análisis de datos en entornos masivos. Editorial UOC.

López M., J. J. y Zarza, G. (2017). La ingeniería del big data: cómo trabajar con datos. Editorial UOC. Barcelona, España.

Maldonado, S. (2022). Analytics y Big Data: ciencia de los Datos aplicada al mundo de los negocios. RIL editores.

Suarez L, A. A., Vazquez S., C. R., & Huffel, S. Van. (2018). Machine learning approaches for ambulatory electrocardiography signal processing.

Rios Insua, D., & Gomez-Ullate Oteiza, D. (2019). Big data: conceptos, tecnologias y aplicaciones. Editorial CSIC Consejo Superior de Investigaciones Cientificas.

Arnst, M., Louppe, G., Van Hulle, R., Gillet, L., Bureau, F., & Denoel, V. (2022). A hybrid stochastic model and its Bayesian identification for infectious disease screening in a university campus with application to massive COVID-19 screening at the University of Liège. Mathematical Biosciences, 347. https://doi.org/10.1016/j.mbs.2022.108805

Gutierrez-Barbosa, H., Medina-Moreno, S., Zapata, J. C., & Chua, J. V. (2020). Dengue Infections in Colombia: Epidemiological Trends of a Hyperendemic Country. Tropical Medicine and Infectious Disease, 5(4).

Gangula, R., Thirupathi, L., Parupati, R., Sreeveda, K., & Gattoju, S. (2023). Ensemble machine learning based prediction of dengue disease with performance and accuracy elevation patterns. Materials Today: Proceedings, 80, 3458–3463. https://doi.org/10.1016/j.matpr.2021.07.270

Castillo Romero, J. A. (2019). Big data. IFCT128PO. IC Editorial.

Organización Mundial de La Salud. (2023). Dengue y dengue grave. WHO.

Kadenic, M. D., Koumaditis, K., & Junker-Jensen, L. (2023). Mastering scrum with a focus on team maturity and key components of scrum. Information and Software Technology, 153, 107079. https://doi.org/10.1016/j.infsof.2022.107079

Treatments for dengue: a Global Dengue Alliance to address unmet needs. (2023). The Lancet Global Health. https://doi.org/10.1016/S2214-109X(23)00362-5

Nariya, M. K., Mills, C. E., Sorger, P. K., & Sokolov, A. (2023). Paired evaluation of machine-learning models characterizes effects of confounders and outliers. Patterns, 4(8), 100791. https://doi.org/10.1016/j.patter.2023.100791-

Menoyo R., D., Garcia L., E., & Garcia C., A. (2021). Fundamentos de la ciencia de datos. Editorial Universidad de Alcala.

Minguillon, J., Casas, J., & Minguillon, J. (2017). Mineria de datos: modelos y algoritmos. Editorial UOC.

Kotu, V., & Deshpande, B. (2019). Chapter 14 - Feature Selection. In V. Kotu & B. Deshpande (Eds.), Data Science (Second Edition) (pp. 467–490). Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-814761-0.00014-9

Caballero, R., & Martin, E. (2022). Las bases de big data y de la inteligencia artificial. Los libros de la Catarata.

Edgar, T. W., & Manz, D. O. (2017). Chapter 4 - Exploratory Study. In T. W. Edgar & D. O. Manz (Eds.), Research Methods for Cyber Security (pp. 95–130). Syngress. https://doi.org/10.1016/B978-0-12-805349-2.00004-2

Denoux, T., Kanjanatarakul, O., & Sriboonchitta, S. (2019). A new evidential K-nearest neighbor rule based on contextual discounting with partially supervised learning. International Journal of Approximate Reasoning, 113, 287–302. https://doi.org/10.1016/j.ijar.2019.07.009

Malik, A., Javeri, Y. T., Shah, M., & Mangrulkar, R. (2022). Chapter 11 - Impact analysis of COVID-19 news headlines on global economy. In R. C. Poonia, B. Agarwal, S. Kumar, M. S. Khan, G. Marques, & J. Nayak (Eds.), Cyber-Physical Systems (pp. 189–206). Academic Press. https://doi.org/10.1016/B978-0-12-824557-6.00001-7

Published

2024-03-16 — Updated on 2024-03-16

Versions

How to Cite

Mejia Rodriguez, C. A., Rincon Pinzon, M. A., Palmera Quintero, L. M., & Arevalo Vergel, L. M. (2024). Application of machine learning and CRISP-DM methodology for accurate severity classification of dengue. COLOMBIAN JOURNAL OF ADVANCED TECHNOLOGIES, 1(43), 78–85. https://doi.org/10.24054/rcta.v1i43.2822

Most read articles by the same author(s)