Application of machine learning and CRISP-DM methodology for accurate severity classification of dengue
DOI:
https://doi.org/10.24054/rcta.v1i43.2822Keywords:
Data Science, CRISP-DM, Dengue, Machine LearningAbstract
The project focuses on accurately classifying the severity of Dengue cases in Casanare, Colombia, using Machine Learning (ML) and the CRISP-DM methodology. The target variable is "final classification," which categorizes cases into Dengue without warning signs and Dengue with warning signs. Several models and techniques were tested, with 'RandomForest' standing out as the most effective due to its high performance, achieving an accuracy of 100%. This improvement in classification will enable early and precise identification of case severity, which, in turn, can enhance medical care and intervention strategies. The "Dengue Cases in Casanare by hospital service, person type, symptoms, and hospital status" database was used to support the analysis.
Downloads
References
Medina L., E. H. Big Data: Los Datos como Generadores de Valor. Universidad Peruana de Ciencias Aplicadas. 2023.
Casas R., J., Nin G., J., & Julbe L., F. (2019). Big data: análisis de datos en entornos masivos. Editorial UOC.
López M., J. J. y Zarza, G. (2017). La ingeniería del big data: cómo trabajar con datos. Editorial UOC. Barcelona, España.
Maldonado, S. (2022). Analytics y Big Data: ciencia de los Datos aplicada al mundo de los negocios. RIL editores.
Suarez L, A. A., Vazquez S., C. R., & Huffel, S. Van. (2018). Machine learning approaches for ambulatory electrocardiography signal processing.
Rios Insua, D., & Gomez-Ullate Oteiza, D. (2019). Big data: conceptos, tecnologias y aplicaciones. Editorial CSIC Consejo Superior de Investigaciones Cientificas.
Arnst, M., Louppe, G., Van Hulle, R., Gillet, L., Bureau, F., & Denoel, V. (2022). A hybrid stochastic model and its Bayesian identification for infectious disease screening in a university campus with application to massive COVID-19 screening at the University of Liège. Mathematical Biosciences, 347. https://doi.org/10.1016/j.mbs.2022.108805
Gutierrez-Barbosa, H., Medina-Moreno, S., Zapata, J. C., & Chua, J. V. (2020). Dengue Infections in Colombia: Epidemiological Trends of a Hyperendemic Country. Tropical Medicine and Infectious Disease, 5(4).
Gangula, R., Thirupathi, L., Parupati, R., Sreeveda, K., & Gattoju, S. (2023). Ensemble machine learning based prediction of dengue disease with performance and accuracy elevation patterns. Materials Today: Proceedings, 80, 3458–3463. https://doi.org/10.1016/j.matpr.2021.07.270
Castillo Romero, J. A. (2019). Big data. IFCT128PO. IC Editorial.
Organización Mundial de La Salud. (2023). Dengue y dengue grave. WHO.
Kadenic, M. D., Koumaditis, K., & Junker-Jensen, L. (2023). Mastering scrum with a focus on team maturity and key components of scrum. Information and Software Technology, 153, 107079. https://doi.org/10.1016/j.infsof.2022.107079
Treatments for dengue: a Global Dengue Alliance to address unmet needs. (2023). The Lancet Global Health. https://doi.org/10.1016/S2214-109X(23)00362-5
Nariya, M. K., Mills, C. E., Sorger, P. K., & Sokolov, A. (2023). Paired evaluation of machine-learning models characterizes effects of confounders and outliers. Patterns, 4(8), 100791. https://doi.org/10.1016/j.patter.2023.100791-
Menoyo R., D., Garcia L., E., & Garcia C., A. (2021). Fundamentos de la ciencia de datos. Editorial Universidad de Alcala.
Minguillon, J., Casas, J., & Minguillon, J. (2017). Mineria de datos: modelos y algoritmos. Editorial UOC.
Kotu, V., & Deshpande, B. (2019). Chapter 14 - Feature Selection. In V. Kotu & B. Deshpande (Eds.), Data Science (Second Edition) (pp. 467–490). Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-814761-0.00014-9
Caballero, R., & Martin, E. (2022). Las bases de big data y de la inteligencia artificial. Los libros de la Catarata.
Edgar, T. W., & Manz, D. O. (2017). Chapter 4 - Exploratory Study. In T. W. Edgar & D. O. Manz (Eds.), Research Methods for Cyber Security (pp. 95–130). Syngress. https://doi.org/10.1016/B978-0-12-805349-2.00004-2
Denoux, T., Kanjanatarakul, O., & Sriboonchitta, S. (2019). A new evidential K-nearest neighbor rule based on contextual discounting with partially supervised learning. International Journal of Approximate Reasoning, 113, 287–302. https://doi.org/10.1016/j.ijar.2019.07.009
Malik, A., Javeri, Y. T., Shah, M., & Mangrulkar, R. (2022). Chapter 11 - Impact analysis of COVID-19 news headlines on global economy. In R. C. Poonia, B. Agarwal, S. Kumar, M. S. Khan, G. Marques, & J. Nayak (Eds.), Cyber-Physical Systems (pp. 189–206). Academic Press. https://doi.org/10.1016/B978-0-12-824557-6.00001-7
Additional Files
Published
Versions
- 2024-03-16 (3)
- 2024-03-16 (2)
- 2024-03-16 (1)
How to Cite
Issue
Section
License
Copyright (c) 2024 COLOMBIAN JOURNAL OF ADVANCED TECHNOLOGIES
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.