Generation of synthetic data to evaluate the bovine leukosis infectious disease

Authors

DOI:

https://doi.org/10.24054/rcta.v1i41.2556

Keywords:

Machine learning, Bovine infectious diseases, Synthetic data, Leucosis

Abstract

The projects that are conducted in the animal health sector face technological and scientific limitations due both to the lack of consistent and reliable information, and to the high costs of collecting information for farmers. Likewise, legal limitations on the disclosure of information for reasons such as data protection laws lead to delays in the development of policies and strategies, as well as in decision-making. Given this lack of information availability, the generation of synthetic data from a set of original data emerges as a solution. Thus, this paper presents a study through which three methods to generate synthetic data that reflect the behavior of a bovine disease in a set of real data were evaluated. The work was based on comparing machine learning algorithms, tools, and model-based methods to improve the realism of synthetic data of disease behavior. The goal was to find the best model for the generation of synthetic data using the case of the bovine mastitis infectious disease, since there is not enough data for it. In order to validate the synthetic data, it was necessary to contrast the original data set and the synthetic information, looking for the selected method to generate synthetic data with qualities similar to those of the original data set.

References

Andrade Becerra, R., Caro Carvajal, Z., Pulido Medellín, M., Porras Vargas, J., & Vargas Abella, J. (2014a). Prevalencia de bacterias causantes de mastitis en fincas lecheras de Toca (Boyacá, Colombia). Ciencia y Agricultura, 11, 47–53.

Andrade Becerra, R., Caro Carvajal, Z., Pulido Medellín, M., Porras Vargas, J., & Vargas Abella, J. (2014b). Prevalencia de bacterias causantes de mastitis en fincas lecheras de Toca (Boyacá, Colombia). Ciencia y Agricultura, 11, 47–53.

Ballesteros-Ricaurte, J.-A., Avendaño-Fernández, E., González-Amarillo, A.-M., & Granados- Comba, A. (2021a). Mapeo científico en la búsqueda de información. Caso de estudio: enfermedades infecciosas en bovinos. Revista Científica, 42(3), 265–275. https://doi.org/10.14483/23448350.17532

Ballesteros-Ricaurte, J.-A., Avendaño-Fernández, E., González-Amarillo, A.-M., & Granados- Comba, A. (2021b). Mapeo científico en la búsqueda de información. Caso de estudio: enfermedades infecciosas en bovinos. Revista Científica, 42(3), 265–275. https://doi.org/10.14483/23448350.17532

Goncalves, A., Ray, P., Soper, B., Stevens, J., Coyle, L., & Sales, A. P. (2020). Generation and evaluation of synthetic patient data. BMC Medical Research Methodology, 20(1), 1–40. https://doi.org/10.1186/s12874-020-00977-1

González Martínez, E. F. (2021). Generador de datos sintéticos para el monitoreo de transacciones con factores de riesgo de lavado de activos. (Tesis de Maestría). (Universida).

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139–144. https://doi.org/10.1145/3422622

Lopez-Martin, M., Carro, B., & Sanchez-Esguevillas, A. (2018a). Variational data generative model for intrusion detection. Knowledge and Information Systems, 60(1), 569–590. https://doi.org/10.1007/s10115-018-1306-7

Lopez-Martin, M., Carro, B., & Sanchez- Esguevillas, A. (2018b). Variational data generative model for intrusion detection. Knowledge and Information Systems, 60(1), 569–590. https://doi.org/10.1007/s10115-018-1306-7

MOSTLY AI Inc. (n.d.). Mostly. 2020. Retrieved October 20, 2020, from https://mostly.ai

Olmedo Vélez, V., & Narváez Tello, C. (2021). Generación de un conjunto de datos sintéticos mediante técnicas de aprendizaje automático para análisis de fraude (Trabajo de grado) (E.P. Nacional, Ed.; Escuela Po).

Ordóñez, H., Cobos, C., & Bucheli, V. (2020). Machine learning model for predicting theft trends in Colombia. RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao, 2020(E29), 494–506.

Pulido-Medellín, M., González-Ariza, W., Bayona- Ríos, H., & Chavarro-Tulcán, G. (2017a). Determinación de Leucosis enzootica bovina mediante las claves Hematológicas de Göttigen y Elisa en Boyacá. Rev. Fac.Cs. Vets., 58(1), 10–16.

Pulido-Medellín, M., González-Ariza, W., Bayona- Ríos, H., & Chavarro-Tulcán, G. (2017b). Determinación De Leucosis enzoótica Bovina meDiante Las cLaves HematoLógicas De göttingen y eLisa en Boyacá, coLomBia Enzootic Bovine Leukosis Assessment by Hematology Gottingen Keys and ELISA in Boyacá, Colombia. Rev. Fac. Cs. Vets., 58(1), 10–16.

Raschka, S., & Mirjalili, V. (2019). Python Machine Learning (Segunda Ed). Marcombo.

Shah, S., Gandhi, D., & Kothari, J. (2020). Machine learning based Synthetic Data Generation using Iterative Regression Analysis. In Fourth International Conference on Electronics, Communication and Aerospace Technology (pp. 1093–1100). https://doi.org/10.1109/ICECA49313.2020.9297491

Spositto, O., Blanco, G., Matteo, L., & Levi, M. (2020). SMOTE , Algoritmo para balanceo de clases en un estudio aplicado a la ganadería . XXVI Congreso Argentino de Ciencias de La Computación - CACIC, 289–298.

Surendra, H., & Mohan, H. S. (2017a). A Review Of Synthetic Data Generation Methods For Privacy Preserving Data Publishing. International Journal of Scientific & Technology Research, 6(3), 95–101.

Surendra, H., & Mohan, H. S. (2017b). A Review Of Synthetic Data Generation Methods For Privacy Preserving Data Publishing. International Journal of Scientific & Technology Research, 6(3), 95–101.

Tan, C., Behjati, R., & Arisholm, E. (2019). A model-based approach to generate dynamic synthetic test data: A conceptual model. In IEEE 12th International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2019 (pp. 11–14). IEEE. https://doi.org/10.1109/ICSTW.2019.00026

Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., & Bennett, K. P. (2020a). Generation and evaluation of privacy preserving synthetic health data. Neurocomputing, 416, 244–255. https://doi.org/10.1016/j.neucom.2019.12.13 6

Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., & Bennett, K. P. (2020b). Generation and evaluation of privacy preserving synthetic health data. Neurocomputing, 416, 244–255. https://doi.org/10.1016/j.neucom.2019.12.13 6

Published

2023-10-19 — Updated on 2023-05-18

Versions

How to Cite

Ballesteros-Ricaurte, J. A., González- Sanabria, J. S., & Ordóñez, H. (2023). Generation of synthetic data to evaluate the bovine leukosis infectious disease. COLOMBIAN JOURNAL OF ADVANCED TECHNOLOGIES, 1(41), 115–122. https://doi.org/10.24054/rcta.v1i41.2556 (Original work published October 19, 2023)