Esta es un versión antigua publicada el 2022-08-07. Consulte la versión más reciente.

MACHINE LEARNING AND THE REUTERS COLLECTION-21578 IN DOCUMENT CLASSIFICATION

Autores/as

Juan Jose Paniagua Medina https://orcid.org/0009-0001-1835-286X
Everardo Vargas Rodriguez https://orcid.org/0000-0001-5480-3384
Rafael Guzman Cabrera https://orcid.org/0000-0002-9320-7021

DOI:

https://doi.org/10.24054/rcta.v2i40.2344

Palabras clave:

Document classification, naive bayes, logistic regression, SVM

Resumen

Currently, it is very easy to produce documents, which means that there is too much information, and all this information produced is almost impossible to organize if automatic methods are not used. The automatic classification of documents can be defined as an action executed by an artificial system on a set of structured or unstructured documents. This action is performed by using the words contained in the documents to define the class to which the test document belongs. This paper presents several classification experiments using the Reuters-21578 database in order to observe the performance of naive Bayes classifiers, support vector machines (SVM) and logistic regression. The results obtained show the performance of the classifiers, their behavior when applying cleaning techniques to reduce the size of the documents and different classification scenarios.

Descargas

Los datos de descargas todavía no están disponibles.

Descargas

Publicado

2023-05-02 — Actualizado el 2022-08-07

Versiones

Cómo citar

Paniagua Medina, J. J., Vargas Rodriguez, E., & Guzman Cabrera , R. (2022). MACHINE LEARNING AND THE REUTERS COLLECTION-21578 IN DOCUMENT CLASSIFICATION. REVISTA COLOMBIANA DE TECNOLOGIAS DE AVANZADA (RCTA), 2(40). https://doi.org/10.24054/rcta.v2i40.2344 (Original work published 2 de mayo de 2023)