This is an outdated version published on 2022-08-07. Read the most recent version.

MACHINE LEARNING AND THE REUTERS COLLECTION-21578 IN DOCUMENT CLASSIFICATION

Authors

Juan Jose Paniagua Medina https://orcid.org/0009-0001-1835-286X
Everardo Vargas Rodriguez https://orcid.org/0000-0001-5480-3384
Rafael Guzman Cabrera https://orcid.org/0000-0002-9320-7021

DOI:

https://doi.org/10.24054/rcta.v2i40.2344

Keywords:

Document classification, naive bayes, logistic regression, SVM

Abstract

Currently, it is very easy to produce documents, which means that there is too much information, and all this information produced is almost impossible to organize if automatic methods are not used. The automatic classification of documents can be defined as an action executed by an artificial system on a set of structured or unstructured documents. This action is performed by using the words contained in the documents to define the class to which the test document belongs. This paper presents several classification experiments using the Reuters-21578 database in order to observe the performance of naive Bayes classifiers, support vector machines (SVM) and logistic regression. The results obtained show the performance of the classifiers, their behavior when applying cleaning techniques to reduce the size of the documents and different classification scenarios.

Downloads

Download data is not yet available.

Downloads

Pdf (Español (España))

Published

2023-05-02 — Updated on 2022-08-07

Versions

How to Cite

Paniagua Medina, J. J., Vargas Rodriguez, E., & Guzman Cabrera , R. (2022). MACHINE LEARNING AND THE REUTERS COLLECTION-21578 IN DOCUMENT CLASSIFICATION. COLOMBIAN JOURNAL OF ADVANCED TECHNOLOGIES, 2(40). https://doi.org/10.24054/rcta.v2i40.2344 (Original work published May 2, 2023)