MACHINE LEARNING AND THE REUTERS COLLECTION-21578 IN DOCUMENT CLASSIFICATION
DOI:
https://doi.org/10.24054/rcta.v2i40.2344Keywords:
Document classification, naive bayes, logistic regression, SVMAbstract
Currently, it is very easy to produce documents, which means that there is too much information, and all this information produced is almost impossible to organize if automatic methods are not used. The automatic classification of documents can be defined as an action executed by an artificial system on a set of structured or unstructured documents. This action is performed by using the words contained in the documents to define the class to which the test document belongs. This paper presents several classification experiments using the Reuters-21578 database in order to observe the performance of naive Bayes classifiers, support vector machines (SVM) and logistic regression. The results obtained show the performance of the classifiers, their behavior when applying cleaning techniques to reduce the size of the documents and different classification scenarios.
Downloads
Downloads
Published
Versions
- 2022-07-25 (6)
- 2022-07-25 (5)
- 2022-07-25 (4)
- 2022-08-07 (3)
- 2023-07-19 (2)
- 2023-05-02 (1)
How to Cite
Issue
Section
License
Copyright (c) 2023 REVISTA COLOMBIANA DE TECNOLOGIAS DE AVANZADA (RCTA)
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.