MACHINE LEARNING AND THE REUTERS COLLECTION-21578 IN DOCUMENT CLASSIFICATION
DOI:
https://doi.org/10.24054/rcta.v2i40.2344Palabras clave:
Document classification, naive bayes, logistic regression, SVMResumen
Currently, it is very easy to produce documents, which means that there is too much information, and all this information produced is almost impossible to organize if automatic methods are not used. The automatic classification of documents can be defined as an action executed by an artificial system on a set of structured or unstructured documents. This action is performed by using the words contained in the documents to define the class to which the test document belongs. This paper presents several classification experiments using the Reuters-21578 database in order to observe the performance of naive Bayes classifiers, support vector machines (SVM) and logistic regression. The results obtained show the performance of the classifiers, their behavior when applying cleaning techniques to reduce the size of the documents and different classification scenarios.
Descargas
Descargas
Publicado
Versiones
- 2022-07-25 (6)
- 2022-07-25 (5)
- 2022-07-25 (4)
- 2022-08-07 (3)
- 2023-07-19 (2)
- 2023-05-02 (1)
Cómo citar
Número
Sección
Licencia
Derechos de autor 2023 REVISTA COLOMBIANA DE TECNOLOGIAS DE AVANZADA (RCTA)
Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.