Exploración del sesgo de género en la clasificación de ocupaciones de Colombia utilizando aprendizaje automático

Deimer de Jesús Ramos Cuello; Alveiro Alonso Rosado Gomez; Maritza Liliana Calderón Benavides

doi:10.24054/rcta.v2i44.3010

Autores/as

Deimer de Jesús Ramos Cuello Universidad Autónoma de Bucaramanga https://orcid.org/0009-0005-1108-0549
Alveiro Alonso Rosado Gomez Universidad Francisco de Paula Santander https://orcid.org/0000-0003-2932-3383
Maritza Liliana Calderón Benavides Universidad Autónoma de Bucaramanga https://orcid.org/0000-0001-8658-9036

DOI:

https://doi.org/10.24054/rcta.v2i44.3010

Palabras clave:

aprendizaje automático, aprendizaje supervisado, equidad en inteligencia artificial, incrustaciones de palabras, procesamiento del lenguaje natural

Resumen

El artículo explora el uso de Word2Vec y FastText para convertir nombres de ocupaciones en representaciones vectoriales y analizar su polaridad de género. Se emplearon dos bases de datos colombianas para preparar y limpiar los datos. Mediante clasificadores, se evaluó cómo la polaridad de género afecta la clasificación de ocupaciones y salarios. Se utilizó ANOVA y pruebas de Tukey para el análisis estadístico. Se descubrió que modelos como ExtraTreesClassifier y XGBClassifier presentaron menores diferencias de precisión entre géneros, sugiriendo que tienden a clasificar con mayor exactitud a los hombres. Sin embargo, no se evidenció una preferencia clara en las predicciones de los modelos hacia un género específico tras manipular las variables relacionadas con denominaciones profesionales. El estudio destaca la importancia de abordar los sesgos sistémicos en representaciones semánticas que pueden perpetuar prejuicios existentes.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

N. Bantilan, «Themis-ml: A Fairness-Aware Machine Learning Interface for End-To-End Discrimination Discovery and Mitigation,» Journal of Technology in Human Services, pp. 15-30, 2018. DOI: https://doi.org/10.1080/15228835.2017.1416512

J. Borana, «Applications of Artificial Intelligence & Associated Technologies,» de International Conference on Emerging Technologies in Engineering, Biomedical, Management and Science, Jodhpur, 2016.

R. Burke, «Multisided Fairness for Recommendation,» de Fairness, Accountability, and Transparency in Machine Learning, Halifax, 2017.

N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman y A. Galstyan, «A Survey on Bias and Fairness in Machine Learning,» arXiv, pp. 1-31, 2019. DOI: https://doi.org/10.1145/3457607

S. Chowdhury y A. Nath, «Trends In Natural Language Processing : Scope And Challenges,» International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2021.

B. Dev, A. Singh, N. Uppal, A. Rizwan, V. Sri y S. Suman, «Survey Paper: Study of Natural Language Processing and its Recent Applications,» International Conference on Innovative Sustainable Computational Technologies (CISCT), pp. 1-5, 2022. DOI: https://doi.org/10.1109/CISCT55310.2022.10046440

A. Nohria y H. Kaur, «Evaluation of Parsing Techniques in Natural Language Processing,» International Journal of Computer Trends and Technology, 2018. DOI: https://doi.org/10.14445/22312803/IJCTT-V60P104

A. Gerek, M. C. Yüney, E. Erkaya y M. C. Ganiz, «Effects of Positivization on the Paragraph Vector Model,» IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1-5, 2019. DOI: https://doi.org/10.1109/INISTA.2019.8778304

N. Swinger, M. De-Arteaga, N. T. Heffernan IV, M. Leiserson y A. T. Kalai, «What Are the Biases in My Word Embedding?,» de Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, Honolulu, 2019. DOI: https://doi.org/10.1145/3306618.3314270

T. Mikolov, I. Sutskever, K. Chen, G. Corrado y J. Dean, «Distributed Representations of Words and Phrases and their Compositionality,» arXiv, pp. 1-9, 2013.

P. Bojanowski, E. Grave, A. Joulin y T. Mikolov, «Enriching Word Vectors with Subword Information,» arXiv, 2016. DOI: https://doi.org/10.1162/tacl_a_00051

T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama y A. Kalai, «Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings,» arXiv, 2016.

A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, Sebastopol: O’Reilly, 2019.

C. Lopez, A. Gazgalis, V. Boddapati, R. Shah, J. Cooper y J. Geller, «Artificial Learning and Machine Learning Decision Guidance Applications in Total Hip and Knee Arthroplasty: A Systematic Review,» Arthroplasty Today, pp. 103-112, 2021. DOI: https://doi.org/10.1016/j.artd.2021.07.012

A. Caliskan, P. P. Ajay, T. Charlesworth, R. Wolfe y M. Banaji, «Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics,» arXiv, pp. 1-15, 2022. DOI: https://doi.org/10.1145/3514094.3534162

Y. Shrestha y Y. Yang, «Fairness in Algorithmic Decision-Making: Applications in Multi-Winner Voting, Machine Learning, and Recommender Systems,» Algorithms, vol. 12, pp. 1-28, 2019. DOI: https://doi.org/10.3390/a12090199

H. Chung, C. Park, W. S. Kang y J. Lee, «Gender Bias in Artificial Intelligence: Severity Prediction at an Early Stage of COVID-19,» Front Physio, 2021. DOI: https://doi.org/10.3389/fphys.2021.778720

U. Mahadeo y R. Dhanalakshmi, «Stability of feature selection algorithm: A review,» Journal of King Saud University – Computer and Information Sciences, p. 1060 –1073, 2022. DOI: https://doi.org/10.1016/j.jksuci.2019.06.012

P. S. Varsha, «How can we manage biases in artificial intelligence systems – A systematic literature review,» International Journal of Information Management Data Insights, pp. 1-9, 2023. DOI: https://doi.org/10.1016/j.jjimei.2023.100165

A. Bhattacharya, Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more, Birmingham: Packt, 2022.