Exploring gender bias in Colombian occupation classification using machine learning

Authors

DOI:

https://doi.org/10.24054/rcta.v2i44.3010

Keywords:

machine learning, supervised learning, equity in artificial intelligence, word embeddings, natural language processing

Abstract

The paper explores using Word2Vec and FastText to convert occupational names into vector representations and analyze their gender polarity. Two Colombian databases were used to prepare and clean the data. Using classifiers, we evaluated how gender polarity affects the classification of occupations and salaries. ANOVA and Tukey tests were used for statistical analysis. It was discovered that models such as ExtraTreesClassifier and XGBClassifier presented more minor differences in accuracy between genders, suggesting that they tend to classify men more accurately. However, no clear preference was evident in the models' predictions toward a specific gender after manipulating the variables related to professional denominations. The study highlights the importance of addressing systemic biases in semantic representations that can perpetuate existing prejudices.

Downloads

Download data is not yet available.

References

N. Bantilan, «Themis-ml: A Fairness-Aware Machine Learning Interface for End-To-End Discrimination Discovery and Mitigation,» Journal of Technology in Human Services, pp. 15-30, 2018. DOI: https://doi.org/10.1080/15228835.2017.1416512

J. Borana, «Applications of Artificial Intelligence & Associated Technologies,» de International Conference on Emerging Technologies in Engineering, Biomedical, Management and Science, Jodhpur, 2016.

R. Burke, «Multisided Fairness for Recommendation,» de Fairness, Accountability, and Transparency in Machine Learning, Halifax, 2017.

N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman y A. Galstyan, «A Survey on Bias and Fairness in Machine Learning,» arXiv, pp. 1-31, 2019. DOI: https://doi.org/10.1145/3457607

S. Chowdhury y A. Nath, «Trends In Natural Language Processing : Scope And Challenges,» International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2021.

B. Dev, A. Singh, N. Uppal, A. Rizwan, V. Sri y S. Suman, «Survey Paper: Study of Natural Language Processing and its Recent Applications,» International Conference on Innovative Sustainable Computational Technologies (CISCT), pp. 1-5, 2022.

A. Nohria y H. Kaur, «Evaluation of Parsing Techniques in Natural Language Processing,» International Journal of Computer Trends and Technology, 2018. DOI: https://doi.org/10.14445/22312803/IJCTT-V60P104

A. Gerek, M. C. Yüney, E. Erkaya y M. C. Ganiz, «Effects of Positivization on the Paragraph Vector Model,» IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1-5, 2019. DOI: https://doi.org/10.1109/INISTA.2019.8778304

N. Swinger, M. De-Arteaga, N. T. Heffernan IV, M. Leiserson y A. T. Kalai, «What Are the Biases in My Word Embedding?,» de Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, Honolulu, 2019. DOI: https://doi.org/10.1145/3306618.3314270

T. Mikolov, I. Sutskever, K. Chen, G. Corrado y J. Dean, «Distributed Representations of Words and Phrases and their Compositionality,» arXiv, pp. 1-9, 2013.

P. Bojanowski, E. Grave, A. Joulin y T. Mikolov, «Enriching Word Vectors with Subword Information,» arXiv, 2016. DOI: https://doi.org/10.1162/tacl_a_00051

T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama y A. Kalai, «Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings,» arXiv, 2016.

A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, Sebastopol: O’Reilly, 2019.

C. Lopez, A. Gazgalis, V. Boddapati, R. Shah, J. Cooper y J. Geller, «Artificial Learning and Machine Learning Decision Guidance Applications in Total Hip and Knee Arthroplasty: A Systematic Review,» Arthroplasty Today, pp. 103-112, 2021. DOI: https://doi.org/10.1016/j.artd.2021.07.012

A. Caliskan, P. P. Ajay, T. Charlesworth, R. Wolfe y M. Banaji, «Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics,» arXiv, pp. 1-15, 2022. DOI: https://doi.org/10.1145/3514094.3534162

Y. Shrestha y Y. Yang, «Fairness in Algorithmic Decision-Making: Applications in Multi-Winner Voting, Machine Learning, and Recommender Systems,» Algorithms, vol. 12, pp. 1-28, 2019. DOI: https://doi.org/10.3390/a12090199

H. Chung, C. Park, W. S. Kang y J. Lee, «Gender Bias in Artificial Intelligence: Severity Prediction at an Early Stage of COVID-19,» Front Physio, 2021. DOI: https://doi.org/10.3389/fphys.2021.778720

U. Mahadeo y R. Dhanalakshmi, «Stability of feature selection algorithm: A review,» Journal of King Saud University – Computer and Information Sciences, p. 1060 –1073, 2022. DOI: https://doi.org/10.1016/j.jksuci.2019.06.012

P. S. Varsha, «How can we manage biases in artificial intelligence systems – A systematic literature review,» International Journal of Information Management Data Insights, pp. 1-9, 2023. DOI: https://doi.org/10.1016/j.jjimei.2023.100165

A. Bhattacharya, Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more, Birmingham: Packt, 2022.

Published

2024-07-19

How to Cite

Ramos Cuello, D. de J., Rosado Gomez, A. A., & Calderón Benavides, M. L. (2024). Exploring gender bias in Colombian occupation classification using machine learning. COLOMBIAN JOURNAL OF ADVANCED TECHNOLOGIES, 2(44), 83–88. https://doi.org/10.24054/rcta.v2i44.3010

Most read articles by the same author(s)