Exploring gender bias in Colombian occupation classification using machine learning





machine learning, supervised learning, equity in artificial intelligence, word embeddings, natural language processing


The paper explores using Word2Vec and FastText to convert occupational names into vector representations and analyze their gender polarity. Two Colombian databases were used to prepare and clean the data. Using classifiers, we evaluated how gender polarity affects the classification of occupations and salaries. ANOVA and Tukey tests were used for statistical analysis. It was discovered that models such as ExtraTreesClassifier and XGBClassifier presented more minor differences in accuracy between genders, suggesting that they tend to classify men more accurately. However, no clear preference was evident in the models' predictions toward a specific gender after manipulating the variables related to professional denominations. The study highlights the importance of addressing systemic biases in semantic representations that can perpetuate existing prejudices.


Download data is not yet available.


Ramos Cuello, D. de J., Rosado Gomez, A. A., & Calderón Benavides, M. L. (2024). Exploring gender bias in Colombian occupation classification using machine learning. COLOMBIAN JOURNAL OF ADVANCED TECHNOLOGIES, 2(44), 83–88.

