Comparación de redes Vision Transformer y convolucionales para detección de conducción segura

Robinson Jiménez Moreno; Anny Astrid Espitia Cubillos; Javier Eduardo Martínez Baquero

doi:10.24054/rcta.v1i47.3824

Comparación de redes Vision Transformer y convolucionales para detección de conducción segura

Autores/as

Robinson Jiménez Moreno Universidad Militar Nueva Granada https://orcid.org/0000-0002-4812-3734
Anny Astrid Espitia Cubillos Universidad Militar Nueva Granada https://orcid.org/0000-0002-4791-0250
Javier Eduardo Martínez Baquero Universidad de los Llanos https://orcid.org/0009-0006-5858-7184

DOI:

https://doi.org/10.24054/rcta.v1i47.3824

Palabras clave:

asistente de conducción, redes neuronales convolucionales, detección de somnolencia, clasificador Haar, conducción segura, transferencia de aprendizaje, visión por computador

Resumen

Este documento presenta los resultados de comparar el entrenamiento de arquitecturas de aprendizaje profundo aplicadas al desarrollo de sistemas de conducción segura. Se generan bases de datos con capturas de 670 imágenes de conductores en el interior del vehículo, que se dividieron en tres subconjuntos para el entrenamiento de dos arquitecturas basadas en redes neuronales convolucionales (CNN) y redes transformers para visión, el 70% de las imágenes se utilizó para el entrenamiento, el 20% se destinó a la validación y el 10% restante se reservó para las pruebas. Estas dos arquitecturas se comparan con el fin de contrastar su capacidad en el reconocimiento de patrones en la clasificación de tres estados de conducción, estado normal, estado de distracción y estado de sueño. En ambos casos se evidencia la necesidad de focalizar el aprendizaje a fin de mejorar el desempeño en el aprendizaje de las dos arquitecturas, para lo que se incluye una etapa previa de segmentación de caras mediante clasificador Haar, obteniéndose niveles de precisión del 98% para la CNN y del 87% para la red Transformers, tiempos promedio de inferencia de 0.1 y 0.52, F1-score de 98.9% y 82.2%, y recall de 98.8% y 80.6%, respectivamente, las métricas estadísticas por clase evidencian el alto grado de confianza en el reconocimiento de cada clase. La comparativa se realiza en un equipo de cómputo con procesador core i9 de 2.3GHz y 24GB de RAM, una GPU RTX 4080 de 12 GB de memoria, bajo software de programación MATLAB.

Descargas

Los datos de descarga aún no están disponibles.

Referencias

Y. Y. Wang and H. Y. Wei, “Safe Driving Capacity of Autonomous Vehicles,” in 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), 2018, pp. 1–5. doi:10.1109/VTCFall.2018.8690822.

J. W. Lee, B. J. Park, K. H. Kim, and H.K. Choi, “A testbed for development and test of the safe driving system,” in 2016 International Conference on Information and Communication Technology Convergence (ICTC), 2016, pp. 1149–1151. doi:10.1109/ICTC.2016.7763392.

G. Salzillo, C. Natale, G. B. Fioccola, and E. Landolfi, “Evaluation of Driver Drowsiness based on Real-Time Face Analysis,” in 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2020, pp. 328–335. doi:10.1109/SMC42975.2020.9283133.

E. Karakullukcu, “Leveraging convolutional neural networks for image-based classification of feature matrix data,” Expert Syst. Appl., 2025, vol. 281, p. 127625, doi:10.1016/j.eswa.2025.127625.

A. Abdullah, W. S. Wong, and D. Albashish, “EB-CNN: Ensemble of branch convolutional neural network for image classification,” Pattern Recognit. Lett., 2025, vol. 189, pp. 1–7, doi:10.1016/j.patrec.2024.12.017.

Y. L. Chen, C. L. Lin, Y. C. Lin, and T. C. Chen, “Transformer-CNN for small image object detection,” Signal Process. Image Commun., 2024, vol. 129, p. 117194, doi:10.1016/j.image.2024.117194.

Y. Y. Wang and H. Y. Wei, “Road Capacity and Throughput for Safe Driving Autonomous Vehicles,” IEEE Access, 2020, vol. 8, pp. 95779–95792, doi:10.1109/ACCESS.2020.2995312.

K. Aati, M. Houda, S. Alotaibi, A. M. Khan, N. Alselami, and O. Benjeddou, “Analysis of Road Traffic Accidents in Dense Cities: Geotech Transport and ArcGIS,” Transp. Eng., 2024, vol. 16, p. 100256, doi:10.1016/j.treng.2024.100256.

H. T. N. Le and H. Q. T. Ngo, “Application of the vision-based deep learning technique for waste classification using the robotic manipulation system,” Int. J. Cogn. Comput. Eng., 2025, vol. 6, pp. 391–400, doi:10.1016/j.ijcce.2025.02.005.

I. Shad, Z. Zhang, M. Asim, M. Al-Habib, S. A. Chelloug, and A. A. El-Latif, “Deep learning-based image processing framework for efficient surface litter detection in Computer Vision applications,” J. Radiat. Res. Appl. Sci., 2025, vol. 18, no. 2, p. 101534, doi:10.1016/j.jrras.2025.101534.

M. Ciranni, V. Murino, F. Odone, and V. P. Pastore, “Computer vision and deep learning meet plankton: Milestones and future directions,” Image Vis. Comput., 2024, vol. 143, p. 104934, doi:10.1016/j.imavis.2024.104934.

A. Khan, Z. Rauf, A. Sohail, A. R. Khan, H. Asif, A. Asif, and U. Farooq, “A survey of the vision transformers and their CNN-transformer based variants,” Artif. Intell. Rev., 2023, vol. 56, no. 3, pp. 2917–2970, doi:10.1007/s10462-023-10595-0.

X. Sun, L. Jin, H. Wang, Z. Huo, Y. He, and G Wang, “Spatial awareness enhancement based single-stage anchor-free 3D object detection for autonomous driving,” Displays, 2024, Vol. 85, p. 102821, doi:10.1016/j.displa.2024.102821.

Y. Zhou, and X. Zeng, “Towards comprehensive understanding of pedestrians for autonomous driving: Efficient multi-task-learning-based pedestrian detection, tracking and attribute recognition,” Robotics and Autonomous Systems, 2024, Vol. 171, p. 104580, doi:10.1016/j.robot.2023.104580.

C. M. Farmer, “Potential lives saved by in-vehicle alcohol detection systems”, Traffic Injury Prevention, 2021, Vol. 22, no. 1, pp. 7-12, doi:10.1080/15389588.2020.1836366.

Z. Wang, Z. Li, Z. Li, Y. Xu, F. Qi, J. Kong, “A low cost and effective multi-instance abnormal driving behavior detection system under edge computing”, Computers & Security, 2023, Vol. 132, p. 103362, doi:10.1016/j.cose.2023.103362.

Y. X. Chew, S. F. Abdul Razak, S. Yogarayan, and S. N. M. S. Ismail, “Dual-Modal Drowsiness Detection to Enhance Driver Safety,” Computers, Materials and Continua, 2024, Vol. 81, no. 3, pp. 4397-4417, doi:10.32604/cmc.2024.056367.

Y. Sun, R. Wang, H. Zhang, N. Ding, S. Ferreira, and X. Shi, “Driving fingerprinting enhances drowsy driving detection: Tailoring to individual driver characteristics,” Accident Analysis & Prevention, 2024, Vol. 208, p. 107812, doi:10.1016/j.aap.2024.107812.

K. Zhang, D. Wu, Q. Liu, F. Dong, J. Liu, L. Jiang, and Y. Yuan, “Algorithm for drowsiness detection based on hybrid brain network parameter optimization,” Biomedical Signal Processing and Control, 2024, Vol. 94, p. 106344, doi: 10.1016/j.bspc.2024.106344.

X. Lin, Z. Huang, W. Ma, and W. Tang, “EEG-based driver drowsiness detection based on simulated driving environment,” Neurocomputing, 2025, Vol. 616, p. 128961, doi:10.1016/j.neucom.2024.128961.

X. Feng, S. Dai, and Z. Guo, “Pseudo-label-assisted subdomain adaptation network with coordinate attention for EEG-based driver drowsiness detection,” Biomedical Signal Processing and Control, 2025, Vol. 101, p. 107132, doi:10.1016/j.bspc.2024.107132.

F. Wang, M. Ma, R. Fu, and X. Zhang, “EEG-based detection of driving fatigue using a novel electrode,” Sensors and Actuators A: Physical, 2024, Vol. 365, p. 114895, doi: 10.1016/j.sna.2023.114895.

F. Wang, D. Chen, and X. Zhang, “Real-time Driving Fatigue Detection of ECG Signals Acquired Based on Novel Electrodes Using Wavelet Scattering Networks”, Measurement, 2025, Vol. 243, p. 116438, doi:10.1016/j.measurement.2024.116438.

Y. Liu, Z. Xiang, Z. Yan, J. Jin, L. Shu, L. Zhang, and X. Xu, “CEEMDAN fuzzy entropy based fatigue driving detection using single-channel EEG,” Biomedical Signal Processing and Control, 2024,Vol. 95, Part A, p. 106460, doi:10.1016/j.bspc.2024.106460.

I. Latreche, S. Slatnia, O. Kazar, and S. Harous, “An optimized deep hybrid learning for multi-channel EEG-based driver drowsiness detection,” Biomedical Signal Processing and Control, 2025, Vol. 99, p. 106881, doi:10.1016/j.bspc.2024.106881.

J. Chen, Y. Cui, H. Wang, E. He, and A. Alhudhaif, “Deep learning approach for detection of unfavorable driving state based on multiple phase synchronization between multi-channel EEG signals”, Information Sciences, 2024, Vol. 658, p. 120070, doi:10.1016/j.ins.2023.120070.

W. Yu, and Q. Huang, “A deep encoder-decoder network for anomaly detection in driving trajectory behavior under spatio-temporal context,” International Journal of Applied Earth Observation and Geoinformation, 2022, Vol. 115, p. 103115, doi:10.1016/j.jag.2022.103115.

S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a convolutional neural network," in 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 2017, pp. 1-6, doi: 10.1109/ICEngTechnol.2017.8308186.

L. Lin, S. Wang, J. Yang, and F. Wei, “A multi-aware graph convolutional network for driver drowsiness detection,” Knowledge-Based Systems, 2024, Vol. 305, p. 112643, doi:10.1016/j.knosys.2024.112643.

F. Wei, J. Yang, Y. Wang, L. Lin, and H. Zhang, “Prior knowledge-guided multi-information graph convolutional network for driver drowsiness detection”, Expert Systems with Applications, 2025, Vol. 275, p. 127028, doi:10.1016/j.eswa.2025.127028.

M. Elhenawy, M. Masoud, N. Haworth, K. Young, A. Rakotonirainy, R. Grzebieta, and A. Williamson, “Detection of driver distraction in the Australian naturalistic driving study videos using pre-trained models and transfer learning”, Transportation Research Part F: Traffic Psychology and Behaviour, 2023, Vol. 97, pp. 31-43, doi:10.1016/j.trf.2023.06.016.

B. Kanigoro, and B. Asdyo, “Facial Landmark and YOLOv5 Drowsiness Detection System,” Procedia Computer Science, 2024, Vol. 245, pp. 548-554, doi:10.1016/j.procs.2024.10.281.

Y. Ma, Z. Xie, S. Chen, F. Qiao, and Z. Li, “Real-time detection of abnormal driving behavior based on long short-term memory network and regression residuals”, Transportation Research Part C: Emerging Technologies, 2023, Vol. 146, p. 103983, doi:10.1016/j.trc.2022.103983.

N. Wang, T. Pu, Y. Zhang, Y. Liu, and Z. Zhang, “More appropriate DenseNetBL classifier for small sample tree species classification using UAV-based RGB imagery,” Heliyon, 2023, Vol. 9, no. 10, p. e20467, doi:10.1016/j.heliyon.2023.e20467.

L. Zhang, K. Yang, Y. Han, J. Li, W. Wei, H. Tan, P. Yu, K. Zhang, and X. Yang, “TSD-DETR: A lightweight real-time detection transformer of traffic sign detection for long-range perception of autonomous driving,” Engineering Applications of Artificial Intelligence, 2025, Vol. 139, Part A, p. 109536, doi:10.1016/j.engappai.2024.109536.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." Preprint, submitted June 3, 2021. Doi:10.48550/arXiv.2010.11929.

T. Hugo, M. Cord, A. El-Nouby, J.Verbeek, and H. Jégou, "Three things everyone should know about vision transformers." In Computer Vision–ECCV 2022, edited by S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, vol. 13684, pp. 497-515. Cham: Springer Nature Switzerland, 2022, doi:10.1007/978-3-031-20053-3_29.

P. Viola, and M. J. Jones, “Robust Real-Time Face Detection”, International Journal of Computer Vision, 2004, vol. 57, pp. 137–154, doi:10.1023/B:VISI.0000013087.49260.fb.