Comparación de redes Vision Transformer y convolucionales para detección de conducción segura

Robinson Jiménez Moreno; Anny Astrid Espitia Cubillos; Javier Eduardo Martínez Baquero

doi:10.24054/rcta.v1i47.3824

Safe driving detection by Vision Transformer and convolutional networks comparison

Authors

Robinson Jiménez Moreno Universidad Militar Nueva Granada https://orcid.org/0000-0002-4812-3734
Anny Astrid Espitia Cubillos Universidad Militar Nueva Granada https://orcid.org/0000-0002-4791-0250
Javier Eduardo Martínez Baquero Universidad de los Llanos https://orcid.org/0009-0006-5858-7184

DOI:

https://doi.org/10.24054/rcta.v1i47.3824

Keywords:

driving assistant, convolutional neural networks, drowsiness detection, haar classifier, safe driving, transfer learning, computer vision

Abstract

This paper presents the results of comparing the training of deep learning architectures applied to the development of safe driving systems. Databases were generated with 670 images of drivers inside vehicles, which were divided into three subsets for training two architectures based on convolutional neural networks (CNNs) and transformer networks for vision. 70% of the images were used for training, 20% for validation, and the remaining 10% for testing. These two architectures were compared to assess their pattern recognition capabilities in classifying three driving states, normal state, distracted state and sleep state. In both cases, the need to focus the learning to improve the learning performance of the two architectures is evident, for which a previous stage of face segmentation by means of Haar classifier is included, obtaining accuracy levels of 98% for the CNN and 87% for the Transformers network with average inference times of 0.1 and 0.52 seconds, F1 scores of 98.9% and 82.2%, and recall rates of 98.8% and 80.6%, respectively, the statistical metrics for each class demonstrate a high degree of confidence in the recognition of each class. The comparison was performed on a computer with a 2.3GHz Core i9 processor, 24GB of RAM, and an RTX 4080 GPU with 12GB of memory, using MATLAB programming software.

Downloads

Download data is not yet available.

References

Y. Y. Wang and H. Y. Wei, “Safe Driving Capacity of Autonomous Vehicles,” in 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), 2018, pp. 1–5. doi:10.1109/VTCFall.2018.8690822.

J. W. Lee, B. J. Park, K. H. Kim, and H.K. Choi, “A testbed for development and test of the safe driving system,” in 2016 International Conference on Information and Communication Technology Convergence (ICTC), 2016, pp. 1149–1151. doi:10.1109/ICTC.2016.7763392.

G. Salzillo, C. Natale, G. B. Fioccola, and E. Landolfi, “Evaluation of Driver Drowsiness based on Real-Time Face Analysis,” in 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2020, pp. 328–335. doi:10.1109/SMC42975.2020.9283133.

E. Karakullukcu, “Leveraging convolutional neural networks for image-based classification of feature matrix data,” Expert Syst. Appl., 2025, vol. 281, p. 127625, doi:10.1016/j.eswa.2025.127625.

A. Abdullah, W. S. Wong, and D. Albashish, “EB-CNN: Ensemble of branch convolutional neural network for image classification,” Pattern Recognit. Lett., 2025, vol. 189, pp. 1–7, doi:10.1016/j.patrec.2024.12.017.

Y. L. Chen, C. L. Lin, Y. C. Lin, and T. C. Chen, “Transformer-CNN for small image object detection,” Signal Process. Image Commun., 2024, vol. 129, p. 117194, doi:10.1016/j.image.2024.117194.

Y. Y. Wang and H. Y. Wei, “Road Capacity and Throughput for Safe Driving Autonomous Vehicles,” IEEE Access, 2020, vol. 8, pp. 95779–95792, doi:10.1109/ACCESS.2020.2995312.

K. Aati, M. Houda, S. Alotaibi, A. M. Khan, N. Alselami, and O. Benjeddou, “Analysis of Road Traffic Accidents in Dense Cities: Geotech Transport and ArcGIS,” Transp. Eng., 2024, vol. 16, p. 100256, doi:10.1016/j.treng.2024.100256.

H. T. N. Le and H. Q. T. Ngo, “Application of the vision-based deep learning technique for waste classification using the robotic manipulation system,” Int. J. Cogn. Comput. Eng., 2025, vol. 6, pp. 391–400, doi:10.1016/j.ijcce.2025.02.005.

I. Shad, Z. Zhang, M. Asim, M. Al-Habib, S. A. Chelloug, and A. A. El-Latif, “Deep learning-based image processing framework for efficient surface litter detection in Computer Vision applications,” J. Radiat. Res. Appl. Sci., 2025, vol. 18, no. 2, p. 101534, doi:10.1016/j.jrras.2025.101534.

M. Ciranni, V. Murino, F. Odone, and V. P. Pastore, “Computer vision and deep learning meet plankton: Milestones and future directions,” Image Vis. Comput., 2024, vol. 143, p. 104934, doi:10.1016/j.imavis.2024.104934.

A. Khan, Z. Rauf, A. Sohail, A. R. Khan, H. Asif, A. Asif, and U. Farooq, “A survey of the vision transformers and their CNN-transformer based variants,” Artif. Intell. Rev., 2023, vol. 56, no. 3, pp. 2917–2970, doi:10.1007/s10462-023-10595-0.

X. Sun, L. Jin, H. Wang, Z. Huo, Y. He, and G Wang, “Spatial awareness enhancement based single-stage anchor-free 3D object detection for autonomous driving,” Displays, 2024, Vol. 85, p. 102821, doi:10.1016/j.displa.2024.102821.

Y. Zhou, and X. Zeng, “Towards comprehensive understanding of pedestrians for autonomous driving: Efficient multi-task-learning-based pedestrian detection, tracking and attribute recognition,” Robotics and Autonomous Systems, 2024, Vol. 171, p. 104580, doi:10.1016/j.robot.2023.104580.

C. M. Farmer, “Potential lives saved by in-vehicle alcohol detection systems”, Traffic Injury Prevention, 2021, Vol. 22, no. 1, pp. 7-12, doi:10.1080/15389588.2020.1836366.

Z. Wang, Z. Li, Z. Li, Y. Xu, F. Qi, J. Kong, “A low cost and effective multi-instance abnormal driving behavior detection system under edge computing”, Computers & Security, 2023, Vol. 132, p. 103362, doi:10.1016/j.cose.2023.103362.

Y. X. Chew, S. F. Abdul Razak, S. Yogarayan, and S. N. M. S. Ismail, “Dual-Modal Drowsiness Detection to Enhance Driver Safety,” Computers, Materials and Continua, 2024, Vol. 81, no. 3, pp. 4397-4417, doi:10.32604/cmc.2024.056367.

Y. Sun, R. Wang, H. Zhang, N. Ding, S. Ferreira, and X. Shi, “Driving fingerprinting enhances drowsy driving detection: Tailoring to individual driver characteristics,” Accident Analysis & Prevention, 2024, Vol. 208, p. 107812, doi:10.1016/j.aap.2024.107812.

K. Zhang, D. Wu, Q. Liu, F. Dong, J. Liu, L. Jiang, and Y. Yuan, “Algorithm for drowsiness detection based on hybrid brain network parameter optimization,” Biomedical Signal Processing and Control, 2024, Vol. 94, p. 106344, doi: 10.1016/j.bspc.2024.106344.

X. Lin, Z. Huang, W. Ma, and W. Tang, “EEG-based driver drowsiness detection based on simulated driving environment,” Neurocomputing, 2025, Vol. 616, p. 128961, doi:10.1016/j.neucom.2024.128961.

X. Feng, S. Dai, and Z. Guo, “Pseudo-label-assisted subdomain adaptation network with coordinate attention for EEG-based driver drowsiness detection,” Biomedical Signal Processing and Control, 2025, Vol. 101, p. 107132, doi:10.1016/j.bspc.2024.107132.

F. Wang, M. Ma, R. Fu, and X. Zhang, “EEG-based detection of driving fatigue using a novel electrode,” Sensors and Actuators A: Physical, 2024, Vol. 365, p. 114895, doi: 10.1016/j.sna.2023.114895.

F. Wang, D. Chen, and X. Zhang, “Real-time Driving Fatigue Detection of ECG Signals Acquired Based on Novel Electrodes Using Wavelet Scattering Networks”, Measurement, 2025, Vol. 243, p. 116438, doi:10.1016/j.measurement.2024.116438.

Y. Liu, Z. Xiang, Z. Yan, J. Jin, L. Shu, L. Zhang, and X. Xu, “CEEMDAN fuzzy entropy based fatigue driving detection using single-channel EEG,” Biomedical Signal Processing and Control, 2024,Vol. 95, Part A, p. 106460, doi:10.1016/j.bspc.2024.106460.

I. Latreche, S. Slatnia, O. Kazar, and S. Harous, “An optimized deep hybrid learning for multi-channel EEG-based driver drowsiness detection,” Biomedical Signal Processing and Control, 2025, Vol. 99, p. 106881, doi:10.1016/j.bspc.2024.106881.

J. Chen, Y. Cui, H. Wang, E. He, and A. Alhudhaif, “Deep learning approach for detection of unfavorable driving state based on multiple phase synchronization between multi-channel EEG signals”, Information Sciences, 2024, Vol. 658, p. 120070, doi:10.1016/j.ins.2023.120070.

W. Yu, and Q. Huang, “A deep encoder-decoder network for anomaly detection in driving trajectory behavior under spatio-temporal context,” International Journal of Applied Earth Observation and Geoinformation, 2022, Vol. 115, p. 103115, doi:10.1016/j.jag.2022.103115.

S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a convolutional neural network," in 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 2017, pp. 1-6, doi: 10.1109/ICEngTechnol.2017.8308186.

L. Lin, S. Wang, J. Yang, and F. Wei, “A multi-aware graph convolutional network for driver drowsiness detection,” Knowledge-Based Systems, 2024, Vol. 305, p. 112643, doi:10.1016/j.knosys.2024.112643.

F. Wei, J. Yang, Y. Wang, L. Lin, and H. Zhang, “Prior knowledge-guided multi-information graph convolutional network for driver drowsiness detection”, Expert Systems with Applications, 2025, Vol. 275, p. 127028, doi:10.1016/j.eswa.2025.127028.

M. Elhenawy, M. Masoud, N. Haworth, K. Young, A. Rakotonirainy, R. Grzebieta, and A. Williamson, “Detection of driver distraction in the Australian naturalistic driving study videos using pre-trained models and transfer learning”, Transportation Research Part F: Traffic Psychology and Behaviour, 2023, Vol. 97, pp. 31-43, doi:10.1016/j.trf.2023.06.016.

B. Kanigoro, and B. Asdyo, “Facial Landmark and YOLOv5 Drowsiness Detection System,” Procedia Computer Science, 2024, Vol. 245, pp. 548-554, doi:10.1016/j.procs.2024.10.281.

Y. Ma, Z. Xie, S. Chen, F. Qiao, and Z. Li, “Real-time detection of abnormal driving behavior based on long short-term memory network and regression residuals”, Transportation Research Part C: Emerging Technologies, 2023, Vol. 146, p. 103983, doi:10.1016/j.trc.2022.103983.

N. Wang, T. Pu, Y. Zhang, Y. Liu, and Z. Zhang, “More appropriate DenseNetBL classifier for small sample tree species classification using UAV-based RGB imagery,” Heliyon, 2023, Vol. 9, no. 10, p. e20467, doi:10.1016/j.heliyon.2023.e20467.

L. Zhang, K. Yang, Y. Han, J. Li, W. Wei, H. Tan, P. Yu, K. Zhang, and X. Yang, “TSD-DETR: A lightweight real-time detection transformer of traffic sign detection for long-range perception of autonomous driving,” Engineering Applications of Artificial Intelligence, 2025, Vol. 139, Part A, p. 109536, doi:10.1016/j.engappai.2024.109536.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." Preprint, submitted June 3, 2021. Doi:10.48550/arXiv.2010.11929.

T. Hugo, M. Cord, A. El-Nouby, J.Verbeek, and H. Jégou, "Three things everyone should know about vision transformers." In Computer Vision–ECCV 2022, edited by S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, vol. 13684, pp. 497-515. Cham: Springer Nature Switzerland, 2022, doi:10.1007/978-3-031-20053-3_29.

P. Viola, and M. J. Jones, “Robust Real-Time Face Detection”, International Journal of Computer Vision, 2004, vol. 57, pp. 137–154, doi:10.1023/B:VISI.0000013087.49260.fb.

Downloads

Published

2026-01-01

Issue

Vol. 1 No. 47 (2026): January - June

Section

Artículos

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Most read articles by the same author(s)

Robinson Jiménez Moreno, Javier Eduardo Martínez Baquero, Oscar Javier Agudelo Varela, Fuzzy control for soft robotic gripper oriented to no rigid and thing objects , Revista Colombiana de Tecnologías de Avanzada (RCTA): Vol. 2 No. 42 (2023): July - December
Carlos Arturo Ariza Ariza, Javier Eduardo Martínez Baquero, Oscar Manuel Agudelo Varela, Luis Alfredo Rodríguez Umaña, Omar Yesid Beltrán Gutierrez, Control and supervision system of temperature and relative humidity in a fermentation chamber , Revista Colombiana de Tecnologías de Avanzada (RCTA): Vol. 2 No. 40 (2022): July - December
Anny Astrid Espitia Cubillos, Robinson Jiménez Moreno, Systematic review of robotics applications for victim care , Revista Colombiana de Tecnologías de Avanzada (RCTA): Vol. 2 No. 44 (2024): July - December
Ronald Stivel Melo Cárdenas, Luis Alfredo Rodríguez Umaña, Javier Eduardo Martinez Baquero, Nelson Baquero Álvarez, Prototype of an automated system for reading and control of smart meters , Revista Colombiana de Tecnologías de Avanzada (RCTA): Vol. 1 No. 45 (2025): January - June
Robinson Jiménez Moreno, Andrés Mauricio Castro Pescador, Anny Astrid Espitia Cubillos, Deep learning for selection of numerical options by voice as tools for chatbot , Revista Colombiana de Tecnologías de Avanzada (RCTA): Vol. 1 No. 45 (2025): January - June
Anny Astrid Espitia Cubillos, Robinson Jiménez Moreno, Mateo Andrés Pulido Aponte , Multi-agent system focused on distributed artificial intelligence processes for order capture and routing , Revista Colombiana de Tecnologías de Avanzada (RCTA): Vol. 2 No. 46 (2025): July - December
Walter Naranjo Lourido, Danna Alejandra Ávila Cohecha, Yand David Quevedo Turriago, Javier Eduardo Martinez Baquero, Design of an electric propulsion system for a subway-type means of transport , Revista Colombiana de Tecnologías de Avanzada (RCTA): Vol. 2 No. 46 (2025): July - December

Safe driving detection by Vision Transformer and convolutional networks comparison

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

Similar Articles

Most read articles by the same author(s)

Information

Language

Latest publications

Keywords

indexing

Estadisticas

Analytics

Social Media