Safe driving detection by Vision Transformer and convolutional networks comparison

Authors

DOI:

https://doi.org/10.24054/rcta.v1i47.3824

Keywords:

driving assistant, convolutional neural networks, drowsiness detection, haar classifier, safe driving, transfer learning, computer vision

Abstract

This paper presents the results of comparing the training of deep learning architectures applied to the development of safe driving systems. Databases were generated with 670 images of drivers inside vehicles, which were divided into three subsets for training two architectures based on convolutional neural networks (CNNs) and transformer networks for vision. 70% of the images were used for training, 20% for validation, and the remaining 10% for testing. These two architectures were compared to assess their pattern recognition capabilities in classifying three driving states, normal state, distracted state and sleep state. In both cases, the need to focus the learning to improve the learning performance of the two architectures is evident, for which a previous stage of face segmentation by means of Haar classifier is included, obtaining accuracy levels of 98% for the CNN and 87% for the Transformers network with average inference times of 0.1 and 0.52 seconds, F1 scores of 98.9% and 82.2%, and recall rates of 98.8% and 80.6%, respectively, the statistical metrics for each class demonstrate a high degree of confidence in the recognition of each class. The comparison was performed on a computer with a 2.3GHz Core i9 processor, 24GB of RAM, and an RTX 4080 GPU with 12GB of memory, using MATLAB programming software.

Downloads

Download data is not yet available.

References

Y. Y. Wang and H. Y. Wei, “Safe Driving Capacity of Autonomous Vehicles,” in 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), 2018, pp. 1–5. doi:10.1109/VTCFall.2018.8690822.

J. W. Lee, B. J. Park, K. H. Kim, and H.K. Choi, “A testbed for development and test of the safe driving system,” in 2016 International Conference on Information and Communication Technology Convergence (ICTC), 2016, pp. 1149–1151. doi:10.1109/ICTC.2016.7763392.

G. Salzillo, C. Natale, G. B. Fioccola, and E. Landolfi, “Evaluation of Driver Drowsiness based on Real-Time Face Analysis,” in 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2020, pp. 328–335. doi:10.1109/SMC42975.2020.9283133.

E. Karakullukcu, “Leveraging convolutional neural networks for image-based classification of feature matrix data,” Expert Syst. Appl., 2025, vol. 281, p. 127625, doi:10.1016/j.eswa.2025.127625.

A. Abdullah, W. S. Wong, and D. Albashish, “EB-CNN: Ensemble of branch convolutional neural network for image classification,” Pattern Recognit. Lett., 2025, vol. 189, pp. 1–7, doi:10.1016/j.patrec.2024.12.017.

Y. L. Chen, C. L. Lin, Y. C. Lin, and T. C. Chen, “Transformer-CNN for small image object detection,” Signal Process. Image Commun., 2024, vol. 129, p. 117194, doi:10.1016/j.image.2024.117194.

Y. Y. Wang and H. Y. Wei, “Road Capacity and Throughput for Safe Driving Autonomous Vehicles,” IEEE Access, 2020, vol. 8, pp. 95779–95792, doi:10.1109/ACCESS.2020.2995312.

K. Aati, M. Houda, S. Alotaibi, A. M. Khan, N. Alselami, and O. Benjeddou, “Analysis of Road Traffic Accidents in Dense Cities: Geotech Transport and ArcGIS,” Transp. Eng., 2024, vol. 16, p. 100256, doi:10.1016/j.treng.2024.100256.

H. T. N. Le and H. Q. T. Ngo, “Application of the vision-based deep learning technique for waste classification using the robotic manipulation system,” Int. J. Cogn. Comput. Eng., 2025, vol. 6, pp. 391–400, doi:10.1016/j.ijcce.2025.02.005.

I. Shad, Z. Zhang, M. Asim, M. Al-Habib, S. A. Chelloug, and A. A. El-Latif, “Deep learning-based image processing framework for efficient surface litter detection in Computer Vision applications,” J. Radiat. Res. Appl. Sci., 2025, vol. 18, no. 2, p. 101534, doi:10.1016/j.jrras.2025.101534.

M. Ciranni, V. Murino, F. Odone, and V. P. Pastore, “Computer vision and deep learning meet plankton: Milestones and future directions,” Image Vis. Comput., 2024, vol. 143, p. 104934, doi:10.1016/j.imavis.2024.104934.

A. Khan, Z. Rauf, A. Sohail, A. R. Khan, H. Asif, A. Asif, and U. Farooq, “A survey of the vision transformers and their CNN-transformer based variants,” Artif. Intell. Rev., 2023, vol. 56, no. 3, pp. 2917–2970, doi:10.1007/s10462-023-10595-0.

X. Sun, L. Jin, H. Wang, Z. Huo, Y. He, and G Wang, “Spatial awareness enhancement based single-stage anchor-free 3D object detection for autonomous driving,” Displays, 2024, Vol. 85, p. 102821, doi:10.1016/j.displa.2024.102821.

Y. Zhou, and X. Zeng, “Towards comprehensive understanding of pedestrians for autonomous driving: Efficient multi-task-learning-based pedestrian detection, tracking and attribute recognition,” Robotics and Autonomous Systems, 2024, Vol. 171, p. 104580, doi:10.1016/j.robot.2023.104580.

C. M. Farmer, “Potential lives saved by in-vehicle alcohol detection systems”, Traffic Injury Prevention, 2021, Vol. 22, no. 1, pp. 7-12, doi:10.1080/15389588.2020.1836366.

Z. Wang, Z. Li, Z. Li, Y. Xu, F. Qi, J. Kong, “A low cost and effective multi-instance abnormal driving behavior detection system under edge computing”, Computers & Security, 2023, Vol. 132, p. 103362, doi:10.1016/j.cose.2023.103362.

Y. X. Chew, S. F. Abdul Razak, S. Yogarayan, and S. N. M. S. Ismail, “Dual-Modal Drowsiness Detection to Enhance Driver Safety,” Computers, Materials and Continua, 2024, Vol. 81, no. 3, pp. 4397-4417, doi:10.32604/cmc.2024.056367.

Y. Sun, R. Wang, H. Zhang, N. Ding, S. Ferreira, and X. Shi, “Driving fingerprinting enhances drowsy driving detection: Tailoring to individual driver characteristics,” Accident Analysis & Prevention, 2024, Vol. 208, p. 107812, doi:10.1016/j.aap.2024.107812.

K. Zhang, D. Wu, Q. Liu, F. Dong, J. Liu, L. Jiang, and Y. Yuan, “Algorithm for drowsiness detection based on hybrid brain network parameter optimization,” Biomedical Signal Processing and Control, 2024, Vol. 94, p. 106344, doi: 10.1016/j.bspc.2024.106344.

X. Lin, Z. Huang, W. Ma, and W. Tang, “EEG-based driver drowsiness detection based on simulated driving environment,” Neurocomputing, 2025, Vol. 616, p. 128961, doi:10.1016/j.neucom.2024.128961.

X. Feng, S. Dai, and Z. Guo, “Pseudo-label-assisted subdomain adaptation network with coordinate attention for EEG-based driver drowsiness detection,” Biomedical Signal Processing and Control, 2025, Vol. 101, p. 107132, doi:10.1016/j.bspc.2024.107132.

F. Wang, M. Ma, R. Fu, and X. Zhang, “EEG-based detection of driving fatigue using a novel electrode,” Sensors and Actuators A: Physical, 2024, Vol. 365, p. 114895, doi: 10.1016/j.sna.2023.114895.

F. Wang, D. Chen, and X. Zhang, “Real-time Driving Fatigue Detection of ECG Signals Acquired Based on Novel Electrodes Using Wavelet Scattering Networks”, Measurement, 2025, Vol. 243, p. 116438, doi:10.1016/j.measurement.2024.116438.

Y. Liu, Z. Xiang, Z. Yan, J. Jin, L. Shu, L. Zhang, and X. Xu, “CEEMDAN fuzzy entropy based fatigue driving detection using single-channel EEG,” Biomedical Signal Processing and Control, 2024,Vol. 95, Part A, p. 106460, doi:10.1016/j.bspc.2024.106460.

I. Latreche, S. Slatnia, O. Kazar, and S. Harous, “An optimized deep hybrid learning for multi-channel EEG-based driver drowsiness detection,” Biomedical Signal Processing and Control, 2025, Vol. 99, p. 106881, doi:10.1016/j.bspc.2024.106881.

J. Chen, Y. Cui, H. Wang, E. He, and A. Alhudhaif, “Deep learning approach for detection of unfavorable driving state based on multiple phase synchronization between multi-channel EEG signals”, Information Sciences, 2024, Vol. 658, p. 120070, doi:10.1016/j.ins.2023.120070.

W. Yu, and Q. Huang, “A deep encoder-decoder network for anomaly detection in driving trajectory behavior under spatio-temporal context,” International Journal of Applied Earth Observation and Geoinformation, 2022, Vol. 115, p. 103115, doi:10.1016/j.jag.2022.103115.

S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a convolutional neural network," in 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 2017, pp. 1-6, doi: 10.1109/ICEngTechnol.2017.8308186.

L. Lin, S. Wang, J. Yang, and F. Wei, “A multi-aware graph convolutional network for driver drowsiness detection,” Knowledge-Based Systems, 2024, Vol. 305, p. 112643, doi:10.1016/j.knosys.2024.112643.

F. Wei, J. Yang, Y. Wang, L. Lin, and H. Zhang, “Prior knowledge-guided multi-information graph convolutional network for driver drowsiness detection”, Expert Systems with Applications, 2025, Vol. 275, p. 127028, doi:10.1016/j.eswa.2025.127028.

M. Elhenawy, M. Masoud, N. Haworth, K. Young, A. Rakotonirainy, R. Grzebieta, and A. Williamson, “Detection of driver distraction in the Australian naturalistic driving study videos using pre-trained models and transfer learning”, Transportation Research Part F: Traffic Psychology and Behaviour, 2023, Vol. 97, pp. 31-43, doi:10.1016/j.trf.2023.06.016.

B. Kanigoro, and B. Asdyo, “Facial Landmark and YOLOv5 Drowsiness Detection System,” Procedia Computer Science, 2024, Vol. 245, pp. 548-554, doi:10.1016/j.procs.2024.10.281.

Y. Ma, Z. Xie, S. Chen, F. Qiao, and Z. Li, “Real-time detection of abnormal driving behavior based on long short-term memory network and regression residuals”, Transportation Research Part C: Emerging Technologies, 2023, Vol. 146, p. 103983, doi:10.1016/j.trc.2022.103983.

N. Wang, T. Pu, Y. Zhang, Y. Liu, and Z. Zhang, “More appropriate DenseNetBL classifier for small sample tree species classification using UAV-based RGB imagery,” Heliyon, 2023, Vol. 9, no. 10, p. e20467, doi:10.1016/j.heliyon.2023.e20467.

L. Zhang, K. Yang, Y. Han, J. Li, W. Wei, H. Tan, P. Yu, K. Zhang, and X. Yang, “TSD-DETR: A lightweight real-time detection transformer of traffic sign detection for long-range perception of autonomous driving,” Engineering Applications of Artificial Intelligence, 2025, Vol. 139, Part A, p. 109536, doi:10.1016/j.engappai.2024.109536.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." Preprint, submitted June 3, 2021. Doi:10.48550/arXiv.2010.11929.

T. Hugo, M. Cord, A. El-Nouby, J.Verbeek, and H. Jégou, "Three things everyone should know about vision transformers." In Computer Vision–ECCV 2022, edited by S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, vol. 13684, pp. 497-515. Cham: Springer Nature Switzerland, 2022, doi:10.1007/978-3-031-20053-3_29.

P. Viola, and M. J. Jones, “Robust Real-Time Face Detection”, International Journal of Computer Vision, 2004, vol. 57, pp. 137–154, doi:10.1023/B:VISI.0000013087.49260.fb.

Published

2026-01-01

Similar Articles

1-10 of 612

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)