•  
  •  
 

Keywords

Autonomous vehicles, Vision Transformers, end-to-end learning, behavior cloning, Steering Angle and Speed Prediction, imitation learning

Document Type

Article

Abstract

End-to-end autonomous driving systems have recently gained significant interest due to the rapid advancement of deep learning algorithms. This progress has enabled the development of intelligent agents that can drive vehicles like or potentially better than humans in some scenarios. This study investigates using Vision Transformers (ViTs) in end-to-end autonomous driving systems to predict vehicle steering angle and speed through behavior cloning. The paper suggests two models: a single-head ViT-MLP architecture that jointly predicts steering angle and speed and a multi-head model designed with two separate heads: one for steering angle prediction and the other for speed prediction. The models were trained on the Comma 2K19 dataset, which includes 33 hours of highway driving. The results indicated that the single-head model outperformed the multi-head architecture and previous methods, achieving an average MAE of 0.198. Although the model demonstrates strong performance in stable conditions, such as lane-keeping, its efficacy decreases during sudden maneuvers, such as overtaking, and in challenging lighting and weather conditions. These findings emphasize the potential of ViTs for creating cost-effective highway autonomous systems but also point out the necessity for improved robustness through sensor fusion or additional training features.

Share

COinS