This project was created as an assessment for the Self-Driving Car Nanodegree Program by Udacity. The goal is to drive a car autonomously in a simulator using a deep neuronal network (DNN) trained on human driving behavior. For that Udacity provided the simulator and a basic python script to connect a DNN with it. The simulator has two modes. In the "training mode" the car can be controlled through a keyboard or a game pad to generated data. More information about the data and it's structure can be found in the corresponding section. In the "autonomous mode" however the car receives it's input commands by the python script.
The following animations shows the final model controlling the car on two different tracks.
Track 1 | Track 2 |
---|---|
This project requires Python 3.5 and the following Python libraries installed:
Only needed for driving in the simulator:
The drive script needs the path to the model definition as argument. The definition has to be a json file generated by Keras. In addition the model weights have to be located at the same path like the model definition and has to have the same filename (except file type of course). So if all the necessary files (drive.py, model.json, model.h5) are in the same directory, the script can be executed with the following command:
python drive.py model.json
The script will automatically connect to the simulator and send commands as soon as it's entering the autonomous mode.
To retrain the model it's enough to execute the model.py script without any arguments. Some parameters are set as constants at the beginning of the script and can easily be modified for example to set the path to the training data. An overview of the constants is shown below with the default values.
IMG_SIZE = [160, 320]
CROPPING = (54, 0, 0, 0)
SHIFT_OFFSET = 0.2
SHIFT_RANGE = 0.2
BATCH_SIZE = 128
#Patience for early stopping
PATIENCE = 3
# Maximal number of epochs. Might stop earlyer.
NB_EPOCH = 50
TRAINING_DATA_PATHS = ['data/track1_central/driving_log.csv',
'data/track1_recovery/driving_log.csv',
'data/track1_reverse/driving_log.csv',
'data/track1_recovery_reverse/driving_log.csv',
'data/track2_central/driving_log.csv']
VALIDATION_DATA_PATHS = ['data/track1_test/driving_log.csv',
'data/track2_test/driving_log.csv']
During "training mode" the simulator records three images with a frequency of 10hz. Next to a camera centered at the car there are also two additional cameras recording with an offset to the left and right respectively. This allows to apply an approach described in a paper by Nvidia. A sample of the recorded images is shown in the following table:
Left | Center | Right |
---|---|---|
Beside the images the the simulator also creates a log file while recording containing information like the current steering angle, speed and the corresponding image paths. In the displayed image an extract of the log file can be seen, containing all the features.
The data used for training the model can be downloaded here. It contains the following folders:
Used for training
Name | Number Images | Description |
---|---|---|
track1_central | 8.978 | driving centered on the road |
track1_recovery | 2.369 | driving from the side of the road back to the center |
track1_reverse | 9.254 | driving as centered on the road as possible in opposite direction |
track1_recovery_reverse | 2.396 | driving from the side of the road back to the center in opposite direction |
track2_central | 19.274 | driving centered on the road on the second track in both directions |
total | 42.271 |
Used for validation
Name | Number Images | Description |
---|---|---|
track1_test | 2.882 | driving centered on the road for one round on track 1 |
track2_test | 2.924 | driving centered on the road for one round on track 2 |
total | 5.806 |
To validate the model one round on each track was seperately recored and used as validation set. Instead of using a test set, the models were finally evaluated on the simulator since this is the only reliable way to determine the performance.
The pretrained model can be obtained through the following links:
For this project a technique called transfer learning was used to reuse a pretrained model for a different task. In this case the model used is the VGG16 from the Visual Geometry Group trained on the imagenet dataset. Since the initial problem the model was trained on is quite different from the problem at hand the last block was removed, rebuild with slightly different parameters and retrained. Instead of three convolution layers with no followed by a Max pooling layer, three convolution layer with sub sampling and no pooling layer are used. The top layer was build from scratch to be able to predict contentious values instead of classes.
The complete architecture can be seen in the following image or in the console ouput.
Besides the transfer learning approach also models build from scratch were evaluated. Mainly architectures based on nvidias paper and also one proposed by comma.ai. They all worked and were able to controll the car in the simulator. In the end the transfer learning model seemed to be able to controll the car best on both tracks with short training time.
During training a image generator provides data to the model. Since keras vanilla ImageDataGenerator is mainly suited for classification problems I extended the implementation to better work with continous labels. The two main differences being that flow_from_directory takes the labels as parameter instead of inferring them from folder names and ability to add transform function for the labels to the varies random image transformations. The latter allows to generate randomly transform images with modified expected values. One particular use case is to randomly flip road images. If a image gets flipped you also need to change sign of the steering angle. Other changes include the option to pass a function as rescale parameter and the option to crop images.
The following code snipped shows an example usage of the modified ImageDataGenerator. Images will be normalized to a range from -1 to 1, randomly flipped, horizontally shifted and also cropped from the top by 32 pixel. The lambda function passed to width_shift_value_transform modifies the steering angle based on how much the image was shifted to teach the model to correct for the shift.
SHIFT_OFFSET = 0.2
SHIFT_RANGE = 0.2
datagen = RegressionImageDataGenerator(rescale=lambda x: x / 127.5 - 1.,
horizontal_flip=True,
horizontal_flip_value_transform=lambda val: -val,
width_shift_range=SHIFT_RANGE,
width_shift_value_transform=
lambda val, shift: val - ((SHIFT_OFFSET / SHIFT_RANGE) * shift),
cropping=(32, 0, 0, 0))
Batch size was set to 128 as recommended in this paper.
The model was trained using early stopping to prevent over fitting through to much training. The maximum epochs was set to 50 to make sure to stay in a reasonable time frame. To evaluate the models performance, every epoch the validation loss was calculated. The final model trained for 8 epochs before stopping after no improvement for 5 epochs (number of epochs set to wait for an improvement). So the final weights were from the model after 3 epochs.
The following table shows three example images before and after the random transformation including the corresponding steering angles.
. | Sample 1 | Sample 2 | Sample 3 |
---|---|---|---|
Original Image | |||
Original Angles | 0.461989 | -0.351643 | -0.200000 |
Transformed Image | |||
Transformed Angles | 0.61926335 | 0.4209907 | 0.27578576 |