Skip to content

Commit

Permalink
Updated doc
Browse files Browse the repository at this point in the history
  • Loading branch information
aweeraman committed Feb 5, 2019
1 parent 5ae6530 commit 88d5f05
Show file tree
Hide file tree
Showing 5 changed files with 62 additions and 92 deletions.
91 changes: 62 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Udacity Deep Reinforcement Learning Nanodegree Project 1: Navigation
# Project: Navigation

This is a project that uses Deep Q-Networks to train an agent to capture yellow bananas and avoid
blue bananas through deep reinforcement learning in a Unity ML-Agents environment.
Expand Down Expand Up @@ -72,33 +72,6 @@ To customize hyperparameters and train the agent, execute the following:

```
$ python bananas.py --train
Mono path[0] = '/Users/anuradha/ninsei/udacity/bananas/Banana.app/Contents/Resources/Data/Managed'
Mono config path = '/Users/anuradha/ninsei/udacity/bananas/Banana.app/Contents/MonoBleedingEdge/etc'
INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
Number of Brains: 1
Number of External Brains : 1
Lesson number : 0
Reset Parameters :
Unity brain name: BananaBrain
Number of Visual Observations (per agent): 0
Vector Observation space type: continuous
Vector Observation space size (per agent): 37
Number of stacked Vector Observation: 1
Vector Action space type: discrete
Vector Action space size (per agent): 4
Vector Action descriptions: , , ,
Number of agents: 1
Number of actions: 4
Episode 100 Average Score: 0.785
Episode 200 Average Score: 4.03
Episode 300 Average Score: 7.21
Episode 400 Average Score: 9.00
Episode 500 Average Score: 11.44
Episode 574 Average Score: 13.02
Environment solved in 474 episodes! Average Score: 13.02
```

# Environment details
Expand All @@ -118,7 +91,62 @@ The action space for the agent consists of the following four possible actions:

The agent must collect a reward of +13 or more in over 100 consecutive episodes to solve the problem.

# Troubleshooting Tips
## Learning algorithm

Q-Learning is an approach which generates a Q-table that is used by an agent to determine best action
for a given state. This technique becomes difficult and inefficient in environments that have a large
state space. Deep Q-Networks on the other hand makes use of a neural network to approximate Q-values
for each action based on the input state.

However, there are drawbacks in Deep Q-Learning. A common issue is that the reinforcement learning tends
to be unstable or divergent when a non-linear function approximator such as neural networks are used to
represent Q. This instability comes from the correlations present in the sequence of observations, the fact
that small updates to Q may significantly change the policy and the data distribution, and the correlations
between Q and the target values. [1]

To overcome this, experience replay is a technique that was used in this solution that uses the biologically
inspired approach of replaying a random sample of prior actions to remove correlations in the observation
sequence and smooth changes in the data distribution.

## Model architecture and hyperparameters

* Fully connected layer 1: Input 37 (state space), Output 32, RELU activation
* Fully connected layer 2: Input 32, Output 32, RELU activation
* Fully connected layer 3: Input 32, Output 4 (action space)

The hyperparameters for tweaking and optimizing the learning algorithm were:

* max_t (750): maximum number of timesteps per episode
* eps_start (1.0): starting value of epsilon, for epsilon-greedy action selection
* eps_end (0.01): minimum value of epsilon
* eps_decay (0.9): multiplacative factor (per episode) for decreasing epsilon

## Plot of rewards

Below is a training run of the above model architecture and hyperparameters:

```
Number of agents: 1
Number of actions: 4
Episode 100 Average Score: 3.97
Episode 200 Average Score: 9.51
Episode 287 Average Score: 13.12
Environment solved in 187 episodes! Average Score: 13.12
```

The plot of rewards for this run is as follows:

![Plot of rewards](https://raw.githubusercontent.com/aweeraman/deep-q-networks-navigation/master/images/plot_of_rewards.png)

## Future work

Further optimization of this architecture can be performed by training with different hyperparameters to
get faster and better learning outcomes. A couple of further approaches to try out are:

* Double Q-Learning
* Delayed Q-Learning

## Troubleshooting Tips

If you run into an error such as the following when training the agent:

Expand All @@ -131,3 +159,8 @@ Modify ~/.matplotlib/matplotlibrc and add the following line:
```
backend: TkAgg
```

## Reference

1 - https://en.wikipedia.org/wiki/Q-learning#Deep_Q-learning

63 changes: 0 additions & 63 deletions Report.md

This file was deleted.

Binary file removed graph_09.png
Binary file not shown.
Binary file removed graph_099.png
Binary file not shown.
File renamed without changes

0 comments on commit 88d5f05

Please sign in to comment.