This is a tensorflow implementation of a policy gradient algorithm for CartPole-v1 environment of OpenAI gym. In addition to the policy network, a value network is also lerned in order to reduce the variance during training.
- tensorflow 0.11
- OpenAI gym
$ python main.py