Release Soft Actor-Critic (SAC) and policy kwargs · hill-a/stable-baselines

added Soft Actor-Critic (SAC) model
fixed a bug in DQN where prioritized_replay_beta_iters param was not used
fixed DDPG that did not save target network parameters
fixed bug related to shape of true_reward (@abhiskk)
fixed example code in documentation of tf_util:Function (@JohannesAck)
added learning rate schedule for SAC
fixed action probability for continuous actions with actor-critic models
added optional parameter to action_probability for likelihood calculation of given action being taken.
added more flexible custom LSTM policies
added auto entropy coefficient optimization for SAC
clip continuous actions at test time too for all algorithms (except SAC/DDPG where it is not needed)
added a mean to pass kwargs to policy when creating a model (+ save those kwargs)
fixed DQN examples in DQN folder
added possibility to pass activation function for DDPG, DQN and SAC

We would like to thanks our contributors (in random order): @abhiskk @JohannesAck
@EliasHasle @mrakgr @Bleyddyn
and welcoming a new maintainer: @erniejunior

Provide feedback