Skip to content

Soft Actor-Critic (SAC) and policy kwargs

Compare
Choose a tag to compare
@araffin araffin released this 17 Jan 17:16
· 156 commits to master since this release
  • added Soft Actor-Critic (SAC) model
  • fixed a bug in DQN where prioritized_replay_beta_iters param was not used
  • fixed DDPG that did not save target network parameters
  • fixed bug related to shape of true_reward (@abhiskk)
  • fixed example code in documentation of tf_util:Function (@JohannesAck)
  • added learning rate schedule for SAC
  • fixed action probability for continuous actions with actor-critic models
  • added optional parameter to action_probability for likelihood calculation of given action being taken.
  • added more flexible custom LSTM policies
  • added auto entropy coefficient optimization for SAC
  • clip continuous actions at test time too for all algorithms (except SAC/DDPG where it is not needed)
  • added a mean to pass kwargs to policy when creating a model (+ save those kwargs)
  • fixed DQN examples in DQN folder
  • added possibility to pass activation function for DDPG, DQN and SAC

We would like to thanks our contributors (in random order): @abhiskk @JohannesAck
@EliasHasle @mrakgr @Bleyddyn
and welcoming a new maintainer: @erniejunior