Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about value network #45

Open
yuke93 opened this issue Jul 17, 2019 · 0 comments
Open

Question about value network #45

yuke93 opened this issue Jul 17, 2019 · 0 comments

Comments

@yuke93
Copy link

yuke93 commented Jul 17, 2019

Hi Yuanming,

Thanks for releasing codes of this wonderful project!

I have a question about the value network. In net.py, the new_value is predicted by observing fake_output and new_states. Let s_t denote fake_input, and then fake_output is s_{t+1}. The new_states contain the ation a_t that transfers s_t to s_{t+1}. Therefore, it seems the codes are predicting Q(s_t, a_{t-1}), Q(s_{t+1}, a_t) rather than Q(s_t, a_t), Q(s_{t+1}, a_{t+1}). If so, I am confused how the policy gradients are calculated (e.g., Eqn. (7) in the paper). I might get something wrong. I'd appreciate it if you could help me clarify this question. Thanks!

Yu Ke

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant