Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nfq refactor #980

Merged
merged 4 commits into from
Sep 28, 2023
Merged

Nfq refactor #980

merged 4 commits into from
Sep 28, 2023

Conversation

CasBex
Copy link
Contributor

@CasBex CasBex commented Sep 25, 2023

Hi, I implemented NFQ a while ago (#897) and have since noticed some things that could be improved. Changes are summarized below:

  1. Use a (state) -> ... -> |action space| structure for the Q-network (rather than (state, single action) -> ... -> value structure and looping over all actions)
  2. Explicitly use gradient / optimise instead of Flux.train
  3. Rely on the Trajectory controller for sampling rather than sampling inside the optimise! function

In addition to increased uniformity, this also consumes much less memory on the gpu compared to the previous version, which needed to duplicate all input data |action space| times to loop over all actions. The main drawback is that it is less faithful to the original implementation [1].

[1] Riedmiller, M. (2005). Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_32

@CasBex
Copy link
Contributor Author

CasBex commented Sep 25, 2023

Unrelated, but during testing I tried to work with the EpisodeSampleRatioController and didn't manage to get it working: it complains that NamedTuple has no field terminal. I didn't find any examples of it in this repo or in RLTrajectories, so that may be worthwile to pursue in another issue.

Copy link
Member

@HenriDeh HenriDeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good changes, this implements NFQ in a more idiosyncratic way that reflects better RL.jl's design.
I only have one question (see below).

@HenriDeh
Copy link
Member

Unrelated, but during testing I tried to work with the EpisodeSampleRatioController and didn't manage to get it working: it complains that NamedTuple has no field terminal. I didn't find any examples of it in this repo or in RLTrajectories, so that may be worthwile to pursue in another issue.

Yes it's new and unused at the moment. Can you share a stacktrace, I'll open a PR to fix this.

@HenriDeh HenriDeh merged commit dd19ee0 into JuliaReinforcementLearning:main Sep 28, 2023
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants