Nfq refactor #980

CasBex · 2023-09-25T11:11:29Z

Hi, I implemented NFQ a while ago (#897) and have since noticed some things that could be improved. Changes are summarized below:

Use a (state) -> ... -> |action space| structure for the Q-network (rather than (state, single action) -> ... -> value structure and looping over all actions)
Explicitly use gradient / optimise instead of Flux.train
Rely on the Trajectory controller for sampling rather than sampling inside the optimise! function

In addition to increased uniformity, this also consumes much less memory on the gpu compared to the previous version, which needed to duplicate all input data |action space| times to loop over all actions. The main drawback is that it is less faithful to the original implementation [1].

[1] Riedmiller, M. (2005). Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_32

CasBex · 2023-09-25T11:14:09Z

Unrelated, but during testing I tried to work with the EpisodeSampleRatioController and didn't manage to get it working: it complains that NamedTuple has no field terminal. I didn't find any examples of it in this repo or in RLTrajectories, so that may be worthwile to pursue in another issue.

HenriDeh

Good changes, this implements NFQ in a more idiosyncratic way that reflects better RL.jl's design.
I only have one question (see below).

src/ReinforcementLearningExperiments/deps/experiments/experiments/DQN/JuliaRL_NFQ_CartPole.jl

HenriDeh · 2023-09-27T09:37:21Z

Unrelated, but during testing I tried to work with the EpisodeSampleRatioController and didn't manage to get it working: it complains that NamedTuple has no field terminal. I didn't find any examples of it in this repo or in RLTrajectories, so that may be worthwile to pursue in another issue.

Yes it's new and unused at the moment. Can you share a stacktrace, I'll open a PR to fix this.

CasBex added 2 commits September 25, 2023 13:02

Mimic DQN interface

9294980

Modify Experiment

66c37a7

Remove GPU in experiment

8de7c84

HenriDeh reviewed Sep 27, 2023

View reviewed changes

src/ReinforcementLearningExperiments/deps/experiments/experiments/DQN/JuliaRL_NFQ_CartPole.jl Outdated Show resolved Hide resolved

Change batch size

d67b21c

CasBex mentioned this pull request Sep 28, 2023

EpisodeSampleRatioController NamedTuple has no field terminal JuliaReinforcementLearning/ReinforcementLearningTrajectories.jl#63

Open

HenriDeh self-requested a review September 28, 2023 08:58

HenriDeh approved these changes Sep 28, 2023

View reviewed changes

HenriDeh merged commit dd19ee0 into JuliaReinforcementLearning:main Sep 28, 2023
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nfq refactor #980

Nfq refactor #980

CasBex commented Sep 25, 2023

CasBex commented Sep 25, 2023

HenriDeh left a comment •

edited

Loading

HenriDeh commented Sep 27, 2023

Nfq refactor #980

Nfq refactor #980

Conversation

CasBex commented Sep 25, 2023

CasBex commented Sep 25, 2023

HenriDeh left a comment • edited Loading

Choose a reason for hiding this comment

HenriDeh commented Sep 27, 2023

HenriDeh left a comment •

edited

Loading