Move the Critic into the Optimizer #4939

andrewcoh · 2021-02-11T20:00:09Z

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

…evelop-critic-optimizer

ml-agents/mlagents/trainers/action_info.py

ml-agents/mlagents/trainers/ppo/optimizer_torch.py

vincentpierre · 2021-02-12T23:20:04Z

ml-agents/mlagents/trainers/sac/optimizer_torch.py

@@ -483,7 +495,7 @@ def update(self, batch: AgentBuffer, num_sequences: int) -> Dict[str, float]:
        offset = 1 if self.policy.sequence_length > 1 else 0
        next_memories_list = [
            ModelUtils.list_to_tensor(
-                batch[BufferKey.MEMORY][i][self.policy.m_size // 2 :]


Are there changes to memories in this PR? Why was this line changed?

ml-agents/mlagents/trainers/sac/optimizer_torch.py

vincentpierre · 2021-02-12T23:23:56Z

ml-agents/mlagents/trainers/torch/networks.py

+        pass
+
+
+class ValueNetwork(nn.Module, Critic):


Maybe later, but we could rename SimpleCritic to mirror SimpleActor

vincentpierre · 2021-02-12T23:24:47Z

ml-agents/mlagents/trainers/torch/networks.py

@@ -156,7 +156,31 @@ def forward(
        return encoding, memories


-class ValueNetwork(nn.Module):
+class Critic(abc.ABC):


I think it is time we split networks.py into networks.py, actor.py and Critic.py. Maybe in a subsequent PR.

* Use correct memories (t-1 instead of t) for training * Add presence check * Proper critic memories for PPO * Still somewhat broken but cleaner * Fix indexing issue * Fix padding issues * Fix more indexing bugs * Fix SAC * Fix non-lstm PPO * Tweak SAC tests * Fix AgentProcessor tests * Fix PPO tests * Code cleanup * Code cleanup * Fix SAC test * Append the right memories * Don't pad when not needed * Refactor some stuff * Remove random print statement

ml-agents/mlagents/trainers/ppo/optimizer_torch.py

vincentpierre · 2021-02-24T23:52:39Z

ml-agents/mlagents/trainers/optimizer/torch_optimizer.py

+        next_obs: List[np.ndarray],
+        done: bool,
+        agent_id: str = "",
+    ) -> Tuple[Dict[str, np.ndarray], Dict[str, float], Optional[AgentBufferField]]:


vincentpierre · 2021-02-24T23:59:16Z

ml-agents/mlagents/trainers/torch/networks.py

@@ -358,6 +370,21 @@ def get_action_stats(
        action, log_probs, entropies = self.action_model(encoding, masks)
        return action, log_probs, entropies, memories

+    def get_stats(


docstring in the interface seems to be what we have done in the past

https://github.com/Unity-Technologies/ml-agents/pull/4939/files#diff-50a682c59106eecab8f1dd4c6ef6152f02bd52ee127074a225f1c8b52b68278eR273

ok, I forgot this was implementing the interface

vincentpierre · 2021-02-25T00:03:44Z

ml-agents/mlagents/trainers/optimizer/torch_optimizer.py

+            if leftover > 0:
+                first_seq_obs = _obs[0:leftover]
+                first_seq_len = leftover
+            else:
+                first_seq_obs = _obs[0 : self.policy.sequence_length]


Suggested change

if leftover > 0:

first_seq_obs = _obs[0:leftover]

first_seq_len = leftover

else:

first_seq_obs = _obs[0 : self.policy.sequence_length]

if leftover > 0:

first_seq_len = leftover

first_seq_obs = _obs[0 :first_seq_len]

vincentpierre · 2021-02-25T00:04:53Z

ml-agents/mlagents/trainers/optimizer/torch_optimizer.py

+    ) -> Tuple[Dict[str, torch.Tensor], AgentBufferField, torch.Tensor]:
+        """
+        Evaluate the batch sequence-by-sequence, assembling the result. This enables us to get the
+        intermediate memories for the critic.


describe the inputs and their dimensions. it is not clear what initial_memory's shape is

Added a comment

vincentpierre · 2021-02-25T00:08:06Z

ml-agents/mlagents/trainers/optimizer/torch_optimizer.py

+            seq_obs.append(first_seq_obs)
+
+        for _ in range(first_seq_len):
+            all_next_memories.append(initial_memory.squeeze().detach().numpy())


Suggested change

all_next_memories.append(initial_memory.squeeze().detach().numpy())

all_next_memories.append(_mem.squeeze().detach().numpy())

Hard to understand why the all_next_memories are a concatenation of initial memories...

Refactored this to be a bit clearer

vincentpierre · 2021-02-25T00:09:13Z

ml-agents/mlagents/trainers/optimizer/torch_optimizer.py

+        _mem = initial_memory
+        seq_obs = []
+
+        first_seq_len = self.policy.sequence_length


Any reason for padding the end of the first sequence and not the last sequence ?

Neither is padded in this function, but the buffer pads the front of the first sequence, so this function accounts for that. I left the padding in the buffer as-is and still need to experiment with that change; it will likely be a future PR.

Then why is the first sequence handled differently ?

ervteng

* [MLA-1768] retrain Match3 scene (Unity-Technologies#4943) * improved settings and move to default_settings * update models * Release 13 versions. (Unity-Technologies#4946) - updated release tag validation script to automate the updating of files with release tags that need to be changed as part of the pre-commit operation. * Update readme for release_13. (Unity-Technologies#4951) * Update docs to pass doc validation. (Unity-Technologies#4953) * update defines, compile out Initialize body on non-desktop (Unity-Technologies#4957) * Masking Discrete Actions typos (Unity-Technologies#4961) (Unity-Technologies#4964) Co-authored-by: Philipp Siedler <[email protected]> * Adding references to the Extensions package to help promote it. (Unity-Technologies#4967) (Unity-Technologies#4968) Co-authored-by: Marwan Mattar <[email protected]> Co-authored-by: Chris Elion <[email protected]> * Fix release link validations. (Unity-Technologies#4970) * Adding the Variable length observation to the readme and to the overview of ML-Agents * - * forgot a dot * InputActuatorComponent to allow the generation of an action space from an InputActionAsset (Unity-Technologies#4881) (Unity-Technologies#4974) * handle no plugins found (Unity-Technologies#4969) (Unity-Technologies#4973) * Tick extension version. (Unity-Technologies#4975) * adding a comic and readding removed feaures docs * Update 2018 project version to fix burst errors. (Unity-Technologies#4977) (Unity-Technologies#4978) * Add an example project for the InputSystemActuator. (Unity-Technologies#4976) (Unity-Technologies#4980) * Update barracuda, swtich Agents in Sorter use Burst. (Unity-Technologies#4979) (Unity-Technologies#4981) * Set ignore done=False in GAIL (Unity-Technologies#4971) * MultiAgentGroup Interface (Unity-Technologies#4923) * add SimpleMultiAgentGroup * add group reward field to agent and proto * Fix InputActuatorComponent tests asmdef. (Unity-Technologies#4994) (Unity-Technologies#4995) * Fix asmdef? (Unity-Technologies#4994) (Unity-Technologies#4996) * Make TrainingAnalyticsSideChannel internal (Unity-Technologies#4999) * [MLA-1783] built-in actuator type (Unity-Technologies#4950) * Add component menues for some sensors and actuators. (Unity-Technologies#5001) * Add component menues for some sensors and actuators. (Unity-Technologies#5001) (Unity-Technologies#5002) * Fixing the number of layers in the config of PyramidsRND * Merge master -> release_13_branch-to-master * Fix RpcCommunicator merge. * Move the Critic into the Optimizer (Unity-Technologies#4939) Co-authored-by: Ervin Teng <[email protected]> * master -> main. (Unity-Technologies#5010) * Adding links to CV/Robotics/GameSim (Unity-Technologies#5012) * Make GridSensor a non allocating object after initialization. (Unity-Technologies#5014) Co-authored-by: Chris Elion <[email protected]> * Modified the model_serialization to have correct inputs and outputs * switching from CamelCase to snake_case * Fix gpu pytests (Unity-Technologies#5019) * Move tensors to cpu before converting it to numpy * Adding a name field to BufferSensorComponent * Adding a note to the CHANGELOG about var len obs * Adding a helper method for creating observation placeholder names and removed the _h and _c placeholders * Adding a custom editor for BufferSensorComponent * Changing Sorter to use the new Editor and serialization * adding inheritdoc * Update cattrs dependencies to support python3.9 (Unity-Technologies#4821) * Python Dataflow for Group Manager (Unity-Technologies#4926) * Make buffer type-agnostic * Edit types of Apped method * Change comment * Collaborative walljump * Make collab env harder * Add group ID * Add collab obs to trajectory * Fix bug; add critic_obs to buffer * Set group ids for some envs * Pretty broken * Less broken PPO * Update SAC, fix PPO batching * Fix SAC interrupted condition and typing * Fix SAC interrupted again * Remove erroneous file * Fix multiple obs * Update curiosity reward provider * Update GAIL and BC * Multi-input network * Some minor tweaks but still broken * Get next critic observations into value estimate * Temporarily disable exporting * Use Vince's ONNX export code * Cleanup * Add walljump collab YAML * Lower max height * Update prefab * Update prefab * Collaborative Hallway * Set num teammates to 2 * Add config and group ids to HallwayCollab * Fix bug with hallway collab * Edits to HallwayCollab * Update onnx file meta * Make the env easier * Remove prints * Make Collab env harder * Fix group ID * Add cc to ghost trainer * Add comment to ghost trainer * Revert "Add comment to ghost trainer" This reverts commit 292b6ce. * Actually add comment to ghosttrainer * Scale size of CC network * Scale value network based on num agents * Add 3rd symbol to hallway collab * Make comms one-hot * Fix S tag * Additional changes * Some more fixes * Self-attention Centralized Critic * separate entity encoder and RSA * clean up args in mha * more cleanups * fixed tests * entity embeddings work with no max Integrate into CC * remove group id * very rough sketch for TeamManager interface * One layer for entity embed * Use 4 heads * add defaults to linear encoder, initialize ent encoders * add team manager id to proto * team manager for hallway * add manager to hallway * send and process team manager id * remove print * small cleanup * default behavior for baseTeamManager * add back statsrecorder * update * Team manager prototype (Unity-Technologies#4850) * remove group id * very rough sketch for TeamManager interface * add team manager id to proto * team manager for hallway * add manager to hallway * send and process team manager id * remove print * small cleanup Co-authored-by: Chris Elion <[email protected]> * Remove statsrecorder * Fix AgentProcessor for TeamManager Should work for variable decision frequencies (untested) * team manager * New buffer layout, TeamObsUtil, pad dead agents * Use NaNs to get masks for attention * Add team reward to buffer * Try subtract marginalized value * Add Q function with attention * Some more progress - still broken * use singular entity embedding (Unity-Technologies#4873) * I think it's running * Actions added but untested * Fix issue with team_actions * Add next action and next team obs * separate forward into q_net and baseline * might be right * forcing this to work * buffer error * COMAA runs * add lambda return and target network * no target net * remove normalize advantages * add target network back * value estimator * update coma config * add target net * no target, increase lambda * remove prints * cloud config * use v return * use target net * adding zombie to coma2 brnch * add callbacks * cloud run with coma2 of held out zombie test env * target of baseline is returns_v * remove target update * Add team dones * ntegrate teammate dones * add value clipping * try again on cloud * clipping values and updated zombie * update configs * remove value head clipping * update zombie config * Add trust region to COMA updates * Remove Q-net for perf * Weight decay, regularizaton loss * Use same network * add base team manager * Remove reg loss, still stable * Black format * add team reward field to agent and proto * set team reward * add maxstep to teammanager and hook to academy * check agent by agent.enabled * remove manager from academy when dispose * move manager * put team reward in decision steps * use 0 as default manager id * fix setTeamReward Co-authored-by: Vincent-Pierre BERGES <[email protected]> * change method name to GetRegisteredAgents * address comments * Revert C# env changes * Remove a bunch of stuff from envs * Remove a bunch of extra files * Remove changes from base-teammanager * Remove remaining files * Remove some unneeded changes * Make buffer typing neater * AgentProcessor fixes * Back out trainer changes * use delegate to avoid agent-manager cyclic reference * put team reward in decision steps * fix unregister agents * add teamreward to decision step * typo * unregister on disabled * remove OnTeamEpisodeBegin * change name TeamManager to MultiAgentGroup * more team -> group * fix tests * fix tests * Use attention tests from master * Revert "Use attention tests from master" This reverts commit 78e052b. * Use attention from master * Renaming fest * Use NamedTuples instead of attrs classes * Bug fixes * remove GroupMaxStep * add some doc * Fix mock brain * np float32 fixes * more renaming * Test for team obs in agentprocessor * Test for group and add team reward * doc improve Co-authored-by: Ervin T. <[email protected]> * store registered agents in set * remove unused step counts * Global group ids * Fix Trajectory test * Remove duplicated files * Add team methods to AgentAction * Buffer fixes (cherry picked from commit 2c03d2b) * Add test for GroupObs * Change AgentAction back to 0 pad and add tests * Addressed some comments * Address some comments * Add more comments * Rename internal function * Move padding method to AgentBufferField * Fix slicing typing and string printing in AgentBufferField * Fix to-flat and add tests * Rename GroupmateStatus to AgentStatus * Update comments * Added GroupId, GlobalGroupId, GlobalAgentId types * Update comment * Make some agent processor properties internal * Rename add_group_status * Rename store_group_status, fix test * Rename clear_group_obs Co-authored-by: Andrew Cohen <[email protected]> Co-authored-by: Ruo-Ping Dong <[email protected]> Co-authored-by: Chris Elion <[email protected]> Co-authored-by: andrewcoh <[email protected]> Co-authored-by: Vincent-Pierre BERGES <[email protected]> * Removing some scenes (Unity-Technologies#4997) * Removing some scenes, All the Static and all the non variable speed environments. Also removed Bouncer, PushBlock, WallJump and reacher. Removed a bunch of visual environements as well. Removed 3DBallHard and FoodCollector (kept Visual and Grid FoodCollector) * readding 3DBallHard * readding pushblock and walljump * Removing tennis * removing mentions of removed environments * removing unused images * Renaming Crawler demos * renaming some demo files * removing and modifying some config files * new examples image? * removing Bouncer from build list * replacing the Bouncer environment with Match3 for llapi tests * Typo in yamato test * Fix issue with queuing input events that stomp on others. (Unity-Technologies#5034) Co-authored-by: Chris Elion <[email protected]> Co-authored-by: Chris Goy <[email protected]> Co-authored-by: Marwan Mattar <[email protected]> Co-authored-by: vincentpierre <[email protected]> Co-authored-by: andrewcoh <[email protected]> Co-authored-by: Ruo-Ping Dong <[email protected]> Co-authored-by: Ervin Teng <[email protected]> Co-authored-by: Andrew Cohen <[email protected]>

andrewcoh and others added 13 commits February 9, 2021 14:52

Separate Actor/Critic, remove ActorCritics

96278d0

update policy to not use critic

5f8cbc5

add critic to optimizer, ppo runs

293ec08

fix precommit errors

7d20bd9

fix test_networks

c669226

Update SAC to use separate policy

b22d0ae

make critic a property

d7e2ca6

remove commented code

9f6eca7

fix saver test

944997a

Move value network for SAC to device

527ca06

Merge remote-tracking branch 'origin/develop-critic-optimizer' into d…

eb15030

…evelop-critic-optimizer

add SharedActorCritic

4d215cf

test for SharedActorCritic

9fac4b1

andrewcoh requested review from vincentpierre and ervteng February 11, 2021 21:59

andrewcoh added 4 commits February 11, 2021 17:43

fix agent processor test

d5a30f1

fix sac shared

65b5992

fix test policy

31da276

adjust step size gail visual ppo

c41c9a7

vincentpierre approved these changes Feb 12, 2021

View reviewed changes

andrewcoh and others added 3 commits February 16, 2021 15:37

Merge branch 'master' into develop-critic-optimizer

817b248

address comments

cbb8b64

andrewcoh changed the title ~~[WIP] Move the Critic into the Optimizer~~ Move the Critic into the Optimizer Feb 24, 2021

andrewcoh added 2 commits February 24, 2021 15:34

raise if SAC using SharedActorCritic

beae793

Merge branch 'master' into develop-critic-optimizer

5911879

ervteng reviewed Feb 24, 2021

View reviewed changes

ml-agents/mlagents/trainers/ppo/optimizer_torch.py Show resolved Hide resolved

add critic to default device

7ce234d

vincentpierre reviewed Feb 24, 2021

View reviewed changes

vincentpierre reviewed Feb 25, 2021

View reviewed changes

Address comments

6c300c8

Base automatically changed from master to main February 25, 2021 19:16

docstring for action info

36d9532

ervteng approved these changes Feb 25, 2021

View reviewed changes

andrewcoh merged commit d0d9d14 into main Feb 25, 2021

delete-merged-branch bot deleted the develop-critic-optimizer branch February 25, 2021 21:33

dongruoping mentioned this pull request Mar 1, 2021

Fix gpu pytests #5019

Merged

10 tasks

ervteng mentioned this pull request Mar 9, 2021

[bug-fix] Fix save/restore critic, add test #5062

Merged

10 tasks

github-actions bot locked as resolved and limited conversation to collaborators Feb 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move the Critic into the Optimizer #4939

Move the Critic into the Optimizer #4939

andrewcoh commented Feb 11, 2021 •

edited

Loading

vincentpierre Feb 12, 2021

andrewcoh Feb 24, 2021

vincentpierre Feb 12, 2021

vincentpierre Feb 12, 2021

vincentpierre Feb 24, 2021

ervteng Feb 25, 2021

vincentpierre Feb 24, 2021

andrewcoh Feb 25, 2021

andrewcoh Feb 25, 2021

vincentpierre Feb 25, 2021

vincentpierre Feb 25, 2021

vincentpierre Feb 25, 2021

ervteng Feb 25, 2021

vincentpierre Feb 25, 2021

vincentpierre Feb 25, 2021

ervteng Feb 25, 2021

vincentpierre Feb 25, 2021

ervteng Feb 25, 2021

vincentpierre Feb 25, 2021

ervteng left a comment

	all_next_memories.append(initial_memory.squeeze().detach().numpy())
	all_next_memories.append(_mem.squeeze().detach().numpy())

Move the Critic into the Optimizer #4939

Move the Critic into the Optimizer #4939

Conversation

andrewcoh commented Feb 11, 2021 • edited Loading

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ervteng left a comment

Choose a reason for hiding this comment

andrewcoh commented Feb 11, 2021 •

edited

Loading