Skip to content

v0.6.0: compiled losses and partial steps

Latest
Compare
Choose a tag to compare
@vmoens vmoens released this 22 Oct 21:42

What's Changed

We introduce wrappers for ML-Agents and OpenSpiel. See the doc here for OpenSpiel and here for MLAgents.

We introduce support for [partial steps](#2377, #2381), allowing you to run rollouts that ends only when all envs are done without resetting those who have reached a termination point.

We add the capability of passing replay buffers directly to data collectors, to avoid inter-process synced communications - thereby drastically speeding up data collection. See the doc of the collectors for more info.

The GAIL algorithm has also been integrated in the library (#2273).

We ensure that all loss modules are compatible with torch.compile without graph breaks (for a typical built). Execution of compiled losses is usually in the range of 2x faster than its eager counterpart.

Finally, we have sadly decided not to support Gymnasium v1.0 and future releases as the new autoreset API is fundamentally incompatible with TorchRL. Furthermore, it does not guarantee the same level of reproducibility as previous releases. See this discussion for more information.

We provide wheels for aarch64 machines, but not being able to upload them to PyPI we provide them attached to these release notes.

Deprecations

  • [Deprecation] Deprecate default num_cells in MLP (#2395) by @vmoens
  • [Deprecations] Deprecate in view of v0.6 release #2446 by @vmoens

New environments

New features

  • [Feature] Add group_map support to MLAgents wrappers (#2491) by @kurtamohler
  • [Feature] Add scheduler for alpha/beta parameters of PrioritizedSampler (#2452) Co-authored-by: Vincent Moens by @LTluttmann
  • [Feature] Check number of kwargs matches num_workers (#2465) Co-authored-by: Vincent Moens by @antoine.broyelle
  • [Feature] Compiled and cudagraph for policies #2478 by @vmoens
  • [Feature] Consistent Dropout (#2399) Co-authored-by: Vincent Moens by @depictiger
  • [Feature] Deterministic sample for Masked one-hot #2440 by @vmoens
  • [Feature] Dict specs in vmas (#2415) Co-authored-by: Vincent Moens by @55539777+matteobettini
  • [Feature] Ensure transformation keys have the same number of elements (#2466) by @f.broyelle
  • [Feature] Make benchmarked losses compatible with torch.compile #2405 by @vmoens
  • [Feature] Partial steps in batched envs #2377 by @vmoens
  • [Feature] Pass replay buffers to MultiaSyncDataCollector #2387 by @vmoens
  • [Feature] Pass replay buffers to SyncDataCollector #2384 by @vmoens
  • [Feature] Prevent loading existing mmap files in storages if they already exist #2438 by @vmoens
  • [Feature] RNG for RBs (#2379) by @vmoens
  • [Feature] Randint on device for buffers #2470 by @vmoens
  • [Feature] SAC compatibility with composite distributions. (#2447) by @albertbou92
  • [Feature] Store MARL parameters in module (#2351) by @vmoens
  • [Feature] Support wrapping IsaacLab environments with GymEnv (#2380) by @yu-fz
  • [Feature] TensorDictMap #2306 by @vmoens
  • [Feature] TensorDictMap Query module #2305 by @vmoens
  • [Feature] TensorDictMap hashing functions #2304 by @vmoens
  • [Feature] break_when_all_done in rollout #2381 by @vmoens
  • [Feature] inline hold_out_net #2499 by @vmoens
  • [Feature] replay_buffer_chunk #2388 by @vmoens

New Algorithms

  • [Algorithm] GAIL (#2273) Co-authored-by: Vincent Moens by @Sebastian.dittert

Fixes

  • [BugFix, CI] Set TD_GET_DEFAULTS_TO_NONE=1 in all CIs (#2363) by @vmoens
  • [BugFix] Add MultiCategorical support in PettingZoo action masks (#2485) Co-authored-by: Vincent Moens by @matteobettini
  • [BugFix] Allow for composite action distributions in PPO/A2C losses (#2391) by @albertbou92
  • [BugFix] Avoid reshape(-1) for inputs to DreamerActorLoss (#2496) by @kurtamohler
  • [BugFix] Avoid reshape(-1) for inputs to objectives modules (#2494) Co-authored-by: Vincent Moens by @kurtamohler
  • [BugFix] Better dumps/loads (#2343) by @vmoens
  • [BugFix] Extend RB with lazy stack #2453 by @vmoens
  • [BugFix] Extend RB with lazy stack (revamp) #2454 by @vmoens
  • [BugFix] Fix Compose input spec transform (#2463) Co-authored-by: Louis Faury @louisfaury
  • [BugFix] Fix DeviceCastTransform #2471 by @vmoens
  • [BugFix] Fix LSTM in GAE with vmap (#2376) by @vmoens
  • [BugFix] Fix MARL-DDPG tutorial and other MODE usages (#2373) by @vmoens
  • [BugFix] Fix displaying of tensor sizes in buffers #2456 by @vmoens
  • [BugFix] Fix dumps for SamplerWithoutReplacement (#2506) by @vmoens
  • [BugFix] Fix get-related errors (#2361) by @vmoens
  • [BugFix] Fix invalid CUDA ID error when loading Bounded variables across devices (#2421) by @cbhua
  • [BugFix] Fix listing of updated keys in collectors (#2460) by @vmoens
  • [BugFix] Fix old deps tests #2500 by @vmoens
  • [BugFix] Fix support for MiniGrid envs (#2416) by @kurtamohler
  • [BugFix] Fix tictactoeenv.py #2417 by @vmoens
  • [BugFix] Fixes to RenameTransform (#2442) Co-authored-by: Vincent Moens by @thomasbbrunner
  • [BugFix] Make sure keys are exclusive in envs (#1912) by @vmoens
  • [BugFix] TensorDictPrimer updates spec instead of overwriting (#2332) Co-authored-by: Vincent Moens by @matteobettini
  • [BugFix] Use a RL-specific NO_DEFAULT instead of TD's one (#2367) by @vmoens
  • [BugFix] compatibility to new Composite dist log_prob/entropy APIs #2435 by @vmoens
  • [BugFix] torch 2.0 compatibility fix #2475 by @vmoens

Performance

  • [Performance] Faster CatFrames.unfolding with padding="same" (#2407) by @kurtamohler
  • [Performance] Faster PrioritizedSliceSampler._padded_indices (#2433) by @kurtamohler
  • [Performance] Faster SliceSampler._tensor_slices_from_startend (#2423) by @kurtamohler
  • [Performance] Faster target update using foreach (#2046) by @vmoens

Documentation

  • [Doc] Better doc for inverse transform semantic #2459 by @vmoens
  • [Doc] Correct minor erratum in knowledge_base entry (#2383) by @depictiger
  • [Doc] Document losses in README.md #2408 by @vmoens
  • [Doc] Fix README example (#2398) by @vmoens
  • [Doc] Fix links to tutos (#2409) by @vmoens
  • [Doc] Fix pip3install typos in Readme (#2342) by @43245438+TheRisenPhoenix
  • [Doc] Fix policy in getting started (#2429) by @vmoens
  • [Doc] Fix tutorials for release #2476 by @vmoens
  • [Doc] Fix wrong default value for flatten_tensordicts in ReplayBufferTrainer (#2502) by @vmoens
  • [Doc] Minor fixes to comments and docstrings (#2443) by @thomasbbrunner
  • [Doc] Refactor README (#2352) by @vmoens
  • [Docs] Use more appropriate ActorValueOperator in PPOLoss documentation (#2350) by @GaetanLepage
  • [Documentation] README rewrite and broken links (#2023) by @vmoens

Not user facing

New Contributors

As always, we want to show how appreciative we are of the vibrant open-source community that keeps TorchRL alive.

Full Changelog: v0.5.0...v0.6.0