-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of github.com:h2r/NPM-Dataset
- Loading branch information
Showing
511 changed files
with
61,704 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -121,9 +121,72 @@ The detailed metadata can be found in the dataset card. | |
| all_joint_velocities | {"fl.hx": -0.0014713359996676445, "fl.hy": -0.0019799235742539167, "fl.kn": 0.011371612548828125, "fr.hx": -0.007194998674094677, "fr.hy": 0.0033285804092884064, "fr.kn": -0.01216356735676527, "hl.hx": 0.004889719653874636, "hl.hy": -0.0077947331592440605, "hl.kn": 0.005902839358896017, "hr.hx": 0.01074210461229086, "hr.hy": 0.005369353573769331, "hr.kn": -0.019331036135554314, "arm0.sh0": -0.009795751422643661, "arm0.sh1": 0.011766805313527584, "arm0.hr0": 0.0, "arm0.el0": 0.010913466103374958, "arm0.el1": -0.007954984903335571, "arm0.wr0": 0.004147909115999937, "arm0.wr1": 0.003433068050071597, "arm0.f1x": -0.0011129062622785568} | --> | ||
|
||
## RT-1 | ||
The RT-1 model from the paper ["RT-1: Robotics Transformer for Real-World Control at Scale"](https://www.roboticsproceedings.org/rss19/p025.pdf) by _Brohan et al._ was modified and fine-tuned on LaNMP. | ||
The RT-1 model from the paper ["RT-1: Robotics Transformer for Real-World Control at Scale"](https://www.roboticsproceedings.org/rss19/p025.pdf) by _Brohan et al._ was modified and fine-tuned on LaNMP. This model was trained and run on an NVIDIA 3090 GPU. | ||
|
||
<img src="./models/main_models/rt1/figures/rt1.png" width="450px"></img> | ||
|
||
A forked implementation of <a href = "https://github.com/Rohan138/rt1-pytorch.git"> RT1 (Robotic Transformer) </a> originally inspired by the <a href="https://ai.googleblog.com/2022/12/rt-1-robotics-transformer-for-real.html"> Google Research </a> paper. | ||
|
||
This implemenetation of RT-1 was pretrained on the <a href="https://sites.google.com/view/bridgedata"> Bridge </a> dataset and further fine-tuned on our LaNMP dataset for evaluation. Please find details of the repository below | ||
|
||
### Setup Instructions | ||
|
||
```bash | ||
git clone [email protected]:h2r/LaNPM-Dataset.git | ||
cd models/main_models/rt1 | ||
pip install -e . | ||
``` | ||
|
||
### Overview of files | ||
|
||
This repository has 7 critical files/folders whose use cases are described below | ||
|
||
1) ```main.py```: used to pretrain RT-1 on the bridge dataset. Modifying this file to accomodate different datasets requires changing the ```observation_space``` and ```action_space``` according to the dataset being loaded, as well as changing the dataset keys in ```rt1_pytorch/tokenizers/action_tokenizer.py```. Running this file saves a series of checkpoints and logs losses using weights and biases | ||
2) ```main_ft.py```: used to finetune RT-1 on the LaNMP dataset. This file has the ```observation_space``` and ```action_space``` and PyTorch ```DataLoader``` already modified to accomodate for the LaNMP dataset finetuning (AI2Thor). Running this file saves a series of checkpoints and logs losses using weights and biases | ||
3) ```main_ft_eval.py```: used to run RT-1 in inference mode on the LaNMP dataset. This file has the ```observation_space``` and ```action_space``` and PyTorch ```DataLoader``` already modified to accomodate for the LaNMP dataset (AI2Thor). The file iterates/loads all saved checkpoints from finetuning and runs RT-1 on inference mode for the validation dataset on each checkpoint. The script logs the test losses using weights and biases | ||
4) ```ai2thor_env.py```: contains a Gym environment style class to load and take steps in AI2Thor enivironment. This file is used to generate real-time trajectories based on the action tokens generated by a finetuned RT-1 model (specific for AI2Thor). The main ```step()``` function takes/executes the generated action by RT-1 and returns a success message along with information about the environment state e.g. object or agent metadata, which can be saved to capture the trajectory taken by the agent for a given task | ||
5) ```rollout_ai2thor.py```: interfaces between the finetuned RT-1 model (from a loaded checkpoint after finetuning on LaNMP) and the ```ai2thor_env.py``` Gym environment, in order to send observations from the AI2Thor environment to RT-1 and execute proposed action tokens by RT-1 on AI2Thor. Note that this file should not be run on a headless machine since it requires/deploys AI2Thor simulator GUI | ||
6) ```rt1_pytorch/rt1_policy.py```: contains the RT-1 model implementation in PyTorch. The ```loss()``` function performs forward pass of RT-1 for training and ```act()``` function performs the forward pass during inference. | ||
7) ```lanmp_dataloader/rt1_dataloader.py```: contains the ```DatasetManager``` class that extracts trajectories from the LaNMP ```sim_data.hdf5``` dataset file. The script automatically separates train and validation subsets according to different splits e.g. k-fold by scene, task wise or for diversity ablation. The ```DatasetManager``` also handles tokenizing/detokenizing the raw trajectory data into 256 discrete buckets, whilst also chunking trajectories across non-overlapping window lengths of 6 steps | ||
|
||
### Details about file arguments | ||
|
||
Most relevant files in this repository accept the same set of arguments that are detailed below | ||
* ```dataset```: only for the ```main.py``` file, specifies the dataset on which the RT-1 model should be pretrained | ||
* ```train-split```: specifies what fraction of the loaded dataset should be used for training v.s. evaluation | ||
* ```eval-split```: specifies what fraction of the laoded dataset should be used for evaluation v.s. training | ||
* ```epochs```: total number of passes over the all batches of the training set | ||
* ```lr```: learning rate for cross-entropy loss of RT1 | ||
* ```train-batch-size```: the number of trajectories from which to sample data for the current training batch | ||
* ```eval-batch-size```: the number of trajectories from which to sample data for the current evaluation batch | ||
* ```trajectory-length```: the window size (context history of ```trajecotry-length``` previous images) used for each trajectory when feeding data to RT-1 model; this is set to 6 based on the RT-1 implementation | ||
* ```sentence-transformer```: the language embedding to apply on the language-specified task | ||
* ```device```: the device to load the model/data onto during training/inference | ||
* ```eval-freq```: the interval of batches at which to run evaluation/inference on the validation dataset (currently set to 0 in ```main_ft.py```) | ||
* ```checkpoint-freq```: the interval of batches at which to save a checkpoint during training | ||
* ```checkpoint-dir```: the directory path at which to save a checkpoint during training | ||
* ```load-checkpoint```: (optional) path of the pretrained checkpoint to load for further fine-tuning | ||
* ```wandb```: boolean determining if logging to weights and biases should happen | ||
* ```eval-scene```: the AI2Thor scene number in the dataset that is held out of the training set for evaluation during k-fold cross validation across scenes | ||
* ```split-type```: determines the split type (i.e. k-fold by scene, task wise or diversity ablation) between train and evaluation used by the ```DatasetManager``` in ```rt1_dataloader.py``` | ||
* ```num-diversity-scenes```: only if ```split-type``` is ```diversity-ablation```, this is used to determine the total number of scenes to perform diversity ablation over i.e. maximum of 4 for LaNMP simulation data | ||
* ```max-diversity-trajectories```: only if ```split-type``` is ```diversity-ablation```, this is used to determine the total number of trajectories that are divided evenly across the number of ```num-diversity-scenes``` scenes | ||
* ```train-subbatch```: the batch size to use during training/finetuning | ||
* ```eval-subbatch```: the batch size to use during evaluation | ||
|
||
### Checkpoint samples | ||
|
||
Please find the follow checkpoints samples that can be loaded to the RT-1 model. These can be found on the supplementary Google Drive associated with this project | ||
* ```sample_checkpoints/pretrained_bridge```: the final checkpoint saved when pretraining the RT-1 model on the Bridge dataset | ||
* ```sample_checkpoints/task_gen```: the final checkpoint saved after finetuning RT-1 model on the task-wise split for the task generalization experiment | ||
|
||
### Additional notes | ||
|
||
When running any of the finetuning or pretraining scripts, please ensure the following modules are loaded | ||
```module load cuda/11.8.0-lpttyok``` | ||
```module load cudnn/8.7.0.84-11.8-lg2dpd5``` | ||
|
||
|
||
To be continued... | ||
|
||
## ALFRED Seq2Seq | ||
The ALFRED Seq2Seq model from the paper ["ALFRED A Benchmark for Interpreting Grounded Instructions for Everyday Tasks"](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shridhar_ALFRED_A_Benchmark_for_Interpreting_Grounded_Instructions_for_Everyday_Tasks_CVPR_2020_paper.pdf) by _Shridhar et al._ was modified and fine-tuned on LaNMP. | ||
|
@@ -180,4 +243,4 @@ python models/eval/eval_seq2seq.py --model_path exp/best_test_fold1.pth --gpu -- | |
``` | ||
* The command assumes it is run on a machine with a GUI in order to run the AI2THOR simulator, i.e. not on a headless machine. | ||
* To run other models instead of the "fold1" model, change any part that has "fold1" in the command to the desired model, e.g. "task" for the "best_test_task.pth" model. | ||
* More details on all the command-line arguments can be found at `LaNMP-Dataset/models/main_models/eval/eval_seq2seq.py`. | ||
* More details on all the command-line arguments can be found at `LaNMP-Dataset/models/main_models/eval/eval_seq2seq.py`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2022 Phil Wang | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
Oops, something went wrong.