From f86a9be14cb4cdb9cc6c8545f33c7d45e94531af Mon Sep 17 00:00:00 2001 From: shaoanlu Date: Fri, 3 May 2024 01:58:04 +0900 Subject: [PATCH] Update learning_note.md --- learning_note.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/learning_note.md b/learning_note.md index 22548ce..4e7146c 100644 --- a/learning_note.md +++ b/learning_note.md @@ -1,7 +1,8 @@ ## Learning note ### Main insight -- The model should learn the residuals (velocity, gradient of the denoising) if possible. This greatly stabilizes the training. +- The model should learn the residuals (gradient of the denoising) if possible. This greatly stabilizes the training. - Advantages of diffusion model: 1) capability of modeling multi-modality, 2) stable training, and 3) temporally output consistency. +- Iteratively add training data of failure modes to make extrapolation into interpolation. ### Scribbles - The trained policy does not 100% reach the goal without collision (there is no collision in its training data). - Unable to recover from OOD data. @@ -14,9 +15,18 @@ - Even though the loss curve appears saturated, the performance of the controller can still improve as training continues. - The training loss curves of the diffusion model are extremely smooth btw. - On the contrary, it might be difficult to know if the model is overfitting or not by looking at the trajectory as well as the the denoising process. + - But in general I feel there is little harm training duffusion model as long as possible. - DDPM and DDIM samplers yield the best result. - Inference is not in real-time. The controller is set to sun 100Hz. +### Possible reasons for failures on collision avoidance +1. There is no data having collision in the training data. +2. Policy learned with imitation learning can exhibit accumulated error during closed-loop control + +When the quadrotor getting too close to the obstacles (due to 2), the input state becomes OOD (due to 1), therefore the diffusion policy is unable to recover from such situation. + +- Possible fix: adding training data that the quadrotor recovers from collision. + ### Things that didn't work - Tried encoding distances to each obstacle. Did not observe improvement in terms of collision avoidance. - Tried using vision encoder to replace obstacle encoding. Didn't see performance improvement. \ No newline at end of file