Update learning_note.md

shaoanlu · May 2, 2024 · f86a9be · f86a9be
1 parent 8d25002
commit f86a9be
Showing 1 changed file with 11 additions and 1 deletion.
diff --git a/learning_note.md b/learning_note.md
@@ -1,7 +1,8 @@
 ## Learning note
 ### Main insight
-- The model should learn the residuals (velocity, gradient of the denoising) if possible. This greatly stabilizes the training.
+- The model should learn the residuals (gradient of the denoising) if possible. This greatly stabilizes the training.
 - Advantages of diffusion model: 1) capability of modeling multi-modality, 2) stable training, and 3) temporally output consistency.
+- Iteratively add training data of failure modes to make extrapolation into interpolation.
 ### Scribbles
 - The trained policy does not 100% reach the goal without collision (there is no collision in its training data).
   - Unable to recover from OOD data.
@@ -14,9 +15,18 @@
 - Even though the loss curve appears saturated, the performance of the controller can still improve as training continues.
 - The training loss curves of the diffusion model are extremely smooth btw.
   - On the contrary, it might be difficult to know if the model is overfitting or not by looking at the trajectory as well as the the denoising process.
+  - But in general I feel there is little harm training duffusion model as long as possible.
 - DDPM and DDIM samplers yield the best result.
 - Inference is not in real-time. The controller is set to sun 100Hz.
 
+### Possible reasons for failures on collision avoidance
+1. There is no data having collision in the training data. 
+2. Policy learned with imitation learning can exhibit accumulated error during closed-loop control
+
+When the quadrotor getting too close to the obstacles (due to 2), the input state becomes OOD (due to 1), therefore the diffusion policy is unable to recover from such situation.
+
+- Possible fix: adding training data that the quadrotor recovers from collision.
+
 ### Things that didn't work
 - Tried encoding distances to each obstacle. Did not observe improvement in terms of collision avoidance.
 - Tried using vision encoder to replace obstacle encoding. Didn't see performance improvement.