You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Obviously very problem dependent as usual, but for the noise of the perturbations this is set to the 0.02 standard dev in the config (also in the paper apparently we don't want to lower this, which seems a little odd as one would think we want to do this as we converge to a maxima, or at the very least we'd want to lower the SD along with L2 coefficient if we reduce LR).
Also seems like they do around 10000 episodes per a one gradient estimation/one optimization step. Was wondering why the researchers arrived at these values. Both seem critical to the performance/efficient of this type of RIL. Is there any rough guidelines or intuition for setting these values? Or any sort of empirical evidence/studies to reference?
The text was updated successfully, but these errors were encountered:
Obviously very problem dependent as usual, but for the noise of the perturbations this is set to the 0.02 standard dev in the config (also in the paper apparently we don't want to lower this, which seems a little odd as one would think we want to do this as we converge to a maxima, or at the very least we'd want to lower the SD along with L2 coefficient if we reduce LR).
Also seems like they do around 10000 episodes per a one gradient estimation/one optimization step. Was wondering why the researchers arrived at these values. Both seem critical to the performance/efficient of this type of RIL. Is there any rough guidelines or intuition for setting these values? Or any sort of empirical evidence/studies to reference?
The text was updated successfully, but these errors were encountered: