-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to plot the Hessian max eigenvalue spectra? #12
Comments
Hi @Dong1P , Thank you for your support. I did not release the code for the Hessian eigenvalue spectra visualization (e.g., Fig 1c and 4) yet. Instead, I provide some useful information below. Hessian Max Eigenvalue Spectrum: My implementation uses PyHessian (https://github.com/amirgholami/PyHessian) and the pseudo-code below is extremely simple. Source: Appendix A3 in Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness (ICML 2022). It calculates and gathers top-k (e.g., top-5) Hessian eigenvalues by using power iteration mini-batch wisely. from pyhessian import hessian
from tqdm import tqdm
max_eigens = [] # a list of batch-wise top-k hessian max eigenvalues
model = model.cuda()
for xs, ys in tqdm(dataset_train):
hessian_comp = hessian(model, data=(xs, ys), transform=transform, weight_decay=weight_decay, cuda=True) # measure hessian max eigenvalues with NLL + L2 on data augmented (`transform`) datasets
top_eigenvalues, top_eigenvector = hessian_comp.eigenvalues(top_n=5) # collect top-5 hessian eigenvaues by using power-iteration (https://en.wikipedia.org/wiki/Power_iteration)
max_eigens = max_eigens + top_eigenvalues # aggregate top-5 max eigenvalues PyHessian does not support Visualization: Hessian spectra (a list of real values, i.e., |
Thanks for your great work and I have learned a lot from it.@xxxnell |
Hi @yukimmmmiao , thank you for the kind words. I assumed that the largest Hessian values have a dominant influence on optimization (Ghorbani, et al (ICML 2019). See also Liu et al (NeurIPS 2020)). I agree that the smallest Hessian eigenvalues also play an important role in optimization---to be clear, the algorithm will produce the greatest eigenvalues in absolute value, so the Hessian spectrum contains not only the largest eigenvalues but also the smallest negative eigenvalues. However, this algorithm neglect near-zero Hessian values, and I would like to leave a detailed analysis of near-zero Hessian values for future work. In my code, NN weights are fixed values. The Hessian values were measured by using saved checkpoints in separate jobs, not in the optimization tasks, for simplicity. |
Hi @xxxnell, do you have any tips on what arguments to use for the |
Hi @dgcnz, thank you for reaching out. The occurrence of negative Hessian eigenvalues is largely dependent on the dataset and model configuration. I was wondering that you're working with smaller datasets, e.g. CIFAR, with data augmentations and utilizing a small model, e.g. Ti-sized model. |
Thanks for your answer, @xxxnell 😄. We're currently testing on Rotational MNIST, which as far as I understand, would be too small/easy to consistently find negative eigenvalues? Also, the datasets you tested for obtaining negative hessian eigenvalues was 10% of CIFAR and ImageNet, right? Did you by any chance test on a smaller dataset? For context, we're comparing a CNN with an Rotationally Equivariant CNN and we were hoping to find a similar pattern as your work for a ViT vs ResNet. |
Unfortunately, I haven't tested on datasets smaller/easier than CIFAR. The conf Please feel free to reach out via email ([email protected] or [email protected]) if you'd like to provide more detailed information about your settings. I'd be happy to discuss at some point. |
I read your paper and studied a lot.
I would also like to see the code for plotting Hessian max eigenvalue spectra.
May I know if you have any plans to update?
Best,
The text was updated successfully, but these errors were encountered: