-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Torch implementation of ParametricUMAP #1103
base: master
Are you sure you want to change the base?
Conversation
Hello @jh83775! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-03-26 11:59:02 UTC |
Thanks for this. There is currently work underway to support Keras 3 (the initial support is done) such that multiple backends, including pytorch, could be used. You may want to check in on the refactoring discussion on PR #1101 |
That looks helpful, thanks. I'll update to reflect the discussions on that thread. Looks like the keras-pytorch version might supersede this work, although hopefully this can still be useful for those more familiar with torch than keras |
I think there's still merit in this; I just don't have a good sense of
where things will settle to know what the right way to support the various
options will be.
…On Sat, Mar 23, 2024 at 9:37 AM jh83775 ***@***.***> wrote:
Thanks for this. There is currently work underway to support Keras 3 (the
initial support is done) such that multiple backends, including pytorch,
could be used. You may want to check in on the refactoring discussion on PR
#1101 <#1101>
That looks helpful, thanks. I'll update to reflect the discussions on that
thread. Looks like the keras-pytorch version might supersede this work,
although hopefully this can still be useful for those more familiar with
torch than keras
—
Reply to this email directly, view it on GitHub
<#1103 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3IUBIK675PBJBNBEXQC53YZWATDAVCNFSM6AAAAABFEUQLBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWGQ4TOOBQHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
So as far as I can tell this will just sit alongside the existing Keras3 implementation (which should now support a pytorch backed) just fine as an extra pytorch specific ParametricUMAP. I would be happy to just merge this if you think it has sufficient value as-is for a standalone pytorch speciifc implementation. |
I'm also glad to see this implementation. Thanks! I do think the overlap in functionality between this and the keras implementation is pretty high. This code is missing some key functionality (it's been a few days since I read it). One of the benefits of parametric UMAP is it allows you to balance the global MDS loss and auto encoder loss, which iirc are not in this implementation. One thing that the keras code is missing that this code has is the data iterator. Currently to run the keras code, even if you use torch as a backend you still rely on on the tensorflow dataset, meaning you need to have keras, tensorflow, and torch installed. If we pull the iterator from this code into the keras implementation we will overcome that issue (this is something I've been meaning to do but haven't had time yet) |
@turian I think this is out of the scope of the umap python library, but I give an example of how to build your own graph and iterator here: https://colab.research.google.com/drive/1WkXVZ5pnMrm17m0YgmtoNjM_XHdnE5Vp?usp=sharing You could switch out the graph and the interator for one that supports a graph larger than memory and an iterator that grabs from that graph. |
Hi All,
I've popped in a pull request with a version of ParametricUMAP written in PyTorch. It mostly follows the discussions in the threads below:
I'm not quite sure where it best fits in the codebase - currently I've just put it in umap/torch.py so you can import with
from umap.torch import ParametricUMAP
, which feels intuitive. It should be fairly easy to edit and include a nonparametric UMAP as well, via thetorch.nn.Embedding
class, but I can put in a fresh pull request for that when I find the time. There could also be a discussion around what parameters are accessible to the user while not bloating the number of optional args. For example, in my implementation the user can manually change the batch size and learning rate, but not the optimizer or negative sample rate. What is included in the optional args is relatively arbitrary based on what I tended to be changing the most for my use case. It will be relatively easy to expose any currently hardcoded parameter choices to the user though.I've also added a few examples in a Jupyter notebook (see
notebooks/Parametric_UMAP/08.0-torch-parametric-umap.ipynb
) to give an indication of how to get started.If any PyTorch experts want to take a look, I'll gladly add in any optimizations (I'm sure there will be a few things here and there that can be changed and improve the runtime).
I've tested it with pytorch==1.12.1 & CUDA==11.6. I think it should work with other versions too, but haven't tested more widely.