-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: Redesign underlying storage #39
base: main
Are you sure you want to change the base?
Conversation
I would refer to FluxML/Flux.jl#1448 and the related issues and PRs. In particular, the storage change here is reverting back to Flux’s original implementation. This PR is a bit more sophisticated about its indexing, so it may not have the same issues that the original implementation. But this PR will need a bunch of performance tests when you’re ready. The linked PR above has a good selection of the important ones. I would add (These are just preliminary comments; will need more time to dig through the actual code changes) |
Can you explain more the motivation? As in:
Any such modification is going to add complexity, which is a cost not necessarily for users, but for anyone maintaining the package in future. |
My motivation is primarily the setting of arbitrary axes as "one hot". |
I have commented on #36 to follow up that work. Unfortunately, I think this PR is going to be a non-starter since primary motivation for the current storage is having a contiguous array of indices for GPU performance. Fundamentally, I don't think this PR will be able to meet that performance goal. I'm open to being shown that I'm wrong though. I think if we are going to support an arbitrary axis in the package, then it is better to follow #36's approach or add the axis to the struct as originally suggested in #35. A caveat being @mcabbott's point about complexity added to the library. If the only downstream operation that you care about is matrix multiplication, then it is easier to just use a permuted array in user code for that one function. Adding it to the package means it has to support not just matrix multiplication but every other present and future operation we define on |
Ah sorry, I mis-read, or forgot that this is what #36 does. I think that, in any approach to write a new more flexible array type, you will probably need to store which axis is one-hot in the type, to make indexing fast & use dispatch to call other operations. As you do here, e.g. onehotaxis. And once you do this, you will have much the same type-stability issues as in constructing a PermutedDimsArray. |
After working on #36 #35 and getting bogged down with the type inference, I'm trying the other route. I think it might also help with:
The concept:
Instead of a datatype holding the indices directly, keep an Array of
OneHotVector
s (essentially indices, but with type metadata) and some metadata of which axis is the OneHot one.This PR is a draft of the idea but is already functional for some things and matmul was fast in the limited tests I made.
Your input is appreciated :)
PR Checklist