-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion statements in attention implementation #264
Comments
hi @dnnspark, thanks for your message ! Replying to 1. first, can you walk me through the problem ? I may be a little tired but I don't see the issue right now ? dim_k is defined as dim_key // num_heads (ok, the choice of letters is probably not great), so it looks like we're talking about the same thing. |
ok, 2. now:
let me know if this helps, I can definitely follow up on this assert |
Hi @blefaudeux, thanks for checking! For 1, becasue the input of the In this case, For 2, I see; that makes a lot of sense for the self-attention. And agreed, it may need some adjustments for being used for more general cross attention use case. Thanks! |
aah yes I see your point now, yes it implicitly assumes the same dimension everywhere, that's bad. Can be fixed, I'm trying to get out of a CI quagmire and will submit a PR, or feel free to do that if you fancy it |
oh, let me just volley a PR right now for 1. and this will be fixed. one sec |
Hey @blefaudeux, with a second thought, I think you're right about this dimension issue. In my example above, there's a flaw:
It's actually not the same shape of query input: (4, 24, 300). So I think the dimensions of all inputs (query, key, value) has to be always same. In that case, the first assertion is actually correct, even though the name is a bit confusing (which makes your PR still legit). |
an afterthought on my side is that this assert is not at the right place anyway, unless the projection conserves dimensions (I thought that was partly your point in your explanation actually). We check the dimensions pre-projection, then project, then head split (which is where the dimension misfit would be visible), but one could imagine an initial misfit which is "fixed" by differentiated projections (not saying that this would be a good thing to do, but it would work I believe). I'll try to fix that in the PR |
it turns out that some of the checks were not correct, undue constraints, fixed with the attached PR |
@dnnspark I think that this is fixed with the PR which landed yesterday ? |
❓ Questions and Help
I'm trying to implement Perceiver using xformers, and stumbled upon two assertion statements.
The first one is this one: Doesn't this have to be
t.shape[2] % self.dim_head == 0
, to be consistent with the error message one line below?The second one is this one: why does the query projection have to preserve the dimension? I'm trying to implement a cross-attention scenario that query and key comes from different sources (So they are of differeint dimsensions) and the linear projections make sure they are of same dimensions (i.e.
N x D_{query} -> N x d
for query andM x D_{key} -> M x d
). However, the assertion above enforce the query projection preserve the dimension. What's the point of this assertion (btw, this assertion does not exist in the original torchtext implementation)? Or, what's the right way to implement this idea?The text was updated successfully, but these errors were encountered: