-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add convnext encoder, pytorch transformer decoder #162
Conversation
Thank you very much! |
In fact, I just re-check the formulas and correct them, along with some data augmentation methods. For example, removing the space control commands like \hspace and \vspace, because it is hard to measure the exact space. |
I'm wondering how this is even working without the proper positional embedding in the encoder (#130). |
The positional embedding is designed for VIT, CNN's feature extract method is not like Transformer (but they may have similar theory in some aspect). |
I know what you mean. But my understanding is that adding the positional information will stabilize the performance. |
Well, I agree that adding position information may do have some help for performance, but I have no time to test, it will be nice if you had time to do it. But according to this research, position information added on CNN can both help or hurt the performance. It is true that Transformer has a larger receptive field, but it doesn't means it's performance will be better. At least, ConvNext's (with convolution kernel size = 7) performance is better than VIT even than Swin Transformer according to their paper (link here). Actually, it is also controversial about the transformer's success, some researchers think it was the success of the global receptive field (or attention), while others think it was the architecture's success (the convnext's architecture was designed according to transformer) even the patches' (here is a paper talked about it). |
Thank you for the input and the papers. I will have a look. |
Great! I am also curious about it, but i just have no much time :) |
I got higher scores on the dataset which I built myself, but it seems no much improvement on the dataset you provided, so the main bottleneck may be the dataset itself. Anyway I decided to open the PR to help the project more stronger.