-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxito: improving our 404 handler #10061
Comments
Edit: Sorry about the blunt messsage, was going out the door :) Would like to sync up on this! Edit 2: Let's clarify that since #9657 is ready to be merged very soon, it can either just be merged into the v2 refactor or adapted such that it doesn't complicate it. It's a choice we'll make on call today, feb 28. |
I'm not 100% convinced of this change. We have a lot more 200 requests than 404 and API calls are a lot slower than Python logic + DB queries. Also, API calls to the storage are paid 😄 . At first sight, this does not look like a good change, in my opinion. Are there other strong benefits for this change? |
If we are not convinced about checking if the file exists, we can still implement this for normal 404s to avoid hitting proxito twice. And we are behind CF, so 200 will be cached. |
How is that? Can we expand on this? I'm not sure to follow what you refer to "this" in this sentence. |
@humitos we are using the custom nginx 404 handler for both, normal 404 and 404 after hitting storage
we can get rid of the one used for normal 404, and keep the one from the internal redirect. I still think it's worth simplifying both. |
Gotcha! I understand what you are saying now. Thanks! If we return a 404 the first time the request is managed by El Proxito, we already know it's going to be a 404. However, it will be managed by our I understand that we will be gaining some extra milliseconds here. The median full response is 23ms for this view (see https://onenr.io/0ERzYEk18wr), so it will be less than that and it will apply only to 404 pages. I'm not seeing too much benefit here. (note that this may be also reduced with #6321) On the other hand, it will make us to have different workflows for "404 that happens at the storage" than for "404 that happens at the Python code", making following and understanding the code a little more complex. I'm still not convinced about this change due that I'm not seeing too much benefit for doing this work at this time. |
You need to take into consideration the 404 view handler, https://onenr.io/0OQMEq5ZxQG, that's 88.5ms, that's more than 100ms for a full 404 response.
For me, this would simplify things, as we won't need to extract information like project/version/file in two places. |
Right, but those 88.5ms include all the checks to the S3 and other stuffs --not only the call to the unresolver. The second call to "unresolve" is the only time we will be reducing with this refactor. |
This is an initial usage of the new proxito implementation to serve files. It's under a feature flag, so it can be enabled per-project. - We still need to adapt our 404 handler to make use of this new implementation (it could be simplified with #10061) - We are still using our re paths to match the URLs, but we don't make use of the captured parameters, we use the unresolver for that. This also means that things like #2292 won't be solved till we do the whole migration. - There is a lot of repetition from the original get method, but some of it could be simplified with #10065. Tests will pass once we have merged one private PR that we have pending. > **Note** > We don't support custom urlconfs yet, so we shouldn't enable it for those projects yet.
Currently, our 404 handling is done via nginx, this is: If there is a 404 in our doc serving logic, nginx will forward the request to our application again via the
_proxito_404_
URL.readthedocs.org/readthedocs/proxito/urls.py
Lines 121 to 125 in 870e664
This means that the proxito middleware will be run twice! And so will our "unresolver" logic (old or new implementation).
Why are we doing this? This probably was done for our internal nginx redirect that serves the documentation from S3, so we won't have to check if the file exists, we just serve it directly, and if it isn't found, then we execute our 404 logic. Why are we doing this for non-internal redirects? Probably to have our logic in just one place.
Instead of doing that, I propose:
Handling everything in the current request, not more double trip from nginx to our application. This is by just handling the 404 for non-internal redirects requests directly in our code, and before serving files via the internal redirect, we check if the file exists, this means doing a head request to storage
https://github.com/jschneier/django-storages/blob/6b7c78fe8ab57b4f27770db203d106da16bf35ea/storages/backends/s3boto3.py#L460-L464
This could increase our load a little, but it will decrease the load of 404 pages a LOT, since we won't be executing our logic twice for the same request. This leaves one spot to cover, what happens if the file is deleted from storage after we have checked that it exists? Well, I'm fine with just serving a plain 404 for this case, it shouldn't be that common.
This together with the new proxito implementation will allow us to serve more contextualized 404s, like per subproject/translation/etc.
The text was updated successfully, but these errors were encountered: