Proxito: improving our 404 handler #10061

stsewd · 2023-02-22T15:50:42Z

Currently, our 404 handling is done via nginx, this is: If there is a 404 in our doc serving logic, nginx will forward the request to our application again via the _proxito_404_ URL.

readthedocs.org/readthedocs/proxito/urls.py

Lines 121 to 125 in 870e664

    
           re_path( 
        
               r'^_proxito_404_(?P<proxito_path>.*)$', 
        
               ServeError404.as_view(), 
        
               name='proxito_404_handler', 
        
           ),

This means that the proxito middleware will be run twice! And so will our "unresolver" logic (old or new implementation).

Why are we doing this? This probably was done for our internal nginx redirect that serves the documentation from S3, so we won't have to check if the file exists, we just serve it directly, and if it isn't found, then we execute our 404 logic. Why are we doing this for non-internal redirects? Probably to have our logic in just one place.

Instead of doing that, I propose:

Handling everything in the current request, not more double trip from nginx to our application. This is by just handling the 404 for non-internal redirects requests directly in our code, and before serving files via the internal redirect, we check if the file exists, this means doing a head request to storage

https://github.com/jschneier/django-storages/blob/6b7c78fe8ab57b4f27770db203d106da16bf35ea/storages/backends/s3boto3.py#L460-L464

This could increase our load a little, but it will decrease the load of 404 pages a LOT, since we won't be executing our logic twice for the same request. This leaves one spot to cover, what happens if the file is deleted from storage after we have checked that it exists? Well, I'm fine with just serving a plain 404 for this case, it shouldn't be that common.

This together with the new proxito implementation will allow us to serve more contextualized 404s, like per subproject/translation/etc.

The text was updated successfully, but these errors were encountered:

benjaoming · 2023-02-22T16:04:16Z

~~Blocked by #9657~~

Edit: Sorry about the blunt messsage, was going out the door :) Would like to sync up on this!

Edit 2: Let's clarify that since #9657 is ready to be merged very soon, it can either just be merged into the v2 refactor or adapted such that it doesn't complicate it. It's a choice we'll make on call today, feb 28.

humitos · 2023-02-27T10:41:02Z

This is by just handling the 404 for non-internal redirects requests directly in our code, and before serving files via the internal redirect, we check if the file exists, this means doing a head request to storage
This could increase our load a little, but it will decrease the load of 404 pages a LOT

I'm not 100% convinced of this change. We have a lot more 200 requests than 404 and API calls are a lot slower than Python logic + DB queries. Also, API calls to the storage are paid 😄 . At first sight, this does not look like a good change, in my opinion.

Are there other strong benefits for this change?

stsewd · 2023-02-27T15:29:26Z

If we are not convinced about checking if the file exists, we can still implement this for normal 404s to avoid hitting proxito twice.

And we are behind CF, so 200 will be cached.

humitos · 2023-02-27T16:20:06Z

@stsewd

we can still implement this for normal 404s to avoid hitting proxito twice.

How is that? Can we expand on this? I'm not sure to follow what you refer to "this" in this sentence.

stsewd · 2023-02-27T16:25:59Z

@humitos we are using the custom nginx 404 handler for both, normal 404 and 404 after hitting storage

readthedocs.org/dockerfiles/nginx/proxito.conf.template

Line 32 in d641add

error_page 404 = @notfoundfallback;

readthedocs.org/dockerfiles/nginx/proxito.conf.template

Line 49 in d641add

error_page 404 = @notfoundfallback;

we can get rid of the one used for normal 404, and keep the one from the internal redirect. I still think it's worth simplifying both.

humitos · 2023-02-28T11:21:50Z

Gotcha! I understand what you are saying now. Thanks!

If we return a 404 the first time the request is managed by El Proxito, we already know it's going to be a 404. However, it will be managed by our @notfoundfallback, which will hit El Proxito again. This view will serve a custom page or the standard maze. At this point, we hit El Proxito twice, when we could have hit it only once because the first time we already knew what to do.

I understand that we will be gaining some extra milliseconds here. The median full response is 23ms for this view (see https://onenr.io/0ERzYEk18wr), so it will be less than that and it will apply only to 404 pages. I'm not seeing too much benefit here. (note that this may be also reduced with #6321)

On the other hand, it will make us to have different workflows for "404 that happens at the storage" than for "404 that happens at the Python code", making following and understanding the code a little more complex.

I'm still not convinced about this change due that I'm not seeing too much benefit for doing this work at this time.

stsewd · 2023-02-28T14:58:48Z

You need to take into consideration the 404 view handler, https://onenr.io/0OQMEq5ZxQG, that's 88.5ms, that's more than 100ms for a full 404 response.

On the other hand, it will make us to have different workflows for "404 that happens at the storage" than for "404 that happens at the Python code", making following and understanding the code a little more complex.

For me, this would simplify things, as we won't need to extract information like project/version/file in two places.

humitos · 2023-02-28T15:46:21Z

You need to take into consideration the 404 view handler, onenr.io/0OQMEq5ZxQG, that's 88.5ms, that's more than 100ms for a full 404 response.

Right, but those 88.5ms include all the checks to the S3 and other stuffs --not only the call to the unresolver. The second call to "unresolve" is the only time we will be reducing with this refactor.

This is an initial usage of the new proxito implementation to serve files. It's under a feature flag, so it can be enabled per-project. - We still need to adapt our 404 handler to make use of this new implementation (it could be simplified with #10061) - We are still using our re paths to match the URLs, but we don't make use of the captured parameters, we use the unresolver for that. This also means that things like #2292 won't be solved till we do the whole migration. - There is a lot of repetition from the original get method, but some of it could be simplified with #10065. Tests will pass once we have merged one private PR that we have pending. > **Note** > We don't support custom urlconfs yet, so we shouldn't enable it for those projects yet.

stsewd added the Needed: design decision A core team decision is required label Feb 22, 2023

stsewd mentioned this issue Feb 23, 2023

Proxito V2 #10044

Merged

humitos mentioned this issue Feb 27, 2023

Contextualize 404 page #9657

Merged

8 tasks

stsewd mentioned this issue Jul 5, 2023

Proxito 404: check for files in the DB instead of storage #10512

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxito: improving our 404 handler #10061

Proxito: improving our 404 handler #10061

stsewd commented Feb 22, 2023

benjaoming commented Feb 22, 2023 •

edited

Loading

humitos commented Feb 27, 2023

stsewd commented Feb 27, 2023

humitos commented Feb 27, 2023

stsewd commented Feb 27, 2023

humitos commented Feb 28, 2023

stsewd commented Feb 28, 2023

humitos commented Feb 28, 2023

Proxito: improving our 404 handler #10061

Proxito: improving our 404 handler #10061

Comments

stsewd commented Feb 22, 2023

benjaoming commented Feb 22, 2023 • edited Loading

humitos commented Feb 27, 2023

stsewd commented Feb 27, 2023

humitos commented Feb 27, 2023

stsewd commented Feb 27, 2023

humitos commented Feb 28, 2023

stsewd commented Feb 28, 2023

humitos commented Feb 28, 2023

benjaoming commented Feb 22, 2023 •

edited

Loading