-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxito 404: check for files in the DB instead of storage #10512
Comments
I understand that we don't need files like If I'm correct here, I think it's a great idea what you are proposing. By the way, we are getting rid of
This also looks like a good idea to me. It simplifies the code and make everything more standardized, in my opinion. |
yeah, we don't need to know if those files exist, it we are in the 404 handler, it means it wasn't found in storage. |
We talked about this in a call, and we're 👍 on moving forward. Big wins:
|
So, now one question, what do we want to do first?
|
@stsewd I think the DB call work seems useful to start with, assuming the size of the table doesn't make it slow :) |
We have all the information we need in the DB already. Closes #10512
We have all the information we need in the DB already. Closes #10512
We have all the information we need in the DB already. Closes #10512
What's the problem this feature will solve?
Our 404 handler is slooooow, this is mainly because we hit our storage to check if a file exists,
in the worst case, we will do 6 calls to storage.
Describe the solution you'd like
Instead of relying on calls to storage to check if a given file exists, we can use our DB for that. How's that? Well, as part of our build process, we create an HTMLFile object for each HTML file found in the user docs.
readthedocs.org/readthedocs/projects/tasks/search.py
Lines 138 to 147 in f7d390c
So we already keep track of every HTML file. This table is BIG, so we may want to first try to trim it a little, I don't think we need to keep track of every HTML file, we just need to track index.html/README.html and 404.html files.
What about search?
We rely heavily on the HTMLFile model for search, but I think we can do just fine without it. When indexing, we can just walk the storage as we do when creating the HTML files
readthedocs.org/readthedocs/projects/tasks/search.py
Lines 107 to 110 in f7d390c
And we can get the search ignore/ranking patterns from the config of the build object attached to the version.
readthedocs.org/readthedocs/builds/models.py
Lines 310 to 327 in 84f889a
If we want, we can also skip that work and just hit that table as is right now.
Additional context
A little related to #10061, where I talk about not hitting our application twice for 404s that aren't from the storage.
The text was updated successfully, but these errors were encountered: