-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backport] fix: Prevent error page recursion (#35209) #35259
[Backport] fix: Prevent error page recursion (#35209) #35259
Conversation
We sometimes see rendering errors in the error page itself, which then cause another attempt at rendering the error page. I'm not sure _exactly_ how the loop is occurring, but it looks something like this: 1. An error is raised in a view or middleware and is not caught by application code 2. Django catches the error and calls the registered uncaught error handler 3. Our handler tries to render an error page 4. The rendering code raises an error 5. GOTO 2 (until some sort of server limit is reached) By catching all errors raised during error-page render and substituting in a hardcoded string, we can reduce server resources, avoid logging massive sequences of recursive stack traces, and still give the user *some* indication that yes, there was a problem. This should help address openedx#35151 At least one of these rendering errors is known to be due to a translation error. There's a separate issue for restoring translation quality so that we avoid those issues in the future (openedx/openedx-translations#549) but in general we should catch all rendering errors, including unknown ones. Testing: - In `lms/envs/devstack.py` change `DEBUG` to `False` to ensure that the usual error page is displayed (rather than the debug error page). - Add line `1/0` to the top of the `student_dashboard` function in `common/djangoapps/student/views/dashboard.py` to make that view error. - In `lms/templates/static_templates/server-error.html` replace `static.get_platform_name()` with `None * 7` to make the error template itself produce an error. - Visit <http://localhost:18000/dashboard>. Without the fix, the response takes 10 seconds and produces a 6 MB, 85k line set of stack traces and the page displays "A server error occurred. Please contact the administrator." With the fix, the response takes less than a second and produces three stack traces (one of which contains the error page's rendering error).
Thanks for the pull request, @cmltaWt0! What's next?Please work through the following steps to get your changes ready for engineering review: 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. 🔘 Let us know that your PR is ready for review:Who will review my changes?This repository is currently maintained by Where can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:
When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
3d50dd8
into
openedx:open-release/redwood.master
@cmltaWt0 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
@cmltaWt0 thank you! I think this error has been going on for at least 5 years without an easy way to reproduce or fix. Thanks for fixing it! |
This is a backport of #35209
Fix details
We sometimes see rendering errors in the error page itself, which then cause another attempt at rendering the error page. I'm not sure exactly how the loop is occurring, but it looks something like this:
By catching all errors raised during error-page render and substituting in a hardcoded string, we can reduce server resources, avoid logging massive sequences of recursive stack traces, and still give the user some indication that yes, there was a problem.
This should help address #35151
At least one of these rendering errors is known to be due to a translation error. There's a separate issue for restoring translation quality so that we avoid those issues in the future (openedx/openedx-translations#549) but in general we should catch all rendering errors, including unknown ones.
Testing:
lms/envs/devstack.py
changeDEBUG
toFalse
to ensure that the usual error page is displayed (rather than the debug error page).1/0
to the top of thestudent_dashboard
function incommon/djangoapps/student/views/dashboard.py
to make that view error.lms/templates/static_templates/server-error.html
replacestatic.get_platform_name()
withNone * 7
to make the error template itself produce an error.Without the fix, the response takes 10 seconds and produces a 6 MB, 85k line set of stack traces and the page displays "A server error occurred. Please contact the administrator."
With the fix, the response takes less than a second and produces three stack traces (one of which contains the error page's rendering error).
Description
Describe what this pull request changes, and why. Include implications for people using this change.
Design decisions and their rationales should be documented in the repo (docstring / ADR), per
OEP-19, and can be
linked here.
Useful information to include:
"Developer", and "Operator".
changes.
Supporting information
Link to other information about the change, such as Jira issues, GitHub issues, or Discourse discussions.
Be sure to check they are publicly readable, or if not, repeat the information here.
Testing instructions
Please provide detailed step-by-step instructions for testing this change.
Deadline
"None" if there's no rush, or provide a specific date or event (and reason) if there is one.
Other information
Include anything else that will help reviewers and consumers understand the change.