-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with collation-related index corruption #14860
Deal with collation-related index corruption #14860
Conversation
f9ab8d3
to
2da9b04
Compare
The script should be finished, although it may use some testing and maybe some optimization. The documentation has been started in mastodon/documentation#811 |
9614753
to
928adc1
Compare
928adc1
to
37b6caa
Compare
Changed the PR to leave the default database configuration as-is, because the Similarly removed the recommendation to re-create the database with the I have left the warning in the post-migration hook, since I think it is important to draw attention to the issue, but I am not sure it's a good idea since it would make it noisy for little value beyond the first time it is run. |
22f8198
to
c6bfaec
Compare
c6bfaec
to
80571a7
Compare
ping @Gargron |
@@ -48,6 +48,17 @@ namespace :db do | |||
end | |||
end | |||
|
|||
task :post_migration_hook do | |||
at_exit do | |||
unless %w(C POSIX).include?(ActiveRecord::Base.connection.execute('SELECT datcollate FROM pg_database WHERE datname = current_database();').first['datcollate']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so this would throw this warning on my server every time I run db:migrate, even though I have not updated glibc after indexing. This will probably scare a lot of people. We need to come up with something that only warns you if you actually have an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I think the only reliable way to check that there is actually an issue is to use amcheck
. However, there are two issues with that:
- it will impact performances for a few seconds to a few minutes
- it requires superuser access to the database, which Mastodon typically doesn't have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove that altogether and instead just rely on the documentation and release notes… maybe catch some specific migration errors that are likely to be caused by this… but I'm afraid those solutions would not be enough to convince people to look into silent corruptions…
@prompt.warn 'Please make sure to stop Mastodon and have a backup.' | ||
exit(1) unless @prompt.yes?('Continue?') | ||
|
||
deduplicate_accounts! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of our CLI-related classes are like this, but here I'm starting to wish we could extract things into their own classes and then just call them, e.g. AccountDeduplicator
, CustomEmojiDeduplicator
, etc. Don't think I'm going to ask you to do it right now though. Bonus of that would be each deduplicator class maintaining its own model stubs so you know exactly what needs what.
This tool goes through the database to detect and fix duplicates. This operation is very slow and may cause data loss (of data that would be inaccessible without intervention because of the existing index corruptions). It tries its best to make sensible decisions, and asks the user in some cases.
Avoids followers hash cache being incorrect, among other things
80571a7
to
a94e394
Compare
Starting up my instance now displays the warning of unsafe database storage, or whatever the wording is. It instructs me to go to the following link, which leads to a 404 error https://docs.joinmastodon.org/admin/troubleshooting/index-corruption/#am-i-affected What is the proper command to run (in docker) to fix the issue. My server no longer loads, throwing up 502 errors. |
That's very quick. Well. The warning is just a warning, and I did say to @ThibG it should not be too scary. The warning doesn't do anything, it definitely isn't the reason why you're getting 502 errors, you probably didn't restart everything (in any case there is no need to guess just look at what the logs say) |
Is this error related? docker-compose run --rm web bin/tootctl feeds build |
This would be the reason your server doesn't start and it looks like actual filesystem-level corruption (unrelated to index corruption and the warning generated by this PR). Judging by:
Something is wrong with your disks -- maybe full, maybe broken, it doesn't sound good. |
I just want to make sure I don’t break anything - how would I run to command to fix the database (command found in the link) while using Docker to run my instance? - thanks! |
* Add tootctl maintenance fix-duplicates This tool goes through the database to detect and fix duplicates. This operation is very slow and may cause data loss (of data that would be inaccessible without intervention because of the existing index corruptions). It tries its best to make sensible decisions, and asks the user in some cases. * Add warning message in db:migrate hook * Clear Rails cache after being done with database deduplication Avoids followers hash cache being incorrect, among other things
* Fix 2FA/sign-in token sessions being valid after password change (#14802) If someone tries logging in to an account and is prompted for a 2FA code or sign-in token, even if the account's password or e-mail is updated in the meantime, the session will show the prompt and allow the login process to complete with a valid 2FA code or sign-in token * Fix Move handler not being triggered when failing to fetch target (#15107) When failing to fetch the target account, the ProcessingWorker fails as expected, but since it hasn't cleared the `move_in_progress` flag, the next attempt at processing skips the `Move` activity altogether. This commit changes it to clear the flag when encountering any unexpected error on fetching the target account. This is likely to occur because, of, e.g., a timeout, when many instances query the same actor at the same time. * Fix slow distinct queries where grouped queries are faster (#15287) About 2x speed-up on inboxes query * Fix possible inconsistencies in tag search (#14906) Do not downcase the queried tag before passing it to postgres when searching: - tags are not downcased on creation - `arel_table[:name].lower.matches(pattern)` generates an ILIKE anyway - if Postgres and Rails happen to use different case-folding rules, downcasing before query but not before insertion may mean that some tags with some casings are not searchable * Fix updating account counters when account_stat is not yet created (#15108) * Fix account processing failing because of large collections (#15027) Fixes #15025 * Fix downloading remote media files when server returns empty filename (#14867) Fixes #14817 * Fix webfinger redirect handling in ResolveAccountService (#15187) * Fix webfinger redirect handling in ResolveAccountService ResolveAccountService#process_webfinger! handled a one-step webfinger redirection, but only accepting the result if it matched the exact URI passed as input, defeating the point of a redirection check. Instead, use the same logic as in `ActivityPub::FetchRemoteAccountService`, updating the resulting `acct:` URI with the result of the first webfinger query. * Add tests * Remove dependency on unused and unmaintained http_parser.rb gem (#14574) It seems that years ago, the “http” gem dependend on the “http_parser.rb” gem (it now depends on the “http-parser” gem), and, still years ago, we pulled it from git in order to benefit from a bugfix that wasn't released yet (#7467). * Add tootctl maintenance fix-duplicates (#14860, #15201, #15264, #15349, #15359) * Fix old migration script not being able to run if it fails midway (#15361) * Fix old migration script not being able to run if it fails midway Improve the robustness of a migration script likely to fail because of database corruption so it can run again once database corruptions are fixed. * Display a specific error message in case of index corruption Co-authored-by: Eugen Rochko <[email protected]> Co-authored-by: Claire <[email protected]> Co-authored-by: Eugen Rochko <[email protected]> Co-authored-by: Claire <[email protected]>
* Fix 2FA/sign-in token sessions being valid after password change (mastodon#14802) If someone tries logging in to an account and is prompted for a 2FA code or sign-in token, even if the account's password or e-mail is updated in the meantime, the session will show the prompt and allow the login process to complete with a valid 2FA code or sign-in token * Fix Move handler not being triggered when failing to fetch target (mastodon#15107) When failing to fetch the target account, the ProcessingWorker fails as expected, but since it hasn't cleared the `move_in_progress` flag, the next attempt at processing skips the `Move` activity altogether. This commit changes it to clear the flag when encountering any unexpected error on fetching the target account. This is likely to occur because, of, e.g., a timeout, when many instances query the same actor at the same time. * Fix slow distinct queries where grouped queries are faster (mastodon#15287) About 2x speed-up on inboxes query * Fix possible inconsistencies in tag search (mastodon#14906) Do not downcase the queried tag before passing it to postgres when searching: - tags are not downcased on creation - `arel_table[:name].lower.matches(pattern)` generates an ILIKE anyway - if Postgres and Rails happen to use different case-folding rules, downcasing before query but not before insertion may mean that some tags with some casings are not searchable * Fix updating account counters when account_stat is not yet created (mastodon#15108) * Fix account processing failing because of large collections (mastodon#15027) Fixes mastodon#15025 * Fix downloading remote media files when server returns empty filename (mastodon#14867) Fixes mastodon#14817 * Fix webfinger redirect handling in ResolveAccountService (mastodon#15187) * Fix webfinger redirect handling in ResolveAccountService ResolveAccountService#process_webfinger! handled a one-step webfinger redirection, but only accepting the result if it matched the exact URI passed as input, defeating the point of a redirection check. Instead, use the same logic as in `ActivityPub::FetchRemoteAccountService`, updating the resulting `acct:` URI with the result of the first webfinger query. * Add tests * Remove dependency on unused and unmaintained http_parser.rb gem (mastodon#14574) It seems that years ago, the “http” gem dependend on the “http_parser.rb” gem (it now depends on the “http-parser” gem), and, still years ago, we pulled it from git in order to benefit from a bugfix that wasn't released yet (mastodon#7467). * Add tootctl maintenance fix-duplicates (mastodon#14860, mastodon#15201, mastodon#15264, mastodon#15349, mastodon#15359) * Fix old migration script not being able to run if it fails midway (mastodon#15361) * Fix old migration script not being able to run if it fails midway Improve the robustness of a migration script likely to fail because of database corruption so it can run again once database corruptions are fixed. * Display a specific error message in case of index corruption Co-authored-by: Eugen Rochko <[email protected]> Co-authored-by: Claire <[email protected]> Co-authored-by: Eugen Rochko <[email protected]> Co-authored-by: Claire <[email protected]>
A number of Mastodon instances have corrupted database indexes because of changing collation rules.
Such issues are most of the time discovered when trying to run some migrations (e.g., #14443) but can remain unnoticed for a long time, and cause issues that can be difficult to diagnose and repair.
To finally deal with that, we should:
C
collation rules, which do not get updated and do not suffer from those issuesdb:migrate
to display a warning if the database could be affected by the issue