Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(dict): Remove only corrections if a space could be inserted as well
The typo dictionary words.csv previously contained a bunch of problematic entries such as: abouta,about algorithmi,algorithm attachen,attach shouldbe,should anumber,number Which resulted in wrong automatic corrections if the following spaces (indicated by ␣) were accidentally missed: about␣a algorithm␣i developed attach␣en masse should␣be a␣number Many of these entries were introduced by taking entries from the codespell-dict and removing corrections containing spaces (since typos currently doesn't support them), e.g the codespell dictionary contains: abouta->about a, about, shouldbe->should, should be, This commit updates `tests/verify.rs` to automatically remove corrections in the form of `{correction}{common_word},{correction}` or `{common_word}{correction},{correction}`, where `{common_word}` is one of the 1000 most frequent English words (except if `{correction}` also ends/starts in `{common_word}`, since we still want to correct e.g. "extrememe" to "extreme"). The top-1000-most-frequent-words.csv file was generated by running: curl https://norvig.com/ngrams/count_1w.txt \ | head -n1024 \ | awk '{print $1;}' \ | grep -vE '^([^ia]|al|re)$' \ > top-1000-most-frequent-words.csv
- Loading branch information