Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous "Average Word Size Too High" error #2335

Closed
cryptaliagy opened this issue Nov 10, 2022 · 2 comments · Fixed by #2336 or #2331
Closed

Erroneous "Average Word Size Too High" error #2335

cryptaliagy opened this issue Nov 10, 2022 · 2 comments · Fixed by #2336 or #2331

Comments

@cryptaliagy
Copy link

cryptaliagy commented Nov 10, 2022

I was trying to use cSpell while writing a markdown file, but cSpell returns an error saying "Average Word Size is Too High". The file has mostly regular text (currently ~600 words according to cSpell's definition of a word), with a few links. Originally, I thought that might be the problem (since the longest links could somehow be skewing the average), but removing a handful of them did not solve the problem (though it turns out that it was because I had to restart VSCode for the setting to take place).

I wrote this little script to check what the average word size is to make sure it wasn't too high

import re
import argparse
import sys
import io
import typing


class ParsedArgs(typing.Protocol):
    file: io.TextIOWrapper
    regex: str


def make_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "file",
        type=argparse.FileType("r"),
        help="input file",
    )
    parser.add_argument(
        "-r",
        "--regex",
        default=r"[\s,{}[\]]",
    )
    return parser


def main(args: list[str]):
    parser = make_parser()
    parsed_args: ParsedArgs = parser.parse_args(args)

    data = parsed_args.file.read()

    words = [word for word in re.split(parsed_args.regex, data) if word]

    print(
        "Average word size:",
        sum(len(word) for word in words) / len(words),
    )
    print(
        "Total word count:",
        len(words),
    )

    parsed_args.file.close()


if __name__ == "__main__":
    main(sys.argv[1:])

Which produced the output:

Average word size:  7.871794871794871
Total word count:  585

Which is very much so under the average block requirement. However, I noticed that when I clicked on the "More Info..." button, it linked me to the cSpell.blockCheckingWhenTextChunkSizeGreaterThan block, so I added a check for max word size and found that one of the links was being recognized as being a word of length 631.

Updating cSpell.blockCheckingWhenTextChunkSizeGreaterThan to 700 solved my problem, but took me a lot longer than it needed to since the error message sent me down the wrong path.

I'd also recommend raising that setting in general to accommodate long URLs. The general guidance I've seen in the past (such as in this post) is that URLs should be below ~2000 characters, so maybe raising it to even half that as a default could be useful?

@Jason3S
Copy link
Collaborator

Jason3S commented Nov 10, 2022

@taliamax,

Thank you for investigating the issue. I'm sorry that a copy/paste error took so much of your time.

@github-actions
Copy link
Contributor

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants