Erroneous "Average Word Size Too High" error #2335

cryptaliagy · 2022-11-10T14:24:46Z

I was trying to use cSpell while writing a markdown file, but cSpell returns an error saying "Average Word Size is Too High". The file has mostly regular text (currently ~600 words according to cSpell's definition of a word), with a few links. Originally, I thought that might be the problem (since the longest links could somehow be skewing the average), but removing a handful of them did not solve the problem (though it turns out that it was because I had to restart VSCode for the setting to take place).

I wrote this little script to check what the average word size is to make sure it wasn't too high

import re
import argparse
import sys
import io
import typing


class ParsedArgs(typing.Protocol):
    file: io.TextIOWrapper
    regex: str


def make_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "file",
        type=argparse.FileType("r"),
        help="input file",
    )
    parser.add_argument(
        "-r",
        "--regex",
        default=r"[\s,{}[\]]",
    )
    return parser


def main(args: list[str]):
    parser = make_parser()
    parsed_args: ParsedArgs = parser.parse_args(args)

    data = parsed_args.file.read()

    words = [word for word in re.split(parsed_args.regex, data) if word]

    print(
        "Average word size:",
        sum(len(word) for word in words) / len(words),
    )
    print(
        "Total word count:",
        len(words),
    )

    parsed_args.file.close()


if __name__ == "__main__":
    main(sys.argv[1:])

Which produced the output:

Average word size:  7.871794871794871
Total word count:  585

Which is very much so under the average block requirement. However, I noticed that when I clicked on the "More Info..." button, it linked me to the cSpell.blockCheckingWhenTextChunkSizeGreaterThan block, so I added a check for max word size and found that one of the links was being recognized as being a word of length 631.

Updating cSpell.blockCheckingWhenTextChunkSizeGreaterThan to 700 solved my problem, but took me a lot longer than it needed to since the error message sent me down the wrong path.

I'd also recommend raising that setting in general to accommodate long URLs. The general guidance I've seen in the past (such as in this post) is that URLs should be below ~2000 characters, so maybe raising it to even half that as a default could be useful?

The text was updated successfully, but these errors were encountered:

fix: #2335

Jason3S · 2022-11-10T17:44:38Z

@taliamax,

Thank you for investigating the issue. I'm sorry that a copy/paste error took so much of your time.

github-actions · 2022-12-11T05:33:11Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Jason3S added a commit that referenced this issue Nov 10, 2022

fix: Correct error message for Maximum Word Length Exceeded

37da14e

fix: #2335

Jason3S mentioned this issue Nov 10, 2022

fix: Correct error message for Maximum Word Length Exceeded #2336

Merged

Jason3S closed this as completed in #2336 Nov 10, 2022

Jason3S added a commit that referenced this issue Nov 10, 2022

fix: Correct error message for Maximum Word Length Exceeded (#2336)

d4ce7fc

fix: #2335

github-actions bot mentioned this issue Nov 10, 2022

chore(main): release code-spell-checker 2.11.1 #2331

Merged

github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Erroneous "Average Word Size Too High" error #2335

Erroneous "Average Word Size Too High" error #2335

cryptaliagy commented Nov 10, 2022 •

edited

Loading

Jason3S commented Nov 10, 2022

github-actions bot commented Dec 11, 2022

Erroneous "Average Word Size Too High" error #2335

Erroneous "Average Word Size Too High" error #2335

Comments

cryptaliagy commented Nov 10, 2022 • edited Loading

Jason3S commented Nov 10, 2022

github-actions bot commented Dec 11, 2022

cryptaliagy commented Nov 10, 2022 •

edited

Loading