utils.full_process executed when processor=None #319

sdennler · 2021-08-01T09:14:51Z

Great and very helpful tool! Thank you!

One thing I noticed is that even when process.extractOne (and others) have processor set to None, utils.full_process is still executed several times. Probably because of

fuzzywuzzy/fuzzywuzzy/process.py

Line 100 in 8895162

pre_processor = partial(utils.full_process, force_ascii=True)

This generates two times the same output:

from fuzzywuzzy import process

query = "123   ....  "
choices = ["123", query]

print(process.extract(query, choices))
print(process.extract(query, choices, processor=None))

Output:

[('123', 100), ('123   ....  ', 100)]
[('123', 100), ('123   ....  ', 100)]

Expected would be that without a processor the 1:1 match is better. So some thing like this:

[('123', 100), ('123   ....  ', 100)]
[('123   ....  ', 100), ('123', 90)]

The text was updated successfully, but these errors were encountered:

maxbachmann · 2021-08-04T07:46:56Z

In Fuzzywuzzy the processor argument only allows the usage of additional preprocessing. However, it does not provide a way to disable the preprocessing inside the scorer. So when calling

process.extract(query, choices, processor=None)

The string is still preprocessed, since the default scorer fuzz.WRatio preprocesses strings by default. To disable this you would have to use:

process.extract(query, choices, processor=None, scorer=partial(fuzz.WRatio, full_process=False))

I agree that this is very counter-intuitive, which is why I use the behavior you expected in RapidFuzz.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utils.full_process executed when processor=None #319

utils.full_process executed when processor=None #319

sdennler commented Aug 1, 2021

maxbachmann commented Aug 4, 2021

utils.full_process executed when processor=None #319

utils.full_process executed when processor=None #319

Comments

sdennler commented Aug 1, 2021

maxbachmann commented Aug 4, 2021