Skip to content
This repository has been archived by the owner on Aug 26, 2024. It is now read-only.

Installing python-Levenshtein as suggested by the warnings gives different results. #318

Open
JeremyThiesen opened this issue Jul 22, 2021 · 1 comment

Comments

@JeremyThiesen
Copy link

JeremyThiesen commented Jul 22, 2021

I was running this code:

from fuzzywuzzy import fuzz
partial_ratio = fuzz.partial_ratio('more than fifty', 'i know that because a lion run fifty mile per hour and a cheetah run about eighty mile per hour and sixty-five be more than fifty and be slow than eighty')
print (partial_ratio)

At fuzzywuzzy version 0.18.0, it gives the answer of 100. It also gives the following user warning.

UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

Installing python-Levenshtein at version 0.12.2, then gives the result answer of 87 for the preceeding code block, which is incorrect since there is an exact match.

@maxbachmann
Copy link

This issue has already been reported: #79
The implementation in python-Levenshtein provides incorrect results in some cases. So you can:

  1. use the slower difflib based version (and possibly suppress the warning)
  2. use the python-Levenshtein version which can provide incorrect results for any ratio which uses partial_ratio
  3. use RapidFuzz (I am the author) which provides a fast implementation providing similar results to the difflib based implementation

It would be possible to fix this behavior for fuzzywuzzy/python-Levenshtein. However since both projects are not really maintained anymore it is unclear if/when this will be fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants