You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 26, 2024. It is now read-only.
Referring to the description of token_set_ratio in the original blog post: if the SORTED_INTERSECTION is a strict subset of STRING2, the result ratio will be 100. E.g.,
fuzz.token_set_ratio("Deep Learning", "Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2")
yields 100. This is patently incorrect, and does not uphold the purported intuition ("because the SORTED_INTERSECTION component is always exactly the same, the scores increase when (a) that makes up a larger percentage of the full string, and (b) the string remainders are more similar").
Looking at fuzz._token_set, we see that it returns
Referring to the description of
token_set_ratio
in the original blog post: if theSORTED_INTERSECTION
is a strict subset ofSTRING2
, the result ratio will be 100. E.g.,yields 100. This is patently incorrect, and does not uphold the purported intuition ("because the
SORTED_INTERSECTION
component is always exactly the same, the scores increase when (a) that makes up a larger percentage of the full string, and (b) the string remainders are more similar").Looking at
fuzz._token_set
, we see that it returnsIt appears the assumption is that the string remainder will never be empty. Perhaps something like this is more appropriate:
The text was updated successfully, but these errors were encountered: