-
Notifications
You must be signed in to change notification settings - Fork 874
How to get fuzzy index? #293
Comments
I haven't thought about it deeply, but I think it's an ambiguous problem definition. You probably need a minimum threshold that you would consider a match before being able to implement. If you did that, a naive solution would be to just iterate the string and return the first index with a fuzzy-threshold matching your search key. There might be other more optimized solutions that would scale to longer strings. Hope this helps. |
@mridu-enigma I don't understand it. You want index of first token match or index where max similarity (density) is observed? Say you are searching for @acslater00 I haven't grokked the entire code, perhaps you could shed some light of correct behavior. |
fuzz.partial_ratio searches for the best alignment of the shorter string to the longer string. It does not matter which way you insert them in as long as they do not have a similar length (for similar lengths the results can differ)
Since you mention that you match a key against a phrase I assume that the key always has to be shorter than the phrase, so you might be able to implement this the following way: def your_scorer(s1, s2):
if len(s1) > len(s2):
return 0
return fuzz.partial_ratio(s1, s2) |
I am using fuzzywuzzy to look for key-phrase like terms in corpuses.
FWIW, when there's a tie I'd like the tie to be broken by earliest match, so: is there a way to get the fuzzy-index of a match? I tried all functions in fuzz and process (using dir() to discover funcs like QWRatio, etc.)
For instance, I want some mechanism that ranks
fuzz.partial_ratio('alex', 'alexa not')
higher thanfuzz.partial_ratio('alex', 'not alexa')
, but for fuzzy matches (that's a simplistic example). How can I achieve this?The text was updated successfully, but these errors were encountered: