Uniform treatment of strings in Unicode. #20

tlaunay · 2013-03-15T10:43:09Z

Uniform treatment of strings in Unicode. Non-ASCII chars are now considered in strings, which allows for matches in Cyrillic, Chinese, Greek, etc.

Also removed some unused imports and updated the tests.

…idered in strings, which allows for matches in Cyrillic, Chinese, Greek, etc.

…unicode preprocessing *before* using fuzz lib.

…Also fixed empty string detection in token_sort_ratio.

… anymore.

josegonzalez · 2013-04-01T18:35:16Z

Are all these commits supposed to be here? If so, I'll pester people at sg so that this gets merged where possible.

(I'm one of the people you contacted on the 5th, sorry for the late reply!)

tlaunay · 2013-04-02T07:21:40Z

Yes, they are, we are working together! No problem with the late reply. :)

josegonzalez · 2013-04-02T14:41:44Z

@acslater00 given our internal usage of fuzzywuzzy, does it make more sense to have functions like u_partial_token_set_ratio() which work on unicode strings, or perhaps have a unicode=False argument that can be toggled to get the new code?

If so, I can work with @tlaunay to make the required changes.

Pull Request #20 Augmented With force_ascii parameter

josegonzalez · 2013-05-03T16:57:35Z

This was merged, thanks @tlaunay and @lerignoux for the pull request!

Tristan Launay and others added 12 commits March 15, 2013 10:37

Uniform treatment of strings in Unicode. Non-ASCII chars are now cons…

516558a

…idered in strings, which allows for matches in Cyrillic, Chinese, Greek, etc.

Added file for processing strings.

54e0389

Unicode support in benchmark.py

9e40951

ENG-741: having a true benchmark, to see when we improve stuff

53edd59

Proper benchmark display. Introduce methods to explicitly do all the …

c5fdad5

…unicode preprocessing *before* using fuzz lib.

Simplified processing of strings with built-in regex code in python. …

ff0eff3

…Also fixed empty string detection in token_sort_ratio.

Fixed comment.

98355c7

Re-upped the limit on benchmark, now that performance is not an issue…

8f744ec

… anymore.

ENG-741 cut long lines in fuzzy wizzy benchmark

58f7f5e

ENG-741 commented code removed not erased for review from creator

c062f19

Fixed Unicode flag for tests.

4d378f1

ENG-741 fixed benchmark line length

666e8cd

Added a test for non letters/digits replacements.

6e07a3a

acslater00 pushed a commit that referenced this pull request May 3, 2013

Merge pull request #23 from seatgeek/pr/20

b486605

Pull Request #20 Augmented With force_ascii parameter

acslater00 merged commit 6e07a3a into seatgeek:master May 3, 2013

mlampros mentioned this pull request Dec 16, 2017

How to deal with Chinese characters? mlampros/fuzzywuzzyR#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniform treatment of strings in Unicode. #20

Uniform treatment of strings in Unicode. #20

tlaunay commented Mar 15, 2013

josegonzalez commented Apr 1, 2013

tlaunay commented Apr 2, 2013

josegonzalez commented Apr 2, 2013

josegonzalez commented May 3, 2013

Uniform treatment of strings in Unicode. #20

Uniform treatment of strings in Unicode. #20

Conversation

tlaunay commented Mar 15, 2013

josegonzalez commented Apr 1, 2013

tlaunay commented Apr 2, 2013

josegonzalez commented Apr 2, 2013

josegonzalez commented May 3, 2013