Merge pull request #147 from SethMMorton/remove-python2-mentions-in-d…

…ocumentation Remove python2 mentions in documentation
SethMMorton · Jan 31, 2022 · 473348f · 473348f
2 parents 24d7a4c + 1606199
commit 473348f
Show file tree

Hide file tree

Showing 4 changed files with 97 additions and 67 deletions.
diff --git a/README.rst b/README.rst
@@ -34,11 +34,10 @@ Simple yet flexible natural sorting in Python.
     - `Installation`_
     - `How to Run Tests`_
     - `How to Build Documentation`_
-    - `Deprecation Schedule`_
+    - `Dropped Deprecated APIs`_
     - `History`_
 
-**NOTE**: Please see the `Deprecation Schedule`_ section for changes in
-``natsort`` version 7.0.0.
+**NOTE**: Please see the `Dropped Deprecated APIs`_ section for changes.
 
 Quick Description
 -----------------
@@ -103,7 +102,7 @@ Quick Examples
 - `Locale-Aware Sorting (or "Human Sorting")`_
 - `Further Customizing Natsort`_
 - `Sorting Mixed Types`_
-- `Handling Bytes on Python 3`_
+- `Handling Bytes`_
 - `Generating a Reusable Sorting Key and Sorting In-Place`_
 - `Other Useful Things`_
 
@@ -241,26 +240,23 @@ when you sort:
     >>> a = ['4.5', 6, 2.0, '5', 'a']
     >>> natsorted(a)
     [2.0, '4.5', '5', 6, 'a']
-    >>> # On Python 2, sorted(a) would return [2.0, 6, '4.5', '5', 'a']
-    >>> # On Python 3, sorted(a) would raise an "unorderable types" TypeError
+    >>> # sorted(a) would raise an "unorderable types" TypeError
 
-Handling Bytes on Python 3
-++++++++++++++++++++++++++
+Handling Bytes
+++++++++++++++
 
-``natsort`` does not officially support the `bytes` type on Python 3, but
+``natsort`` does not officially support the `bytes` type, but
 convenience functions are provided that help you decode to `str` first:
 
 .. code-block:: pycon
 
     >>> from natsort import as_utf8
     >>> a = [b'a', 14.0, 'b']
-    >>> # On Python 2, natsorted(a) would would work as expected.
-    >>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
+    >>> # natsorted(a) would raise a TypeError (bytes() < str())
     >>> natsorted(a, key=as_utf8) == [14.0, b'a', 'b']
     True
     >>> a = [b'a56', b'a5', b'a6', b'a40']
-    >>> # On Python 2, natsorted(a) would would work as expected.
-    >>> # On Python 3, natsorted(a) would return the same results as sorted(a)
+    >>> # natsorted(a) would return the same results as sorted(a)
     >>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
     True
 
@@ -446,27 +442,8 @@ use ``tox``:
 
 This will place the documentation in ``build/sphinx/html``.
 
-Deprecation Schedule
---------------------
-
-Dropped Python 3.4 and Python 3.5 Support
-+++++++++++++++++++++++++++++++++++++++++
-
-``natsort`` version 8.0.0 dropped support for Python < 3.6.
-
-Dropped Python 2.7 Support
-++++++++++++++++++++++++++
-
-``natsort`` version 7.0.0 dropped support for Python 2.7.
-
-The version 6.X branch will remain as a "long term support" branch where bug
-fixes are applied so that users who cannot update from Python 2.7 will not be
-forced to use a buggy ``natsort`` version (bug fixes will need to be requested;
-by default only the 7.X branch will be updated).
-New features would not be added to version 6.X, only bug fixes.
-
 Dropped Deprecated APIs
-+++++++++++++++++++++++
+-----------------------
 
 In ``natsort`` version 6.0.0, the following APIs and functions were removed
 

diff --git a/docs/api.rst b/docs/api.rst
@@ -83,8 +83,8 @@ Convenience Functions
 
 .. _bytes_help:
 
-Help With Bytes On Python 3
-+++++++++++++++++++++++++++
+Help With Bytes
++++++++++++++++
 
 The official stance of :mod:`natsort` is to not support `bytes` for
 sorting; there is just too much that can go wrong when trying to automate

diff --git a/docs/examples.rst b/docs/examples.rst
@@ -349,10 +349,10 @@ Just like the :func:`sorted` built-in function, you can supply the
     >>> natsorted(a, reverse=True)
     ['a10', 'a9', 'a4', 'a2', 'a1']
 
-Sorting Bytes on Python 3
--------------------------
+Sorting Bytes
+-------------
 
-Python 3 is rather strict about comparing strings and bytes, and this
+Python is rather strict about comparing strings and bytes, and this
 can make it difficult to deal with collections of both. Because of the
 challenge of guessing which encoding should be used to decode a bytes
 array to a string, :mod:`natsort` does *not* try to guess and automatically
@@ -368,8 +368,7 @@ array, so you can use the key on any arbitrary collection of data.
 
     >>> from natsort import as_ascii
     >>> a = [b'a', 14.0, 'b']
-    >>> # On Python 2, natsorted(a) would would work as expected.
-    >>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
+    >>> # natsorted(a) would raise a TypeError (bytes() < str())
     >>> natsorted(a, key=as_ascii) == [14.0, b'a', 'b']
     True
 

diff --git a/docs/howitworks.rst b/docs/howitworks.rst
@@ -413,11 +413,18 @@ is done, we can see how comparisons can be done in the expected manner.
     >>> a > b
     True
 
-Comparing Different Types on Python 3
-+++++++++++++++++++++++++++++++++++++
+.. note::
+
+    The actual :meth:`decompose_path_into_components`-equivalent function in
+    :mod:`natsort` actually has a few more heuristics than shown here so that
+    it is not over-zealous in what it defines as a path suffix, but this has
+    been omitted in this how-to for clarity.
+
+Comparing Different Types
++++++++++++++++++++++++++
 
 `The second major special case I encountered was sorting of different types`_.
-If you are on Python 2 (i.e. legacy Python), this mostly doesn't matter *too*
+On Python 2 (i.e. legacy Python), this mostly didnt't matter *too*
 much since it uses an arbitrary heuristic to allow traditionally un-comparable
 types to be compared (such as comparing ``'a'`` to ``1``). However, on Python 3
 (i.e. Python) it simply won't let you perform such nonsense, raising a
@@ -662,9 +669,9 @@ These can be summed up as follows:
 #. :mod:`locale` is a thin wrapper over your operating system's *locale*
    library, so if *that* is broken (like it is on BSD and OSX) then
    :mod:`locale` is broken in Python.
-#. Because of a bug in legacy Python (i.e. Python 2), there is no uniform
+#. Because of a bug in legacy Python (i.e. Python 2), there was no uniform
    way to use the :mod:`locale` sorting functionality between legacy Python
-   and Python 3.
+   and Python (luckily this is no longer an issue now that Python 2 is EOL).
 #. People have differing opinions of how capitalization should affect word
    order.
 #. There is no built-in way to handle locale-dependent thousands separators
@@ -715,23 +722,13 @@ easy... just call the :meth:`str.swapcase` method on the input.
     >>> sorted(a, key=lambda x: x.swapcase())
     ['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
 
-The last (i call it *IGNORECASE*) should be super easy, right?
-Simply call :meth:`str.lowercase` on the input. This will work but may
-not always give the correct answer on non-latin character sets. It's
-a good thing that in Python 3.3
-:meth:`str.casefold` was introduced, which does a better job of removing
-all case information from unicode characters in
-non-latin alphabets.
+The last (i call it *IGNORECASE*) is pretty easy.
+Simply call :meth:`str.casefold` on the input (it's like :meth:`std.lowercase`
+but does a better job on non-latin character sets).
 
 .. code-block:: pycon
 
-    >>> def remove_case(x):
-    ...     try:
-    ...         return x.casefold()
-    ...     except AttributeError:  # Legacy Python backwards compatibility
-    ...         return x.lowercase()
-    ...
-    >>> sorted(a, key=remove_case)
+    >>> sorted(a, key=lambda x: x.casefold())
     ['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
 
 The middle case (I call it *GROUPLETTERS*) is less straightforward.
@@ -742,7 +739,7 @@ with its lowercase version and then the original character.
 
     >>> import itertools
     >>> def groupletters(x):
-    ...     return ''.join(itertools.chain.from_iterable((remove_case(y), y) for y in x))
+    ...     return ''.join(itertools.chain.from_iterable((y.casefold(), y) for y in x))
     ...
     >>> groupletters('Apple')
     'aAppppllee'
@@ -904,13 +901,69 @@ characters; otherwise, numbers won't be parsed properly. Therefore, it must
 be applied as part of the :func:`coerce_to_int`/:func:`coerce_to_float`
 functions in a manner similar to :func:`groupletters`.
 
-As you might have guessed, there is a small problem.
-It turns out the there is a bug in the legacy Python implementation of
-:func:`locale.strxfrm` that causes it to outright fail for :func:`unicode`
-input (https://bugs.python.org/issue2481). :func:`locale.strcoll` works,
-but is intended for use with ``cmp``, which does not exist in current Python
-implementations. Luckily, the :func:`functools.cmp_to_key` function
-makes :func:`locale.strcoll` behave like :func:`locale.strxfrm`.
+Unicode Support With Local
+++++++++++++++++++++++++++
+
+Remember how in the `Basic Unicode Support`_ section I mentioned that we
+use the "decompressed" Unicode normalization form (e.g. NFD) on all inputs
+to ensure the order is as expected?
+
+If you have been following along so far, you probably expect that it is not
+that easy. You would be correct.
+
+It turns out that some locales (but not all) expect the input to be in
+"compressed form" (e.g. NFC) or the ordering is not as you might expect.
+`Check out this issue for a real-world example`_. Here's a relevant
+snippet of code
+
+.. code-block:: pycon
+
+    In [1]: import locale, unicodedata
+
+    In [2]: a = ['Aš', 'Cheb', 'Česko', 'Cibulov', 'Znojmo', 'Žilina']
+
+    In [3]: locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
+    Out[3]: 'en_US.UTF-8'
+
+    In [4]: sorted(a, key=locale.strxfrm)
+    Out[4]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+    In [5]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x)))
+    Out[5]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+    In [6]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x)))
+    Out[6]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+    In [7]: locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
+    Out[7]: 'de_DE.UTF-8'
+
+    In [8]: sorted(a, key=locale.strxfrm)
+    Out[8]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+    In [9]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x)))
+    Out[9]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+    In [10]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x)))
+    Out[10]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
+
+    In [11]: locale.setlocale(locale.LC_ALL, 'cs_CZ.UTF-8')
+    Out[11]: 'cs_CZ.UTF-8'
+
+    In [12]: sorted(a, key=locale.strxfrm)
+    Out[12]: ['Aš', 'Cibulov', 'Česko', 'Cheb', 'Znojmo', 'Žilina']
+
+    In [13]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x)))
+    Out[13]: ['Aš', 'Česko', 'Cibulov', 'Cheb', 'Žilina', 'Znojmo']
+
+    In [14]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x)))
+    Out[14]: ['Aš', 'Cibulov', 'Česko', 'Cheb', 'Znojmo', 'Žilina']
+
+Two out of three locales sort the same data in the same order no matter how the unicode
+input was normalized, but Czech seems to care how the input is formatted!
+
+So, everthing mentioned in `Basic Unicode Support`_ is conditional on whether
+or not the user wants to use the :mod:`locale` library or not. If not, then
+"NFD" normalization is used. If they do, "NFC" normalization is used.
 
 Handling Broken Locale On OSX
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1121,3 +1174,4 @@ what the rest of the world assumes.
 .. _really good: https://hypothesis.readthedocs.io/en/latest/
 .. _testing strategy: https://docs.pytest.org/en/latest/
 .. _check out some official Unicode documentation: https://unicode.org/reports/tr15/
+.. _Check out this issue for a real-world example: https://github.com/SethMMorton/natsort/issues/140