Skip to content

Commit

Permalink
Merge pull request #147 from SethMMorton/remove-python2-mentions-in-d…
Browse files Browse the repository at this point in the history
…ocumentation

Remove python2 mentions in documentation
  • Loading branch information
SethMMorton authored Jan 31, 2022
2 parents 24d7a4c + 1606199 commit 473348f
Show file tree
Hide file tree
Showing 4 changed files with 97 additions and 67 deletions.
43 changes: 10 additions & 33 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,10 @@ Simple yet flexible natural sorting in Python.
- `Installation`_
- `How to Run Tests`_
- `How to Build Documentation`_
- `Deprecation Schedule`_
- `Dropped Deprecated APIs`_
- `History`_

**NOTE**: Please see the `Deprecation Schedule`_ section for changes in
``natsort`` version 7.0.0.
**NOTE**: Please see the `Dropped Deprecated APIs`_ section for changes.

Quick Description
-----------------
Expand Down Expand Up @@ -103,7 +102,7 @@ Quick Examples
- `Locale-Aware Sorting (or "Human Sorting")`_
- `Further Customizing Natsort`_
- `Sorting Mixed Types`_
- `Handling Bytes on Python 3`_
- `Handling Bytes`_
- `Generating a Reusable Sorting Key and Sorting In-Place`_
- `Other Useful Things`_

Expand Down Expand Up @@ -241,26 +240,23 @@ when you sort:
>>> a = ['4.5', 6, 2.0, '5', 'a']
>>> natsorted(a)
[2.0, '4.5', '5', 6, 'a']
>>> # On Python 2, sorted(a) would return [2.0, 6, '4.5', '5', 'a']
>>> # On Python 3, sorted(a) would raise an "unorderable types" TypeError
>>> # sorted(a) would raise an "unorderable types" TypeError
Handling Bytes on Python 3
++++++++++++++++++++++++++
Handling Bytes
++++++++++++++

``natsort`` does not officially support the `bytes` type on Python 3, but
``natsort`` does not officially support the `bytes` type, but
convenience functions are provided that help you decode to `str` first:

.. code-block:: pycon
>>> from natsort import as_utf8
>>> a = [b'a', 14.0, 'b']
>>> # On Python 2, natsorted(a) would would work as expected.
>>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
>>> # natsorted(a) would raise a TypeError (bytes() < str())
>>> natsorted(a, key=as_utf8) == [14.0, b'a', 'b']
True
>>> a = [b'a56', b'a5', b'a6', b'a40']
>>> # On Python 2, natsorted(a) would would work as expected.
>>> # On Python 3, natsorted(a) would return the same results as sorted(a)
>>> # natsorted(a) would return the same results as sorted(a)
>>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
True
Expand Down Expand Up @@ -446,27 +442,8 @@ use ``tox``:
This will place the documentation in ``build/sphinx/html``.

Deprecation Schedule
--------------------

Dropped Python 3.4 and Python 3.5 Support
+++++++++++++++++++++++++++++++++++++++++

``natsort`` version 8.0.0 dropped support for Python < 3.6.

Dropped Python 2.7 Support
++++++++++++++++++++++++++

``natsort`` version 7.0.0 dropped support for Python 2.7.

The version 6.X branch will remain as a "long term support" branch where bug
fixes are applied so that users who cannot update from Python 2.7 will not be
forced to use a buggy ``natsort`` version (bug fixes will need to be requested;
by default only the 7.X branch will be updated).
New features would not be added to version 6.X, only bug fixes.

Dropped Deprecated APIs
+++++++++++++++++++++++
-----------------------

In ``natsort`` version 6.0.0, the following APIs and functions were removed

Expand Down
4 changes: 2 additions & 2 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,8 @@ Convenience Functions

.. _bytes_help:

Help With Bytes On Python 3
+++++++++++++++++++++++++++
Help With Bytes
+++++++++++++++

The official stance of :mod:`natsort` is to not support `bytes` for
sorting; there is just too much that can go wrong when trying to automate
Expand Down
9 changes: 4 additions & 5 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -349,10 +349,10 @@ Just like the :func:`sorted` built-in function, you can supply the
>>> natsorted(a, reverse=True)
['a10', 'a9', 'a4', 'a2', 'a1']
Sorting Bytes on Python 3
-------------------------
Sorting Bytes
-------------

Python 3 is rather strict about comparing strings and bytes, and this
Python is rather strict about comparing strings and bytes, and this
can make it difficult to deal with collections of both. Because of the
challenge of guessing which encoding should be used to decode a bytes
array to a string, :mod:`natsort` does *not* try to guess and automatically
Expand All @@ -368,8 +368,7 @@ array, so you can use the key on any arbitrary collection of data.
>>> from natsort import as_ascii
>>> a = [b'a', 14.0, 'b']
>>> # On Python 2, natsorted(a) would would work as expected.
>>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
>>> # natsorted(a) would raise a TypeError (bytes() < str())
>>> natsorted(a, key=as_ascii) == [14.0, b'a', 'b']
True
Expand Down
108 changes: 81 additions & 27 deletions docs/howitworks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -413,11 +413,18 @@ is done, we can see how comparisons can be done in the expected manner.
>>> a > b
True
Comparing Different Types on Python 3
+++++++++++++++++++++++++++++++++++++
.. note::

The actual :meth:`decompose_path_into_components`-equivalent function in
:mod:`natsort` actually has a few more heuristics than shown here so that
it is not over-zealous in what it defines as a path suffix, but this has
been omitted in this how-to for clarity.

Comparing Different Types
+++++++++++++++++++++++++

`The second major special case I encountered was sorting of different types`_.
If you are on Python 2 (i.e. legacy Python), this mostly doesn't matter *too*
On Python 2 (i.e. legacy Python), this mostly didnt't matter *too*
much since it uses an arbitrary heuristic to allow traditionally un-comparable
types to be compared (such as comparing ``'a'`` to ``1``). However, on Python 3
(i.e. Python) it simply won't let you perform such nonsense, raising a
Expand Down Expand Up @@ -662,9 +669,9 @@ These can be summed up as follows:
#. :mod:`locale` is a thin wrapper over your operating system's *locale*
library, so if *that* is broken (like it is on BSD and OSX) then
:mod:`locale` is broken in Python.
#. Because of a bug in legacy Python (i.e. Python 2), there is no uniform
#. Because of a bug in legacy Python (i.e. Python 2), there was no uniform
way to use the :mod:`locale` sorting functionality between legacy Python
and Python 3.
and Python (luckily this is no longer an issue now that Python 2 is EOL).
#. People have differing opinions of how capitalization should affect word
order.
#. There is no built-in way to handle locale-dependent thousands separators
Expand Down Expand Up @@ -715,23 +722,13 @@ easy... just call the :meth:`str.swapcase` method on the input.
>>> sorted(a, key=lambda x: x.swapcase())
['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
The last (i call it *IGNORECASE*) should be super easy, right?
Simply call :meth:`str.lowercase` on the input. This will work but may
not always give the correct answer on non-latin character sets. It's
a good thing that in Python 3.3
:meth:`str.casefold` was introduced, which does a better job of removing
all case information from unicode characters in
non-latin alphabets.
The last (i call it *IGNORECASE*) is pretty easy.
Simply call :meth:`str.casefold` on the input (it's like :meth:`std.lowercase`
but does a better job on non-latin character sets).

.. code-block:: pycon
>>> def remove_case(x):
... try:
... return x.casefold()
... except AttributeError: # Legacy Python backwards compatibility
... return x.lowercase()
...
>>> sorted(a, key=remove_case)
>>> sorted(a, key=lambda x: x.casefold())
['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
The middle case (I call it *GROUPLETTERS*) is less straightforward.
Expand All @@ -742,7 +739,7 @@ with its lowercase version and then the original character.
>>> import itertools
>>> def groupletters(x):
... return ''.join(itertools.chain.from_iterable((remove_case(y), y) for y in x))
... return ''.join(itertools.chain.from_iterable((y.casefold(), y) for y in x))
...
>>> groupletters('Apple')
'aAppppllee'
Expand Down Expand Up @@ -904,13 +901,69 @@ characters; otherwise, numbers won't be parsed properly. Therefore, it must
be applied as part of the :func:`coerce_to_int`/:func:`coerce_to_float`
functions in a manner similar to :func:`groupletters`.

As you might have guessed, there is a small problem.
It turns out the there is a bug in the legacy Python implementation of
:func:`locale.strxfrm` that causes it to outright fail for :func:`unicode`
input (https://bugs.python.org/issue2481). :func:`locale.strcoll` works,
but is intended for use with ``cmp``, which does not exist in current Python
implementations. Luckily, the :func:`functools.cmp_to_key` function
makes :func:`locale.strcoll` behave like :func:`locale.strxfrm`.
Unicode Support With Local
++++++++++++++++++++++++++

Remember how in the `Basic Unicode Support`_ section I mentioned that we
use the "decompressed" Unicode normalization form (e.g. NFD) on all inputs
to ensure the order is as expected?

If you have been following along so far, you probably expect that it is not
that easy. You would be correct.

It turns out that some locales (but not all) expect the input to be in
"compressed form" (e.g. NFC) or the ordering is not as you might expect.
`Check out this issue for a real-world example`_. Here's a relevant
snippet of code

.. code-block:: pycon
In [1]: import locale, unicodedata
In [2]: a = ['Aš', 'Cheb', 'Česko', 'Cibulov', 'Znojmo', 'Žilina']
In [3]: locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
Out[3]: 'en_US.UTF-8'
In [4]: sorted(a, key=locale.strxfrm)
Out[4]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
In [5]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x)))
Out[5]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
In [6]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x)))
Out[6]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
In [7]: locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
Out[7]: 'de_DE.UTF-8'
In [8]: sorted(a, key=locale.strxfrm)
Out[8]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
In [9]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x)))
Out[9]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
In [10]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x)))
Out[10]: ['Aš', 'Česko', 'Cheb', 'Cibulov', 'Žilina', 'Znojmo']
In [11]: locale.setlocale(locale.LC_ALL, 'cs_CZ.UTF-8')
Out[11]: 'cs_CZ.UTF-8'
In [12]: sorted(a, key=locale.strxfrm)
Out[12]: ['Aš', 'Cibulov', 'Česko', 'Cheb', 'Znojmo', 'Žilina']
In [13]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFD", x)))
Out[13]: ['Aš', 'Česko', 'Cibulov', 'Cheb', 'Žilina', 'Znojmo']
In [14]: sorted(a, key=lambda x: locale.strxfrm(unicodedata.normalize("NFC", x)))
Out[14]: ['Aš', 'Cibulov', 'Česko', 'Cheb', 'Znojmo', 'Žilina']
Two out of three locales sort the same data in the same order no matter how the unicode
input was normalized, but Czech seems to care how the input is formatted!

So, everthing mentioned in `Basic Unicode Support`_ is conditional on whether
or not the user wants to use the :mod:`locale` library or not. If not, then
"NFD" normalization is used. If they do, "NFC" normalization is used.

Handling Broken Locale On OSX
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1121,3 +1174,4 @@ what the rest of the world assumes.
.. _really good: https://hypothesis.readthedocs.io/en/latest/
.. _testing strategy: https://docs.pytest.org/en/latest/
.. _check out some official Unicode documentation: https://unicode.org/reports/tr15/
.. _Check out this issue for a real-world example: https://github.com/SethMMorton/natsort/issues/140

0 comments on commit 473348f

Please sign in to comment.