-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datetime and time equality and hashing problematic #116035
Comments
A quick scan of @pganssle 's https://blog.ganssle.io/articles/2018/02/a-curious-case-datetimes.html documents that this happens but I didn't spot any justification for it... |
I don't know if I've ever actually gotten to the bottom of why this requirement exists, but I've been meaning to figure it out for years. I don't have time to track it down at the moment, but it was the result of a discussion between @abalkin and @tim-one, with the result documented here. This is the message where Tim explains that you can't make two dates differing only by their I don't really understand the concern with |
Also I don't think this should be labeled as a "bug" — I don't know what the labels are used for, but if we were to change this behavior it would be breaking a thoroughly documented feature with a decently long public discussion history behind it. This is not a situation where no one thought about this, it's more that they thought about it and picked a specific set of trade-offs. If we want to change it, it would be because we decided that we would prefer a different set of trade-offs (and prefer them so much that it is worth making breaking changes to Not saying that we definitely can't do it or anything, just that it's not really a "bug" (and it's certainly not getting backported to feature-frozen branches). |
Removed the "bug" label because it's working as designed and documented. Plus the chance that it will change is approximately 0 😉. It's not PEP 495 particularly at work here, but a basic rule throughout datetime: objects overwhelmingly work in "naïve time", the details of which are primarily spelled out across various footnotes on tables of operations. It means that timezone info is overwhelmingly ignored, unless two aware datetime objects have two different tzinfo members. Then things like equality comparison effectively convert to UTC under the covers before the comparison occurs. But if two datetime objects have the same tzinfo member (as in your second example),, comparison treats them as naïve datetimes. Everything related to time zone is ignored then. That was intended from the start, and has worked that way from the start (long before PEP 495 came along, although that PEP had to try to preserve backward compatibility too). The intent has always been that if you want timezone-aware arithmetic (including comparison), you should convert to UTC first (a zone where "naïve arithmetic" is always the right thing to do), do the arithmetic, and convert back when you're done. A consequence is that, ya, you "shouldn't" be using aware datetimes as dict keys - unless naïve comparison is what you want (which, BTW, usually is what I want). @pganssle, as I recall, datetime.combine(dt.date(), dt.time()) to get back an object identical in all respects to I don't believe any other use was envisioned for giving |
Good to see you're alive and well @tim-one, it's been a while!
Just because something is working as designed and documented doesn't mean there isn't a bug in either, in this case it would appear to be the design.
I don't doubt it, but that also doesn't mean it's not a bug ;-)
Yeah, I mean this appears to be the crux of the behaviour problem. I think it's somewhat defensible, although I still view it as a design failure, that "folds are completely ignored when there's no tzinfo". But I honestly can't see how this can be seen as anything but a bug: >>> t1 = datetime(2024, 10, 27, 1, fold=0, tzinfo=ZoneInfo('Europe/London'))
>>> str(t1)
'2024-10-27 01:00:00+01:00'
>>> t2 = datetime(2024, 10, 27, 1, fold=1, tzinfo=ZoneInfo('Europe/London'))
>>> str(t2)
'2024-10-27 01:00:00+00:00'
>>> t1 == t2
True These are fundamentally two different points in time, in what way can they be defended as being equal? :-)
Where are these two things documented? They honestly feel like they should be in a |
It's because a lot of stuff around datetime handling is actually more context-dependent than you would think. They represent different absolute times, but the same civil time, because civil times with backwards offset changes have inherent ambiguities. If you look at how the temporal library in Javascript has drawn their abstraction boundaries, you'll see why it is more complicated here — the If we had the option (and maybe we will one day), I'd probably build a new So it's easily defensible why those things are equal: they both represent 01:00 on 2024-10-27 in London. The fact that 01:00 on 2024-10-27 happened twice and these represent the two different instances only matters if you are using a different concept of equality (specifically, the one where two datetimes are equal if they represent the same offset from epoch). I also realized that yes, this particular >>> from datetime import datetime, timedelta
>>> from zoneinfo import ZoneInfo
>>> dt1 = datetime(2024, 10, 27, 1, tzinfo=ZoneInfo("Europe/London"))
>>> dt1 == dt1.astimezone(ZoneInfo("America/New_York")
False
>>> dt2 = dt1 - timedelta(seconds=1)
>>> dt2 == dt2.astimezone(ZoneInfo("America/New_York"))
True Any time changing In that context, reading Tim's message in 2015 makes more sense, because if you are preserving hash/equality equivalence, either you need a situation where the hash depends on the fold, in which case equality must also depend on the fold for intra-zone comparisons or you need to do something like what they did in PEP 495, where the ambiguous datetimes don't compare equal to datetimes in other zones, and thus it's fine for them to have different hash values. Presumably reading through the original thread you can see why this particular set of trade-offs was made rather than another set. I would be surprised if it wasn't carefully considered, since years and years into my work with |
Your It's what Guido intended, and is how it's always worked. You're not required to agree with the design, but fighting against such old and established decisions is almost certainly futile. The docs could be clearer, and I've long thought they could be materially improved by adding a less formal introductory section pointing out the consequences of this design decision. They can be surprising. But then the very concept of "time zones" with changing offsets is an unprincipled political mess. As a general thing, while the docs are pretty good about pointing out where For datetime
I could have sworn that at one time it also explicitly said that if both comparands are aware and have the same tzinfo attributes, comparison acts as if both comparands were naïve (as is done in the footnote for Paul wrote that datetime was "optimized" for "civil time", but it's more that Guido fundamentally only cared about "civil time". If you needed any form of time zone aware arithmetic, fine, convert to UTC, do naïve arithmetic there, and convert back. Indeed, he cared so little about time zones that the original design deliberately omitted any way to disambiguate the "problem times" around DST transitions. I recall, at the time, his astonishment upon discovering that a C Which is the one original decision which eventually got reworked, to add I expect I'd agree with Paul about what would constitute a better design (stronger wall between naïve and aware). But in real life I don't have problems, because I don't fight the design: when I need aware arithmetic, I convert to UTC first. >>> utc = ZoneInfo("utc")
>>> t2.astimezone(utc) == t1.astimezone(utc)
False
>>> print(t2.astimezone(utc) - t1.astimezone(utc))
1:00:00 |
Ya, it's seemingly random doc rot. I don't know when that info got lost, but at least the 3.8 docs were (to my eyes) quite clear about this:
Although everywhere the docs say (or said) tzinfo is ignored, they should add that EDIT: looks like the docs here dropped this important info starting with version 3.11. No idea why. Worse, they seem now to imply that equality does make time zone adjustments in all cases:
So I'll at least agree that the docs have become infested with bugs 😦. |
@serhiy-storchaka, "git blame" says you're the source of the "doc rot" 😉:
Could you please skim this issue and restore former text appropriately? The major thing is that all text has vanished explicitly saying that comparison of two datetime objects with a common tzinfo attribute ignores the tzinfo attribute entirely. That's extremely important to know, but the docs no longer say so. This applies to all 6 comparison operators. UTC adjustments are done only when comparing two aware datetime objects with distinct tzinfo members. For extra credit, it looks like the docs were never updated to add that |
Ignoring the tzinfo attribute if it is identical in both comparands looked like a trivial optimization to me, so I omitted it. I did not notice that it ignores also the fold attribute. Now it looks like a bug to me. See also #72787. |
…mparisons if tzinfo is the same
Thanks for taking a look!
Then please read this entire page. It's the very opposite of "trivial", and isn't an "optimization" at all: it can make profound difference to semantics. Chris gave an example in his initial complaint here:
They are different when converted to UTC, but because they have the same tzinfo attribute, the common tiznfo attribute (and |
…mparisons if tzinfo is the same (pythonGH-116187) This mostly restores information removed in c12240e (pythonGH-114749). (cherry picked from commit 05b0490) Co-authored-by: Serhiy Storchaka <[email protected]>
…mparisons if tzinfo is the same (pythonGH-116187) This mostly restores information removed in c12240e (pythonGH-114749). (cherry picked from commit 05b0490) Co-authored-by: Serhiy Storchaka <[email protected]>
It looked like optimization to me because I forgot about |
Not quite. Ambiguous times are a fact of life, and time zone conversion always had to "pick one". The only difference adding Since Python didn't ship any concrete tzinfo classes at first, even at the first release of So, to cover all bases, it was documented from the start that comparison, like |
[@cjw296]
I expect that most apps would use a datetime instead, but first converted to UTC. While that's roughly equivalent to a timestamp, local system time has nothing to do with it, and sticking to datetimes avoids all the annoying cases converting to a float can stumble into. Converting to UTC is precisely defined and lossless, and will remain so for thousands of years to come (eventually failing around |
With the docs amended, are there more actionable items left here? |
None left to my eyes, but Chris may yet disagree. In the meantime, I added the "docs" label. |
BTW, here's a specific example where the meaning of from datetime import datetime, timedelta
from zoneinfo import ZoneInfo
east = ZoneInfo("US/Eastern")
dt1 = datetime(2016, 3, 13, 2, 30, tzinfo=east)
dt2 = dt1 + timedelta(hours=1)
print(dt1) # 2016-03-13 02:30:00-05:00
print(dt2) # 2016-03-13 03:30:00-04:00
print(dt1 == dt2) # False
utc = ZoneInfo("utc")
print(dt1.astimezone(utc) == dt2.astimezone(utc)) # True This is shortly after the start of DST in that zone. The time "2:30" doesn't exist on a local wall clock (although it does if you forget to move the clock forward at 2am - which is how conversion to UTC acts in this case). It's not a time in "a fold", but one in "a gap". But in "naïve time", they both exist, and are an hour apart - and both subtraction and EDIT - In case it wasn't apparent, I'll point out explicitly that nothing about this example changed when |
…mparisons if tzinfo is the same (pythonGH-116187) This mostly restores information removed in c12240e (pythonGH-114749).
@pganssle - can you point me to where your definition of "civil time" is coming from? While we'd hopefully all agree that the best response to "let's meet at 1:30am on the 27th Oct 2024" would be "what? no! I'll be asleep in bed", I can certainly think of situations where it would be "well, which 1:30am on the 27th Oct 2024?". My point being that even in "civil time", more information is used and folks don't just go "well, they're the same time, there's no difference" - if you're being payed to play a 1hr set at 1:30am on that date, I'm very sure you'll find out which one it is...
Yeah, it looks like they've done a more thorough job than Python 😞 @tim-one - you're throwing Guido under the bus here, but sounds like he probably deserves it in this case! "fighting against such old and established decisions is almost certainly futile" - in terms of changing the implementation, I guess, which is definitely saddening, but dates, times and zones are always saddening. However, an overhaul of the docs with way bigger warnings about this "surprising" behaviour might help folks in the future.
Again, with hindsight, this feels like a shame - opening an even bigger pandoras box than just getting a standardized ZoneInfo class into Python and making the API surface you highlight as problematic to be smaller from the start.
Can you give an example for this in relation to datetimes? Is it just rounding at the sub-second level or is there more?
Yeah, I separately moaned about this on #116111 and #116038. |
Here is an opposite example: from datetime import datetime, UTC
from zoneinfo import ZoneInfo
east = ZoneInfo("US/Eastern")
dt1 = datetime(2016, 11, 6, 1, 30, tzinfo=east, fold=0)
dt2 = datetime(2016, 11, 6, 1, 30, tzinfo=east, fold=1)
print(dt1) # 2016-11-06 01:30:00-04:00
print(dt2) # 2016-11-06 01:30:00-05:00
print(dt1 == dt2) # True
print(dt2 - dt1) # 0:00:00
print(dt2.timestamp() - dt1.timestamp()) # 3600.0
print(dt1.astimezone(UTC) == dt2.astimezone(UTC)) # False
print(dt2.astimezone(UTC) - dt1.astimezone(UTC)) # 1:00:00 There are perhaps good reasons for this, but the fact that datetime objects which refer to different points of time are considered equal, is confusing. It means that only datetime objects in UTC and raw timestamps can be used to identify the time. Other time zones can only be used to format and parse the local time. |
@serhiy-storchaka, yes, the introduction of DST is confusing, period. In your example, show a person the local wall clock at Or, in most of the US, just watch random Facebook feeds this Sunday to enjoy a variety of annoyed posts from people who got burned by the switch to DST 😉. |
Docs patches can be contributed by anyone 😉. I was too close to this from the start to really grok the intensity of the complaints. Life is easy if you play along with the design: if you care about time zone effects, convert to UTC and work in that. There are no surprises in UTC (or, for that matter, in any other fixed-offset zone). That advice is crucial, but I don't see any value in laboriously documenting all the things that can go wrong if someone is determined to fight the design.
Not inclined to spend time on it. It was again never intended that lossless conversion to a float be possible. If you try, you're on your own. The docs are strong enough already (although, ya, don't even mention possible rounding effects):
Same thing: if you want the time zone to affect intrazone computation, convert to UTC first. |
…mparisons if tzinfo is the same (pythonGH-116187) This mostly restores information removed in c12240e (pythonGH-114749).
…mparisons if tzinfo is the same (pythonGH-116187) This mostly restores information removed in c12240e (pythonGH-114749).
Bug report
Bug description:
I'm sure folks will pull out reasoning behind this from the inscrutable section in PEP 495 but this behaviour is honestly extremely surprising at best and just flat out wrong on first glance:
Even when deliberately specifying as explicitly as possible two points in time which are absolutely not the same (these are two points during the UK DST backwards transition later this year):
This has really unpleasant implications for things like storing points in time in a dictionary:
The inverse of this issue is reported in #115845.
Yes, I know the timestamp can be used:
>>> dt1.timestamp() == dt2.timestamp() False
...but not on
time
objects, andtimestamp
also brings the local machine's timezone into play:Concretely, it would be a lot less confusing if:
time
objects, anddatetime
objects wheretzinfo
isNone
are equal and hashed the same only if all of their attributes includingfold
are the same.datetime
objects wheretzinfo
is notNone
are equal and hashed the same only if they represent the exact same point in time.CPython versions tested on:
3.12
Operating systems tested on:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: