-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Tar timestamp conversion from/to string and DateTimeOffset #71038
Conversation
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsThe tests that made comparisons between expected timestamps and timestamps extracted from extended attributes (string -> double -> DateTimeOffset) were intermittently failing due to a precision bug. The main suspicion is that double is losing precision, so I was given the suggestion here to change it to decimal. In that same comment, I added more decimals when parsing (from 6 to 9) to help with the precision as well. Re-enable test disabled here: #69997
|
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
Do we understand where the loss occurs? It seems unlikely it's long-double-long since as @tannergooding pointed out that would require a value over 2^52. I wonder whether we are papering over an issue that may manifest elsewhere. If we are lacking a local repro to debug, we could disable the test for now (or make it tolerant), and leave the issue open. Then later add logging/asserts and loop it until we find a debuggable repro? |
runtime/src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHelpers.cs Lines 122 to 123 in 7ab7f83
Today (GMT: Tuesday, June 21, 2022 2:39:51 PM), there have been 1,655,769,600 seconds since 1/1/1970. In the above code, we are multiplying seconds since epoch by UPDATE: My original "seconds since 1970" number was actually "seconds since year 0". I've updated the math above to (hopefully) be correct. |
Thank you. Somehow I read it as 10^52 😀 |
double secondsSinceEpoch = GetSecondsSinceEpochFromDateTimeOffset(timestamp); | ||
return secondsSinceEpoch.ToString("F9", CultureInfo.InvariantCulture); // 6 decimals, no commas | ||
decimal secondsSinceEpoch = GetSecondsSinceEpochFromDateTimeOffset(timestamp); | ||
return secondsSinceEpoch.ToString("F9", CultureInfo.InvariantCulture); // 9 decimals, no commas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just G
(the default) will print the shortest roundtrippable string. G9
will print the shortest string or up to 9
significant digits, whichever is lesser. However, it will also print exponentials.
There is actually a decent amount of "inconsistency" in how the formatting APIs work between the different format specifiers and its a bit frustrating at times, but we can't easily change existing ones due to back-compat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it will also print exponentials.
I had initially tested G
and because it shows E+XX
like in the screenshot above, I decided to use F
instead.
@tannergooding just to make sure we are all in agreement, using F
is ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what scenarios?
double d = 1655815010.199999998;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 1.65581501E+09
d = 1.199999998;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 1.2
d = 123456789.199999998;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 123456789 (lol wat?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And F9
with double
does not seem to improve:
double d = 1655815010.199999998;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1655815010.200000048
d = 1.199999998;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1.199999998
d = 123456789.199999998;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 123456789.200000003
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using a decimal
in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With decimal, F9
works fine:
decimal d = 1655815010.199999998M;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1655815010.199999998
d = 1.199999998M;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1.199999998
d = 123456789.199999998M;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 123456789.199999998
But G9
behaves similarly to the example above for double:
decimal d = 1655815010.199999998M;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); //1.65581501E+09
d = 1.199999998M;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 1.2
d = 123456789.199999998M;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 123456789
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use G
as suggested above, and as explained via chat.
I am also going to remove some calls to this conversion code where it's not necessary. This code to convert timestamps is only required when converting from gnu to pax or viceversa (to store atime and ctime as required by the format).
} | ||
else if (originalEntry.Format is TarEntryFormat.Ustar or TarEntryFormat.V7) | ||
{ | ||
CompareDateTimeOffsets(initialNow, actualAccessTime); | ||
CompareDateTimeOffsets(initialNow, actualChangeTime); | ||
AssertExtensions.GreaterThanOrEqualTo(actualAccessTime, initialNow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this GreaterThanOrEqualTo
and not just Assert.Equal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In these cases, the initialNow
timestamp is DateTimeOffset.UtcNow
, which is generated shortly before invoking the constructors that take the other
entry. Since the constructors generate the mtime
timestamp automatically using UtcNow
, I can't know the exact expected value, but I can at least verify that the value is larger than the timestamp I saved before calling the constructors.
The cases that do an Equal
comparison are the ones where the existing mtime
is used as the value to store for atime
and ctime
, so I know the exact expected value to compare.
@eerhardt the latest commits take care of the following:
|
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -383,10 +383,10 @@ private void ReadPosixAndGnuSharedAttributes(Span<byte> buffer) | |||
private void ReadGnuAttributes(Span<byte> buffer) | |||
{ | |||
// Convert byte arrays | |||
int aTime = TarHelpers.GetTenBaseNumberFromOctalAsciiChars(buffer.Slice(FieldLocations.ATime, FieldLengths.ATime)); | |||
long aTime = TarHelpers.GetTenBaseLongFromOctalAsciiChars(buffer.Slice(FieldLocations.ATime, FieldLengths.ATime)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to create a test that has a date time in 2039 to make sure we can handle dates past 2038.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests for the epochalypse and the max upper limit in octal.
@@ -397,7 +396,7 @@ private void CollectExtendedAttributesFromStandardFieldsIfNeeded() | |||
_extendedAttributes.Add(PaxEaName, _name); | |||
|
|||
bool containsATime = _extendedAttributes.ContainsKey(PaxEaATime); | |||
bool containsCTime = _extendedAttributes.ContainsKey(PaxEaATime); | |||
bool containsCTime = _extendedAttributes.ContainsKey(PaxEaCTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test that would have caught this bug? If not, we should add one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, kinda:
- The
TarWriter_WriteEntry_Pax_Tests.WritePaxAttributes_Timestamps_AutomaticallyAdded
checks this for thePaxTarEntry(entryFormat, string)
constructor, when the user does not explicitly set atime and ctime. - The
PaxTarEntry_Conversion_Tests.Constructor_ConversionFrom*
tests are semi-related: they check that atime and ctime are always in the dictionary after conversion, in the constructor itself, not at write time. - We don't have a test for the conversion constructors, so I'm adding one.
But now that you mention it, it seems these two conditions will never be false, making the code unreachable, so I am removing the code. Here's why:
When constructing a PaxTarEntry
, whether it is new or it is being converted from another entry, all constructors ensure to add atime
and ctime
to the extended attributes dictionary if the user did not provide them (see the method AddNewAccessAndChangeTimestampsIfNotExist
). Consider also that the extended attributes dictionary is exposed to the user as an IReadOnlyDictionary<string, string>
getter-only property, so no new items can be added after construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: Both the existing test and the new test I'm adding pass after removing the unreachable code.
new DateTimeOffset((long)(secondsSinceUnixEpoch * TimeSpan.TicksPerSecond) + DateTime.UnixEpoch.Ticks, TimeSpan.Zero); | ||
|
||
// Converts the specified DateTimeOffset to the number of seconds that have passed since the Unix Epoch. | ||
internal static double GetSecondsSinceEpochFromDateTimeOffset(DateTimeOffset dateTimeOffset) => | ||
((double)(dateTimeOffset.UtcDateTime - DateTime.UnixEpoch).Ticks) / TimeSpan.TicksPerSecond; | ||
internal static decimal GetSecondsSinceEpochFromDateTimeOffset(DateTimeOffset dateTimeOffset) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be internal
? Is it only called from this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// Converts the array to an octal base number, then transforms it to ten base and returns it. | ||
internal static long GetTenBaseLongFromOctalAsciiChars(Span<byte> buffer) | ||
{ | ||
string str = GetTrimmedAsciiString(buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to create an intermediate string just to parse it into an integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to use System.Buffers.Text.Utf8Parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a Utf8Parser API that parses the integer from an "octal" string. I see it supports x
- hexstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a new allocation is not needed. This code can be improved to instead of returning a string, it returns an int representing the length of the ROS for slicing.
The method checks that the last character(s) in the ROS are either a 0
(null char) or a 32
(space). All other characters are not trimmed. Which means that if an unexpected non-numeric character is found, it will cause the conversion to fail.
Do you mind if I address this request later? I'd like to get this PR merged just for the DateTimeOffsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind if I address this request later?
I think that is fine.
entry._header._mTime = TarHelpers.GetDateTimeOffsetFromSecondsSinceEpoch(status.MTime); | ||
entry._header._aTime = TarHelpers.GetDateTimeOffsetFromSecondsSinceEpoch(status.ATime); | ||
entry._header._mTime = info.LastWriteTimeUtc; | ||
entry._header._aTime = info.LastAccessTimeUtc; | ||
entry._header._cTime = TarHelpers.GetDateTimeOffsetFromSecondsSinceEpoch(status.CTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at the FileStatus code:
runtime/src/libraries/System.Private.CoreLib/src/System/IO/FileStatus.Unix.cs
Lines 309 to 318 in fdc3b51
return UnixTimeToDateTimeOffset(_fileCache.MTime, _fileCache.MTimeNsec); | |
} | |
internal void SetLastWriteTime(string path, DateTimeOffset time, bool asDirectory) | |
=> SetAccessOrWriteTime(path, time, isAccessTime: false, asDirectory); | |
private static DateTimeOffset UnixTimeToDateTimeOffset(long seconds, long nanoseconds) | |
{ | |
return DateTimeOffset.FromUnixTimeSeconds(seconds).AddTicks(nanoseconds / NanosecondsPerTick); | |
} |
It looks like status.XTime
gets added to status.XTimeNsec
to make this DateTimeOffset. We will be respecting that for M and A time, but it looks like we don't respect that for C time now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Changing this logic to:
entry._header._cTime = DateTimeOffset.FromUnixTimeSeconds(status.CTime).AddTicks(status.CTimeNsec / 100 /* nanoseconds per tick */);
…to store as DateTimeOffset.
…e a dot. The DateTimeOffset comparison done afterwards should suffice.
… entry. Add tests to ensure we always add those entries to the dictionary on construction.
I'm rebasing on top of the latest bits in main, which contain the fixes for the timestamp and Apple-specific test failures. |
72698b6
to
cad5599
Compare
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks @carlossanlop.
The tvOS failure is unrelated to this change, and it is not critical: The In other Unix platforms, extracting fifos does not throw, but apparently it does on tvOS. I'll get it fixed later. Here's the callstack:
|
The tests that made comparisons between expected timestamps and timestamps extracted from extended attributes (string -> double -> DateTimeOffset) were intermittently failing due to a precision bug. The main suspicion is that double is losing precision, so I was given the suggestion here to change it to decimal. In that same comment, I added more decimals when parsing (from 6 to 9) to help with the precision as well.
Fixes #69474
Fixes #70060
Re-enable test disabled here: #69997