Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Tar timestamp conversion from/to string and DateTimeOffset #71038

Merged
merged 12 commits into from
Jun 23, 2022

Conversation

carlossanlop
Copy link
Member

The tests that made comparisons between expected timestamps and timestamps extracted from extended attributes (string -> double -> DateTimeOffset) were intermittently failing due to a precision bug. The main suspicion is that double is losing precision, so I was given the suggestion here to change it to decimal. In that same comment, I added more decimals when parsing (from 6 to 9) to help with the precision as well.

Fixes #69474
Fixes #70060

Re-enable test disabled here: #69997

@ghost
Copy link

ghost commented Jun 21, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

The tests that made comparisons between expected timestamps and timestamps extracted from extended attributes (string -> double -> DateTimeOffset) were intermittently failing due to a precision bug. The main suspicion is that double is losing precision, so I was given the suggestion here to change it to decimal. In that same comment, I added more decimals when parsing (from 6 to 9) to help with the precision as well.

Fixes #69474
Fixes #70060

Re-enable test disabled here: #69997

Author: carlossanlop
Assignees: carlossanlop
Labels:

area-System.IO

Milestone: 7.0.0

@carlossanlop
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@danmoseley
Copy link
Member

Do we understand where the loss occurs? It seems unlikely it's long-double-long since as @tannergooding pointed out that would require a value over 2^52. I wonder whether we are papering over an issue that may manifest elsewhere.

If we are lacking a local repro to debug, we could disable the test for now (or make it tolerant), and leave the issue open. Then later add logging/asserts and loop it until we find a debuggable repro?

@eerhardt
Copy link
Member

eerhardt commented Jun 21, 2022

It seems unlikely it's long-double-long since as @tannergooding pointed out that would require a value over 2^52.

internal static DateTimeOffset GetDateTimeOffsetFromSecondsSinceEpoch(double secondsSinceUnixEpoch) =>
new DateTimeOffset((long)(secondsSinceUnixEpoch * TimeSpan.TicksPerSecond) + DateTime.UnixEpoch.Ticks, TimeSpan.Zero);

Today (GMT: Tuesday, June 21, 2022 2:39:51 PM), there have been 1,655,769,600 seconds since 1/1/1970.

In the above code, we are multiplying seconds since epoch by TimeSpan.TicksPerSecond (10,000,000):

image

UPDATE: My original "seconds since 1970" number was actually "seconds since year 0". I've updated the math above to (hopefully) be correct.

@danmoseley
Copy link
Member

Thank you. Somehow I read it as 10^52 😀

double secondsSinceEpoch = GetSecondsSinceEpochFromDateTimeOffset(timestamp);
return secondsSinceEpoch.ToString("F9", CultureInfo.InvariantCulture); // 6 decimals, no commas
decimal secondsSinceEpoch = GetSecondsSinceEpochFromDateTimeOffset(timestamp);
return secondsSinceEpoch.ToString("F9", CultureInfo.InvariantCulture); // 9 decimals, no commas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want G9?

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe 9 is too small.

image

Copy link
Member

@eerhardt eerhardt Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@tannergooding tannergooding Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just G (the default) will print the shortest roundtrippable string. G9 will print the shortest string or up to 9 significant digits, whichever is lesser. However, it will also print exponentials.

There is actually a decent amount of "inconsistency" in how the formatting APIs work between the different format specifiers and its a bit frustrating at times, but we can't easily change existing ones due to back-compat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it will also print exponentials.

I had initially tested G and because it shows E+XX like in the screenshot above, I decided to use F instead.

@tannergooding just to make sure we are all in agreement, using F is ok?

Copy link
Member Author

@carlossanlop carlossanlop Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what scenarios?

double d = 1655815010.199999998;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 1.65581501E+09
d = 1.199999998;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 1.2
d = 123456789.199999998;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 123456789 (lol wat?)

Copy link
Member Author

@carlossanlop carlossanlop Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And F9 with double does not seem to improve:

double d = 1655815010.199999998;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1655815010.200000048
d = 1.199999998;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1.199999998
d = 123456789.199999998;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 123456789.200000003

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using a decimal in the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With decimal, F9 works fine:

decimal d = 1655815010.199999998M;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1655815010.199999998
d = 1.199999998M;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 1.199999998
d = 123456789.199999998M;
Console.WriteLine(d.ToString("F9", CultureInfo.InvariantCulture)); // 123456789.199999998

But G9 behaves similarly to the example above for double:

decimal d = 1655815010.199999998M;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); //1.65581501E+09
d = 1.199999998M;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 1.2
d = 123456789.199999998M;
Console.WriteLine(d.ToString("G9", CultureInfo.InvariantCulture)); // 123456789

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use G as suggested above, and as explained via chat.

I am also going to remove some calls to this conversion code where it's not necessary. This code to convert timestamps is only required when converting from gnu to pax or viceversa (to store atime and ctime as required by the format).

}
else if (originalEntry.Format is TarEntryFormat.Ustar or TarEntryFormat.V7)
{
CompareDateTimeOffsets(initialNow, actualAccessTime);
CompareDateTimeOffsets(initialNow, actualChangeTime);
AssertExtensions.GreaterThanOrEqualTo(actualAccessTime, initialNow);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this GreaterThanOrEqualTo and not just Assert.Equal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these cases, the initialNow timestamp is DateTimeOffset.UtcNow, which is generated shortly before invoking the constructors that take the other entry. Since the constructors generate the mtime timestamp automatically using UtcNow, I can't know the exact expected value, but I can at least verify that the value is larger than the timestamp I saved before calling the constructors.

The cases that do an Equal comparison are the ones where the existing mtime is used as the value to store for atime and ctime, so I know the exact expected value to compare.

@carlossanlop
Copy link
Member Author

carlossanlop commented Jun 22, 2022

@eerhardt the latest commits take care of the following:

  • Avoid using the conversion methods when it is not needed: There is a case where we create a FileSystemInfo object, and we can reuse 2 of the 3 Last*TimeUtc properties.
  • Fixed a bug checking two times for a key in the dictionary, instead of two different ones.
  • Using 'G' for string formatting of the decimal, as recommended. I also modified the related test to not require finding a dot character in the string. The important thing is that later that timestamp should be able to do the roundtrip, and the tests should pass.
    • The particular case where the string timestamp might not contain a dot and decimal portion is when the a v7/ustar/gnu entry is converted to pax, and the mtime field is then copied into the extended attributes dictionary. The mtime field in that case did not have a decimal portion, it was obtained directly from the old header mtime field, which holds an integer.
  • Process ten base fields containing timestamp seconds as long, not int (commit is below).

@carlossanlop
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@carlossanlop
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -383,10 +383,10 @@ private void ReadPosixAndGnuSharedAttributes(Span<byte> buffer)
private void ReadGnuAttributes(Span<byte> buffer)
{
// Convert byte arrays
int aTime = TarHelpers.GetTenBaseNumberFromOctalAsciiChars(buffer.Slice(FieldLocations.ATime, FieldLengths.ATime));
long aTime = TarHelpers.GetTenBaseLongFromOctalAsciiChars(buffer.Slice(FieldLocations.ATime, FieldLengths.ATime));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to create a test that has a date time in 2039 to make sure we can handle dates past 2038.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for the epochalypse and the max upper limit in octal.

@@ -397,7 +396,7 @@ private void CollectExtendedAttributesFromStandardFieldsIfNeeded()
_extendedAttributes.Add(PaxEaName, _name);

bool containsATime = _extendedAttributes.ContainsKey(PaxEaATime);
bool containsCTime = _extendedAttributes.ContainsKey(PaxEaATime);
bool containsCTime = _extendedAttributes.ContainsKey(PaxEaCTime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that would have caught this bug? If not, we should add one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, kinda:

  • The TarWriter_WriteEntry_Pax_Tests.WritePaxAttributes_Timestamps_AutomaticallyAdded checks this for the PaxTarEntry(entryFormat, string) constructor, when the user does not explicitly set atime and ctime.
  • The PaxTarEntry_Conversion_Tests.Constructor_ConversionFrom* tests are semi-related: they check that atime and ctime are always in the dictionary after conversion, in the constructor itself, not at write time.
  • We don't have a test for the conversion constructors, so I'm adding one.

But now that you mention it, it seems these two conditions will never be false, making the code unreachable, so I am removing the code. Here's why:

When constructing a PaxTarEntry, whether it is new or it is being converted from another entry, all constructors ensure to add atime and ctime to the extended attributes dictionary if the user did not provide them (see the method AddNewAccessAndChangeTimestampsIfNotExist). Consider also that the extended attributes dictionary is exposed to the user as an IReadOnlyDictionary<string, string> getter-only property, so no new items can be added after construction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Both the existing test and the new test I'm adding pass after removing the unreachable code.

new DateTimeOffset((long)(secondsSinceUnixEpoch * TimeSpan.TicksPerSecond) + DateTime.UnixEpoch.Ticks, TimeSpan.Zero);

// Converts the specified DateTimeOffset to the number of seconds that have passed since the Unix Epoch.
internal static double GetSecondsSinceEpochFromDateTimeOffset(DateTimeOffset dateTimeOffset) =>
((double)(dateTimeOffset.UtcDateTime - DateTime.UnixEpoch).Ticks) / TimeSpan.TicksPerSecond;
internal static decimal GetSecondsSinceEpochFromDateTimeOffset(DateTimeOffset dateTimeOffset) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be internal? Is it only called from this class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// Converts the array to an octal base number, then transforms it to ten base and returns it.
internal static long GetTenBaseLongFromOctalAsciiChars(Span<byte> buffer)
{
string str = GetTrimmedAsciiString(buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to create an intermediate string just to parse it into an integer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to use System.Buffers.Text.Utf8Parser

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a Utf8Parser API that parses the integer from an "octal" string. I see it supports x - hexstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a new allocation is not needed. This code can be improved to instead of returning a string, it returns an int representing the length of the ROS for slicing.

The method checks that the last character(s) in the ROS are either a 0 (null char) or a 32 (space). All other characters are not trimmed. Which means that if an unexpected non-numeric character is found, it will cause the conversion to fail.

Do you mind if I address this request later? I'd like to get this PR merged just for the DateTimeOffsets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind if I address this request later?

I think that is fine.

entry._header._mTime = TarHelpers.GetDateTimeOffsetFromSecondsSinceEpoch(status.MTime);
entry._header._aTime = TarHelpers.GetDateTimeOffsetFromSecondsSinceEpoch(status.ATime);
entry._header._mTime = info.LastWriteTimeUtc;
entry._header._aTime = info.LastAccessTimeUtc;
entry._header._cTime = TarHelpers.GetDateTimeOffsetFromSecondsSinceEpoch(status.CTime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the FileStatus code:

return UnixTimeToDateTimeOffset(_fileCache.MTime, _fileCache.MTimeNsec);
}
internal void SetLastWriteTime(string path, DateTimeOffset time, bool asDirectory)
=> SetAccessOrWriteTime(path, time, isAccessTime: false, asDirectory);
private static DateTimeOffset UnixTimeToDateTimeOffset(long seconds, long nanoseconds)
{
return DateTimeOffset.FromUnixTimeSeconds(seconds).AddTicks(nanoseconds / NanosecondsPerTick);
}

It looks like status.XTime gets added to status.XTimeNsec to make this DateTimeOffset. We will be respecting that for M and A time, but it looks like we don't respect that for C time now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Changing this logic to:

            entry._header._cTime = DateTimeOffset.FromUnixTimeSeconds(status.CTime).AddTicks(status.CTimeNsec / 100 /* nanoseconds per tick */);

@carlossanlop
Copy link
Member Author

I'm rebasing on top of the latest bits in main, which contain the fixes for the timestamp and Apple-specific test failures.

@carlossanlop
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks @carlossanlop.

@carlossanlop
Copy link
Member Author

The tvOS failure is unrelated to this change, and it is not critical:

The ExtractToFile_SpecialFile_Unelevated_Throws method threw UnauthorizedAccessException when attempting to extract the fifo from the archive into disk:

https://github.com/dotnet/runtime/blame/051b4828c7d3a0cad3289830ef9fd2120f45bb2b/src/libraries/System.Formats.Tar/tests/TarReader/TarReader.ExtractToFile.Tests.Unix.cs#L39

In other Unix platforms, extracting fifos does not throw, but apparently it does on tvOS. I'll get it fixed later.

Here's the callstack:

[14:55:07.7170830] 2022-06-22 14:55:07.706 System.Formats.Tar.Tests[58680:78698627]    Exception messages: System.UnauthorizedAccessException : Access to the path '/private/var/mobile/Containers/Data/Application/11374FD2-B51C-49C4-844D-F40F062ACE94/tmp/1pslf5q1.gf2/output' is denied.
[14:55:07.7171130] ---- System.IO.IOException : Operation not permitted
[14:55:07.7189880] 2022-06-22 14:55:07.708 System.Formats.Tar.Tests[58680:78698627]    Exception stack traces:    at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
[14:55:07.7190220]    at System.Formats.Tar.TarEntry.ExtractAsFifo(String destinationFileName)
[14:55:07.7190300] 2022-06-22 14:55:07.708 System.Formats.Tar.Tests[58680:78698627]    at System.Formats.Tar.TarEntry.ExtractToFileInternal(String filePath, String linkTargetPath, Boolean overwrite)
[14:55:07.7190360]    at System.Formats.Tar.TarEntry.ExtractToFile(String destinationFileName, Boolean overwrite)
[14:55:07.7201670] 2022-06-22 14:55:07.709 System.Formats.Tar.Tests[58680:78698627]    at System.Formats.Tar.Tests.TarReader_ExtractToFile_Tests.ExtractToFile_SpecialFile_Unelevated_Throws()

@carlossanlop carlossanlop merged commit 70e9ca0 into dotnet:main Jun 23, 2022
@carlossanlop carlossanlop deleted the FixTarTimestamps branch June 23, 2022 00:22
@ghost ghost locked as resolved and limited conversation to collaborators Jul 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
6 participants