-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: TarReader should not assume same format for all entries in a tar file #69544
Comments
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsBackground and motivationThe user @Bio2hazard was kind enough to report this bug when testing the new The tar file contains 4 entries:
I confirmed this by inspecting the archive with a Hex Editor. The archive also shows that all 4 entries have The current behavior of our Surprisingly, if this same file is opened and traversed with SharpCompress or 7-zip, they can traverse all the entries without problems. This means that:
I am opening this proposal to discuss the possibility of becoming as flexible as SharpCompress and 7-zip, at least when it comes to the The @bartonjs @stephentoub @eerhardt @adamsitnik @jozkee @jeffhandley @baronfel API ProposalRemove the namespace System.Formats.Tar;
public class TarReader : IDisposable
{
- public TarFormat Format { get; private set; }
} API Usageusing FileStream fs = File.OpenRead("archive.tar"); // Archive with intermixed entries
using TarReader reader = new(fs);
TarEntry entry = reader.GetNextEntry();
if (entry is V7TarEntry v7Entry)
{
// do something if v7
}
else if (entry is UstarTarEntry ustarEntry)
{
// do something if ustar
}
else if (entry is PaxTarEntry paxEntry)
{
// do something if pax
}
else if (entry is GnuTarEntry gnuEntry)
{
// do something if gnu
} Alternative DesignsOptionally, but not really needed, we could move the namespace System.Formats.Tar;
public abstract class TarEntry
{
+ public TarFormat Format { get; private set; }
} Which would allow using the property instead of casting: using FileStream fs = File.OpenRead("archive.tar"); // Archive with intermixed entries
using TarReader reader = new(fs);
TarEntry entry = reader.GetNextEntry();
switch (entry.Format)
{
case TarFormat.V7:
// do something if v7
break;
case TarFormat.Ustar:
// do something if ustar
break;
case TarFormat.Pax:
// do something if pax
break;
case TarFormat.Gnu:
// do something if gnu
break;
} RisksLow.
|
I think adding the I assume GNU-format global attributes still have to be literally the first entry to work as expected, and not simply the first GNU-format entry in the archive. If not, is there a way of making the behavior apparent to a user who might want to read from them? What does this do to the writer? Now that there's evidence that the various existing TAR programs don't care about hybrid format files, it seems like the writer shouldn't convert TarEntry objects by itself; just using an initial mode for what kind of format to use for the easy methods. Once it's NOT converting, then maybe there's missing API to perform the conversion, like each |
Re-reading the top I see you already talked about the writer continuing to be converting. To me that made sense when an archive had to use only one family of entries, but makes less sense if the majority of tools support hybrid. The ability to convert probably has some tidiness/goodness; but it feels like it should maybe just be extra/special API (like converting copy-style ctors). |
From our offline conversation, Jeremy, removing the format property from TarReader and moving it to TarEntry make the most sense, so I'll edit the proposal and mark it as ready for review. |
Looks good as proposed. namespace System.Formats.Tar
{
public partial class TarReader
{
- public TarFormat Format { get; }
}
public partial class TarEntry
{
+ public TarFormat Format { get; }
}
} |
Background and motivation
The user @Bio2hazard was kind enough to report this bug when testing the new
System.Formats.Tar
APIs published in .NET 7 Preview4 with a tar file he generated with their own work tools. He shared the file here.The tar file contains 4 entries:
RegularFile
ustar
entries.gnu
metadata entry of typeLongLink
.RegularFile
entry with the actual data that the previousLongLink
metadata entry describes.I confirmed this by inspecting the archive with a Hex Editor.
The archive also shows that all 4 entries have
magic
and aversion
metadata fields following theustar
rules. Even the 3rd entry, which is clearly not supported byustar
because the entry type isL
. In thegnu
format, themagic
andversion
fields are slightly different to those from POSIX formats (ustar
andpax
). I describe those differences here.The current behavior of our
TarReader
is to throwFormatException
when initially reading an entry of a particular format, and then encountering an subsequent entry in a different format, or the entry type is unsupported by the initially-assumed format of the whole archive.Surprisingly, if this same file is opened and traversed with SharpCompress or 7-zip, they can traverse all the entries without problems. This means that:
I am opening this proposal to discuss the possibility of becoming as flexible as SharpCompress and 7-zip, at least when it comes to the
TarReader
.The
TarWriter
should keep its current behavior: The user should specify the initial format in the constructor, and if an unsupported entry is inserted, it should be converted to that format if possible, or throw an exception if the file is unsupported. This behavior aligns with the Unixtar
tool, which fails if, for example, the user tries to add a block device entry, or a long path entry, when creating av7
orustar
archive (these two formats do not support those types of files).@bartonjs @stephentoub @eerhardt @adamsitnik @jozkee @jeffhandley @baronfel
API Proposal
Remove the
Format
property fromTarReader
to stop assuming all entries are in the same format:Remove:
namespace System.Formats.Tar; public class TarReader : IDisposable { - public TarFormat Format { get; } }
Add:
public class TarEntry { + public TarFormat Format { get; } }
API Usage
Alternative Designs
One alternative is to just not remove anything, and if we encounter an entry of a different format than the first one, we just switch the
Format
property toUnknown
, but keep returning entries to the user when they callGetNextEntry
. We would have to document this very clearly.Risks
Low.
The text was updated successfully, but these errors were encountered: