-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Allow multiple Global Extended Attributes entries in Tar archives with PAX format #69935
Comments
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsBackground and motivationIn the initial Tar proposal, the assumption was that the PAX format allowed one special entry known as Global Extended Attributes (GEA from now on), at the beginning of the archive, to allow overriding the extended attributes all the subsequent entries in the archive. This was incorrect. The FreeBSD tar spec does not explain much about the GEA entries, except that they exist. Both the OpenGroup pax manual and the GNU tar manual explain the format of the name in the GEA entry, which is
But there is no mention of when to expect more than 1 entry and what they mean. Then I recently found this spec: IBM z/OS 2.5.0 pax manual, which has a clear and detailed description of how the GEA entry should work:
API ProposalNew: namespace System.Formats.Tar
{
public sealed partial class PaxGlobalExtendedAttributesTarEntry : System.Formats.Tar.PosixTarEntry
{
public PaxGlobalExtendedAttributesTarEntry(System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>> globalExtendedAttributes) { }
public System.Collections.Generic.IReadOnlyDictionary<string, string> GlobalExtendedAttributes { get { throw null; } }
} Remove: public sealed partial class TarReader : System.IDisposable
{
- public System.Collections.Generic.IReadOnlyDictionary<string, string>? GlobalExtendedAttributes { get { throw null; } }
} Modify: public sealed partial class TarWriter : System.IDisposable
{
- public TarWriter(System.IO.Stream archiveStream, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>>? globalExtendedAttributes = null, bool leaveOpen = false) { }
+ public TarWriter(System.IO.Stream archiveStream, bool leaveOpen = false) { }
} API UsageBeforeWe could only add one GEA entry: // Write
Dictionary<string, string> attributes = new();
attributes["SomeAttributeName"] = "I'm an extended attribute!";
using MemoryStream ms = new();
using TarWriter writer = new(ms, attributes, leaveOpen: true)
{
// Add some more entries if desired
}
ms.Position = 0;
using TarReader reader = new(ms, leaveOpen: false);
reader.GetNextEntry(); // Advance the reader to detect format and GEA entry
// Access the values of the single GEA
Console.WriteLine(reader.GlobalExtendedAttributes["SomeAttributeName"]); // "I'm an extended attribute!" AfterWith this proposed change, we could now add multiple GEA entries: // Write
Dictionary<string, string> attributes1 = new();
attributes["attr1"] = "I'm extended attribute 1!";
PaxGlobalExtendedAttributesTarEntry gea1 = new(attributes1);
Dictionary<string, string> attributes2 = new();
attributes["attr2"] = "I'm extended attribute 2!";
PaxGlobalExtendedAttributesTarEntry gea2 = new(attributes2);
using MemoryStream ms = new();
using TarWriter writer = new(ms, leaveOpen: true) // Default format is PAX for this constructor
{
writer.WriteEntry(gea1);
// Add some more entries if desired, they'll be affected by gea1
writer.WriteEntry(gea2);
// Add some more entries if desired, they'll be affected by gea2
}
ms.Position = 0;
using TarReader reader = new(ms, leaveOpen: false);
PaxGlobalExtendedAttributesTarEntry readGea1 = reader.GetNextEntry() as PaxGlobalExtendedAttributesTarEntry;
Console.WriteLine(readGea1.GlobalExtendedAttributes["attr1"]); // "I'm extended attribute 1!"
// Multiple calls of GetNextEntry for the other entries, until reaching the next GEA entry
PaxGlobalExtendedAttributesTarEntry readGea2 = reader.GetNextEntry() as PaxGlobalExtendedAttributesTarEntry;
Console.WriteLine(readGea2.GlobalExtendedAttributes["attr2"]); // "I'm extended attribute 2!" Alternative DesignsReuse PaxTarEntryWe could avoid adding a new class to represent a GEA entry, and instead reuse the existing Having the
public sealed partial class GnuTarEntry : System.Formats.Tar.PosixTarEntry
{
- public GnuTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+ public GnuTarEntry(System.Formats.Tar.TarEntryType entryType) { }
}
public sealed partial class PaxTarEntry : System.Formats.Tar.PosixTarEntry
{
- public PaxTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+ public PaxTarEntry(System.Formats.Tar.TarEntryType entryType) { }
- public PaxTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>> extendedAttributes) { }
+ public PaxTarEntry(System.Formats.Tar.TarEntryType entryType, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>> extendedAttributes) { }
}
public sealed partial class UstarTarEntry : System.Formats.Tar.PosixTarEntry
{
- public UstarTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+ public UstarTarEntry(System.Formats.Tar.TarEntryType entryType) { }
}
public sealed partial class V7TarEntry : System.Formats.Tar.TarEntry
{
- public V7TarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+ public V7TarEntry(System.Formats.Tar.TarEntryType entryType) { }
} Discarded designI also considered avoiding adding a new entry type, and instead write the dictionary directly: Dictionary<string, string> attributes = new();
attributes["hello"] = "world";
writer.WriteGlobalExtendedAttributes(attributes); But then how should we give the user the GEA entries in a reader? RisksLow. The APIs are new in 7.0, we are on time to improve them. @bartonjs @jeffhandley @adamsitnik @jozkee @tmds
|
Had a short conversation with @bartonjs offline about this, the main proposal makes more sense than the alternative. Marking this as ready. |
namespace System.Formats.Tar
{
public sealed partial class PaxGlobalExtendedAttributesTarEntry : System.Formats.Tar.PosixTarEntry
{
public PaxGlobalExtendedAttributesTarEntry(System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>> globalExtendedAttributes) { }
public System.Collections.Generic.IReadOnlyDictionary<string, string> GlobalExtendedAttributes { get { throw null; } }
}
// Conversion constructors. They move the DataStream value from other to [new].
public partial class GnuTarEntry
{
public GnuTarEntry(TarEntry other);
}
public partial class PaxTarEntry
{
public PaxTarEntry(TarEntry other);
}
public partial class UstarTarEntry
{
public UstarTarEntry(TarEntry other);
}
public partial class V7TarEntry
{
public V7TarEntry(TarEntry other);
}
} namespace System.Formats.Tar
{
public sealed partial class TarReader : System.IDisposable
{
- public System.Collections.Generic.IReadOnlyDictionary<string, string>? GlobalExtendedAttributes { get { throw null; } }
}
public sealed partial class TarWriter : System.IDisposable
{
- public TarWriter(System.IO.Stream archiveStream, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>>? globalExtendedAttributes = null, bool leaveOpen = false) { }
+ public TarWriter(System.IO.Stream archiveStream) {}
- public TarWriter(Stream archiveStream, TarEntryFormat archiveFormat, bool leaveOpen = false);
+ public TarWriter(Stream archiveStream, TarEntryFormat archiveFormat = TarEntryFormat.Pax, bool leaveOpen = false);
}
- public partial enum TarFormat { ... }
+ public partial enum TarEntryFormat { ... }
} |
Shouldn't the default to So it should look like this: namespace System.Formats.Tar
{
public sealed partial class TarReader : System.IDisposable
{
- public System.Collections.Generic.IReadOnlyDictionary<string, string>? GlobalExtendedAttributes { get { throw null; } }
}
public sealed partial class TarWriter : System.IDisposable
{
- public TarWriter(System.IO.Stream archiveStream, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>>? globalExtendedAttributes = null, bool leaveOpen = false) { }
- public TarWriter(Stream archiveStream, TarEntryFormat archiveFormat, bool leaveOpen = false);
+ public TarWriter(Stream archiveStream, bool leaveOpen = false);
- public void WriteEntry(string fileName, string entryName);
+ public void WriteEntry(string fileName, string entryName, TarEntryFormat archiveFormat = TarEntryFormat.Pax);
}
- public partial enum TarFormat { ... }
+ public partial enum TarEntryFormat { ... }
} |
No, we are going to default to writing fileName-based entries to what was defined in the TarWriter constructor: using TarWriter writer = new(archiveStream, TarEntryFormat.Pax, leaveOpen: false);
// A) Saves in the default format specified in the writer constructor
writer.WriteEntry("path/to/file.txt", "entryName.txt");
// B) If the user wants to write entries in different formats (extremely rare, but valid scenario), they'll need to do this:
GnuTarEntry gnuEntry = new(TarEntryType.RegularFile, "entryName2.txt");
using FileStream fs = File.OpenRead("path/to/file2.txt");
gnuEntry.DataStream = fs;
writer.WriteEntry(gnuEntry); // Written in gnu format
TarEntryFormat is not setable. It matches the class that was used to construct the entry: #69544 (comment) GnuTarEntry entry = new(TarEntryType.RegularFile, "entryName.txt");
Console.WriteLine(entry.Format); // Getter only, prints 'Gnu'
See comment B in my first code example. |
TarEntryFormat, Conversion and GEA changes have been fixed. |
Background and motivation
In the initial Tar proposal, the assumption was that the PAX format allowed one special entry known as Global Extended Attributes (GEA from now on), at the beginning of the archive, to allow overriding the extended attributes all the subsequent entries in the archive. This was incorrect.
The FreeBSD tar spec does not explain much about the GEA entries, except that they exist.
Both the OpenGroup pax manual and the GNU tar manual explain the format of the name in the GEA entry, which is
$TMPDIR/GlobalHead.%p.%n
, and describes the suffix number as:But there is no mention of when to expect more than 1 entry and what they mean.
Then I recently found this spec: IBM z/OS 2.5.0 pax manual, which has a clear and detailed description of how the GEA entry should work:
API Proposal
New:
Remove:
public sealed partial class TarReader : System.IDisposable { - public System.Collections.Generic.IReadOnlyDictionary<string, string>? GlobalExtendedAttributes { get { throw null; } } }
Modify:
API Usage
Before
We could only add one GEA entry:
After
With this proposed change, we could now add multiple GEA entries:
Alternative Designs
Reuse PaxTarEntry
We could avoid adding a new class to represent a GEA entry, and instead reuse the existing
PaxTarEntry
class. But there's a problem: it would be confusing to create aTarEntryType.GlobalExtendedAttributes
entry, because the constructor expects anentryName
argument, and in a GEA entry, the name is created internally byTarWriter
: the name depends on the platforms$TmpDir
, on the process ID, and the current GEA entry number, which is stored internally byTarWriter
.Having the
entryName
argument isn't really necessary, since we expose it in its own property with a getter and a setter. So there's a clean way of reusingPaxTarEntry
if we do the following modifications:entryName
as an argument, and the user should set it manually later. If the user attempts to pass an entry without a name toTarWriter.WriteEntry
, an exception is thrown, except if the entry is ofTarEntryType.GlobalExtendedAttributes
, becauseTarWriter
is in charge of writing the name.PaxTarEntry.Name
field would throw if the user attempts to set it on a GEA entry.Discarded design
I also considered avoiding adding a new entry type, and instead write the dictionary directly:
But then how should we give the user the GEA entries in a reader?
GetNextEntry
returns aTarEntry
. Having a dictionary of dictionaries that could hold the GEA dictionaries would be too messy and confusing.Risks
Low. The APIs are new in 7.0, we are on time to improve them.
@bartonjs @jeffhandley @adamsitnik @jozkee @tmds
The text was updated successfully, but these errors were encountered: