-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.IO.Packaging.Package should have a way to handle common errors with malformed packages #26084
Comments
@karelz How can I get this proposal approved? I'm willing to implement it - many users of the Open XML SDK would appreciate this |
@JeremyKuhne @pjanotti is it something you can help shepherd further? If it is out of scope of your knowledge, let's just ask @twsouthwick to bring it to API review and discuss it there ... |
What I would like to see is how this proposal would be used to solve the given problem. How does an |
@JeremyKuhne I don't envision the context being that important. In .NET 4.5 Uri started throwing exceptions on malformed Uris instead of just passing them through. As people start moving to .NET Core, they'll hit this more and more on office documents in DocumentFormat.OpenXml that used to work. This allows them to either sanitize the entry, or potentially subclass Uri to hold the original and also sanitize it so it'll open the package correctly. So, this could be used by the following: Package.Open("path", new PackageSettings { UriProvider = new MalformedProvider() });
class MalformedProvider : IUriProvider
{
Uri CreateUri(string uriString, UriKind uriKind)
{
if(Uri.TryCreate(uriString, uriKind, out var result))
{
return result;
}
else
{
return new MalformedUri(uriString, uriKind);
}
}
}
internal class MalformedUri : Uri
{
public MalformedUri(string uriString, UriKind uriKind)
: base("/", uriKind)
{
Original = uriString;
}
public string Original { get; }
} Another route that could be taken to fix this would be to allow |
@JeremyKuhne, I support @twsouthwick's statement that context would not be important. In practice, this often happens because Excel, for example, interprets some text containing an "@" as a For example, since we have been waiting for a fix for quite some time now, I implemented my own workaround in which I am using |
@JeremyKuhne and @twsouthwick, I thought about this a little more and came up with the idea to encode the malformed URI in the public static Uri FromMalformedUriString([NotNull] string malformedUri)
{
if (malformedUri == null) throw new ArgumentNullException(nameof(malformedUri));
string unescapedMalformedUri = Uri.UnescapeDataString(malformedUri);
string escapedMalformedUri = Uri.EscapeDataString(unescapedMalformedUri);
return new Uri("http://malformed-uri/?value=" + escapedMalformedUri);
} The original malformed URI can be retrieved as follows: public static string GetMalformedUriString([NotNull] Uri uri)
{
if (uri == null) throw new ArgumentNullException(nameof(uri));
if (uri.Scheme != "http") throw new ArgumentException();
if (uri.Host != "malformed-uri") throw new ArgumentException();
if (uri.AbsolutePath != "/") throw new ArgumentException();
string query = uri.Query;
if (string.IsNullOrEmpty(query)) throw new ArgumentException();
if (!query.StartsWith("?value=")) throw new ArgumentException();
string escapedDataValue = query.Substring(7);
string unescapeDataValue = Uri.UnescapeDataString(escapedDataValue);
return Uri.EscapeUriString(unescapeDataValue);
} And here's a unit test that tests this with some sample malformed URIs from actual Excel workbooks (noting that I created the third one to test unencoded strings). [Theory]
[InlineData("mailto:CM@SAP:%20Create/Change%20Work%20Center")]
[InlineData("mailto:CM@SAP:%20User%20Assignement")]
[InlineData("mailto:CM@SAP: Create/Change Work Center")]
public void CanCreateWellFormedUris(string malformedUri)
{
Assert.Throws<UriFormatException>(() => new Uri(malformedUri));
// Create a valid Uri that encodes the malformed URI.
Uri validUri = MalformedUriFixer.FromMalformedUriString(malformedUri);
string malformedUriString = MalformedUriFixer.GetMalformedUriString(validUri);
_output.WriteLine("Original malformed URI: " + malformedUri);
_output.WriteLine("Encoded malformed URI: " + validUri.AbsoluteUri);
_output.WriteLine("Restored malformed URI: " + malformedUriString);
Assert.Equal(
Uri.EscapeUriString(Uri.UnescapeDataString(malformedUri)),
malformedUriString);
} |
I think we can add the IPackage interface like this code 00a2d82 And we can custom handle exception with the custom class that inherit the IPackage interface |
@JeremyKuhne I've taken another look at this since OpenXml users have been hitting it. After playing around with usage, I've updated my proposal to the following new APIs for System.IO.Packaging. I'm happy to take this throw the API proposal process, but haven't done that before, so any guidance would be helpful. namespace System.IO.Packaging
{
public abstract partial class Package
{
protected Package(PackageSettings settings) { }
public static System.IO.Packaging.Package Open(System.IO.Stream stream, PackageSettings settings) { throw null; }
public static System.IO.Packaging.Package Open(string path, PackageSettings settings) { throw null; }
}
public partial class PackageSettings
{
public FileMode PackageMode { get { throw null; } set { throw null; } }
public FileAccess PackageAccess { get { throw null; } set { throw null; } }
public FileShare PackageShare { get { throw null; } set { throw null; } }
public IUriFactory UriFactory { get { throw null; } set { throw null; } }
}
public interface IUriFactory
{
System.Uri CreateUri(string url, System.UriKind kind);
string GetOriginalString(System.Uri uri);
}
public class DefaultUriFactory : IUriFactory
{
public static IUriFactory Instance => throw null;
protected DefaultUriFactory() { }
public virtual Uri CreateUri(string url, UriKind kind) => throw null;
public virtual string GetOriginalString(Uri uri) => throw null;
}
} The This would allow someone to intercept the creation and writing of the Uri instances so that they can do whatever they want with it. For example, here are some test cases using it: [Fact]
public void MalformedHyperlinkThrows()
{
using (var package = Package.Open("malformed_hyperlink.xlsx"))
{
var part = package.GetPart(new Uri("/xl/worksheets/sheet1.xml", UriKind.Relative));
Assert.Throws<UriFormatException>(() => part.GetRelationships());
}
}
[Fact]
public void MalformedHyperlinkHandle()
{
const string ExpectedUri = "mailto:mailto@one@";
var settings = new PackageSettings
{
UriFactory = new MalformedUriFactory()
};
using (var package = Package.Open("malformed_hyperlink.xlsx", settings))
{
var part = package.GetPart(new Uri("/xl/worksheets/sheet1.xml", UriKind.Relative));
var relationship = Assert.Single(part.GetRelationships());
var malformed = Assert.IsType<MalformedUri>(relationship.TargetUri);
Assert.Equal(ExpectedUri, malformed.Uri);
}
}
[Fact]
public void MalformedHyperlinkRoundtrip()
{
const string InvalidUri = "mailto:mailto@one@";
var settings = new PackageSettings
{
UriFactory = new MalformedUriFactory(),
PackageMode = FileMode.OpenOrCreate,
PackageAccess = FileAccess.ReadWrite
};
using (var stream = new MemoryStream())
{
using (var package = Package.Open(stream, settings))
{
Assert.Empty(package.GetRelationships());
package.CreateRelationship(new MalformedUri(InvalidUri), TargetMode.External, "relationship");
}
using (var package = Package.Open(stream, settings))
{
var relationship = Assert.Single(package.GetRelationships());
var malformed = Assert.IsType<MalformedUri>(relationship.TargetUri);
Assert.Equal(InvalidUri, malformed.Uri);
}
}
}
private class MalformedUriFactory : IUriFactory
{
public Uri CreateUri(string url, UriKind kind)
{
if (Uri.TryCreate(url, kind, out var result))
{
return result;
}
return new MalformedUri(url);
}
public string GetOriginalString(Uri uri)
{
if (uri is MalformedUri malformed)
{
return malformed.Uri;
}
return uri.OriginalString;
}
internal class MalformedUri : Uri
{
public MalformedUri(string uriString)
: base("http://unknown")
{
Uri = uriString;
}
public string Uri { get; }
public override string ToString() => Uri;
} |
@twsouthwick Thanks for putting time into this. Note that I'm no longer on the libraries team, @carlossanlop can help move this along. |
Is there any updated news? |
@carlossanlop Any thoughts on this API? I haven't done an API review, but happy to help move something like this along. As more users are moving to .NET Core/.NET 5, this will become a problem for many LOB apps that rely on DocumentFormat.OpenXml. |
@danmosemsft Similar #1544, this one is blocking people from moving to .NET Core/.NET 5 who use DocumentFormat.OpenXml as documents that would open just fine using System.IO.Packaging when targeting < .NET 4.5 but now fail to open with no way to handle it. |
cc @ericstj whose team owns these API's and can evaluate. Unless it's bug fix level, it would be 6.0 work at this point. |
Correct, the soonest this could go in is 6.0. Some thoughts on the proposal. It sounds like the primary purpose of this issue is to add an For example: what if you just added a 1 overload to Open with default parameters? EG: Another concern I have: why are we adding API to deal with malformed packages? Can you instead just convert those packages so that they are no longer malformed? For example, go through and replace any malformed URL with a URL-encoded string. That seems like the typical thing we'd recommend folks to do when we discovered a bad format and want them to get off of it. /cc @dotnet/ncl on the Uri details. |
Thanks @ericstj. I've added a tool into the OpenXml library to handle sanitizing it (dotnet/Open-XML-SDK#793). Downside is that this requires rewriting the document, which may have only been read only before. However, it is interesting that the format itself appears to handle malformed Uris just fine and don't have the restrictions that System.Uri has. ie Word/Excel/etc happily generate files with these values. The roundtripping here is so that the values can be rewritten so the document can be read without modifying anything. OriginalString can't be overwritten to output the original, malformed Uri, so a derived type could be used to ensure the roundtripping. This would allow the OpenXml library to transparently handle the change so people don't have to modify (potentially old) LOB apps to handle this case. Having a way to parse a Uri with a relaxed mode similar to pre-.NET 4.5 would be ideal, so hopefully @dotnet/ncl can give some feedback for that. |
Regarding
In summary, I would strongly urge against making |
Is there any updated news? |
System.IO.Packaging.Package
provides a simple abstraction over a container to hold multiple objects in a single entity. This is commonly used in nupkg packages, vsix extensions, appx, and also Office documents. The package format makes heavy use of URIs to maintain relationships between various components. However, Office will allow a user to generate an invalid URI for one of these relationships, and then opening via System.IO.Packaging fails with no easy way of recovery besides manually updating the file, which necessitates a deeper understanding of the file format that should be abstracted by the library.This is to work around a breaking change that occurred between 4.0->4.5. This proposal is to provide a way to work around that change.
Example
There are multiple issues filed on the OpenXml SDK project (which abstracts the Office format to strongly typed classes) and related projects where users have been hit by this. For example:
A simple Office document that will fail can be created by the following:
mailto:one@
)Package
:This will fail with the following exception:
Proposed API
Details
PackageSettings
class would allow for easily adding additional settings at a later point without adding more factory methods (i.e.Package.Open(...)
). Since theFileMode
andFileAccess
properties are used on all factory methods, these are added to the settings object.Package.Open(...)
methods listed extend the current factory methods that take aStream
or a path with an optionalFileShare
.IUriProvider
would essentially be this line: https://source.dot.net/#System.IO.Packaging/System/IO/Packaging/InternalRelationshipCollection.cs,363The text was updated successfully, but these errors were encountered: