Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve colliding file names on case-insensitive file systems #7837

Open
sschuberth opened this issue Oct 30, 2020 · 13 comments
Open

Resolve colliding file names on case-insensitive file systems #7837

sschuberth opened this issue Oct 30, 2020 · 13 comments

Comments

@sschuberth
Copy link
Member

sschuberth commented Oct 30, 2020

When cloning this repo on Windows I get:

warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'curations/nuget/nuget/-/DiscUtils.yaml'
  'curations/nuget/nuget/-/Discutils.yaml'
  'curations/nuget/nuget/-/EMGU.CV.yaml'
  'curations/nuget/nuget/-/Emgu.CV.yaml'
  'curations/nuget/nuget/-/GitLink.yaml'
  'curations/nuget/nuget/-/gitlink.yaml'
  'curations/nuget/nuget/-/Microsoft.AspNetCore.Authentication.JwtBearer.yaml'
  'curations/nuget/nuget/-/microsoft.aspnetcore.authentication.jwtbearer.yaml'
  'curations/nuget/nuget/-/MSBuildTasks.yaml'
  'curations/nuget/nuget/-/MsBuildTasks.yaml'
  'curations/nuget/nuget/-/React.Core.yaml'
  'curations/nuget/nuget/-/react.core.yaml'
  'curations/nuget/nuget/-/StructureMap.yaml'
  'curations/nuget/nuget/-/structuremap.yaml'
  'curations/nuget/nuget/-/System.Data.HashFunction.SpookyHash.yaml'
  'curations/nuget/nuget/-/system.data.hashfunction.spookyhash.yaml'
  'curations/nuget/nuget/-/System.Windows.Interactivity.WPF.yaml'
  'curations/nuget/nuget/-/system.windows.interactivity.wpf.yaml'
  'curations/nuget/nuget/-/TinyMCE.JQuery.yaml'
  'curations/nuget/nuget/-/TinyMCE.jQuery.yaml'
  'curations/nuget/nuget/-/ValueInjecter.yaml'
  'curations/nuget/nuget/-/valueinjecter.yaml'
  'curations/nuget/nuget/-/VsWhere.yaml'
  'curations/nuget/nuget/-/vswhere.yaml'
  'curations/pypi/pypi/-/Pillow.yaml'
  'curations/pypi/pypi/-/pillow.yaml'
  'curations/pypi/pypi/-/Resource.yaml'
  'curations/pypi/pypi/-/resource.yaml'

I believe the contents of these pairs of files should be merged, and the capitalization should be aligned to the spelling used in the respective registry. Would you agree with that rationale @capfei @fossygirl?

@fossygirl
Copy link
Member

Sorry, I don't know much about casing. @MichaelTsengLZ @jeffmcaffer or @jeffmendoza may be able to help here.

@sschuberth
Copy link
Member Author

I've proposed a PR at #7838.

@jeffmcaffer
Copy link
Member

Thanks @sschuberth for finding this. I vaguely recall doing a deep dive on case preservation and sensitivity and feel like there's a doc or issue around that talks about our handling strategy. Seem to recall that Dan did this work back in the day. Can't find it right now. On the surface it would appear that this curation is the right thing to do. I am however concerned about the root cause and whether this this change will either perpetuate or exacerbate the problem (the latter seems unlikely).

It is entirely possible that the actual package changed their casing at some point in time. That seems especially possible in the NuGet world (seen it before). In that case (hah! pun) we'd have to fall back on the aforementioned casing analysis. My vague recollection is that we ended up at case preserving but case insensitive. The basic assumption being that we did not find any providers that were case sensitive (There were some old cases in npm land I think but they were old and appeared to addressed).

/cc @nellshamrell

@jeffmcaffer
Copy link
Member

clearlydefined/service#157 talks about this somewhat at the implementation level. Still feel like there was an analysis of all the supported (and some prime potential) package managers with some summarization / discussion.

@sschuberth
Copy link
Member Author

At least for NuGet, given that the API requires the package ID to be lower case, maybe it makes sense to align on all lower case file names?

Or actually, does the casing of the file name ever matter? I guess not, as the name from the coordinates in the YAML should be taken for any lookups. Which means we could always (and for any package manager) align on lower case file names.

@jeffmcaffer
Copy link
Member

I'm hesitant to make any concrete/general call on this without consulting the prior research. Casing is fraught with oddities. For this particular PR it should be possible to validate that with the changes merged, the curations are still applied to the relevant component/version.

Stepping back a bit, @sschuberth, how did this curation come about with different casing? Is this coming from ORT tooling using lowercase? or some other workflow?

@sschuberth
Copy link
Member Author

how did this curation come about with different casing?

I have no idea. To the best of my knowledge none of these curations were initially created by ORT or by myself in person. My brief checks just showed that the initial commit each was done by @clearlydefinedbot.

@jeffmcaffer
Copy link
Member

yeah, that likely means that the PR was entered via the website and we are unable to do proper attribution (there was a technical challenge IIRC)

Anyway, I believe that NuGet is case preserving but not case sensitive. The metadata for DiscUtils.Core has mixed case (see below) while the protocol/service, as you point out, lowercases everything. So the user view of the package is mixed case.

<package xmlns="http://schemas.microsoft.com/packaging/2013/05/nuspec.xsd">
  <metadata>
    <id>DiscUtils.Core</id>
    <version>0.15.1-ci0007</version>

Reaching way back I believe we our goal was to present users with the most authentic naming so are attempting to preserve the casing. So while the protocol is all lower, the package itself and the user view is mixed.

That still leaves us with the challenge of what casing to use in the underlying implementation given that Git is case-sensitive (your point here). I would have said that if the system is case IN-sensitive, we should toLower() the coordinates for storage. Again, reaching back, I recall there may have been a few places where we look at the storage layer and reverse engineer coordinates. That's not a great and we may have stopped doing that.

HAH! I found the doc
https://github.com/clearlydefined/clearlydefined/blob/533afead4c95e3ace9dbd727e2a166d89d229af8/docs/providers.md
Unfortunately that doesn't quite help in as much it captures observed behavior, not quite the reasoning why.

Path forward.

  • This is an awesome discussion and it's likely a good thing to be revisited and properly documented (with reasoning). That should likely be done in either a service or clearlydefined (near the doc) repo issue. Might get lost here
  • The PR cited here IMO should NOT be merged as it would vary from the stated behavior. Better to be consistently wrong (if we are at all) than inconsistently right. While that may make it LOOK like we split the curation, if we go by the casing in the package itself, we'll always have curation casing that matches the targeted package's casing.

@sschuberth
Copy link
Member Author

I would have said that if the system is case IN-sensitive, we should toLower() the coordinates for storage. Again, reaching back, I recall there may have been a few places where we look at the storage layer and reverse engineer coordinates. That's not a great and we may have stopped doing that.

So, what's missing for me in the path forward is:

  1. Find out whether there still are place where you reverse engineer coordinates from file path in the storage layer.
  2. If there are such places, get rid of them by looking at the file contents instead of the file path to deduce coordinates.
  3. Align on all lower case paths, but keep file contents to be case preserving.
  4. Ensure that no conflicting casing can be added via the website. That is, existing curations must be amended even if the casing is different but the system is case-preserving and case-insensitive.

The PR cited here IMO should NOT be merged

Then please feel free to close my PR (with a rationale pointing back to here) so that it does not accidentally get merged.

@ariel11
Copy link
Contributor

ariel11 commented Oct 18, 2021

@nellshamrell - unclear if this Issue can be closed?

@sschuberth
Copy link
Member Author

It's easy to verify the problem persists. Just try to clone this repo on Windows and you'll still get

Updating files: 100% (6359/6359), done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'curations/git/github/Azure/azure-sdk-for-go.yaml'
  'curations/git/github/azure/azure-sdk-for-go.yaml'
  'curations/nuget/nuget/-/7Zip4PowerShell.yaml'
  'curations/nuget/nuget/-/7Zip4Powershell.yaml'
  'curations/nuget/nuget/-/CodeProject.ObjectPool.yaml'
  'curations/nuget/nuget/-/codeproject.objectpool.yaml'
  'curations/nuget/nuget/-/DiscUtils.yaml'
  'curations/nuget/nuget/-/Discutils.yaml'
  'curations/nuget/nuget/-/Elmah.MVC.yaml'
  'curations/nuget/nuget/-/Elmah.Mvc.yaml'
  'curations/nuget/nuget/-/EMGU.CV.yaml'
  'curations/nuget/nuget/-/Emgu.CV.yaml'
  'curations/nuget/nuget/-/EnvDTE.yaml'
  'curations/nuget/nuget/-/envdte.yaml'
  'curations/nuget/nuget/-/EnvDTE100.yaml'
  'curations/nuget/nuget/-/envdte100.yaml'
  'curations/nuget/nuget/-/EnvDTE80.yaml'
  'curations/nuget/nuget/-/envdte80.yaml'
  'curations/nuget/nuget/-/EnvDTE90.yaml'
  'curations/nuget/nuget/-/envdte90.yaml'
  'curations/nuget/nuget/-/EnvDTE90a.yaml'
  'curations/nuget/nuget/-/envdte90a.yaml'
  'curations/nuget/nuget/-/GitLink.yaml'
  'curations/nuget/nuget/-/gitlink.yaml'
  'curations/nuget/nuget/-/Jint.yaml'
  'curations/nuget/nuget/-/jint.yaml'
  'curations/nuget/nuget/-/LINQKit.yaml'
  'curations/nuget/nuget/-/LinqKit.yaml'
  'curations/nuget/nuget/-/Microsoft.AspNetCore.Authentication.JwtBearer.yaml'
  'curations/nuget/nuget/-/microsoft.aspnetcore.authentication.jwtbearer.yaml'
  'curations/nuget/nuget/-/MongoDB.LibMongocrypt.yaml'
  'curations/nuget/nuget/-/MongoDB.Libmongocrypt.yaml'
  'curations/nuget/nuget/-/MSBuildTasks.yaml'
  'curations/nuget/nuget/-/MsBuildTasks.yaml'
  'curations/nuget/nuget/-/PDFsharp-MigraDoc-GDI.yaml'
  'curations/nuget/nuget/-/PDFsharp-MigraDoc-gdi.yaml'
  'curations/nuget/nuget/-/React.Core.yaml'
  'curations/nuget/nuget/-/react.core.yaml'
  'curations/nuget/nuget/-/Semver.yaml'
  'curations/nuget/nuget/-/semver.yaml'
  'curations/nuget/nuget/-/StructureMap.yaml'
  'curations/nuget/nuget/-/structuremap.yaml'
  'curations/nuget/nuget/-/System.Data.HashFunction.SpookyHash.yaml'
  'curations/nuget/nuget/-/system.data.hashfunction.spookyhash.yaml'
  'curations/nuget/nuget/-/System.Windows.Interactivity.WPF.yaml'
  'curations/nuget/nuget/-/system.windows.interactivity.wpf.yaml'
  'curations/nuget/nuget/-/TinyMCE.JQuery.yaml'
  'curations/nuget/nuget/-/TinyMCE.jQuery.yaml'
  'curations/nuget/nuget/-/ValueInjecter.yaml'
  'curations/nuget/nuget/-/valueinjecter.yaml'
  'curations/nuget/nuget/-/VsWhere.yaml'
  'curations/nuget/nuget/-/vswhere.yaml'
  'curations/nuget/nuget/-/Xamarin.GooglePlayServices.Gcm.yaml'
  'curations/nuget/nuget/-/xamarin.googleplayservices.gcm.yaml'
  'curations/pypi/pypi/-/Pillow.yaml'
  'curations/pypi/pypi/-/pillow.yaml'
  'curations/pypi/pypi/-/Resource.yaml'
  'curations/pypi/pypi/-/resource.yaml'
  'curations/pypi/pypi/-/wxPython.yaml'
  'curations/pypi/pypi/-/wxpython.yaml'

@nellshamrell
Copy link
Contributor

@ariel11 let's keep it open. I don't know when I will have time to address this, but it's good to keep it on our backlog.

@Littlerock505

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants