Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MPOM-244] upload SHA-512 only for source-release to staging repo #40

Merged
merged 6 commits into from
Jun 10, 2021

Conversation

kwin
Copy link
Member

@kwin kwin commented May 3, 2021

https://issues.apache.org/jira/browse/MPOM-244
The SHA-512 checksum should be attached similar to the source-release.

pom.xml Outdated Show resolved Hide resolved
Copy link
Member

@michael-o michael-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this change non-sense and incomplete:

  • ASF releases are solely distributed through dist.apache.org
  • Generating checksums on the fly for remote repos is done by Maven Resolver
  • Uploading checksums for a subset of items makes no sense

Note that Maven Resolver uses checksums to detect transport corruption only. Verification must happen with signatures.

@michael-o
Copy link
Member

michael-o commented May 3, 2021

@hboutemy WDYT?

@kwin
Copy link
Member Author

kwin commented May 3, 2021

ASF releases are solely distributed through dist.apache.org

Not in all Apache projects, look at https://repo1.maven.org/maven2/org/apache/sling/org.apache.sling.api/2.9.0/ or https://repo1.maven.org/maven2/org/apache/felix/org.apache.felix.scr/2.1.8/. Maven Central has the source-release artifacts as well.

Generating checksums on the fly for remote repos is done by Maven Resolver

Only MD5 and SHA1. SHA512 has been explicitly disabled for performance reasons (https://issues.apache.org/jira/browse/MRESOLVER-140)

Uploading checksums for a subset of items makes no sense

ASF policy enforces that only for those releases (https://infra.apache.org/release-distribution.html#sigs-and-sums). For the binaries it is never checked (because as you said, Maven will only use MD5/SHA1 to detect transport corruption) and ASF doesn't care about the binaries anyhow.

@michael-o
Copy link
Member

ASF releases are solely distributed through dist.apache.org

Not in all Apache projects, look at https://repo1.maven.org/maven2/org/apache/sling/org.apache.sling.api/2.9.0/ or https://repo1.maven.org/maven2/org/apache/felix/org.apache.felix.scr/2.1.8/. Maven Central has the source-release artifacts as well.

They must comply with: https://infra.apache.org/release-distribution.html. They can provide more if they like, but not less than that.

Generating checksums on the fly for remote repos is done by Maven Resolver

Only MD5 and SHA1. SHA512 has been explicitly disabled for performance reasons (https://issues.apache.org/jira/browse/MRESOLVER-140)

What holds you off to enable this?

Uploading checksums for a subset of items makes no sense

ASF policy enforces that only for those releases (https://infra.apache.org/release-distribution.html#sigs-and-sums). For the binaries it is never checked (because as you said, Maven will only use MD5/SHA1 to detect transport corruption) and ASF doesn't care about the binaries anyhow.

ASF never provides binary. The only official release is a source tarball/zip. Binaries are courtesy. So if you want to point someone to an Apache release, point to downloads.apache.org. That's it.

@kwin
Copy link
Member Author

kwin commented May 3, 2021

To be honest, it is a bit tough to discuss with you. Also language like "non-sense" does not really help. But let me try to respond to your complaints:

They must comply with: https://infra.apache.org/release-distribution.html. They can provide more if they like, but not less than that.

Yes, even Apache Sling and Felix do that, but this is IMHO not relevant in the context of this PR.

What holds you off to enable this?

This is only possible with command line options and not via pom.xml configuration. To distribute that among a team is much more effort than on relying on a plugin available from Maven Central which is even compatible with Apache Maven prior 3.8.1 (which is IMHO the first release shipping with Maven Resolver 1.6.x supporting SHA-512 on demand)

ASF never provides binary. The only official release is a source tarball/zip. Binaries are courtesy. So if you want to point someone to an Apache release, point to downloads.apache.org. That's it.

Same as above. IMHO not really relevant in the context of this PR. In fact a lot of ASF projects provide binaries via Maven Central although ASF only mandates distribution of source archives. The whole point about this PR is being able to use the Nexus Staging Repo as the single source for dist and Maven Central (as that eases validating a release candidate and therefore voting on a release).

@michael-o
Copy link
Member

Does this work for you? I also believe that if you attach checksums Resolver will calculate checksums of checksums.

@kwin
Copy link
Member Author

kwin commented May 4, 2021

I decided against using maven-resolver-impl for calculating SHA-512 for the following reasons:

  1. It requires Resolver 1.6+ which only ships with Maven 3.8.1
  2. It will calculate SHA-512 for all artifacts instead only the source-release ones which is IMHO too much overhead and not really useful

Although the SHA-512 artifact being calculated by the checksum-maven-plugin will get another MD5/SHA1 checksum when being deployed that IMHO doesn't do any harm.

@kwin
Copy link
Member Author

kwin commented May 19, 2021

@hboutemy I would appreciate your input on this as well.

@hboutemy
Copy link
Member

hboutemy commented May 21, 2021

IIUC, in the end, you end-up about publishing sha512 checksum to Central only for source-release? => adding that key aspect to the PR title, will need to see if/how/when to update Jira
I must admit that this will ease people's filing Apache dist area, because copying from the build workspace is cumbersome, and I don't expect usual release manager thinks about it after doing 'mvn release:perform', then he has hard time to fill Apache dist area and recalculate the sha512 once the vote is done
I like this limited scope sha512 idea (even if that looks strange at first :) ): @michael-o don't you struggle to get the sha512 when committing source-release to Apache dista area?

on the implementation side, this means that the current commit is big because there was POM reformatting (change of order of 2 plugins), but the real update is about adding <attachChecksums>true</attachChecksums> to checksum plugin for source-release: nice
question: is the reordering necessary?

@hboutemy hboutemy changed the title MPOM-244 upload SHA512 to staging repo as well MPOM-244 upload source-release SHA512 to staging repo as well May 21, 2021
@kwin
Copy link
Member Author

kwin commented May 21, 2021

The reordering is necessary due to nicoulaj/checksum-maven-plugin#112 as both plugins leverage the same phase. Therefore we have to first attach the SHA-512 checksum before calculating the signature. The maven-gpg-plugin will not sign SHA-512 files (https://github.com/apache/maven-gpg-plugin/blob/878edefba0f80b55701ea347a69b58c9bbeca37d/src/main/java/org/apache/maven/plugins/gpg/GpgSignAttachedMojo.java#L53)

@kwin kwin changed the title MPOM-244 upload source-release SHA512 to staging repo as well MPOM-244 upload SHA-512 only for source-release to staging repo May 21, 2021
@hboutemy
Copy link
Member

hboutemy commented May 21, 2021

I understand why you need to exclude asc from checksum, if asc was produced before checksum
I don't understand why we need to ensure that asc was produced before checksum: if luckily it was not produced yet, let's benefit from that luck, isn't it?

@kwin
Copy link
Member Author

kwin commented May 21, 2021

Let me summarize the status quo (without PR):

  1. gpg-maven-plugin:sign executed in phase verify, calculates ASC files for almost all attached files (except for MD5, SHA1 and ASC)
  2. checksum-maven-plugin:files executed in phase verify, calculates the checksum based on ${project.artifactId}-${project.version}-source-release.xxx filename (considering the extension) but does not attach it

With this PR the order is inverted:

  1. checksum-maven-plugin:artifacts executed in phase verify, calculates the checksum based on source-release classifier (disregarding the extension) and attaching it to project
  2. gpg-maven-plugin:sign executed in phase verify, calculates ASC files for almost all attached files (except for MD5, SHA1, SHA-512 and ASC)

Without the inverted order the checksum-maven-plugin would calculate a checksum for the *.asc for source-release as well, which is certainly not desired (nicoulaj/checksum-maven-plugin#112).

@hboutemy
Copy link
Member

ah, ok, I didn't get that attaching checksums required to switch from files goal to artifacts, and no exclusion based on extension exists yet...
thanks for the explanation
@michael-o do you get the idea of publishing source-release sha512 to remote repository to ease filling Apache distribution?

@hboutemy
Copy link
Member

@kwin wouldn't files goal https://checksum-maven-plugin.nicoulaj.net/files-mojo.html be an option? It supports attach, and would not suffer from the ordering prerequisite (which is not something that is guaranteed in a phase

@kwin
Copy link
Member Author

kwin commented May 22, 2021

files goal does never attach with a classifier, therefore we can't use it

@hboutemy
Copy link
Member

ok, then it's safer to bind the checksum-maven-plugin:artifacts goal to post-integration-test phase

@kwin
Copy link
Member Author

kwin commented May 22, 2021

I can do that but for clarity reason I would still reorder in the pom

Update: Done in 5b2579d

@michael-o michael-o self-requested a review May 26, 2021 17:55
Copy link
Member

@michael-o michael-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked the change. Two questions:

  • Will Resolver generate a checksum of the checksum when the checkusm is attached?
  • What will happen if Resolver is configured to generate SHA-512 sums?

@kwin
Copy link
Member Author

kwin commented May 26, 2021

I answered the first question already in #40 (comment):

Although the SHA-512 artifact being calculated by the checksum-maven-plugin will get another MD5/SHA1 checksum when being deployed that IMHO doesn't do any harm.

For the 2nd question, I have not tried, but most probably resolver will just overwrite the existing file (with the same content), for now this isn't an issue, because I don't see a reason why ASF projects should generate SHA-512 by Maven Resolver, it won't ever be enabled by default for performance reasons.

@michael-o
Copy link
Member

I answered the first question already in #40 (comment):

Although the SHA-512 artifact being calculated by the checksum-maven-plugin will get another MD5/SHA1 checksum when being deployed that IMHO doesn't do any harm.

For the 2nd question, I have not tried, but most probably resolver will just overwrite the existing file (with the same content), for now this isn't an issue, because I don't see a reason why ASF projects should generate SHA-512 by Maven Resolver, it won't ever be enabled by default for performance reasons.

Even if Resolver overwrites, Nexus Staging might reject this because the file is uploaded twice. You should check that.

@michael-o
Copy link
Member

@hboutemy I wonder what will happen during a release: Maven Resolver uploads SHA-512, this change uploads it. Will Nexus staging reject it? This should be tested before merged.

@hboutemy
Copy link
Member

hboutemy commented Jun 2, 2021

the day Maven Resolver will be configured to upload sha512, we'll remove this hack (because, yes, it's a hack to ease our release process)
When do you expect Maven Resolver to publish sha512 in addition to sha1 and md5 in real world?

@michael-o
Copy link
Member

the day Maven Resolver will be configured to upload sha512, we'll remove this hack (because, yes, it's a hack to ease our release process)
When do you expect Maven Resolver to publish sha512 in addition to sha1 and md5 in real world?

People can already configure it. I just want to avoid surprises for all Apache folks. E.g., Mark Thomas does this actively now.

@michael-o
Copy link
Member

See here: https://repo1.maven.org/maven2/org/apache/tomcat/tomcat-catalina/10.0.6/

Produced by Resoler via Ant.

@hboutemy
Copy link
Member

hboutemy commented Jun 2, 2021

FYI, I added some instructions to calculate sha512 in dist: apache/maven-site@29bc21e
and while at it, I recalculated checksums for all our latest releases and found 6 fingerprints not exactly using the expected format (fixed in revision 48099): this shows concrete example how people struggle to add the sha512 to dist area...

@hboutemy
Copy link
Member

hboutemy commented Jun 2, 2021

People can already configure it. I just want to avoid surprises for all Apache folks. E.g., Mark Thomas does this actively now.

can you show me, please? I don't know how to do that

notice: I mean using Maven, because that's the Maven ASF parent POM that we're tweaking...

@michael-o
Copy link
Member

FYI, I added some instructions to calculate sha512 in dist: apache/maven-site@29bc21e
and while at it, I recalculated checksums for all our latest releases and found 6 fingerprints not exactly using the expected format (fixed in revision 48099): this shows concrete example how people struggle to add the sha512 to dist area...

I have done this massively in Core ITs: apache/maven-integration-testing@672728a

@michael-o
Copy link
Member

People can already configure it. I just want to avoid surprises for all Apache folks. E.g., Mark Thomas does this actively now.

can you show me, please? I don't know how to do that

notice: I mean using Maven, because that's the Maven ASF parent POM that we're tweaking...

See https://maven.apache.org/resolver/configuration.html for aether.checksums.algorithms=SHA-512. Resolver should generate and update them. Please try in your setup.

@hboutemy
Copy link
Member

hboutemy commented Jun 2, 2021

oh nice, I didn't know that content!
I ran mvn -Papache-release deploy -Daether.checksums.algorithms="SHA-512,SHA-1,MD5" and got id 86 of https://repository.apache.org/content/repositories/snapshots/org/apache/apache/24-SNAPSHOT/
no problem for now

@hboutemy
Copy link
Member

hboutemy commented Jun 2, 2021

changed pom.xml version to 23 and ran the same mvn -Papache-release deploy -Daether.checksums.algorithms="SHA-512,SHA-1,MD5"
=> perfect staging repository: https://repository.apache.org/content/repositories/orgapacheapache-1020/org/apache/apache/23/

@michael-o
Copy link
Member

changed pom.xml version to 23 and ran the same mvn -Papache-release deploy -Daether.checksums.algorithms="SHA-512,SHA-1,MD5"
=> perfect staging repository: https://repository.apache.org/content/repositories/orgapacheapache-1020/org/apache/apache/23/

That is nice. One can manually delete the redundant files from staging.

@elharo elharo changed the title MPOM-244 upload SHA-512 only for source-release to staging repo [MPOM-244] upload SHA-512 only for source-release to staging repo Jun 8, 2021
@hboutemy
Copy link
Member

hboutemy commented Jun 9, 2021

I did not test yet, but why don't we just add <aether.checksums.algorithms>SHA-512,SHA-1,MD5</aether.checksums.algorithms> property to the parent POM?
wouldn't it activate sha512 publishing by Maven for absolutely every artifact?

@kwin
Copy link
Member Author

kwin commented Jun 9, 2021

@hboutemy See my answer in #40 (comment). SHA-512 checksum are not enabled by default for performance reasons: https://issues.apache.org/jira/browse/MRESOLVER-138. Although this affects mainly downloads it shows that SHA-512 checksums won't be checked automatically by Maven (probably will never be). Therefore the additional SHA512 only makes sense for artifacts not (primarily) downloaded via Maven.

@hboutemy
Copy link
Member

@kwin ok, thanks for the link
let's merge this

@hboutemy hboutemy merged commit b08ef16 into apache:master Jun 10, 2021
@michael-o
Copy link
Member

@kwin Thanks for bearing with me. I am very picky when it affects all ASF committers.

@ctubbsii
Copy link
Member

This change bugs me, mostly because the checksum files it creates aren't following any kind of standard. I would have preferred the content of the generated file not just be a raw checksum, but actually follow GNU coreutils default format or, more preferably, the BSD "tag" format (see https://man7.org/linux/man-pages/man1/sha512sum.1.html). Following a standard file format allows one to quickly and easily use standard tools to verify the checksum (shasum -c *.sha512). Having this plugin enabled by default, and not easily disabled by deactivating a profile or setting a property or something, makes it very frustrating. I'm already doing my own thing to generate checksums, and this generates redundant ones that are far less useful and it's difficult to disable it.

@hboutemy
Copy link
Member

hboutemy commented Oct 20, 2021

reading your comment, it makes me feel that for a very long time, we're probably conflating:

  • hash file format for Maven repository format (= hash only, defined by Maven repository format and Maven code that reads/writes it)
  • hash file format for Apache distribution

@kwin
Copy link
Member Author

kwin commented Oct 20, 2021

ASF mandates this format: https://www.apache.org/info/verification.html

@michael-o
Copy link
Member

michael-o commented Oct 20, 2021

@ctubbsii @hboutemy @kwin There are many many issues conflated here and a lot of misunderstanding in general. I had a very lengthly talk about this in general with @cstamas recently.

Disclaimer: I will not talk about signatures here.

First of all, Maven repo (e.g., Central) != Apache dist area. The checksums in repo are solely for bitrot NOTHING ELSE. The format is basically an implementation detail of Maven Resolver, though the parsing is lenient. @kwin You maybe remember that we have talked about this.
Now let's get to the reference from @kwin: There is a lot of wrong information on this page. The heading says "Checking Hashes", then it talks about checksums. Checksums are not the same as hashes. Throughout they completely confuse cryptographic hasing with integrity checks (checksums/bitrot). I quote:

There are lots of checksum algorithms ; we use SHA-1, SHA-256, SHA-512 and MD5.

Those are not checksum algorithms.

@ctubbsii I here do agree with you that the dist area is most designed for human consumption (or curl, etc) and not Maven Resolver. Therefore proper checksums are highly advised. I highly favorize BSD tags since they are default, obviously on BSD systems including macOS, OpenSSL generates them as well by default and GNU sum tools can produce and consume them with ease.

Upshot: Lets discuss a proper solution for the Apache dist area for all Maven-based projects.

PS: You can of course abuse a cryptographic hashing algorithm like SHA-x for checkums, but there are much much better alternatives like xxHash. I consider SHA-2 for Maven Central as mostly pointless and pure waste of CPU cycles. See also https://www.mail-archive.com/[email protected]/msg125281.html.

@kwin
Copy link
Member Author

kwin commented Oct 20, 2021

I do agree with

I consider SHA-2 for Maven Central as mostly pointless and pure waste of CPU cycles.

but the same is true for MD5 and SHA1.
It would have been wise to use a (non-secure) hash = checksum for Maven in the first place, but this is outside the scope of this issue.

The format of hashes in Apache Dist are standardized among all ASF projects and the information from https://www.apache.org/info/verification.html implies that you MUST(!) use the raw hash in the files!

@michael-o
Copy link
Member

@michael-o
Copy link
Member

I do agree with

I consider SHA-2 for Maven Central as mostly pointless and pure waste of CPU cycles.

but the same is true for MD5 and SHA1. It would have been wise to use a (non-secure) hash = checksum for Maven in the first place, but this is outside the scope of this issue.

Exactly!

The format of hashes in Apache Dist are standardized among all ASF projects and the information from https://www.apache.org/info/verification.html implies that you MUST(!) use the raw hash in the files!

I don't see this requirement of raw hash. Can you explicitly show me?

@kwin
Copy link
Member Author

kwin commented Oct 20, 2021

You are right that the format is not stated on https://www.apache.org/info/verification.html but I would conclude that from https://infra.apache.org/release-signing.html#basic-facts

and another file containing a SHA or MD5) checksum.

@michael-o
Copy link
Member

You are right that the format is not stated on https://www.apache.org/info/verification.html but I would conclude that from https://infra.apache.org/release-signing.html#basic-facts

and another file containing a SHA or MD5) checksum.

But not explicit. I don't know who to talk to, but this should be clarified explicitly as it causes confusion as you see.

@khmarbaise
Copy link
Member

The referenced page of apache explicitly mentions the usage of shasum/md5sum/certUtil tool which I have mentioned as well in MNG-6784.

The format from my point of view is a consequence by using the appropriate tools... if not otherwise explicitly mentioned I would assume that the format is the one which can be consumed by the given tools. This is currently not the case.

@michael-o
Copy link
Member

The referenced page of apache explicitly mentions the usage of shasum/md5sum/certUtil tool which I have mentioned as well in MNG-6784.

The format from my point of view is a consequence by using the appropriate tools... if not otherwise explicitly mentioned I would assume that the format is the one which can be consumed by the given tools. This is currently not the case.

...and they all produce different output. Windows does not even work:

C:\Users\osipovmi>certUtil -hashfile .cache SHA1
CertUtil: -hashfile-Befehl ist fehlgeschlagen: 0x80070002 (WIN32: 2 ERROR_FILE_NOT_FOUND)
CertUtil: Das System kann die angegebene Datei nicht finden.

Alternatively:

PS C:\Users\osipovmi> get-filehash -Path .\.viminfo

Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
SHA256          7A576985C92BCF96B763ED0DB1AFC0474B1B585567BC7C45F335BE4DE0BF06F4       C:\Users\osipovmi\.viminfo
PS C:\Users\osipovmi> get-filehash -Path .\.viminfo  | Format-list


Algorithm : SHA256
Hash      : 7A576985C92BCF96B763ED0DB1AFC0474B1B585567BC7C45F335BE4DE0BF06F4
Path      : C:\Users\osipovmi\.viminfo

What now?

@ctubbsii
Copy link
Member

A lot of this discussion is out of scope for MPOM. The point I was trying to make here in #40 (comment) is that this shouldn't be running by default in the apache-release profile if it's going to produce files with less overall utility to the staging process than the ones projects might already be creating for themselves. Therefore, it doesn't add much value to those MPOM users. I don't think MPOM should decide the format for the whole project, but I'm happy to advocate for the BSD "tag" style in the appropriate venue. For here, I'd be happy if it could be disabled more easily, or configured to provide the output format I prefer, so it's not just being a bother.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants