-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolves #931: Fixing problems with encoding in UseDepVersion and PomHelper #932
Resolves #931: Fixing problems with encoding in UseDepVersion and PomHelper #932
Conversation
@slachiewicz please review |
*/ | ||
public static StringBuilder readFile(Path path) throws IOException { | ||
try (BufferedReader reader = Files.newBufferedReader(path)) { | ||
return reader.lines().collect(StringBuilder::new, StringBuilder::append, StringBuilder::append); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does line end character can be important here?
In old implementation we read bytes from file but here line end character can be drop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easier to debug if tests go wrong, if I remember correctly.
I can drop the new lines, it will matching regexes easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting to a draft -- doesn't quite work if I change the encoding of the file to ISO-8859-1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in a draft, but for the record: the issue is caused by a discrepancy between -Dfile.encoding
and the actual encoding of the file. The system property determines the value of the Charset.defaultCharset()
used to convert the bytes[] read in Files.read() into a string using new String(byte[])
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I don't really know whether this is a bug or not. In 2.14.2 it did not need to use the raw file so it relied completely on Maven to process the file. In version 2.15.0, the plugin processes raw POM and thus is susceptible to problems with encoding.
In this case, the problem is caused by the user having the wrong value of their -Dfile.encoding system property, which causes String(byte[]) to use the wrong encoding (given by the user) to convert the byte array to string. This is why I also wasn't able to reproduce the issue initially.
This same problem will occur everywhere where we process raw pom and a user will have a system with the wrong config.
So, as an attempt to solve this problem -- I'm using Plexus's XmlReader with its detection capability to detect the encoding. Please tell me what you think of this solution.
ff62d6f
to
b877d17
Compare
b877d17
to
6f8ab5c
Compare
…tring to circumvent problems with encoding Uses Plexus XmlStreamReader to guess file encoding.
6f8ab5c
to
b881d37
Compare
There is a method - @ajarmoniuk What do you think? |
Even better! 👍 |
### What changes were proposed in this pull request? The pr aims to update some maven plugins to newest version. include: - versions-maven-plugin from 2.15.0 to 2.16.0 - maven-source-plugin from 3.2.1 to 3.3.0 - maven-surefire-plugin from 3.1.0 to 3.1.2 - maven-dependency-plugin from 3.5.0 to 3.6.0 ### Why are the changes needed? - versions-maven-plugin 1.Release Notes: https://github.com/mojohaus/versions/releases/tag/2.16.0 2.Bug Fix: Resolves: display-dependency-updates only shows updates from the most major allowed segment (mojohaus/versions#966) ajarmoniuk Resolves mojohaus/versions#931: Fixing problems with encoding in UseDepVersion and PomHelper (mojohaus/versions#932) ajarmoniuk Resolves mojohaus/versions#916: Partially reverted mojohaus/versions#799. (mojohaus/versions#924) ajarmoniuk Resolves mojohaus/versions#954: Excluded plexus-container-default (mojohaus/versions#955) ajarmoniuk Resolves mojohaus/versions#951: DefaultArtifactVersion::getVersion can be null (mojohaus/versions#952) ajarmoniuk BoundArtifactVersion.toString() to work with NumericVersionComparator (mojohaus/versions#930) ajarmoniuk Issue mojohaus/versions#925: Protect against an NPE if a dependency version is defined in dependencyManagement (mojohaus/versions#926) ajarmoniuk - maven-source-plugin v3.2.1 VS v3.3.0: apache/maven-source-plugin@maven-source-plugin-3.2.1...maven-source-plugin-3.3.0 - maven-surefire-plugin Release Notes: https://github.com/apache/maven-surefire/releases/tag/surefire-3.1.2 - maven-dependency-plugin v3.5.0 VS v3.6.0: apache/maven-dependency-plugin@maven-dependency-plugin-3.5.0...maven-dependency-plugin-3.6.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #41641 from panbingkun/SPARK-44085. Authored-by: panbingkun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? The pr aims to update some maven plugins to newest version. include: - versions-maven-plugin from 2.15.0 to 2.16.0 - maven-source-plugin from 3.2.1 to 3.3.0 - maven-surefire-plugin from 3.1.0 to 3.1.2 - maven-dependency-plugin from 3.5.0 to 3.6.0 ### Why are the changes needed? - versions-maven-plugin 1.Release Notes: https://github.com/mojohaus/versions/releases/tag/2.16.0 2.Bug Fix: Resolves: display-dependency-updates only shows updates from the most major allowed segment (mojohaus/versions#966) ajarmoniuk Resolves mojohaus/versions#931: Fixing problems with encoding in UseDepVersion and PomHelper (mojohaus/versions#932) ajarmoniuk Resolves mojohaus/versions#916: Partially reverted mojohaus/versions#799. (mojohaus/versions#924) ajarmoniuk Resolves mojohaus/versions#954: Excluded plexus-container-default (mojohaus/versions#955) ajarmoniuk Resolves mojohaus/versions#951: DefaultArtifactVersion::getVersion can be null (mojohaus/versions#952) ajarmoniuk BoundArtifactVersion.toString() to work with NumericVersionComparator (mojohaus/versions#930) ajarmoniuk Issue mojohaus/versions#925: Protect against an NPE if a dependency version is defined in dependencyManagement (mojohaus/versions#926) ajarmoniuk - maven-source-plugin v3.2.1 VS v3.3.0: apache/maven-source-plugin@maven-source-plugin-3.2.1...maven-source-plugin-3.3.0 - maven-surefire-plugin Release Notes: https://github.com/apache/maven-surefire/releases/tag/surefire-3.1.2 - maven-dependency-plugin v3.5.0 VS v3.6.0: apache/maven-dependency-plugin@maven-dependency-plugin-3.5.0...maven-dependency-plugin-3.6.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes apache#41641 from panbingkun/SPARK-44085. Authored-by: panbingkun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
As shown in #931, recent changes introduced in 2.15.0 introduced a regression when the user used irregular characters.
The problem is highly platform dependent and occurs when the
file.encoding
system property does not equal to the one returned byCharset.defaultCharset()
.In this particular case, we were using String(byte[]) to construct a String from a byte array which assumed incorrect encoding.
The file was an utf-8 file, but
file.encoding
was equal tolatin1
orwindows1252
.@slawekjaranowski please review