Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix escaped symbols breaking markdown checks #209

Conversation

lpapazow
Copy link
Contributor

@lpapazow lpapazow commented Oct 25, 2017

Create a fix for issue #205 .

Fix a single backslash before special symbols (!"#$%&'()*+,./:;<=>?@][^_`{|}~-) not escaping that symbol in .markdown files.

@lpapazow lpapazow force-pushed the markdown-test-fix-ecape-backslash-appearing branch from 75f6e88 to 1154f49 Compare October 25, 2017 08:09
public interface LineFormatterCallback {
/**
* Used in AbstractStaticCheck's findLineNumber method to apply rules
* about escaping special characters to a line in the .mardown file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo .mardown -> markdown

But maybe it is better .md as this is the file extension that we are using in the check, or just say from the markdown file (without the .)

* Used in AbstractStaticCheck's findLineNumber method to apply rules
* about escaping special characters to a line in the .mardown file
*
* @param line - an unparsed line from the .markdown file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Member

@martinvw martinvw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is really an optimal solution I also do not see sufficient tests which prove that this actually resolves the problem and does not have some nasty corner cases where it might make things worse


@Override
public String formatLine(String line) {
Pattern needsEscape = Pattern.compile("\\\\" + Escaping.ESCAPABLE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you can not be sure that all characters which have to escaped are actually escaped are you sure this will suffice for the most common or better all cases? Should we maybe determine the line number in a different way instead of trying to match it again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given such changes I would expect new tests? Or maybe changed tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Escaping.ESCAPABLE constant is defined in the external markdown library and is used in the markdown parsing logic. It will cover all the cases in which the markdown parser will make changes to the original text.

The reason why I didn't include any new tests is that when I use the verify method that is provided by com.puppycrawl.tools.checkstyle.BaseCheckTestSupport and I provide no error messages to that method, it doesn't fail the test, it only logs the error. Thus, every test that I include, would pass even if it shouldn't.

Do you have any ideas how to create tests that could fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please discuss it with some of your college's first

* @return the number of the line starting from 1, where the searched text occurred for the first
* time
* @throws NoResultException when no match was found
*/
protected int findLineNumber(FileText fileContent, String searchedText, int startLineNumber)
throws NoResultException {
protected int findLineNumberFormatted(FileText fileContent, String searchedText, int startLineNumber,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho this is not the best way forward it looks a little over designed to me with this callback logic, isn't there an easy way to solve this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems that source from which we receive the string has not the same origin as lines we are searching through which seems dangerous for a lot of reasons imho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sources differ is because the searchedText is generated by the markdown parsing logic and the fileContent is the raw text in the markdown file.

Copy link
Member

@martinvw martinvw Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did understand that, but isn't this part of the problem and the thing we would actually like to resolve, can't we just request the parsed text from markdown and search through that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinvw I will try to jump in here, because we discussed it in the office with @lpapazow and @doandzhi : Originally the check is reporting errors into the raw markdown file and for this reason we want to find the exact line in the raw file and report the error there. So if we get the parsed text and report the errors into its lines, we will probably hit an inconsistent line. For this reason we decided to parse a bit the special symbols (taken from the parser) in order to find the correct raw content. What do you think, are you tend to agree or disagree ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what we try to achieve but it feels a little fragile to me, in most languages there are more character which can be escaped than the ones you have to escape so mismatches are possible. Without any proper tests that is hard to tell without me diving into the code myself.

I found an issue referring to the actual underlying problem that the markdown Nodes do not have a reference back to the original location: commonmark/commonmark-java#1

Given that the issue is there, seems it cannot be done (easily) so we might go for this current solution but not without proper tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that good example where escaping is not always performed can be seen here:

http://docs.openhab.org/addons/bindings/netatmo/readme.html
https://github.com/openhab/openhab2-addons/edit/master/addons/binding/org.openhab.binding.netatmo/README.md on line 49 there is a unescaped _ which does not cause problems in some views but in others it does. Docs renders fine, github view renders fine but diff's don't.

*/
protected int findLineNumber(FileText fileContent, String searchedText, int startLineNumber)
throws NoResultException {
return findLineNumberFormatted(fileContent, searchedText, startLineNumber, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like null I would have prefered a no-op LineFormatterCallback if we go for this solution than you also do not need the null check, just let it explode if you are called with a null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this suggestion, it does seem to make the code more robust. I will rewrite it in the suggested way.

}

@Override
public void log(int line, String message) {
MarkdownCheck.this.log(line, message);
}
};
MarkdownVisitor visitor = new MarkdownVisitor(callBack, fileText);
MarkdownVisitor visitor = new MarkdownVisitor(callBack, fileText, new LineFormatterCallback() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the design stays as is, I would make the LineFormatterCallback a FunctionalInterface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion, I will use it.

@lpapazow lpapazow force-pushed the markdown-test-fix-ecape-backslash-appearing branch 5 times, most recently from a6ed210 to 01c9beb Compare November 10, 2017 16:19
@lpapazow lpapazow force-pushed the markdown-test-fix-ecape-backslash-appearing branch from 01c9beb to b43b1f6 Compare November 10, 2017 16:23
@svilenvul
Copy link
Contributor

svilenvul commented Jan 12, 2018

@lpapazow, do you have any updates on this PR ?

@svilenvul
Copy link
Contributor

Replaced by #245

@svilenvul svilenvul closed this Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants