-
-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix identifier case sensitivity bugs #237
fix identifier case sensitivity bugs #237
Conversation
Currently, this is just used to set identifiers without needing to modify the environment, but this could (should?) be extended to other options
Facilitates debug and error handling
This sets up the ability to have one diff file create a simulated filesystem and a second to simulate an edit of an existing file
Defaults True for backwards compatibility
Several related bugs that stem from a diff that contains both deletions and additions. Specifically, the line numbers aren't counted correctly, leading to - issue URL can't be inserted because it can't find the right line in the latest file - generated issue references the wrong line number - closed issue references the wrong line number See GitHub issue alstr#236 The last item might not have any actual impact as (I think) it's just informational. But it'd still be better if it reported the correct line number of the deletion, which necessarily has to be relative to the _old_ file's line number, not the updated file's. As there is no solution in place yet for these bugs, the unittest is marked as an expected failure
…ng of TODOs Some parts of the code were using case-insensitive matches when searching for an identifier (i.e. TODO and todo would both be acceptable) whereas other parts of the code would search for a strict case-sensitive match (only TODO, not todo). This inconsistency led to many issues, among them - alstr#216 - alstr#224 - alstr#225 Further, the identifier match wasn't being done using word breaks, meaning an identifier of "FIX" would match (case-insensitively) with "suffix" or "prefix" if those words happened to appear in a comment. (See alstr#234). Issue alstr#230 was also preventing issue labels from being applied properly if the identifier case did not match exactly the canonical version. i.e. "todo" would generate an issue, but the associated labels wouldn't be applied because the exact identifier used wasn't "TODO". This commit resolves all of these issues.
Earlier commit resolved issue alstr#234
@@ -2,7 +2,7 @@ class Issue(object): | |||
"""Basic Issue model for collecting the necessary info to send to GitHub.""" | |||
|
|||
def __init__(self, title, labels, assignees, milestone, body, hunk, file_name, | |||
start_line, num_lines, markdown_language, status, identifier, ref, issue_url, issue_number): | |||
start_line, num_lines, markdown_language, status, identifier, identifier_actual, ref, issue_url, issue_number, start_line_within_hunk=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
identifier_actual
is the actual identifier that was matched, whereas identifier
is the canonical class of identifier that was matched. For example, for the following comment:
# todo: do the thing
identifier_actual
is todo
and identifier
is TODO
for key in [x for x in vars(self).keys() if x not in ("hunk")]: | ||
selflist.append(f'"{key}": "{getattr(self, key)}"') | ||
selflist.append((f'"hunk": "{self.hunk}"')) | ||
return '\n'.join(selflist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mainly just for debug, but it's also utilized in the error path of the unit tests. It just provides us a way to print
an Issue
@@ -24,22 +25,30 @@ class TodoParser(object): | |||
ISSUE_URL_PATTERN = re.compile(r'(?<=Issue URL:\s).+', re.IGNORECASE) | |||
ISSUE_NUMBER_PATTERN = re.compile(r'/issues/(\d+)', re.IGNORECASE) | |||
|
|||
def __init__(self): | |||
def __init__(self, options=dict()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes in this section (through new line number 51) have to do with allowing the identifiers
to be specified as an argument to the TodoParser
constructor. I found this a cleaner method of configuring TodoParser
when using it in the unit tests. The old method of setting it via an environment variable is still supported, and still has the highest priority, so this is backwards-compatible.
As an aside, it may be helpful to eventually support all configurable items via this options
dictionary, but that's a separate discussion.
@@ -232,32 +247,95 @@ def parse(self, diff_file): | |||
+ (r'(?!(' + '|'.join(suff_escape_list) + r'))' if len(suff_escape_list) > 0 | |||
else '') | |||
+ r'\s*.+$)') | |||
comments = re.finditer(comment_pattern, block['hunk'], re.MULTILINE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here's the huge diff that I couldn't find a way to make smaller. This is the real crux of the fix to the various issues.
Previously, self._extract_issue_if_exists()
was called both from the marker['type'] == 'line'
block and the corresponding else
block (for block-style comments). It's now called in just one place, after the if/else
block. For both branches, a contiguous_comments_and_positions
list is constructed. As before, comments which are contiguous are grouped together on the assumption that they're related to the same issue. But what's new here is that this is now also tracking the position (relative to the start of the hunk) where the comment begins. Having this knowledge is what allows us to later differentiate same-titled issues within the same file.
body=[], | ||
hunk=hunk_info['hunk'], | ||
file_name=hunk_info['file'], | ||
start_line=hunk_info['start_line'] + comment_block['start'] + line_number_within_comment_block, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start_line
is now a function of the hunk's position within the diff and the comment's position within the hunk. When applicable, it also factors in the comment's position within its comment block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the 1st of the two files mentioned earlier that helps us expose #236
self._original_addSubTest = result.addSubTest | ||
result.addSubTest = self._addSubTest | ||
|
||
super().run(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some necessary plumbing in order to track how many subtests have failed in test test_line_numbering_with_deletions
below.
break | ||
else: | ||
matching_issues.append(issue) | ||
return matching_issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially a generic version of cont_issues_for_file_type()
that lets us find issues which match any arbitrary set of fields. These are AND-ed together. i.e. The issue's fields would need to match everything in fields
in order to be part of the list that's returned.
'', | ||
'Unexpected issues:', | ||
'\n=========================\n'.join(map(str, unexpected_issues))]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helper function for below tests
@@ -112,6 +128,70 @@ def test_liquid_issues(self): | |||
def test_lua_issues(self): | |||
self.assertEqual(count_issues_for_file_type(self.raw_issues, 'lua'), 2) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alstr, ready for your review. Thanks |
Great stuff! |
This PR fixes several bugs related to case-sensitivity vs case-insensitive matching of the identifier. Unfortunately, as much as I tried to break it up into smaller PRs, the diff on TodoParser.py is pretty significant on this one. For starters, I recommend viewing it by ignoring whitespace, as several blocks changed indentation.
I'll add inline comments to try to explain the major changes. But ultimately, as stated in the commit logs, these changes are about closing (and/or resolving the real root cause of) the following issues:
It also adds a test for, but does not resolve, #236