Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix identifier case sensitivity bugs #237

Merged
merged 9 commits into from
Nov 12, 2024

Conversation

rgalonso
Copy link
Contributor

This PR fixes several bugs related to case-sensitivity vs case-insensitive matching of the identifier. Unfortunately, as much as I tried to break it up into smaller PRs, the diff on TodoParser.py is pretty significant on this one. For starters, I recommend viewing it by ignoring whitespace, as several blocks changed indentation.

I'll add inline comments to try to explain the major changes. But ultimately, as stated in the commit logs, these changes are about closing (and/or resolving the real root cause of) the following issues:

It also adds a test for, but does not resolve, #236

Currently, this is just used to set identifiers
without needing to modify the environment, but this
could (should?) be extended to other options
Facilitates debug and error handling
These tests capture the existing issues alstr#234 and
issue alstr#235. As no solution is in place yet, they're
marked as expected failures.
This sets up the ability to have one diff file
create a simulated filesystem and a second to
simulate an edit of an existing file
Defaults True for backwards compatibility
Several related bugs that stem from a diff that
contains both deletions and additions. Specifically,
the line numbers aren't counted correctly, leading
to
- issue URL can't be inserted because it can't find
  the right line in the latest file
- generated issue references the wrong line number
- closed issue references the wrong line number

See GitHub issue alstr#236

The last item might not have any actual impact as
(I think) it's just informational. But it'd still
be better if it reported the correct line number
of the deletion, which necessarily has to be
relative to the _old_ file's line number, not the
updated file's.

As there is no solution in place yet for these
bugs, the unittest is marked as an expected failure
…ng of TODOs

Some parts of the code were using case-insensitive
matches when searching for an identifier (i.e.
TODO and todo would both be acceptable) whereas
other parts of the code would search for a strict
case-sensitive match (only TODO, not todo). This
inconsistency led to many issues, among them
- alstr#216
- alstr#224
- alstr#225

Further, the identifier match wasn't being done
using word breaks, meaning an identifier of "FIX"
would match (case-insensitively) with "suffix" or
"prefix" if those words happened to appear in a
comment. (See alstr#234).

Issue alstr#230 was also preventing issue labels from
being applied properly if the identifier case
did not match exactly the canonical version. i.e.
"todo" would generate an issue, but the associated
labels wouldn't be applied because the exact
identifier used wasn't "TODO".

This commit resolves all of these issues.
@@ -2,7 +2,7 @@ class Issue(object):
"""Basic Issue model for collecting the necessary info to send to GitHub."""

def __init__(self, title, labels, assignees, milestone, body, hunk, file_name,
start_line, num_lines, markdown_language, status, identifier, ref, issue_url, issue_number):
start_line, num_lines, markdown_language, status, identifier, identifier_actual, ref, issue_url, issue_number, start_line_within_hunk=1):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identifier_actual is the actual identifier that was matched, whereas identifier is the canonical class of identifier that was matched. For example, for the following comment:

# todo: do the thing

identifier_actual is todo and identifier is TODO

for key in [x for x in vars(self).keys() if x not in ("hunk")]:
selflist.append(f'"{key}": "{getattr(self, key)}"')
selflist.append((f'"hunk": "{self.hunk}"'))
return '\n'.join(selflist)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly just for debug, but it's also utilized in the error path of the unit tests. It just provides us a way to print an Issue

@@ -24,22 +25,30 @@ class TodoParser(object):
ISSUE_URL_PATTERN = re.compile(r'(?<=Issue URL:\s).+', re.IGNORECASE)
ISSUE_NUMBER_PATTERN = re.compile(r'/issues/(\d+)', re.IGNORECASE)

def __init__(self):
def __init__(self, options=dict()):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes in this section (through new line number 51) have to do with allowing the identifiers to be specified as an argument to the TodoParser constructor. I found this a cleaner method of configuring TodoParser when using it in the unit tests. The old method of setting it via an environment variable is still supported, and still has the highest priority, so this is backwards-compatible.

As an aside, it may be helpful to eventually support all configurable items via this options dictionary, but that's a separate discussion.

@@ -232,32 +247,95 @@ def parse(self, diff_file):
+ (r'(?!(' + '|'.join(suff_escape_list) + r'))' if len(suff_escape_list) > 0
else '')
+ r'\s*.+$)')
comments = re.finditer(comment_pattern, block['hunk'], re.MULTILINE)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here's the huge diff that I couldn't find a way to make smaller. This is the real crux of the fix to the various issues.

Previously, self._extract_issue_if_exists() was called both from the marker['type'] == 'line' block and the corresponding else block (for block-style comments). It's now called in just one place, after the if/else block. For both branches, a contiguous_comments_and_positions list is constructed. As before, comments which are contiguous are grouped together on the assumption that they're related to the same issue. But what's new here is that this is now also tracking the position (relative to the start of the hunk) where the comment begins. Having this knowledge is what allows us to later differentiate same-titled issues within the same file.

body=[],
hunk=hunk_info['hunk'],
file_name=hunk_info['file'],
start_line=hunk_info['start_line'] + comment_block['start'] + line_number_within_comment_block,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_line is now a function of the hunk's position within the diff and the comment's position within the hunk. When applicable, it also factors in the comment's position within its comment block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the 1st of the two files mentioned earlier that helps us expose #236

self._original_addSubTest = result.addSubTest
result.addSubTest = self._addSubTest

super().run(result)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some necessary plumbing in order to track how many subtests have failed in test test_line_numbering_with_deletions below.

break
else:
matching_issues.append(issue)
return matching_issues
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially a generic version of cont_issues_for_file_type() that lets us find issues which match any arbitrary set of fields. These are AND-ed together. i.e. The issue's fields would need to match everything in fields in order to be part of the list that's returned.

'',
'Unexpected issues:',
'\n=========================\n'.join(map(str, unexpected_issues))])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper function for below tests

@@ -112,6 +128,70 @@ def test_liquid_issues(self):
def test_lua_issues(self):
self.assertEqual(count_issues_for_file_type(self.raw_issues, 'lua'), 2)


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helps us test for issues #234 and #235

@rgalonso
Copy link
Contributor Author

@alstr, ready for your review. Thanks

@alstr alstr merged commit f54fbeb into alstr:master Nov 12, 2024
1 check passed
@alstr
Copy link
Owner

alstr commented Nov 12, 2024

Great stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants