fix identifier case sensitivity bugs #237

rgalonso · 2024-11-12T00:24:09Z

This PR fixes several bugs related to case-sensitivity vs case-insensitive matching of the identifier. Unfortunately, as much as I tried to break it up into smaller PRs, the diff on TodoParser.py is pretty significant on this one. For starters, I recommend viewing it by ignoring whitespace, as several blocks changed indentation.

I'll add inline comments to try to explain the major changes. But ultimately, as stated in the commit logs, these changes are about closing (and/or resolving the real root cause of) the following issues:

It also adds a test for, but does not resolve, #236

Currently, this is just used to set identifiers without needing to modify the environment, but this could (should?) be extended to other options

Facilitates debug and error handling

These tests capture the existing issues alstr#234 and issue alstr#235. As no solution is in place yet, they're marked as expected failures.

This sets up the ability to have one diff file create a simulated filesystem and a second to simulate an edit of an existing file

Defaults True for backwards compatibility

Several related bugs that stem from a diff that contains both deletions and additions. Specifically, the line numbers aren't counted correctly, leading to - issue URL can't be inserted because it can't find the right line in the latest file - generated issue references the wrong line number - closed issue references the wrong line number See GitHub issue alstr#236 The last item might not have any actual impact as (I think) it's just informational. But it'd still be better if it reported the correct line number of the deletion, which necessarily has to be relative to the _old_ file's line number, not the updated file's. As there is no solution in place yet for these bugs, the unittest is marked as an expected failure

…ng of TODOs Some parts of the code were using case-insensitive matches when searching for an identifier (i.e. TODO and todo would both be acceptable) whereas other parts of the code would search for a strict case-sensitive match (only TODO, not todo). This inconsistency led to many issues, among them - alstr#216 - alstr#224 - alstr#225 Further, the identifier match wasn't being done using word breaks, meaning an identifier of "FIX" would match (case-insensitively) with "suffix" or "prefix" if those words happened to appear in a comment. (See alstr#234). Issue alstr#230 was also preventing issue labels from being applied properly if the identifier case did not match exactly the canonical version. i.e. "todo" would generate an issue, but the associated labels wouldn't be applied because the exact identifier used wasn't "TODO". This commit resolves all of these issues.

Earlier commit resolved issues alstr#216, alstr#224, alstr#225

Earlier commit resolved issue alstr#234

rgalonso · 2024-11-12T00:27:09Z

Issue.py

@@ -2,7 +2,7 @@ class Issue(object):
    """Basic Issue model for collecting the necessary info to send to GitHub."""

    def __init__(self, title, labels, assignees, milestone, body, hunk, file_name,
-                 start_line, num_lines, markdown_language, status, identifier, ref, issue_url, issue_number):
+                 start_line, num_lines, markdown_language, status, identifier, identifier_actual, ref, issue_url, issue_number, start_line_within_hunk=1):


identifier_actual is the actual identifier that was matched, whereas identifier is the canonical class of identifier that was matched. For example, for the following comment:

# todo: do the thing

identifier_actual is todo and identifier is TODO

rgalonso · 2024-11-12T00:28:00Z

Issue.py

+        for key in [x for x in vars(self).keys() if x not in ("hunk")]:
+            selflist.append(f'"{key}": "{getattr(self, key)}"')
+        selflist.append((f'"hunk": "{self.hunk}"'))
+        return '\n'.join(selflist)


This is mainly just for debug, but it's also utilized in the error path of the unit tests. It just provides us a way to print an Issue

rgalonso · 2024-11-12T00:30:51Z

TodoParser.py

@@ -24,22 +25,30 @@ class TodoParser(object):
    ISSUE_URL_PATTERN = re.compile(r'(?<=Issue URL:\s).+', re.IGNORECASE)
    ISSUE_NUMBER_PATTERN = re.compile(r'/issues/(\d+)', re.IGNORECASE)

-    def __init__(self):
+    def __init__(self, options=dict()):


All changes in this section (through new line number 51) have to do with allowing the identifiers to be specified as an argument to the TodoParser constructor. I found this a cleaner method of configuring TodoParser when using it in the unit tests. The old method of setting it via an environment variable is still supported, and still has the highest priority, so this is backwards-compatible.

As an aside, it may be helpful to eventually support all configurable items via this options dictionary, but that's a separate discussion.

rgalonso · 2024-11-12T00:39:11Z

TodoParser.py

@@ -232,32 +247,95 @@ def parse(self, diff_file):
                                       + (r'(?!(' + '|'.join(suff_escape_list) + r'))' if len(suff_escape_list) > 0
                                          else '')
                                       + r'\s*.+$)')
-                    comments = re.finditer(comment_pattern, block['hunk'], re.MULTILINE)


And here's the huge diff that I couldn't find a way to make smaller. This is the real crux of the fix to the various issues.

Previously, self._extract_issue_if_exists() was called both from the marker['type'] == 'line' block and the corresponding else block (for block-style comments). It's now called in just one place, after the if/else block. For both branches, a contiguous_comments_and_positions list is constructed. As before, comments which are contiguous are grouped together on the assumption that they're related to the same issue. But what's new here is that this is now also tracking the position (relative to the start of the hunk) where the comment begins. Having this knowledge is what allows us to later differentiate same-titled issues within the same file.

rgalonso · 2024-11-12T00:41:22Z

TodoParser.py

+                    body=[],
+                    hunk=hunk_info['hunk'],
+                    file_name=hunk_info['file'],
+                    start_line=hunk_info['start_line'] + comment_block['start'] + line_number_within_comment_block,


start_line is now a function of the hunk's position within the diff and the comment's position within the hunk. When applicable, it also factors in the comment's position within its comment block.

rgalonso · 2024-11-12T00:47:16Z

tests/test_new_py.diff

This is the 1st of the two files mentioned earlier that helps us expose #236

rgalonso · 2024-11-12T00:48:13Z

tests/test_process_diff.py

+            self._original_addSubTest = result.addSubTest
+            result.addSubTest = self._addSubTest
+
+        super().run(result)


Some necessary plumbing in order to track how many subtests have failed in test test_line_numbering_with_deletions below.

rgalonso · 2024-11-12T00:50:40Z

tests/test_todo_parser.py

+                break
+        else:
+            matching_issues.append(issue)
+    return matching_issues


Essentially a generic version of cont_issues_for_file_type() that lets us find issues which match any arbitrary set of fields. These are AND-ed together. i.e. The issue's fields would need to match everything in fields in order to be part of the list that's returned.

rgalonso · 2024-11-12T00:51:20Z

tests/test_todo_parser.py

+           '',
+           'Unexpected issues:',
+           '\n=========================\n'.join(map(str, unexpected_issues))])
+


Helper function for below tests

rgalonso · 2024-11-12T00:52:24Z

tests/test_todo_parser.py

@@ -112,6 +128,70 @@ def test_liquid_issues(self):
    def test_lua_issues(self):
        self.assertEqual(count_issues_for_file_type(self.raw_issues, 'lua'), 2)

+


Helps us test for issues #234 and #235

rgalonso · 2024-11-12T00:53:29Z

@alstr, ready for your review. Thanks

alstr · 2024-11-12T10:13:32Z

Great stuff!

rgalonso added 9 commits November 11, 2024 17:52

refactor: add optional argument to TodoParser to set configuration

e42bca6

Currently, this is just used to set identifiers without needing to modify the environment, but this could (should?) be extended to other options

feat: add ability to print Issue object

dd15f79

Facilitates debug and error handling

test: add tests to capture additional known issues

872a997

These tests capture the existing issues alstr#234 and issue alstr#235. As no solution is in place yet, they're marked as expected failures.

test: allow multiple diff files to be consumed

69e2360

This sets up the ability to have one diff file create a simulated filesystem and a second to simulate an edit of an existing file

refactor: optionally output log if _standardTest fails

c643aec

Defaults True for backwards compatibility

test: allow test to be an expected success now that bug is fixed

7ce7825

Earlier commit resolved issues alstr#216, alstr#224, alstr#225

test: allow test to be an expected success now that bug is fixed

8c28d09

Earlier commit resolved issue alstr#234

rgalonso commented Nov 12, 2024

View reviewed changes

alstr merged commit f54fbeb into alstr:master Nov 12, 2024
1 check passed

This was referenced Nov 12, 2024

random line of existing source inserted into file #225

Closed

TODOs with same text are confused #216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix identifier case sensitivity bugs #237

fix identifier case sensitivity bugs #237

rgalonso commented Nov 12, 2024

rgalonso Nov 12, 2024

rgalonso Nov 12, 2024

rgalonso Nov 12, 2024

rgalonso Nov 12, 2024

rgalonso Nov 12, 2024

rgalonso Nov 12, 2024

rgalonso Nov 12, 2024

rgalonso Nov 12, 2024

rgalonso Nov 12, 2024

rgalonso Nov 12, 2024

rgalonso commented Nov 12, 2024

alstr commented Nov 12, 2024

		@@ -112,6 +128,70 @@ def test_liquid_issues(self):
		def test_lua_issues(self):
		self.assertEqual(count_issues_for_file_type(self.raw_issues, 'lua'), 2)

fix identifier case sensitivity bugs #237

fix identifier case sensitivity bugs #237

Conversation

rgalonso commented Nov 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgalonso commented Nov 12, 2024

alstr commented Nov 12, 2024