Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Handle diagnostics writing for BuiltinModuleExtractable #16940

Merged
merged 5 commits into from
Jul 12, 2024

Conversation

RasmusWL
Copy link
Member

@RasmusWL RasmusWL commented Jul 9, 2024

In the hopes this will fix a problem encountered in the wild (that I couldn't reproduce locally).

I couldn't find any tests that I could easily extend to test the new behavior either, so what I did was to locally alter BuiltinExtractor to always throw an exception. I observed this causes us to generate Exception: 'BuiltinModuleExtractable' object has no attribute 'path' in the logs.

With the commits from this PR, we now properly log the failure reason:

[2024-07-09 14:07:25] [build-stdout] [ERROR] [15] Failed to extract module _datetime: <exception-reason>
[2024-07-09 14:07:25] [build-stdout] [TRACEBACK] [15] "semmle/worker.py", line 263, in _extract_loop
[2024-07-09 14:07:25] [build-stdout] [TRACEBACK] [15] "semmle/extractors/super_extractor.py", line 34, in process
[2024-07-09 14:07:25] [build-stdout] [TRACEBACK] [15] "semmle/extractors/builtin_extractor.py", line 24, in process

and generate diagnostics such as (EDIT: updated to not include location after
354394d)

{"timestamp": "2024-07-12T13:40:41.728383", "source": {"id": "py/diagnostics/internal-error", "name": "Internal error in Python extractor", "extractorName": "python"}, "severity": "error", "plaintextMessage": "Internal error", "visibility": {"statusPage": false, "cliSummaryTable": false, "telemetry": true}, "attributes": {"traceback": ["\"semmle/worker.py\", line 263, in _extract_loop", "\"semmle/extractors/super_extractor.py\", line 34, in process", "\"semmle/extractors/builtin_extractor.py\", line 24, in process"], "args": ["<exception-reason>"]}}

I'm not 100% sure if the tooling will be happy with the fake "file" here, so want to confirm that before merging this PR.

EDIT: I forgot to mention I also tried inserting an exception in internal_error_message function, to verify that error handling also works 👍

[2024-07-09 14:15:07] [build-stdout] [WARN] [7] Failed to write diagnostics: <exception-reason>
[2024-07-09 14:15:07] [build-stdout] [TRACEBACK] [7] "semmle/worker.py", line 263, in _extract_loop
[2024-07-09 14:15:07] [build-stdout] [TRACEBACK] [7] "semmle/extractors/super_extractor.py", line 34, in process
[2024-07-09 14:15:07] [build-stdout] [TRACEBACK] [7] "semmle/extractors/builtin_extractor.py", line 24, in process
[2024-07-09 14:15:07] [build-stdout] [TRACEBACK] [7] "semmle/worker.py", line 290, in _extract_loop
[2024-07-09 14:15:07] [build-stdout] [TRACEBACK] [7] "semmle/logging.py", line 387, in internal_error_message

@RasmusWL RasmusWL requested a review from a team as a code owner July 9, 2024 12:17
@github-actions github-actions bot added the Python label Jul 9, 2024
sidshank
sidshank previously approved these changes Jul 9, 2024
Copy link
Contributor

@sidshank sidshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I like the extra information being logged, and presumably the shift from unhandled exception to warning should allow extraction to continue (and not fail).

tausbn
tausbn previously approved these changes Jul 9, 2024
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. 👍
I'm still somewhat surprised that this error is occurring at all, but I think this should fix it.

@molson504x
Copy link

molson504x commented Jul 10, 2024

Hey @RasmusWL - chiming in because you'd commented you weren't sure how to test this out and couldn't reproduce locally. This is how we found the problem in the wild, and how I kind of diagnosed it as a hybrid between an environment problem and a python extractor problem. This client that we are rolling out CodeQL to is using AmazonLinux to build their runner images (yep, I know, that's where this becomes an environment issue), and AmazonLinux handles cryptography functions in a weird way because of FIPS compliance. I targeted the python hashlib library because of finding this in the debug output of CodeQL:

[2024-07-09 15:42:05] [build-stdout] [DEBUG] [24] Trying built-in extractor on module _blake2
Error: 7-09 15:42:05] [build-stdout] [ERROR] [24] Exception: 'BuiltinModuleExtractable' object has no attribute 'path'

I wrote a small python script which would try to create a blake2 hash, and ran it in python 3.9 in AL2, AL2023, and Ubuntu (Ubuntu was running 3.10). Here's the script:

from hashlib import blake2b
h = blake2b()
h.update(b'Hello world')
print(h.hexdigest())

On AmazonLinux, it failed because blake2b (and blake2s) don't exist in hashlib. On Ubuntu, it works fine and produced a hex digest. According to the docs hashlib should exist in 3.9, as should blake2b and blake2s. Weird, right?

In theory, running this same python script through CodeQL's Python Extractor in an AmazonLinux container and on Ubuntu should be a good sanity check to test this. AmazonLinux is available from dockerhub, so in theory you should be able to spin up a AL2 or AL2023 container locally and just run the CodeQL CLI there to reproduce the error.

Some of the internal tooling would not be too happy about this :D
@RasmusWL RasmusWL dismissed stale reviews from tausbn and sidshank via 354394d July 12, 2024 11:38
@RasmusWL RasmusWL requested a review from tausbn July 12, 2024 11:38
@RasmusWL
Copy link
Member Author

Thanks @molson504x, that's really valuable context 👍 When the change from this PR is applied, we will still have the same underlying problem with blake2, it just wont cause the extractor to crash. (and we'll investigate the underlying problem separately 👍)

oh, I forgot to actually submit this comment 🙈

@RasmusWL RasmusWL merged commit 1de2943 into github:main Jul 12, 2024
8 checks passed
@RasmusWL RasmusWL deleted the rasmuswl/BuiltinModuleExtractable branch July 12, 2024 12:46
@molson504x
Copy link

Thanks! Yeah I'm not saying blake2 was the only one, just the one I happened to notice and used as my test. My theory is amazonlinux is using some kind of fips crypto library that doesn't include, among others, blake2 algorithms which would explain the pathing problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants