Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not spam the log with checksum related INFO messages when downloading using transfer_manager #1357

Merged
merged 2 commits into from
Oct 9, 2024

Conversation

rafalh
Copy link
Contributor

@rafalh rafalh commented Oct 2, 2024

download_chunks_concurrently function does not allow to set checksum field in download_kwargs. It also does not set it on its own so it takes the default value of "md5" (see Blob._prep_and_do_download). Because ranged downloads do not return checksums it results in a lot of INFO messages (tens/hundreds):

INFO google.resumable_media._helpers - No MD5 checksum was returned from the service while downloading ...
(which happens for composite objects), so client-side content integrity checking is not being performed.

To fix it set the checksum field to None which means no checksum checking for individual chunks. Note that transfer_manager has its own checksum checking logic (enabled by crc32c_checksum argument)

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #1358 🦕

Copy link

conventional-commit-lint-gcf bot commented Oct 2, 2024

🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use automerge label. Good luck human!

-- conventional-commit-lint bot
https://conventionalcommits.org/

@product-auto-label product-auto-label bot added size: xs Pull request size is extra small. api: storage Issues related to the googleapis/python-storage API. labels Oct 2, 2024
@rafalh rafalh changed the title Fix checksum related INFO messages spamming the log when downloading … Fix checksum related INFO messages spamming the log when downloading using transfer_manager Oct 2, 2024
@rafalh rafalh changed the title Fix checksum related INFO messages spamming the log when downloading using transfer_manager fix: do not spam the log with checksum related INFO messages when downloading using transfer_manager Oct 2, 2024
@rafalh rafalh marked this pull request as ready for review October 2, 2024 10:44
@rafalh rafalh requested review from a team as code owners October 2, 2024 10:44
@cojenco cojenco added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 4, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 4, 2024
Copy link
Contributor

@cojenco cojenco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a reasonable patch to avoid passing in the default checksum md5 in the subsequent call blob._prep_and_do_download. @andrewsg thoughts/concerns to this change?

@rafalh Could you update the corresponding unit tests in https://github.com/googleapis/python-storage/blob/main/tests/unit/test_transfer_manager.py?

@andrewsg
Copy link
Contributor

andrewsg commented Oct 4, 2024

This makes sense to me. Thank you for your submission!

…nloading using transfer_manager

`download_chunks_concurrently` function does not allow to set `checksum` field in `download_kwargs`. It also does not set it on its own so it takes the default value of `"md5"` (see `Blob._prep_and_do_download`). Because ranged downloads do not return checksums it results in a lot of INFO messages (tens/hundreds):
```
INFO google.resumable_media._helpers - No MD5 checksum was returned from the service while downloading ...
(which happens for composite objects), so client-side content integrity checking is not being performed.
```
To fix it set the `checksum` field to `None` which means no checksum checking for individual chunks. Note that `transfer_manager` has its own checksum checking logic (enabled by `crc32c_checksum` argument)
@rafalh
Copy link
Contributor Author

rafalh commented Oct 8, 2024

I fixed the tests and got them to pass locally (executed only tests/unit/test_transfer_manager.py). I had to make a change to download_chunks_concurrently so it makes a copy of download_kwargs dict argument before changing it (which is a good approach anyway). Otherwise some tests changed DOWNLOAD_KWARGS global variable and made other tests fail
Also removed unused expected_download_kwargs variables in some tests

@cojenco cojenco added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 9, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 9, 2024
@cojenco cojenco added the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 9, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 9, 2024
Copy link
Contributor

@cojenco cojenco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for working on this!

@cojenco cojenco merged commit 42392ef into googleapis:main Oct 9, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. size: xs Pull request size is extra small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

download_chunks_concurrently spams the log with INFO messages about checksum not being returned
4 participants