Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark tests as proprietary #544

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mivermeu
Copy link
Contributor

This PR aims to resolve #543 by adding markers to any test that requires non-public data.

  1. A new pytest marker "proprietary" was added and applied to relevant tests. If all tests in a TestCase class would be marked, the class itself was marked instead. If all tests in a module would be marked, the module as a whole was marked with pytestmark = pytest.mark.proprietary instead.
  2. Proprietary test files that were loaded on module import were either moved to class methods or pytest fixtures.
  3. CI/CD piplines were marked with the -m 'proprietary or not proprietary option.

@jrkerns
Copy link
Owner

jrkerns commented Feb 11, 2025

Thanks for starting this. I think we can avoid a lot of code changes knowing that pytest runs all test by default, so doing -m 'proprietary or not proprietary' + the addopts is unnecessary to add everywhere. We run these tests all the time locally and in our CI/CD, so by defaulting to not proprietary the onus is on any RadMachine developer to remember to always be adding -m 'proprietary or not proprietary' which we will definitely forget 😉. You can alternatively add a mark only to the public, say, @pytest.mark.public, datasets which are pretty small right now (I think just the demos right now) to do pytest -m public. That would be just a few lines of code.

I appreciate the start here. One project I've never gotten around to was a public dataset site/archive. Something like the NIH Imaging Data Commons with a much smaller and focused scope on QA. Although I'd like to make a nice site with filtering, etc a pragmatic start would be a spreadsheet which could later be converted. Google forms could be used here for entering relevant information such as phantom type, etc. Anonymization would have to be thought about, although at the beginning we could simply place that on the uploader and later perform a best-effort anonymization and/or hash the uploader, clinic, etc. I can make a a stub for that and create a new issue to discuss that. I've been looking for an excuse to use Anvil. Integrating that with the public tests here would be quite nice!

@mivermeu
Copy link
Contributor Author

mivermeu commented Feb 12, 2025

Thanks for the feedback! I think you make a good point omitting the -m 'proprietary or not proprietary' and addopts changes. It's better to run into some failed tests due to access permissions than to think everything's alright because half of the tests didn't run at all.

I'd still advocate for a proprietary marker over a public one since it's the tests that use proprietary data that are in need of some attention. This way we'll have indicators of the test data that is still non-public. Besides, by pytest's count there are currently 4977 tests (impressive!), 3441 of which use non-public data and 1536 of which do not, so that would still be a lot of public markers.

I've added a line to the Contributing page of the documentation warning people of the proprietary test data. Feel free to change anything of course.

One project I've never gotten around to was a public dataset site/archive. Something like the NIH Imaging Data Commons with a much smaller and focused scope on QA.

Anvil looks like an appropriate tool for the job and I'd be interested to see a start of this project. I think this is a great idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unit tests fail without Google storage bucket access
2 participants