Mark tests as proprietary #544

mivermeu · 2025-02-11T02:44:32Z

This PR aims to resolve #543 by adding markers to any test that requires non-public data.

A new pytest marker "proprietary" was added and applied to relevant tests. If all tests in a TestCase class would be marked, the class itself was marked instead. If all tests in a module would be marked, the module as a whole was marked with pytestmark = pytest.mark.proprietary instead.
Proprietary test files that were loaded on module import were either moved to class methods or pytest fixtures.
CI/CD piplines were marked with the -m 'proprietary or not proprietary option.

jrkerns · 2025-02-11T14:36:01Z

Thanks for starting this. I think we can avoid a lot of code changes knowing that pytest runs all test by default, so doing -m 'proprietary or not proprietary' + the addopts is unnecessary to add everywhere. We run these tests all the time locally and in our CI/CD, so by defaulting to not proprietary the onus is on any RadMachine developer to remember to always be adding -m 'proprietary or not proprietary' which we will definitely forget 😉. You can alternatively add a mark only to the public, say, @pytest.mark.public, datasets which are pretty small right now (I think just the demos right now) to do pytest -m public. That would be just a few lines of code.

I appreciate the start here. One project I've never gotten around to was a public dataset site/archive. Something like the NIH Imaging Data Commons with a much smaller and focused scope on QA. Although I'd like to make a nice site with filtering, etc a pragmatic start would be a spreadsheet which could later be converted. Google forms could be used here for entering relevant information such as phantom type, etc. Anonymization would have to be thought about, although at the beginning we could simply place that on the uploader and later perform a best-effort anonymization and/or hash the uploader, clinic, etc. I can make a a stub for that and create a new issue to discuss that. I've been looking for an excuse to use Anvil. Integrating that with the public tests here would be quite nice!

mivermeu · 2025-02-12T01:59:37Z

Thanks for the feedback! I think you make a good point omitting the -m 'proprietary or not proprietary' and addopts changes. It's better to run into some failed tests due to access permissions than to think everything's alright because half of the tests didn't run at all.

I'd still advocate for a proprietary marker over a public one since it's the tests that use proprietary data that are in need of some attention. This way we'll have indicators of the test data that is still non-public. Besides, by pytest's count there are currently 4977 tests (impressive!), 3441 of which use non-public data and 1536 of which do not, so that would still be a lot of public markers.

I've added a line to the Contributing page of the documentation warning people of the proprietary test data. Feel free to change anything of course.

One project I've never gotten around to was a public dataset site/archive. Something like the NIH Imaging Data Commons with a much smaller and focused scope on QA.

Anvil looks like an appropriate tool for the job and I'd be interested to see a start of this project. I think this is a great idea!

mivermeu added 2 commits February 10, 2025 21:40

Mark tests using non-public data as proprietary

22e41ed

Select all tests in CI/CD pipelines

92d22cb

Revert test selection changes, add contributing instructions

c140c14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark tests as proprietary #544

Mark tests as proprietary #544

mivermeu commented Feb 11, 2025

jrkerns commented Feb 11, 2025

mivermeu commented Feb 12, 2025 •

edited

Loading

Mark tests as proprietary #544

Are you sure you want to change the base?

Mark tests as proprietary #544

Conversation

mivermeu commented Feb 11, 2025

jrkerns commented Feb 11, 2025

mivermeu commented Feb 12, 2025 • edited Loading

mivermeu commented Feb 12, 2025 •

edited

Loading