Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify functionality of super and hyperrun #871

Merged
merged 16 commits into from
Aug 22, 2024
Merged

Conversation

dachengx
Copy link
Collaborator

@dachengx dachengx commented Aug 19, 2024

What is the problem / what does the code in this PR do

Caveat: sometimes the saved superurn with _combining_subruns = True and _combining_subruns = False are different when the combined subruns under _combining_subruns = False has "long range force", like the example in test_only_combining_superruns. However, this is not usually the case in real data processing. But users need to be clear about what they are doing.

Sorry for the back-and-forth, after this PR:

  1. When run_id starts with _ and _combining_subruns is True when get_iter, the context will make the targets and combine them according to the run metadata.
  2. When run_id starts with _ and _combining_subruns is False when get_iter, the context will make the subruns' targets untill the dependency tree starts to support superrun (allow_superrun is True). The first plugin whose allow_superrun is True will merge the depends_on of subruns according to the run metadata, and then the following processing will all be done in the scope of superrun.

To achieve these, the main change is that we handle superrun, especially the subruns in Plugin, not Rechunker anymore.

Also, completely deprecate storage_converter.

Can you briefly describe how it works?

Change the logic in get_components.

Can you give a minimal working example (or illustrate with a figure)?

The most important test might be test_only_combining_superruns and test_loaders_and_savers in TestSuperRuns. Please look at them.

components = self.context.get_components(self.superrun_name, "peak_classification")
# Because records is not allow_superrun
assert "records" in components.loaders
# Because though we call for peak_classification,
# peaks already allow_superrun
assert "peaks" not in components.loaders
# peaks and lone_hits should all be saved
assert "peaks" in components.savers
assert "lone_hits" in components.savers
# of course peak_classification should be saved
assert "peak_classification" in components.savers

# When we make superrun, subruns of the targeted data_type should
# be first made individually and combined.
components = self.context.get_components(
    self.superrun_name, "peak_classification", _combining_subruns=True
)
assert len(components.loaders) == 1
assert "peak_classification" in components.loaders

When _combining_subruns is True, context will process each subrun to the targeted data_type, and then combine subruns.

Please include the following if applicable:

  • Update the docstring(s)
  • Update the documentation
  • Tests to check the (new) code is working as desired.
  • Does it solve one of the open issues on github?

Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).

@coveralls
Copy link

coveralls commented Aug 22, 2024

Coverage Status

coverage: 90.283% (+0.5%) from 89.802%
when pulling 35662fb on unify_super_hyperrun
into 95f8ca2 on master.

@dachengx dachengx marked this pull request as ready for review August 22, 2024 02:19
@dachengx dachengx merged commit ac232cc into master Aug 22, 2024
8 checks passed
@dachengx dachengx deleted the unify_super_hyperrun branch August 22, 2024 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants