Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Hyperrun #838

Merged
merged 7 commits into from
May 14, 2024
Merged

Implement Hyperrun #838

merged 7 commits into from
May 14, 2024

Conversation

dachengx
Copy link
Collaborator

@dachengx dachengx commented May 2, 2024

What is the problem / what does the code in this PR do

Hyperruns. Hyperruns are superruns, also processing of hyperruns depends on run_metadata defined similarly to https://straxen.readthedocs.io/en/latest/tutorials/SuperrunsExample.html#Define-a-superrun:.

If a hyperrun has run_id __000000, the plugin.run_id of the plugins used in processing will be 000000, and the subruns of the hyperrun will be mixed(concatenated) while processing.

A new attribute allow_hyperrun of strax.Plugin class is added. The allow_hyperrun of data_type depends on the data_type whose allow_hyperrun is True can only be True. So this means, your father is True, so you must be True.

Can you briefly describe how it works?

The key feature of hyperrun is that the chunks of subruns can be loaded together.

For superruns, the subruns are still made and loaded separately. For hyperrun, we can really treat the combination of subruns as a single run in the context of processing.

For example, in the added test., the data_type sum deepens on ranges. The "data" in ranges are just sequence of numbers in order with an offset, like [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]. The sum will sum the "data" in ranges by ranges["data"] + ranges["data"][::-1], so you will get [0] * 10 or [29] * 10.

Frist, note that we will have 3 runs. And the "data" in the ranges of 3 subruns are assigned to be [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], and [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]

  1. If the loaded sum is from a superrun, the sum will be [9] * 10 + [29] * 10 + [49] * 10. Just because the subruns will still be calculated separately.
  2. If the loaded sum is from a hyperrun, the sum will be [29] * 30. Because the subruns will still be first combined into "data" whose length is 30, then ranges["data"] + ranges["data"][::-1] run. So the subruns are really loaded as if they are the same run.

Can you give a minimal working example (or illustrate with a figure)?

The example is in the added test.

Please include the following if applicable:

  • Update the docstring(s)
  • Update the documentation
  • Tests to check the (new) code is working as desired.
  • Does it solve one of the open issues on github?

Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).

@dachengx dachengx marked this pull request as ready for review May 2, 2024 20:50
@dachengx dachengx marked this pull request as draft May 2, 2024 21:01
@coveralls
Copy link

coveralls commented May 3, 2024

Coverage Status

coverage: 90.719% (-0.5%) from 91.17%
when pulling a007517 on hyperrun
into 954bc39 on master.

@dachengx dachengx marked this pull request as ready for review May 3, 2024 04:12
@yuema137 yuema137 self-requested a review May 7, 2024 22:41
strax/chunk.py Outdated Show resolved Hide resolved
@yuema137
Copy link
Collaborator

yuema137 commented May 7, 2024

Hi @dachengx this PR looks good to me, but I'm a little bit confused:

  • Why do we add hyperrun on top of superrun? It seems to me that their duties are kind of duplicated and the restrictions & checkings for hyperrun could be set for superrun as well.
    It will be great if you could provide a scenario when we need this feature

@dachengx
Copy link
Collaborator Author

dachengx commented May 8, 2024

Hi @dachengx this PR looks good to me, but I'm a little bit confused:

  • Why do we add hyperrun on top of superrun? It seems to me that their duties are kind of duplicated and the restrictions & checkings for hyperrun could be set for superrun as well.
    It will be great if you could provide a scenario when we need this feature

I will write a test module for this PR. Sorry, actually this PR is not fully ready. In the test, I will show how hyperruns and superruns are different.

@dachengx
Copy link
Collaborator Author

@yuema137 the test is added

@yuema137 yuema137 self-requested a review May 14, 2024 02:53
Copy link
Collaborator

@yuema137 yuema137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dachengx, I read the changes carefully, and now I think I understand this much better:

  • In get_components, the treatment for hyperrun is actually similar to normal runs, which means there is only a single loader. For superruns a loader is defined for each subrun.
  • And the difference is that for hyperruns more checks are done to guarantee that the targets to load satisfy the requirements for hyperruns.
  • Therefore the hyperrun is precisely equivalent to a single run in the view of strax/straxen. So when you use the exhaust plugins, the processor will regard several runs as a single one and give you the whole chunk. But for normal plugins, there should not be a difference.

The test works fine for me. Please merge it if my understanding is correct. Otherwise please let me know.

@dachengx dachengx merged commit d66cd35 into master May 14, 2024
8 of 9 checks passed
@dachengx dachengx deleted the hyperrun branch May 14, 2024 05:27
dachengx added a commit to XENONnT/straxen that referenced this pull request May 16, 2024
@dachengx dachengx mentioned this pull request May 16, 2024
4 tasks
dachengx added a commit to XENONnT/straxen that referenced this pull request May 16, 2024
@zihaoxu98
Copy link

Thanks for this hyper interesting PR. Although I cannot follow the code, based on my understanding in the test case, we always need ExhaustPlugin when we need to concatenate the chunks and do the computing right? I am asking because I found the document of ExhaustPlugin is lost and it would give the wrong result if I changed it to the normal Plugin.

@yuema137
Copy link
Collaborator

@zihaoxu98 The purpose of Hyperrun is to let strax regard the runlist as a single run (for super run it's not the case as strax still treats them as separate runs). It doesn't make observable changes for normal plugins because the data is still chunked, and the computation is done chunk by chunk.
Only with ExhaustPlugin, which tries to combine whatever chunk it can find within a single run, is the computation performed for the combined big chunk as we expect. So basically, ExhaustPlugin merges all possible chunks in a single run, and Hyperrun breaks the boundary between runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants