-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include chunk_number
in lineage: Per chunk storage
#863
Conversation
@FaroutYLq no hurry, I need to test it further. Maybe you are able to hire more people to review. |
Maybe we can add the hash (maybe sha256) of metadata of |
chunk_number
in lineagechunk_number
in lineage: Per chunk storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi thanks for the efforts. I need a bit more time to digest but please let me start asking dumb questions:
- Do we have any plugin that is supposed have no chunk at all (only with metadata) when computation is finished? I vaguely remember seeing something like this before, but not sure if it is just because that it failed somewhere. I don't have an example on hand unfortunately.
- If so will this PR breaks things?
- In this PR, is it expected to have different hash for different chunks, in the same plugin?
I am happy to review this and hope to learn more about the core, but I want to figure out these questions first before diving into it.
|
Is this feature really needed? I have the feeling it will cause us more trouble than we gain. Can you give a few use cases why this feature is required? |
It is needed in reprocessing when we want to process a run but do not want to wait for all chunks to be processed in sequence. A homemade and similar feature is already in outsource but according to @FaroutYLq , it is not so compatible with strax. So this feature is needed. |
Hey, @WenzDaniel. Do you agree with this PR now? We can also have some detailed inspection on it together. You can also list your concerns below in the conversation. Thanks! |
What is the problem / what does the code in this PR do
The
chunk_number
was passed toContext.get_iter
to only load a specific chunk. But the previous implementation has weaknesses:chunk_number
is not tracked by lineage so if you process twice with a differentchunk_number
, though the lineage is the same, the results are not.data_type
to load by chunk number.This PR makes sure that the
chunk_number
is tracked by lineage, and you can assign which chunk to load which data_type.This PR is designed following plan: https://xe1t-wiki.lngs.infn.it/doku.php?id=xenon:xenonnt:analysis:analysis_tools_team:sr2_processing. We will later make outsource more compatible with strax via this PR.
Depends on #856
Can you briefly describe how it works?
Functionality not implemented:
chunk_number
will not be passed to run selection, because we will not send the metadata includingchunk_number
to DB.Context.__add_lineage_to_plugin
to addchunk_number
as a configuration ofPlugin
.chunk_number
is set to bechunk_number: ty.Optional[ty.Dict[str, ty.List[int]]] = None
as argument in functions, e.g. it can be{"raw_records": [0, 1]}
or{"peaklets": [1], "lone_hits": [0]}
. The latter example is means this PR adds the functionality.Context.merge_per_chunk_storage
to combine the per chunk storage into normal storage(wherechunk_number
is not a configuration ofPlugin
).Vulnerability of this PR:
Can you give a minimal working example (or illustrate with a figure)?
Please include the following if applicable:
Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).