Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simpler chunk length check, avoid recursion limit crash #803

Merged
merged 2 commits into from
Feb 11, 2024

Conversation

JelleAalbers
Copy link
Member

What is the problem / what does the code in this PR do

This fixes a crash @cfuselli always sees after restrax for some time.

Can you briefly describe how it works?

In #758 we made StorageBackend check that the chunk length matches the length specified in the metadata. We wrap the class attribute _read_chunk in __new__. But __new__ runs every time we make a new instance. Applications that make backends many times (like a running restrax session) end up with many layers of wrappings, and eventually hit python's recursion limit.

This keeps the check (and its unit test) but implements it more simply. Since _read_chunk is only used in StorageBackend._read_and_format_chunk, we can just add the check there. It will still apply to every storage backend.

@dachengx please let me know if I missed a specific reason behind the implementation in #758!

@coveralls
Copy link

coveralls commented Feb 7, 2024

Coverage Status

coverage: 91.452% (-0.02%) from 91.476%
when pulling 4a0ef00 on JelleAalbers:fix_check_chunk
into 2aa293a on AxFoundation:master.

def __new__(cls, *args, **kwargs):
"""Mandatorily wrap _read_chunk in a check_chunk_n decorator."""
if "_read_chunk" in cls.__dict__:
method = getattr(cls, "_read_chunk")
Copy link
Collaborator

@dachengx dachengx Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JelleAalbers for the improvement. I understand the request from Carlo and your new implementation. I implemented #758 to force-check the chunk size. Can we do something like hasattr(cls._read_chunk, '__closure__') and cls._read_chunk.__closure__ is not None here to test whether _read_chunk is decorated?

Or use some method like https://stackoverflow.com/a/68583930.

Because I wanted to make sure the chunks are really checked, so applied the wrapper at the lowest level before strax.load_file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I understand the check is important, and the storage class hierarchy is sufficiently confusing that I understand you reached for decorators. And there will be a way to fix the decorator solution with more metaprogramming (at the risk of making more hard-to-spot bugs).

I think _read_and_format_chunk is the better place to do this check. The _read_chunk method returns a numpy array, not a strax Chunk: it's _read_and_format_chunk's job to turn that into a chunk. Creating a chunk triggers checks, like whether the data matches the time range, the start and end time are integers, etc. So it makes sense to check the chunk length matches the metadata at the same time.

There is no code that calls _read_chunk directly -- at least not in the AxFoundation or XENONnT GitHub. The only reason I'd write something like that is to avoid all the checks that come with making a strax chunk, like when trying to read data whose metadata is corrupted. Then you don't want the chunk length check either.

@dachengx
Copy link
Collaborator

dachengx commented Feb 8, 2024

@FaroutYLq did you check the chunk size manually in outsource but not using _read_chunk?

@FaroutYLq
Copy link
Collaborator

@FaroutYLq did you check the chunk size manually in outsource but not using _read_chunk?

No I was using strax.load_file(files[i], compressor=compressor, dtype=dtype) for each existing file. Is it more recommended to use _read_chunk now?

@dachengx
Copy link
Collaborator

No I was using strax.load_file(files[i], compressor=compressor, dtype=dtype) for each existing file. Is it more recommended to use _read_chunk now?

No not really.

@dachengx dachengx merged commit d610f8a into AxFoundation:master Feb 11, 2024
9 checks passed
@JelleAalbers JelleAalbers deleted the fix_check_chunk branch February 21, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants