Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subruns information in DataKey of superruns to track metadata #866

Merged
merged 4 commits into from
Aug 16, 2024

Conversation

dachengx
Copy link
Collaborator

@dachengx dachengx commented Aug 11, 2024

What is the problem / what does the code in this PR do

The subruns of a superrun is not checked when loading a saved superrun after redefining the superrun. You will find the example in the MWE.

Can you briefly describe how it works?

This PR does the following things:

  1. Add an argument of DataKey: subruns, which is loaded from the metadata of superrun. Of course, if there is no metadata, an error will be raised.
  2. Add Context.get_datakey for the context to get the decorated DataKey. Change all direct call of strax.DataKey in Context to Context.get_datakey, including Context.key_for.
  3. Include the deterministic hash of subruns to the run_id in DataKey, so if everything is the same except subruns, the DataKey.__repr__ will still change.

The motivation that we do not include subruns in lineage is that we would like to keep the lineage hash of same data_type of subruns and superrun the same, like 0-records-j3nd2fjbiq and _superrun_test_uzafu3c2wk-records-j3nd2fjbiq.

Can you give a minimal working example (or illustrate with a figure)?

#!/usr/bin/env python
# coding: utf-8

import json
import datetime
import pytz
from bson import json_util

import numpy as np

import strax
from strax.testutils import Records, Peaks
import straxen


superrun_name = "_superrun_test"

# Prepare for a context
st = strax.Context(
    storage=[
        strax.DataDirectory(
            "./strax_data", provide_run_metadata=True, readonly=False, deep_scan=True
        )
    ],
    register=[Records, Peaks],
    config={"bonus_area": 42},
)
st.set_context_config({"use_per_run_defaults": False, "write_superruns": True})


def _write_run_doc(context, run_id, time, endtime):
    """Function which writes a dummy run document."""
    run_doc = {"name": run_id, "start": time, "end": endtime}
    with open(context.storage[0]._run_meta_path(str(run_id)), "w") as fp:
        json.dump(run_doc, fp, sort_keys=True, indent=4, default=json_util.default)


# Define run metadata of each subrun
offset_between_subruns = 10
now = datetime.datetime.now()
now.replace(tzinfo=pytz.utc)
subrun_ids = [str(r) for r in range(3)]
for run_id in subrun_ids:
    rr = st.get_array(run_id, "records")
    time = np.min(rr["time"])
    endtime = np.max(strax.endtime(rr))

    _write_run_doc(
        st,
        run_id,
        now + datetime.timedelta(0, int(time)),
        now + datetime.timedelta(0, int(endtime)),
    )

    st.set_config({"secret_time_offset": endtime + offset_between_subruns})  # untracked option
    assert st.is_stored(run_id, "records")

st.define_run(superrun_name, subrun_ids)
print(f"When subruns are: {list(st.run_metadata(superrun_name)['sub_run_spec'].keys())}:")
print(f"Old DataKey is {st.key_for(superrun_name, 'records')}")
st.make(superrun_name, "records")
assert st.is_stored(superrun_name, "records")

# But if you change the metadata definition...
st.define_run(superrun_name, subrun_ids[:-1])
print(f"When subruns are: {list(st.run_metadata(superrun_name)['sub_run_spec'].keys())}:")
print(f"New DataKey is {st.key_for(superrun_name, 'records')}")
assert not st.is_stored(superrun_name, "records")

The output will be

When subruns are: ['0', '1', '2']:
Old DataKey is _superrun_test_uzafu3c2wk-records-j3nd2fjbiq
When subruns are: ['0', '1']:
New DataKey is _superrun_test_zc44ingvpo-records-j3nd2fjbiq

Please include the following if applicable:

  • Update the docstring(s)
  • Update the documentation
  • Tests to check the (new) code is working as desired.
  • Does it solve one of the open issues on github?

Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).

@dachengx dachengx marked this pull request as ready for review August 15, 2024 14:27
@dachengx
Copy link
Collaborator Author

The error in the GitHub action will be killed after updating the test of straxen.

@dachengx dachengx merged commit 6a17e37 into master Aug 16, 2024
6 of 8 checks passed
@dachengx dachengx deleted the check_superrun_metadata branch August 16, 2024 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant