Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop best-practice for linking PosePipeline to analysis schemas #4

Open
peabody124 opened this issue Mar 24, 2021 · 0 comments
Open
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@peabody124
Copy link
Collaborator

peabody124 commented Mar 24, 2021

Right now the analysis pipeline is standalone, which is somewhat of a strength. However, videos from experiments have their own organization structure that should be reflected and will benefit from using DataJoint.

Approach 1) One option is to use the upcoming Deferred schemas feature https://github.com/datajoint/datajoint-elements/blob/main/DesignPrinciples.md. This would in principle allow creating a modular PosePipeline under each analysis schema, with foreign keys indicating all the data links. However, the down side is the computational framework will need to run on and populate each of these instances of the pipeline. Depending on the infrastructure, this could just reflect adding additional tasks to run and check for Video in each analysis schema to then process.

Two other limitations that might be more technical barriers of the datajoint-elements approach is:

  • currently in my current analysis paradigm there are different nodes from which the videos depend upon (for example videos collected from a cell phone, versus from stationary cameras) even though I want them to run through the same "modular" analysis pipeline -- would this require expanding the hierarchy twice??? I suspect, this can't even be done.
  • it appears from the Deferred schemas document the linking (i.e. parent) schema is determined at run time. However, it is likely the parent node name must be determined in the definition, so would need to be forced the same amongst different analysis pipelines.

Approach 2) The alternative, and what I'm currently doing, is having the Video have a primary key that consists of (project, filename), where each analysis pipeline then uses at least one unique project name to isolate their videos. Each analysis pipeline also has a node that contains the filename thus using the join:

pose_pipeline.Video (filename, video_project) * analysis_pipeline.VideoLinkNode (filename, video_project)

Allows connecting between the two. By overriding the key source in downstream nodes of VideoLink to include the relevant join, it populates the right data. e.g.:

@property
def key_source(self):
    return LinkNode & pose_pipeline.ExposePerson

The big down side is that deleting VideoLinkNode doesn't delete the Video (and actually blocks inserting it again if you delete it without manually deleting the corresponding Videos), and if you repopulate the analysis on that Video it won't correctly.

Ideally, there would be a way to get the best of both worlds - to have the foreign key benefits for data integrity but have some Videos point to one schema for their foreign keys and others point to a different one. This is still technically a DAG, but I don't think can be done with MySQL.

@peabody124 peabody124 added documentation Improvements or additions to documentation enhancement New feature or request labels Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant