Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keyword arguments for populate() #971

Merged
merged 15 commits into from
Feb 9, 2022

Conversation

Hendriela
Copy link
Contributor

This pull request implements optional keyword arguments in populate(), which are passed down to each make() call.

This implementation is tested and works with our database. It allows for custom arguments in make(), while not affecting normal make() methods without arguments.

It is for example useful for bool flags whether to save intermediate results during a pipeline step.

@dimitri-yatsenko
Copy link
Member

dimitri-yatsenko commented Nov 1, 2021

The idea of workflows is that they must be reproducible and that input data must be captured (data provenance). Additional custom inputs into make are not captured in the database.

Perhaps we could allow additional arguments but provide a mechanism for storing them in the database as well for complete provenance. Or we could add this feature but document the principles clearly and ensure that users understand that it's on them to ensure reproducibility and provenance.

@Hendriela
Copy link
Contributor Author

The idea of workflows is that they must be reproducible and that input data must be captured (data provenance). Additional custom inputs into make are not captured in the database.

Perhaps we could allow additional arguments but provide a mechanism for storing them in the database as well for complete provenance. Or we could add this feature but document the principles clearly and ensure that users understand that it's on them to ensure reproducibility and provenance.

That is true. This change would not be suitable to pass down arguments that affect the analysis itself.

We use it, as I said, for controlling peripheral processes like plotting intermediate results, or exporting the results in addition to inserting them in the table. Because these things do not influence the pipeline, but should be more flexible than storing them together with other parameters or hard-code them, I think it is fair if they are not captured in the database.

I agree that if users are provided with this option, workflow reproducibility can easily be (consciously or accidentally) broken. They would have to be clearly aware of the intended use of this feature (similar to the dangers of update()).

@dimitri-yatsenko
Copy link
Member

dimitri-yatsenko commented Nov 1, 2021

I agree that if users are provided with this option, workflow reproducibility can easily be (consciously or accidentally) broken. They would have to be clearly aware of the intended use of this feature (similar to the dangers of update()).

Agreed. We have seen the need for this solution as well and, with proper documentation, it makes sense. We would need to document and add test cases before including in a release.

@dimitri-yatsenko
Copy link
Member

@Hendriela Will you please make integrate the suggestions? I will then merge your PR.

@A-Baji
Copy link
Collaborator

A-Baji commented Feb 1, 2022

Sorry @Hendriela!, I did not mean to commit that to your fork.

I have reverted the commit.

apply changes requested by dmitri
@Hendriela
Copy link
Contributor Author

Thanks to @A-Baji to implement @dimitri-yatsenko s suggestions! I understood that I should implement test cases, but I have very little experience with stuff like this and so was not sure how to do it.
But if the current fork is sufficient, I am more than happy if you merge the PR for the next release!

Copy link
Collaborator

@guzman-raphael guzman-raphael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hendriela @A-Baji Thanks for the contributions! 🤝

These updates look great now and the feature seems quite useful. Would you just update the changelog and fix the minor style test issue? You can place the release notes within the upcoming 0.13.3 release.

Unfortunately, we keep the changelog in 2 places right now located in the following paths:

  • ./CHANGELOG.md
  • ./docs-parts/intro/Releases_lang1.rst

Copy link
Collaborator

@guzman-raphael guzman-raphael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hendriela @A-Baji

🎉 Nice work! 🎉
Great to see collaboration in the community.

@guzman-raphael guzman-raphael merged commit 3bdfcf3 into datajoint:master Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants