Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using conda: directives in snakemake rules #450

Closed
koen-vg opened this issue Dec 13, 2022 · 4 comments · Fixed by #484
Closed

Consider using conda: directives in snakemake rules #450

koen-vg opened this issue Dec 13, 2022 · 4 comments · Fixed by #484

Comments

@koen-vg
Copy link
Contributor

koen-vg commented Dec 13, 2022

Conda: directives in snakemake rules.

I would propose a minor enhancement to the Snakefile, which is to add conda: directives to each rule; see https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management. The implication for running pypsa-eur is that you would no longer have to activate the pypsa-eur conda environment before calling snakemake, but instead call snakemake with the --use-conda argument.

Pros:

  • The potential for using the wrong conda environment is much reduced.
  • This makes it trivial to manage several parallel pypsa-eur installation with potentially different conda environments (different versions, different projects, development, etc.)
  • Snakemake automatically takes care of rebuilding the environment when you change the specification.
  • This makes it possible to use pypsa-eur as a snakemake module and use different conda environments in the "parent" workflow.
  • As far as I'm aware, if you don't supply the --use-conda argument, snakemake will run the workflow as usual, meaning that this change would be fully backwards compatible.

Cons:

  • Yet another keyword in every single snakemake rule.
  • Makes it slightly more cumbersome to add additional dependencies to the environment; they must now be specified in the "environment.yaml" instead of installing them "on the go" in the live environment. (Although it can be argued that forcing a more declarative style like this has advantages too, since it makes it more obvious what the environment looks like.)
  • You might still have to build the pypsa-eur environment "manually" for use in notebooks, etc., and may still need to keep this environment up to date manually.

Other implications:

  • By default, recent versions of snakemake will re-run any rules using conda environment "x" after the specification for "x" was changed. This makes a lot of sense from a reproducibility point of view, since the results of a rule could change when the software environment changes. From a practical point of view, however, it can be annoying if you for example add a dependency to the environment and this causes the whole workflow to re-run. Of course, this can be circumvented by a snakemake [...] --touch [...] stategy; it might be an idea to write about that in the pypsa-eur documentation if this change is adopted.

And example of what this would look like can be found here: https://github.com/koen-vg/pypsa-eur/blob/64aed45cd3d94ab48cdaed0cc02c16b6aec7dfc5/Snakefile

Overall I think this would be a good step in the direction of making pypsa-eur easier to work with in a fully reproducible way and makes it harder to make mistakes with the conda environment.

Of people are in favour of this change, I would of course be happy to open a pull request.

@koen-vg
Copy link
Contributor Author

koen-vg commented Dec 13, 2022

(I could also add that if this is adopted, I'd be happy to open a similar pull request for pypsa-eur-sec.)

@fneum
Copy link
Member

fneum commented Dec 13, 2022

I agree we should try out this snakemake feature. I've also seen that over in the euro-calliope, and think it would bring a benefit.

One more pro (I think):

  • Not all packages are needed in all rules. By using the use-conda directive, it's not strictly necessary to have only one conda environment for the whole workflow anymore.

One challenge, I see:

  • Solvers in environments. This is about the only software that you kind of need to install on top of the PyPSA-Eur conda environment. One solution could be to install all the most common, but not all solvers are available on all platforms. Another to ask the user to add manually to the environment file before running. Another is to inject Snakefile with bash code to append solver installation based on config.yaml? Maybe someone else sees a way out.

A useful note:

  • You can even use snakemake --conda-frontend mamba to choose mamba over conda.

@koen-vg
Copy link
Contributor Author

koen-vg commented Dec 13, 2022

As for --conda-frontend option, its default value is already "mamba"! So we get that benefit for free.

I agree that the solver is something to think about. As of today, the user is already required to install some kind of solver in the pypsa-eur environment, so action is required either way. The difference would be adding a line to environment.yaml versus running a command (mamba install [...]) once. I think I still have a preference for adding a line to environment.yaml, out of the two options.

Maybe another option could be to have multiple environment files, with different solvers, and choose the correct environment for the solve_network rule based on the configuration. This can work because the conda: statement in the snakefile can take a function (just like input: and others), and this function could return the correct environment file name. Of course, this comes with the slight disadvantage of having to maintain several solving environments.

I think bash code to install a solver in the Snakefile is more likely to lead to issues down the road.

@pz-max
Copy link
Collaborator

pz-max commented Dec 13, 2022

@koen-vg having to maintain different environement.yaml's is not strictly necessary. We could write an 'updater' function that uses a base.environmental.yaml and e.g. updates it with gurobi.environment.yaml. We did that in this python script, where the config.tutorial.yaml is updated by the test/configs (snakemake uses the same code to overwrite config.yamls)

However, I prefer installing all major solvers only in one environment.yaml as here. It's just easier to maintain and allows this cool --use-conda stuff. (Pro's outweigh here in my opinion the cons. I also like the backwards compatibility -> not using snakemake --use-conda ...)

@fneum fneum mentioned this issue Mar 8, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants