-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples page #7
Conversation
2023_05_31 JM/MT/JW meeting notes
We'll assume that the solution will include re-running the raw source notebooks to generate output (since we can't assume that notebooks in different repos will have their outputs stored.
We'll have automation that checks for
Result: Undecided. We'll start with a maximalist-ish solution for first iteration.
JW -- Existing constraints look good, though I don't like the expectation that folder and notebook name should be the same (means only one notebook per folder, which isn't necessarily a pattern I can commit to remembering/doesn't seem necessary)
MT + JW -- Each notebook folder will have a We'll automatically include new notebooks, automatically update existing notebooks, automatically remove notebooks, and organize the notebooks with categories. We'll base these updates on when things get released. We'll tentatively assign each notebook to one or more categories. JM will come up with a tentative list of categories which the package maintainers will assign to each of their notebooks. The titles will be auto-harvested from the top-level header. Some of these will need to be abbreviated since they're currently quite long.
JW + MT -- Agree (except JW may have a little technical trouble with complying with the final one, though if this replaces the Toolkit's need to support cloud-runnable notebooks maybe it'll be fine)
MT + JW -- Should centralize this in openff-docs. No benefit to copying it out.
Undecided, will revisit later. |
@mattwthompson I'm curious how you would like me to handle Also my computer can run all of the examples (in parallel) in like 5 minutes. Makes the twenty minute execution times in the Toolkit CI pretty frustrating. Also this seems to be the maximalist environment, for future reference: channels:
- conda-forge
- bioconda
dependencies:
- pip
- python=3.10
# Cookbook
- gitpython
- nbconvert
- nbformat
# Examples
- openff-toolkit-examples
- gromacs
- lammps
- rich
- jax This seems to be going too well, I should check if the notebooks are kicking up exceptions that are getting happily baked into the executed notebooks... |
# Execute notebooks in parallel for rendering as HTML | ||
if do_exec: | ||
# Context manager ensures the pool is correctly terminated if there's | ||
# an exception | ||
with Pool() as pool: | ||
# Wait a second between launching subprocesses | ||
# Workaround https://github.com/jupyter/nbconvert/issues/1066 | ||
_ = [*pool.imap_unordered(execute_notebook, delay_iterator(notebooks))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I've always wanted to be able to do this in tests but I couldn't figure out how to get nbval to do it. I mean, I guess it's possible and I just never found out how. Still have to turn off pytest-randomly
, probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah figuring this out was not fun. There's a race condition in nbconvert (which executes the notebooks) so if you launch too many notebook kernels too quickly you sometimes get two of them on the same port. But fixing each problem that came up in the dumbest way possible seemed to work eventually!
I just set it in my action: https://github.com/openforcefield/openff-interchange/blob/v0.3.4/.github/workflows/examples.yaml#L34 Using the |
I'm moving around some files in Interchange; shouldn't affect things too much and doesn't need to be the last change. Just FYI: openforcefield/openff-interchange#740 |
The modified JS file introduced in this commit is from: nglviewer/nglview#1064 Once that PR makes it into a release, the logic and JavaScript added in this commit should be replaced with a pin to the appropriate version of NGLView.
OK. Now that I've been working on this for 3 weeks (:scream:), there are some things that I should note. We agreed at our meeting that we only want to update the examples in this repository when a project makes a release. This is actually a super important fact that it's taken me a while to figure out the full implications of...
I'm hoping to get the GitHub action that does the execution and pre-processing written by the end of the week... after which point this should come together very quickly. I'm thinking I'll do it in a separate PR so we can make sure it works on a release or two before merging this one and making its results visible. |
Weakly-held opinion is that re-running everything everytime anything gets a bump seems excessive but generally keeping things up-to-date is nice. I came across some notebooks in the toolkit that haven't been updated since openforcefield/openff-toolkit#1426, which is not super recent. |
The current implementation runs the notebooks for rendering as web pages from a maximalist environment stored in this repo, but allows source repos to override it for the Colab link and zip download. If the source repo doesn't provide an environment for a notebook, Colab and the zip gets the maximalist environment. So zippy examples can include minimalist environments where it makes a difference. I don't have handling of source repo-wide environments - it's either the openff-docs example environment, or a notebook environment. Running each notebook in its own environment for the web page rendering will be slow and difficult, so I don't want to do it. On the other hand, if a release includes an example that doesn't work with the openff-docs example environment, a quick PR to update that environment should be enough to get the release working again. |
@j-wags @mattwthompson @lilyminium I think this is ready to merge! I would appreciate some clicking around and checking that everything works for everyone else, and a second pair of eyes on whatever you all have time for, and then I'll merge tomorrow unless something comes up. @j-wags I haven't created a quarterly calendar event to clean up the cache because I've added an action to clean up PR folders in the cache when the PR gets merged. Combined with the fact that I've updated the action not to store histories, this should keep everything tidy. I'll test the cleanup works when this PR gets merged :/ The only automation for regenerating the cache is the scheduled nightly one; if the cache needs to be regenerated at any other time (for example, while developing a PR), then it has to be done manually. Instructions for how to do that should be in a comment on any new PR - I'll test that when I open the PR to add QCSubmit and Fragmenter. I just really didn't want to wait 30 minutes every time I make a change to any part of the repo; regenerating the cache should be relatively rare. There's also no way to trigger an RTD PR build in a GitHub action, so if the cache was automatically regenerated in PRs, you'd have to manually trigger that once its done. @lilyminium I still have NAGL 0.2.2 pinned here, because of that versioneer issue I raised at NAGL. To unpin, that needs to be fixed, as that's how the cookbook knows which tag/branch to get the example notebooks from - the environment file is solved by Mamba, and then the tag corresponding to the installed version is cloned to get the notebooks. Updating by hand if the next release doesn't fix that issue should be fine; the version needs to be updated in both the examples conda environment and the Once this is merged I'll write up some documentation on how it works so that it can be maintained/updated while I'm at OpenFE, and I'll open that PR for QCSubmit/Fragmenter. |
I spent a small amount of time poking through things and didn't observe anything notably violent. In fact things seem pretty good - pages load snappily, the content I expect to be there is there, and even the 3D renders work great! If there was one thing I could suggest as an improvement, it'd be more thumbnails. I'm partially responsible for this, so .... One of the NAGL examples ("Prepare a dataset for training") should probably be renamed to include GCNN/NAGL in the title; when I read it for the first time I didn't know what it was about and a new user might confuse it with QC/physical property dataset curation. Some of the cell outputs could do with pretty-ification, like this one which is a fair amount of information in a small number of wrapped black-and-white lines. Out of scope here but might be worth exploring? It'd be a luxury to suppress warnings generated by the runner, like this one: I could spend an hour or two going through things in great detail, the outcome of which would mostly be me wanting to re-write half of the examples and nothing to do with the automation that generates these webpages or dancing around the edges like earlier in this comment. So I think the plan to merge roughly as-is in a day or two is great. |
Agreed! Thumbnails are super time consuming, happen in the source repositories, and like all updates from source repos require a release to be updated, so hopefully if we all chip in a thumbnail or two when we have a moment of inspiration we can fix this over time.
Also agreed! This will be fixed in the next release of NAGL
Definitely worth exploring!
I think we could just hide STDERR, but I'm not sure it's a good idea because I'd like users to know that they're not doing anything wrong if they get a warning when they run it themselves. I think the ideal resolution for each warning, in order of preference, is:
My priority here is helping users understand what's going on, and helping to make outputs reproducible between rendering and running. If it pushes us to fix warnings or find pathways to avoid warnings, so much the better. So again, hopefully we can improve this over time.
I think that's my assessment too - plenty of improvements to make in examples.
Woo! |
Progress
proc_examples.py
and its dependencies), separate them from changes to Sphinx, and stick them in a second PR (Pre-process notebooks from other repositories and cache in a branch #8)main
.alert
styled like.admonition
mamba env update
Goals and motivation
This is going to be a one-stop shop for all the examples, tutorials, and cookbooks in the OpenFF world.
Rendered
Notes for discussion about how to design this:
I would like:
Maybe it would be nice if:
Possibly surprising things we can do:
Constraints:
We need to standardize and make explicit:
Thankfully most of these are already broadly the same across repositories
examples
folder is laid out and where the notebooks goI recommend something like:
This implies using the same environment for testing notebooks in CI as for running notebooks as a user, which is not current practice at least in the Toolkit. This means users might get a few dependencies they don't need, but guarantees the right dependencies are being tested.
We need to decide, in broad strokes
Some dev experience design stuff:
There's basically two extremes here, and intermediate states are possible: One is just do all the layout in MarkDown and have the Sphinx extension only take care of downloading and processing the notebooks (what OpenMM Cookbook does, means that new examples only show up with a PR to openff-docs), and the other is to have all the layout information in the notebook metadata and have the Sphinx extension take care of everything (a very basic version of which is currently implemented in this PR).
And some technical stuff:
I have some ideas that I hate:
main
and keep them up to date with CIWe need to change in existing examples