Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Monorepo #4765

Closed
gnodar01 opened this issue Apr 10, 2023 · 4 comments
Closed

Migrate to Monorepo #4765

gnodar01 opened this issue Apr 10, 2023 · 4 comments

Comments

@gnodar01
Copy link
Collaborator

Many motivations for this, but generally, it makes dev easier:

  • Often changes need to simultaneously be made to core and cellprofiler. Currently, core PR must be merged first before cellprofiler PR can be tested and merged. We'd like to have a single git branch be responsible for a single feature change across all necessary components, filed under a single PR.
  • Dependencies, version numbers, build systems, etc all need to be manually synchronized. With a monorepo these are still separate but can be automated more effectively.
  • We want library as a separate package, without introducing yet another repo.

We've talked about moving to poetry as well. If we want to do this simultaneously, we have at least two paths forward:

  1. We can wait until Support subprojects in a poetry project python-poetry/poetry#2270 is closed before doing anything. There is also Monorepo / Monobuild support? python-poetry/poetry#936, but that has been folded into the former.

  2. Don't wait, and do what several others have done (eg here w/ src, or here, or here): move to a pyproject.toml per subpackage (one for core, one for library, and one at the top level for cellprofiler), with a scripts directory that will run poetry <cmd> across all of them if/when necessary. Once the issue above is closed, we can migrate.

The main priority with (2) is to avoid having to create a separate virtual environment (something poetry strongly nudges towards) for each package. We already go through the trouble to make sure dependencies are non-conflicting, so we only need a single virtual environment.

If we don't want to move to poetry (either at all or yet), we might like to have a monorepo-aware build system like pants for the reasons outlined here.

Ultimately, we don't have to prioritize that either. We'd just refactor the directory to include core, and put library in its own package, with a simple setup.py:

CellProfiler/
├─ packages/
│  ├─ cellprofiler-core/
│  │  ├─ README
│  │  ├─ setup.py
│  │  ├─ cellprofiler_core/
│  │  │  ├─ __init__.py
│  │  ├─ tests/
│  │  ├─ docs/
│  ├─ cellprofiler-library/
│  │  ├─ setup.py
│  │  ├─ README
│  │  ├─ cellprofiler_library/
│  │  │  ├─ __init__.py
│  │  ├─ tests/
├─ setup.py
├─ README
├─ cellprofiler/
│  ├─ __init__.py
├─ tests/
├─ plugins/
├─ docs/
@gnodar01
Copy link
Collaborator Author

@bethac07 and I talked about it, and it may make sense just to do the latter thing, and build out the rest if/when desired. It's a bit inefficient, since for example core tests will run even if only changing cellprofiler-related code, but not that huge a deal. @callum-jpg would you want to give it a go?

@callum-jpg
Copy link
Collaborator

@gnodar01 @bethac07

I have a working example of this on my cellprofiler fork

Here's some notes:

  1. Dependencies have been bumped up a little. This was done because Poetry package resolution was immensely slow for some of the dependencies that had their version range start on old versions. Poetry talk about this at the top of their FAQ
  2. All dependencies installed for me with poetry install as you would expect, except one: mahotas. To solve this, I simply ran pip install mahotas and then re-ran poetry install. Somehow, pip install seems to find mahotas-1.4.13-cp39-cp39-macosx_10_9_universal2.whl, which doesn't exist on PyPI, which is why (I think) poetry fails on this dependency for me (this is apple silicon related).
  3. cellprofiler_core has been merged with the main cellprofiler repo. All commit history has been preserved. I did this using the following guide: https://medium.com/@chris_72272/keeping-git-history-when-converting-multiple-repos-into-a-monorepo-97641744d928
  4. Both cellprofiler_core and cellprofiler_library can be found in the subpackages directory and can be installed independently
  5. Dependencies (and relative paths to cellprofiler_core and cellprofiler_library in the case of cellprofiler) can be found in pyproject.toml files
  6. I found this interesting monorepo guide: https://www.tweag.io/blog/2023-04-04-python-monorepo-1/
    1. They report that they prefer to use pip to handle dependencies, rather than poetry, yet use poetry for its other features. This is because poetry seems to have problems installing pytorch. There is a proposed solution to this here, but I'm unsure if specifying a particular whl is a decent solution, since it pins the platform and python version.
      1. While the pytorch stuff is not super relevant, it's important to keep in mind for the RunCellpose plugin (if the plugins repo joins the monorepo). Moreover, there could be future dependencies that are poorly handles by poetry somehow

To install (I did all of this inside a conda env with python 3.9 and had no issues):

# Install poetry
curl -sSL https://install.python-poetry.org | python3 -

# Clone CellProfiler monorepo fork
https://github.com/callum-jpg/CellProfiler
cd CellProfiler

# Install cellprofiler library only
poetry install --directory subpackages/cellprofiler_library

# Install CellProfiler (and therefore core, and library)
poetry install

# Get pythonw to run GUI on mac inside conda env, if applicable
conda install python.app

# Run cellprofiler (this also works: poetry run pythonw -m cellprofiler)
pythonw -m cellprofiler

@bethac07
Copy link
Member

bethac07 commented May 8, 2023

This is fantastic, though I confess this gave me a bit of a heart attack until I remembered "wait, it has all the core commits in there".

image

I look forward to chatting in person more tomorrow to help me wrap my head around, but one thing I think we should note here - we may have preserved the histories, but one thing that IS broken is that PRs reference the (wrong) CellProfiler PR/issue with the corresponding number aka core PR 126 points at CellProfiler issue 126, because of course it doesn't realize there should be two number 126's.

I think in general we're getting far, far more value doing this than we lose with this messiness, and we can always keep the core repo (renamed if necessary to something like core-archive) around for when we want to do historical digging, but just want to make sure we make the choice conscious of all the tradeoffs/downsides. I tried a different approach that claimed to preserve histories in my fork, but it 100% did not

@bethac07
Copy link
Member

bethac07 commented May 17, 2023

@callum-jpg I know this isn't top of your to-do list right now, but I mentioned this to @hinerm he mentioned we might try using git filter-branch (presumably, specifically, --msg-filter) to allow us to edit commit messages so that instead of using the shorthand #NNN (which then breaks) , edit them to https://github.com/CellProfiler/core/pull/NNN (which shouldn't break, for as long as we keep the repo archived). Seems like worth putting a bit (but perhaps not tons) of time into when you get back to this project in order to try to minimize the impacts of the mergeback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants