-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor into monorepo structure #230
Conversation
Please comment and let's see if we can get this locked down. It only gets more complicated the longer we wait. |
Can you describe the setup and your thoughts behind it? I'm for instance a bit confused about how the Further, is there any reason why we cannot have the standard code structure of a |
A monorepo is exactly having several different but related projects in one repo. If you just have one big thing in the root, that's a monolith and exactly what this PR is about moving away from. If you think we should stick with a monolith, that's a fair position. Fundamentally, the training code does not care for nor need the dependencies from the preprocessing code. When I clone this repo on LUMI, I have no reason to install from the pyproject.toml in the root because the setup is completely different and relates to training. Same goes for evaluation. Nonetheless, there are some dependencies between these things, which is the purpose of putting them in the same repo. |
Agreed on this 👍
Just a nitpick, but having different pyproject.toml-files does not make it less of a monorepo. It means we have multiple packages in the same repo, but that's exactly what we're looking for 👍 @rlrs The motivation here was that separate parts of DFM needed the same dependency, but with different versions, right? We don't need to have separate release pipelines or anything? If that is the case, I think @saattrupdan is right, that the easiest transition is to have a We could also go separate packages. That would just require a bit of CI-work, specifically ensuring that each project is tested independently with the right dependencies. EDIT: @rlrs came swooping in before me 😉 |
If you really prefer sticking with Poetry to manage the entire repo, perhaps this discussion is relevant python-poetry/poetry#936 |
No reason to get snappy here, I just didn't know what a git submodule was in that case, but thanks for the explanation! I found this explanation of a submodule:
That sounds great. Also, @rlrs, would keeping the dependencies of the data processing and training separate resolve the repo issue? |
Wasn't meant that way, was just intended as a shorthand for what you linked!
Splitting dependencies probably goes a long way here. Training requires a separate virtualenv from preprocessing requires a separate virtualenv from evaluation. As for training (and eval on lumi) it all has to run in a specific container, too. Even the venv has to be installed while in this container. I think the essence is that I don't believe it to be cleaner to have one |
It's all good 🙂
Ah, I think I understand it now - the different environments and especially the need to run in a container for the training bit makes it more complicated than simply having the same core setup and installation method. I think I get why a monorepo structure would make things easier in that case, when the setup is really quite different on the two different sides. Will there be any need to import things across the data processing and training, or are they completely separate? And if so, is that possible with the monorepo structure? |
We talked about this at the office, and I believe we achieved a common understanding of this monorepo structure making sense, even though the different projects really aren't connected. |
So what is the conclusion from the discussion above - I am unsure? edit: Discussed this with @MartinBernstorff - it seems like we can clarify this much better in a 20 min call. |
Regarding the CI thing, it seems like it's super easy to run separate parts of CI for different subfolders. |
Great, do we fix this in another PR? Also there might be some paths that are broken now, and misc. things like the Makefile not making sense. |
Yep, let's fix these things in other PR(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Making this PR now so that we can discuss how to structure the potential monorepo. I've already moved some things around, but this probably broke several things that then need to be fixed. Please write any relevant thoughts you have.