Skip to content
This repository has been archived by the owner on May 2, 2024. It is now read-only.

Ensure Tutor works on ARM64 via an easy-to-use plugin #35

Closed
kdmccormick opened this issue Jan 31, 2022 · 22 comments
Closed

Ensure Tutor works on ARM64 via an easy-to-use plugin #35

kdmccormick opened this issue Jan 31, 2022 · 22 comments
Assignees
Labels
bug Report of or fix for something that isn't working as intended enhancement Relates to new features or improvements to existing features

Comments

@kdmccormick
Copy link
Collaborator

kdmccormick commented Jan 31, 2022

The original title was Ensure Tutor works out-of-the-box on ARM64. We reduced the scope to "via an easy-to-use plugin" for now. We'll follow up with making this an official part of the base installation in : #81

Context

There are a few issues circulating around Open edX development on machines under the ARM64 architecture, notably the new M1 Macs. While many/all of these issues are not caused by Tutor itself, they affect the Tutor dev experience nonetheless. To make it maximally likely that 2U (largely Mac users) fully adopts Tutor, it's important that the tools feels like an "out of the box" experience as much as possible. While ideally this will involve merging fixes to upstream repos, it may also involve workarounds or doc additions in Tutor core or plugins.

Open issues

  1. Tutor's openedx image is built for x86-64
  2. MySQL 5.7 does not provide ARM64 images.
  3. frontend-app-learning does not build on ARM64

Acceptance

Ensure that it's nearly as easy (1-2 extras steps max) to set up Tutor on an ARM64 machine as it as on x86-64 machine in all of these configurations:

  • Stable
  • Nightly, using master
  • Nightly, using a custom bind-mounted branch

and using all of these plugins:

  • tutor-mfe
  • tutor-discovery
  • tutor-forum
@ormsbee
Copy link

ormsbee commented Jan 31, 2022

@kdmccormick: So there are at least a few possible pieces of work here, right?

  1. Short term (optional): Make it so that Tutor detects the platform's architecture so that it chooses to use MariaDB automatically instead of requiring people set DOCKER_IMAGE_MYSQL=mariadb:10.4.
  2. Medium term (by Olive, maybe earlier, but after edx-platform code fixes are in): Change the MySQL version to 8.0, which has proper images for ARM64.
  3. Short term: The tutor-mfe ARM64 issue.

@kdmccormick
Copy link
Collaborator Author

Thanks @ormsbee ; I factored your comment into the ticket description.

@kdmccormick
Copy link
Collaborator Author

kdmccormick commented Feb 14, 2022

@regisb Please take a look at the Context, Open Issues, and Acceptance criteria above, and let me know:

  • anything I missed in my synopsis
  • your thoughts on the goal I listed
  • whether there are any paths we could take to make progress on this

Some ideas I've heard so far:

  • Provide an ARM64-ready openedx image in the overhangio DockerHub repository.
  • Detect whether a user is on ARM64, and if so, switch to MariaDB (edit: or MySQL 8!) by default.

@ormsbee
Copy link

ormsbee commented Feb 14, 2022

@kdmccormick: As a heads up that per @jmbowman, the plan is for edx-platform to be MySQL 8.0 compatible in time for Nutmeg, though it will likely also support 5.7 during that release, with 5.7 support dropped for Olive.

@jmbowman
Copy link

edx-platform now has a passing MySQL 8 migrations check: openedx/edx-platform#29784 . The rest of the services haven't been thoroughly checked yet, but I think that resolves all the MySQL 8 compatibility issues that have been identified so far.

@kdmccormick kdmccormick changed the title As a developer, I want Tutor to work out-of-the-box on ARM64 Ensure Tutor works out-of-the-box on ARM64 Feb 14, 2022
@kdmccormick
Copy link
Collaborator Author

That is good to know! It sounds to me like the best workaround for the MySQL issue is to have ARM64 users upgrade early to MySQL 8.

@regisb
Copy link

regisb commented Feb 15, 2022

There are two items which are especially difficult to me:

  1. Providing arm64 Docker images: to do that, I would have to build the linux and arm64 images synchronously, with distributed workers. Currently, most of my CI does not run on arm64, but on a self-hosted Kubernetes cluster. I could switch to GitHub actions, but I'm very reluctant -- for various reasons, but mostly because of vendor lock-in. Thus, the only solution is emulation. But building the "openedx" image alone takes 3 hours (source). I do not want to delay the publication of Docker images by 3 hours for every release. In any case, it seems to me that arm64 users are mostly interested in the Tutor nightly branch -- for which no Docker image is published, and all users are required to build the images themselves. So I see little point in making drastic changes to the architecture of my CI that would provide only a minor benefit.
  2. Getting MFEs to build on arm64: I think that it makes little sense to pretend that Tutor supports arm64 if MFE images do not build on that architecture. But I have no idea what's the root cause of the issue, or how to resolve it. I could use some help here...

For these reasons, I'm not sure what's the best course of action here :-/ Maybe that we should identify the issues that arm64 users face when they follow the tutorial?https://docs.tutor.overhang.io/tutorials/arm64.html

@kdmccormick
Copy link
Collaborator Author

I could switch to GitHub actions, but I'm very reluctant -- for various reasons, but mostly because of vendor lock-in ... I do not want to delay the publication of Docker images by 3 hours for every release.

That makes sense. Thinking out loud: maybe 2U or tCRIL could publish a set of ARM64 images that users could opt into. As I understand it, that wouldn't require any changes on your end or in Tutor. Users would just opt-in to the alternate images by changing the *_DOCKER_IMAGE config settings to different remotes.

In any case, it seems to me that arm64 users are mostly interested in the Tutor nightly branch -- for which no Docker image is published, and all users are required to build the images themselves.

I have been running Nightly using the official nightly images instead of building my own. Is that the wrong way to do it?

I think that it makes little sense to pretend that Tutor supports arm64 if MFE images do not build on that architecture.

I agree: this would need to be fixed upstream. I did link a potential upstream solution above.

@kdmccormick kdmccormick moved this from Ungroomed (Kyle) to Ungroomed (Régis) in Tutor DevEnv Adoption (OLD BOARD) Feb 15, 2022
@regisb
Copy link

regisb commented Feb 15, 2022

I have been running Nightly using the official nightly images instead of building my own. Is that the wrong way to do it?

OK you know what? I had forgotten that these images existed 😅 this is indeed the "right" way to do it. These images are updated nightly by a cron job, which means that it should be fairly easy to make that cron job also build the arm64 "openedx" images. The 3 hour delay is not such a big deal in that situation. Note however that this cron job does not build images for plugins. For instance, the "openedx-discovery" plugin does not have any "nightly" image: https://hub.docker.com/r/overhangio/openedx-discovery/tags

This changes my perspective completely. Currently, my CI builds images on Kubernetes, but honestly it's kind of a pain to build docker-in-docker. I would be interested in setting up a build cluster with both linux/amd64 and linux/arm64 platforms (Hetzner provides Mac Mini hosting, for instance). This would resolve the question of building the Docker images. The MySQL issue is an easy fix. Provided the MFE issue is also resolved, we should be able to improve the arm64 user experience.

@kdmccormick kdmccormick moved this from Ungroomed (Régis) to Ungroomed (Kyle) in Tutor DevEnv Adoption (OLD BOARD) Feb 15, 2022
@yarons
Copy link

yarons commented Feb 16, 2022

Hey guys, I really like this initiative and I'm wondering if I could help.
I created some images to support running openedx on ARM64 with images pushed to the ghcr.io docker registry.

I just wanted to point out some weird things that I ran into along the way:

  1. Ubuntu does support MySQL 5.7 on ARM but not as part of the official repository so I had to install the .deb files with dpkg which is pretty terrible but it works, well, at least until the grand switchover to 8.
  2. Building multiarch openedx container in parallel takes a lot of time (3 hours as mentioned) so I was wondering if it would be better to just split the process to 2 images built on different containers/machines and combining the end results with docker manifest before pushing, I'm not sure how to do it but I'm still trying to figure this out and docker manifest is still considered experimental.
  3. Splitting the buildx multiarch to several building proccesses will allow identifyting the weak points and possibly change the specific build steps to run faster, another thing that docker and buildx support is the use of environment variables TARGETPLATFORM and BUILDPLATFORM.
  4. I've also noticed some special "run-on-arch" that we can use to run the AMD64 and ARM64 on respective platform and decrease the building time (basically the binfmt translation time).
  5. We've also discussed the mongo situation, as long and there's no plan to support older versions no special works is needed, if there's a plan to support openedX that requires preparations of mongo 3 docker for ARM64, Canonical provides .deb files for mongo 3 via Launchpad, same thing that we encountered in MySQL 5.7.

This is my repo and most of the magic happens in the GitHub Actions section:
https://github.com/yarons/tutor-arm-docker
https://github.com/yarons/mysql-arm-docker (This docker has a lot of certificate acquisition code that can be removed safely but I left it so it'll look and feel a bit like the official MySQL docker for AMD64).

I'm aware that @regisb is highly against GitHub Actions due to vendor lock-in but it should be relatively easy to create a working example on GitHub Actions and switch over to a more open platform (Do you have something in mind? Does GitLab CI/CD platform good enough?)

Thank you!

@kdmccormick kdmccormick moved this from Ungroomed (Kyle) to Ungroomed (Régis) in Tutor DevEnv Adoption (OLD BOARD) Feb 16, 2022
@yarons
Copy link

yarons commented Feb 19, 2022

BTW I managed to run tutor on ARM64 in AWS (Sadly with MariaDB), it requires some tweaking to the current Dockerfiles and images (see bugs 590 and 591 on tutor's repo).

Changing the dockerize tarball to the right architecture and fixing the permissions docker (building it with docker images build permission before running the quickstart/init) should suffice.

If you need any further assistance I'd love to help.

@kdmccormick
Copy link
Collaborator Author

Wow, thank you @yarons! I will take a closer look next week, but it first glance I think what you have will be really helpful.

Did you end up trying to run Tutor with your custom MySQL 5.7 ARM64 image before switching to MariaDB?

@yarons
Copy link

yarons commented Feb 20, 2022

Wow, thank you @yarons! I will take a closer look next week, but it first glance I think what you have will be really helpful.

Did you end up trying to run Tutor with your custom MySQL 5.7 ARM64 image before switching to MariaDB?

Not yet but thanks for the idea, I did it for @regisb , I'll try to use it for my implementation as well.

@kdmccormick kdmccormick added bug Report of or fix for something that isn't working as intended enhancement Relates to new features or improvements to existing features and removed for:2u-pilot labels Feb 22, 2022
@kdmccormick kdmccormick moved this from Ungroomed (Régis) to Ungroomed (Kyle) in Tutor DevEnv Adoption (OLD BOARD) Mar 17, 2022
@kdmccormick kdmccormick moved this from Ungroomed (Kyle) to Groomed in Tutor DevEnv Adoption (OLD BOARD) Mar 31, 2022
@yarons
Copy link

yarons commented Apr 10, 2022

@kdmccormick I'm not sure it's still relevant but I ran it with my own MySQL 5.7 Docker and it worked great.

Thanks.

@regisb
Copy link

regisb commented Apr 15, 2022

Can someone with a Mac please confirm that we can close this issue? All images should build properly on the master/nightly branches.

@yarons
Copy link

yarons commented Apr 15, 2022

@regisb This is highly important as all the build process was done on Ubuntu ARM64 and never tested with an M1 CPU before.

@regisb regisb moved this from Groomed to In Progress in Tutor DevEnv Adoption (OLD BOARD) Apr 15, 2022
@regisb regisb self-assigned this Apr 15, 2022
@bradenmacdonald
Copy link

bradenmacdonald commented Apr 16, 2022

Can someone with a Mac please confirm that we can close this issue? All images should build properly on the master/nightly branches.

@regisb I can confirm that with the current tutor nightly + edx-platform master, the base and dev images build properly :)
However I am having some issue with npm install and libsass in particular, so I haven't been able to build the static assets. I'll investigate.

Edit: since the images built OK this must be some issue with my bind-mounted copy of edx-platform, which takes effect after the image gets built. So I assume the image is fine.

@kdmccormick
Copy link
Collaborator Author

My read on the status of this issue is:

Before we can say "ARM64 works out-of-the box", though, we'll want to make sure that:

  • Braden's plugin has a bona fide maintainer
  • Braden's plugin is mentioned in the docs in a very obvious location
    • Perhaps: It installed by-default as an official plugin, so that ARM64 just need to enable it.
  • ARM64 images are regularly built and pushed

These issues are discussed a bit in this PR: overhangio/tutor#650

@bradenmacdonald
Copy link

@kdmccormick Did people try out my plugin at the conference? I haven't heard any feedback so don't have a sense of how helpful it has been.

@kdmccormick
Copy link
Collaborator Author

@bradenmacdonald Yes, a few people used it and it seemed to work great!

@ianonavy
Copy link

First time trying to run Tutor, and I own an M1 Mac. I tried the plugin just now. Works great!

@kdmccormick kdmccormick changed the title Ensure Tutor works out-of-the-box on ARM64 Ensure Tutor works on ARM64 via an easy-to-use plugin Jun 22, 2022
@kdmccormick
Copy link
Collaborator Author

I'm closing this to capture the fact that we now have Tutor working on ARM via an easy-to-use plugin 💯

As I mentioned in this comment we have some more work to do, which I'm splitting out into a new issue: #81

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Report of or fix for something that isn't working as intended enhancement Relates to new features or improvements to existing features
Projects
None yet
Development

No branches or pull requests

7 participants