Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-detect the base-distro (aka glibc version) of container images #1684

Closed
4 tasks
achimnol opened this issue Nov 2, 2023 · 0 comments · Fixed by #2582
Closed
4 tasks

Auto-detect the base-distro (aka glibc version) of container images #1684

achimnol opened this issue Nov 2, 2023 · 0 comments · Fixed by #2582
Labels
comp:agent Related to Agent component effort:hard Need to understand many components / a large extent of contextual or historical information. urgency:2 With time limit, it should be finished within it; otherwise, resolve it when no other chores.
Milestone

Comments

@achimnol
Copy link
Member

achimnol commented Nov 2, 2023

Currently we rely on ai.backend.base-distro label to choose the appropriate build of statically built in-container applications injected by the agent, such as sftp-server, libbaihook, etc.

Sometimes, there are human errors to misconfigure this lable or omit it, causing unexpected app launch failures after creating session containers.

Could we read the container image (without lauching it?) to determine the glibc version inside it to automate this process so that the agent could mount the appropriate binaries before starting the container?

Requirements:

  • No copy of layers/files
  • No creation of containers

TODOs:

  • Attach the minimum glibc version requirement of each agent-provided binaries. (or, could we also read this information automatically when starting up the agent?)
  • Create an interface to inspect an image's filesystem
    • Input: image name & tag
    • Output: the (readonly) merged view path
  • Implement the overlay concrete class for the interface.
  • Insert a glibc version detection step before creating the container in the docker backend (when preparing mounts).

Expected results:

  • Clearer upfront error messages when the image is not compatible with any agent-provided binaries
  • No longer need to specify the labels when building new images (no more human mistakes)

Some surveys

Inspecting the filesystem of container image without running it

Docker by default uses the overlay2 storage driver. (docs, source)
We can manually create a temporary directroy and mount the overlay driver using the list of layer diffs stored as "LowerDir" in the result of docker image inspect:

[
  {
    ...
    "GraphDriver": {
      "Data": {
        "LowerDir": "/var/lib/docker/overlay2/s4zpndcg91we84gpuypg9lqyw/diff:/var/lib/docker/overlay2/noyix80wfyriygqwrw6g7byx0/diff:/var/lib/docker/overlay2/3ivr1rq6spowycb0tm5wzdxrv/diff:/var/lib/docker/overlay2/aelr4wom8ybuazvbh454k43er/diff:/var/lib/docker/overlay2/u7yivy2ed819j6fu2c4go8k99/diff:/var/lib/docker/overlay2/nj2kibbgamju7dhzl3zmd3g7e/diff:/var/lib/docker/overlay2/2089afd8aa85e4246a4d9031169a0246ef7b427cc4776cccdea2f1c2fe51f511/diff",
        "MergedDir": "/var/lib/docker/overlay2/xtx60wfphf9xh9sg9w1txel59/merged",
        "UpperDir": "/var/lib/docker/overlay2/xtx60wfphf9xh9sg9w1txel59/diff",
        "WorkDir": "/var/lib/docker/overlay2/xtx60wfphf9xh9sg9w1txel59/work"
      },
      "Name": "overlay2"
    },
    ...
  }
]
sudo mkdir /tmp/merged
sudo mount -t overlay overlay -olowerdir=/var/lib/docker/overlay2/s4zpndcg91we84gpuypg9lqyw/diff:/var/lib/docker/overlay2/noyix80wfyriygqwrw6g7byx0/diff:/var/lib/docker/overlay2/3ivr1rq6spowycb0tm5wzdxrv/diff:/var/lib/docker/overlay2/aelr4wom8ybuazvbh454k43er/diff:/var/lib/docker/overlay2/u7yivy2ed819j6fu2c4go8k99/diff:/var/lib/docker/overlay2/nj2kibbgamju7dhzl3zmd3g7e/diff:/var/lib/docker/overlay2/2089afd8aa85e4246a4d9031169a0246ef7b427cc4776cccdea2f1c2fe51f511/diff /tmp/merged
ls -l /tmp/merged/lib/
ls -l /tmp/merged/usr/lib/
...

For most cases, we could consider only the overlay driver and fallback to temporarily creating a container for inspection.

Retrieving the glibc version from libc.so.N file

Read the symbol versions and take the latest one.

$ objdump -T /root/test/usr/lib/aarch64-linux-gnu/libc.so.6 | grep GLIBC | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -Vu | tail -n 1
2.35

Some examples of our binary's glibc version requirements

$ objdump -T src/ai/backend/runner/libbaihook.ubuntu18.04.aarch64.so | grep GLIBC | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -Vu | tail -n 1
2.17

$ objdump -T src/ai/backend/runner/libbaihook.ubuntu20.04.aarch64.so | grep GLIBC | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -Vu | tail -n 1
2.17

$ objdump -T src/ai/backend/runner/libbaihook.ubuntu22.04.aarch64.so | grep GLIBC | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -Vu | tail -n 1
2.34

$ objdump -T src/ai/backend/runner/sftp-server.ubuntu16.04.aarch64.bin | grep GLIBC | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -Vu | tail -n 1
2.17

$ objdump -T src/ai/backend/runner/sftp-server.ubuntu18.04.aarch64.bin | grep GLIBC | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -Vu | tail -n 1
2.26

$ objdump -T src/ai/backend/runner/sftp-server.ubuntu20.04.aarch64.bin | grep GLIBC | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -Vu | tail -n 1
2.26
@achimnol achimnol added type:feature Add new features comp:agent Related to Agent component effort:hard Need to understand many components / a large extent of contextual or historical information. urgency:2 With time limit, it should be finished within it; otherwise, resolve it when no other chores. labels Nov 2, 2023
@achimnol achimnol added this to the 24.03 milestone Nov 2, 2023
@achimnol achimnol removed the type:feature Add new features label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:agent Related to Agent component effort:hard Need to understand many components / a large extent of contextual or historical information. urgency:2 With time limit, it should be finished within it; otherwise, resolve it when no other chores.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant