feature: split instruction used to set up environment in order to support caching when building Docker #3577

niits · 2023-02-16T06:26:28Z

Feature request

Change the Docker instruction by splitting it into two parts:

Copy the entire directory except the models directory
Then copy the models directory.

Motivation

Currently, the instruction in the Dockerfile will copy the entire current directory before installing the libraries using the file /home/bentoml/bento/env/python/install.sh. This causes the entire installation of libraries to always be re-run, which may not be necessary if we only change the model or the runner.

COPY --chown=bentoml:bentoml . ./


# Block SETUP_BENTO_COMPONENTS
# install python packages with install.sh
RUN bash -euxo pipefail /home/bentoml/bento/env/python/install.sh

# Block SETUP_BENTO_ENTRYPOINT
RUN rm -rf /var/lib/{apt,cache,log}

It should be like:

COPY --chown=bentoml:bentoml apis  env   src  bento.yaml  Dockerfile  README.md  ./

RUN bash -euxo pipefail /home/bentoml/bento/env/python/install.sh

RUN rm -rf /var/lib/{apt,cache,log}

COPY --chown=bentoml:bentoml models ./

Other

No response

The text was updated successfully, but these errors were encountered:

aarnphm · 2023-02-16T11:46:35Z

I would suggest using a cache remote repository, with --cache-from and --cache-to so that the installation process for docker will be cached remotely.

I will take this into consideration as we are in the process of improving the build workflow.

niits · 2023-02-17T17:53:18Z

@aarnphm Thank you for replying.

Since we copy the entire current folder in before installing the environment, the layer created will permanently be changed when the new version of the bento is updated. So the layer that holds the Python environment never being cached because it depends on the previous instruction. Please refer to Optimizing builds with cache management.

I see you have some commands to cache the /root/.cache/pip directory but it seems to only work with buildkit. That made me understand that the installation of libraries will always be run if using another tool (Kaniko, we have slightly customized your source code to fit our needs:v ).

BentoML/src/bentoml/_internal/container/__init__.py

Line 189 in 6b00baf

"--mount=type=cache,target=/root/.cache/pip " if enable_buildkit else ""

To be more specific, I am having the two versions of Bento created from the same source code as follows:

 bentoml list

 Tag                                 Size       Creation Time        Path                                                
 movie_recommender:25svplelmkcu3tdd  9.33 MiB   2023-01-03 19:54:02  ~/bentoml/bentos/movie_recommender/25svplelmkcu3tdd 
 movie_recommender:vjsd7yelmw2vtgiq  9.33 MiB   2023-01-03 19:54:02  ~/bentoml/bentos/movie_recommender/vjsd7yelmw2vtgiq 
 iris_classifier:3mfgmqelmkemhbza    24.00 KiB  2023-01-03 19:34:00  ~/bentoml/bentos/iris_classifier/3mfgmqelmkemhbza

Because the above two versions are generated with the same source code and they have same env folder too, I expect that the environment installation will be cached when I build the second Bento and two Docker images should share the same layer. However, it seems that the result is not my expectation when we have two layers with different digests used for two docker images.

This makes us spend a lot of unnecessary memory to store these two images as well as the image update process will be much longer when we have to almost completely pull back a layer that accounts for more than 98% of the size of the image. I think improving this point will greatly help with CI/CD based deployments or simply users will pull new images much faster.

Hope you can consider this idea. If there is something I don't understand correctly about how BentoML works, please let me know. Sorry about my bad English :(

niits changed the title ~~feature: split environment settings to support cache when building docker~~ feature: split instruction used to set up environment in order to support caching when building Docker Feb 16, 2023

aarnphm mentioned this issue Feb 17, 2023

rfc: build improvement #3580

Open

smidm mentioned this issue Mar 15, 2023

feat(containerize): caching pip/conda installation layers #3673

Merged

4 tasks

aarnphm closed this as completed in #3673 Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: split instruction used to set up environment in order to support caching when building Docker #3577

feature: split instruction used to set up environment in order to support caching when building Docker #3577

niits commented Feb 16, 2023 •

edited

Loading

aarnphm commented Feb 16, 2023

niits commented Feb 17, 2023 •

edited

Loading

feature: split instruction used to set up environment in order to support caching when building Docker #3577

feature: split instruction used to set up environment in order to support caching when building Docker #3577

Comments

niits commented Feb 16, 2023 • edited Loading

Feature request

Motivation

Other

aarnphm commented Feb 16, 2023

niits commented Feb 17, 2023 • edited Loading

niits commented Feb 16, 2023 •

edited

Loading

niits commented Feb 17, 2023 •

edited

Loading