-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: use --no-cache-dir
flag to pip
in dockerfiles, to save space
#11352
chore: use --no-cache-dir
flag to pip
in dockerfiles, to save space
#11352
Conversation
using "--no-cache-dir" flag in pip install ,make sure downloaded packages by pip don't cached on system . This is a best practice which make sure to fetch from repo instead of using local cached one . Further , in case of Docker Containers , by restricting caching , we can reduce image size. In term of stats , it depends upon the number of python packages multiplied by their respective size . e.g for heavy packages with a lot of dependencies it reduce a lot by don't caching pip packages. Further , more detail information can be found at https://medium.com/sciforce/strategies-of-docker-images-optimization-2ca9cc5719b6 Signed-off-by: Pratik Raj <[email protected]>
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
|
Perhaps it might be better to add the no cache dir environment variable, so all pip instructions automatically refrain from using the cache without having to explicitly add it to each call?
|
That's a good idea actually |
It's what we use so we don't have to worry about doing adding that flag in all our dependant images 🙂 |
:) @Rajpratik71 Can you update the PR to use the env var instead |
On examining i noticed that it is multi stage docker build , with a build image and the main image . All the dependencies are getting installed in builder image , then there is no need of this as after build main image is used and pushed . So , there is no need of this PR. Hence , closing. |
For , this in old versions of pip has conflicts, which gives error mentioned at pypa/pip/issues/5385 and pypa/pip/issues/5735. It is fixed in latest versions at |
@Rajpratik71 . Exactly. That is not a good idea. We have multi-segmented build and the "pip install" step is done in the "build" segment. Then only installed Python libraries from "${HOME}/.local" are copied to the final image using COPY --from. It's actually even better to leave pip --cache because then it causes much faster rebuilds of the image. In the build segment we run the pip install twice - the first time to run the "current master" dependencies and then, when we build the image, with the actual dependencies from sources. This way we get faster rebuilds when setup.py changes, we do not have to re-install everything from scratch when we iterate on the image (for example when we are running kubernetes tests). So removing cache in this case is not a good idea at all. |
using "--no-cache-dir" flag in pip install ,make sure downloaded packages
by pip don't cached on system . This is a best practice which make sure
to fetch from repo instead of using local cached one . Further , in case
of Docker Containers , by restricting caching , we can reduce image size.
In term of stats , it depends upon the number of python packages
multiplied by their respective size . e.g for heavy packages with a lot
of dependencies it reduce a lot by don't caching pip packages.
Further , more detail information can be found at
https://medium.com/sciforce/strategies-of-docker-images-optimization-2ca9cc5719b6
Signed-off-by: Pratik Raj [email protected]