Skip to content

Commit

Permalink
Multistage docker (flashlight#396)
Browse files Browse the repository at this point in the history
Summary:
**Original Issue**: This dicussion started [here](flashlight#225 (comment))

The main point of this PR is to update dockerfiles to use a multistage build, reducing build time and final image size. But it also adds to small but important features:

- `.dockerignore` to avoid passing unnecessary files to the build context
- `ccache` support to CMake

> I'm sure that most of the packages installed with apt on some builds and especially on the final image can be removed. But you know better which ones to take out. This would further reduce the final image size

| image                          | build-time | size   |
|--------------------------------|------------|--------|
| cpu-base-consolidation-latest  | 454.4s     | 4.76GB |
| cpu-latest                     | 433.6s     | 3.38GB |
| cuda-base-consolidation-latest | 970.3s     | 13.7GB |
| cuda-latest                    | 440.3s     | 9.31GB |

Pull Request resolved: flashlight#396

Reviewed By: jacobkahn

Differential Revision: D25820782

Pulled By: tlikhomanenko

fbshipit-source-id: eef3d0550dee862d9bbfb24aa06f716c92ef75e8
  • Loading branch information
Alejandro Gaston Alvarez Franceschi authored and facebook-github-bot committed Jan 26, 2021
1 parent b0fba70 commit 1f002f0
Show file tree
Hide file tree
Showing 6 changed files with 422 additions and 213 deletions.
73 changes: 65 additions & 8 deletions .docker/Dockerfile-CPU
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,75 @@
# flashlight master (git, CPU backend)
# ==================================================================

FROM flml/flashlight:cpu-base-consolidation-latest

ENV MKLROOT="/opt/intel/mkl"
FROM flml/flashlight:cpu-base-consolidation-latest as build

# ==================================================================
# flashlight with CPU backend
# ------------------------------------------------------------------
# Setup and build flashlight
RUN mkdir /root/flashlight
RUN mkdir /tmp/flashlight

COPY . /tmp/flashlight

RUN mkdir -p /tmp/flashlight/build && \
cd /tmp/flashlight/build && \
cmake -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX="/opt/flashlight" -DFL_BACKEND="CPU" -DFL_LIBRARIES_USE_CUDA="OFF" -DFL_BUILD_TESTS="ON" -DDNNL_DIR="/opt/onednn/lib/cmake/dnnl" -DGloo_INCLUDE_DIR="/opt/gloo/include" -DGloo_NATIVE_LIBRARY="/opt/gloo/lib/libgloo.a" .. && \
make -j$(nproc) && \
make install -j$(nproc)



COPY . /root/flashlight
#############################################################################
# SECOND STAGE #
#############################################################################

RUN cd /root/flashlight && mkdir -p build && \
cd build && cmake .. -DFL_BACKEND=CPU -DFL_LIBRARIES_USE_CUDA=OFF && \
make -j$(nproc) && make install
FROM ubuntu:18.04

ENV DEBIAN_FRONTEND=noninteractive

# install dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
g++ \
# for cmake
zlib1g-dev libcurl4-openssl-dev \
# for MKL
apt-transport-https \
gpg-agent gnupg2 \
# for arrayfire CPU backend
# OpenBLAS
libopenblas-dev libfftw3-dev liblapacke-dev \
# ATLAS
libatlas3-base libatlas-base-dev libfftw3-dev liblapacke-dev \
# ssh for OpenMPI
openssh-server openssh-client \
# OpenMPI
libopenmpi-dev libomp-dev openmpi-bin \
# libsndfile
libsndfile1-dev \
# for libsndfile for ubuntu 18.04
libopus-dev \
# FFTW
libfftw3-dev \
# for kenlm
zlib1g-dev libbz2-dev liblzma-dev \
# gflags
libgflags-dev libgflags2.2 \
# for glog
libgoogle-glog-dev libgoogle-glog0v5 \
# for receipts data processing
sox \
# for python
python3-dev python3-pip python3-distutils && \
apt-get clean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt/lists/*

COPY --from=build /opt/arrayfire /opt/arrayfire
COPY --from=build /opt/kenlm /opt/kenlm
COPY --from=build /opt/intel /opt/intel
COPY --from=build /opt/boost /opt/boost
COPY --from=build /opt/flashlight /opt/flashlight

ENV KENLM_ROOT_DIR=/opt/kenlm
ENV MKLROOT="/opt/intel/mkl"
ENV LD_LIBRARY_PATH="/opt/boost/lib:$LD_LIBRARY_PATH"
261 changes: 151 additions & 110 deletions .docker/Dockerfile-CPU-Base
Original file line number Diff line number Diff line change
Expand Up @@ -3,38 +3,144 @@
# ------------------------------------------------------------------
# Ubuntu 18.04
# OpenMPI latest (apt)
# cmake 3.10 (git)
# MKL 2018.4-057 (apt)
# arrayfire 3.7.1 (git, CPU backend)
# oneDNN 2.0 (git)
# Gloo b7e0906 (git)
# libsndfile 4bdd741 (git)
# cmake 3.10 (oficial binaries)
# MKL 2020.4-912 (apt)
# arrayfire 3.7.3 (git, CPU backend)
# libsndfile latest (apt, v1.0.28-4)
# oneDNN master (git)
# Gloo 1da2117 (git)
# FFTW latest (apt)
# KenLM 4a27753 (git)
# GLOG latest (apt)
# gflags latest (apt)
# python 3.6 (apt)
# python3 latest (apt)
# boost 1.75.0 (source)
# ==================================================================

FROM ubuntu:18.04
FROM ubuntu:18.04 as cpu_base_builder

ENV APT_INSTALL="apt-get install -y --no-install-recommends"
ENV MKLROOT="/opt/intel/mkl"
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
wget \
curl \
git \
g++
# for cmake
#zlib1g-dev libcurl4-openssl-dev \
# for MKL
#apt-transport-https \
#gpg-agent gnupg2 \
# ssh for OpenMPI
#openssh-server openssh-client \
## gflags
#libgflags-dev libgflags2.2 \
## for glog
#libgoogle-glog-dev libgoogle-glog0v5 \

# Install boost 1.75
RUN cd /tmp && \
curl -sSLo - https://dl.bintray.com/boostorg/release/1.75.0/source/boost_1_75_0.tar.bz2 | tar --bzip2 -x && \
cd boost_1_75_0 && \
./bootstrap.sh --prefix=/opt/boost && \
./b2 && \
./b2 install
ENV LD_LIBRARY_PATH="/opt/boost/lib:$LD_LIBRARY_PATH"

# cmake 3.10
RUN curl -sSLo - https://cmake.org/files/v3.10/cmake-3.10.3-Linux-x86_64.tar.gz | tar -xz -C /opt/cmake --strip-components 1
ENV PATH="/opt/cmake/bin:$PATH"
# ==================================================================
# arrayfire with CPU backend https://github.com/arrayfire/arrayfire/wiki/Build-Instructions-for-Linux
# ------------------------------------------------------------------
FROM cpu_base_builder as cpu_arrayfire

RUN apt-get update && apt-get install -y --no-install-recommends \
# OpenBLAS
libopenblas-dev liblapacke-dev \
# ATLAS
libatlas3-base libatlas-base-dev liblapacke-dev \
# FFTW
libfftw3-dev

# Install ArrayFire
RUN cd /tmp && \
git clone --branch v3.7.3 --depth 1 --recursive --shallow-submodules https://github.com/arrayfire/arrayfire.git && \
mkdir -p arrayfire/build && \
cd arrayfire/build && \
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/arrayfire -DAF_BUILD_CUDA=OFF -DAF_BUILD_CPU=ON -DAF_BUILD_OPENCL=OFF -DAF_BUILD_EXAMPLES=OFF -DBUILD_TESTING=OFF -DAF_WITH_IMAGEIO=OFF .. && \
make -j$(nproc) && \
make install -j$(nproc)

# ==================================================================
# oneDNN https://github.com/oneapi-src/oneDNN#requirements-for-building-from-source
# ------------------------------------------------------------------
FROM cpu_base_builder as cpu_onednn

#Install OpenMP and TBB
RUN apt-get update && apt-get install -y --no-install-recommends \
ocl-icd-opencl-dev libtbb-dev

RUN cd /tmp && \
git clone --branch v2.0 --depth 1 https://github.com/oneapi-src/onednn.git && \
mkdir -p onednn/build && \
cd onednn/build && \
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/onednn -DWITH_EXAMPLE=OFF -DWITH_TEST=OFF .. && \
make -j$(nproc) && \
make install -j$(nproc)
# ==================================================================
# Gloo https://github.com/facebookincubator/gloo.git
# ------------------------------------------------------------------
FROM cpu_base_builder as cpu_gloo

# Install OpenMPI
RUN apt-get update && apt-get install -y --no-install-recommends \
libopenmpi-dev

RUN cd /tmp && \
git clone --depth 1 https://github.com/facebookincubator/gloo.git && \
mkdir -p gloo/build && \
cd gloo/build && \
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/gloo -DUSE_MPI=ON .. && \
make -j$(nproc) && \
make install -j$(nproc)
# ==================================================================
# KenLM https://github.com/kpu/kenlm/blob/master/BUILDING
# ------------------------------------------------------------------
FROM cpu_base_builder as cpu_kenlm

# Instal gzip, bz2 and xz
RUN apt-get update && apt-get install -y --no-install-recommends \
zlib1g-dev libbz2-dev liblzma-dev

RUN cd /tmp && \
git clone --depth 1 https://github.com/kpu/kenlm.git && \
mkdir -p kenlm/build && \
cd kenlm/build && \
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/kenlm -DCMAKE_POSITION_INDEPENDENT_CODE=ON .. && \
make -j$(nproc) && \
make install -j$(nproc)



#############################################################################
# SECOND STAGE #
#############################################################################

FROM cpu_base_builder as final

# install dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
curl \
git \
vim \
emacs \
nano \
htop \
g++ \
# for cmake
zlib1g-dev libcurl4-openssl-dev \
# for MKL
apt-transport-https \
gpg-agent gnupg2 \
apt-transport-https gpg-agent gnupg2 \
# for arrayfire CPU backend
# OpenBLAS
libopenblas-dev libfftw3-dev liblapacke-dev \
Expand All @@ -44,108 +150,43 @@ RUN apt-get update && \
openssh-server openssh-client \
# OpenMPI
libopenmpi-dev libomp-dev openmpi-bin \
# for libsndfile
autoconf automake autogen build-essential libasound2-dev \
libflac-dev libogg-dev libtool libvorbis-dev pkg-config python \
# libsndfile
libsndfile1-dev \
# for libsndfile for ubuntu 18.04
libopus-dev \
# FFTW
libfftw3-dev \
# for kenlm
zlib1g-dev libbz2-dev liblzma-dev libboost-all-dev \
zlib1g-dev libbz2-dev liblzma-dev \
# gflags
libgflags-dev libgflags2.2 \
# for glog
libgoogle-glog-dev libgoogle-glog0v5 \
# for python sox
sox
# ==================================================================
# cmake 3.10
# ------------------------------------------------------------------
RUN apt-get purge -y cmake && \
# for cmake
DEBIAN_FRONTEND=noninteractive $APT_INSTALL zlib1g-dev libcurl4-openssl-dev && \
cd /tmp && wget https://cmake.org/files/v3.10/cmake-3.10.3.tar.gz && \
tar -xzvf cmake-3.10.3.tar.gz && cd cmake-3.10.3 && \
./bootstrap --system-curl && \
make -j$(nproc) && make install && cmake --version
# ==================================================================
# MKL https://software.intel.com/en-us/mkl
# ------------------------------------------------------------------
RUN cd /tmp && wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB && \
apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB && \
# for build in March 2020 this row should be removed to prevent from the error
# E: Failed to fetch https://apt.repos.intel.com/intelpython/binary/Packages Writing more data than expected (15520 > 15023) E: Some index files failed to download. They have been ignored, or old ones used instead.
# wget https://apt.repos.intel.com/setup/intelproducts.list -O /etc/apt/sources.list.d/intelproducts.list && \
sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list' && \
apt-get update && DEBIAN_FRONTEND=noninteractive $APT_INSTALL intel-mkl-64bit-2018.4-057
# ==================================================================
# arrayfire with CPU backend https://github.com/arrayfire/arrayfire/wiki/
# ------------------------------------------------------------------
RUN cd /tmp && git clone --recursive https://github.com/arrayfire/arrayfire.git && \
wget https://dl.bintray.com/boostorg/release/1.70.0/source/boost_1_70_0.tar.gz && tar xf boost_1_70_0.tar.gz && \
cd arrayfire && git checkout v3.7.1 && git submodule update --init --recursive && \
mkdir build && cd build && \
CXXFLAGS=-DOS_LNX cmake .. -DCMAKE_BUILD_TYPE=Release -DAF_BUILD_CUDA=OFF -DAF_BUILD_OPENCL=OFF -DAF_BUILD_EXAMPLES=OFF -DBOOST_INCLUDEDIR=/tmp/boost_1_70_0 && \
make -j$(nproc) && make install
# ==================================================================
# oneDNN https://github.com/oneapi-src/oneDNN
# ------------------------------------------------------------------
RUN cd /tmp && git clone https://github.com/oneapi-src/oneDNN && \
cd oneDNN && git checkout v2.0 && mkdir -p build && cd build && \
cmake .. && \
make -j$(nproc) && make install
# ==================================================================
# Gloo https://github.com/facebookincubator/gloo.git
# ------------------------------------------------------------------
RUN cd /tmp && git clone https://github.com/facebookincubator/gloo.git && \
cd gloo && git checkout b7e0906 && mkdir build && cd build && \
cmake .. -DUSE_MPI=ON && \
make -j$(nproc) && make install
# ==================================================================
# python (for bindings)
# ------------------------------------------------------------------
RUN PIP_INSTALL="python3 -m pip --no-cache-dir install --upgrade" && \
DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
software-properties-common \
&& \
add-apt-repository ppa:deadsnakes/ppa && \
apt-get update && \
DEBIAN_FRONTEND=noninteractive $APT_INSTALL \
python3.6 \
python3.6-dev \
&& \
wget -O ~/get-pip.py \
https://bootstrap.pypa.io/get-pip.py && \
python3.6 ~/get-pip.py && \
ln -s /usr/bin/python3.6 /usr/local/bin/python3 && \
ln -s /usr/bin/python3.6 /usr/local/bin/python && \
$PIP_INSTALL \
setuptools \
&& \
$PIP_INSTALL \
numpy \
# gtest
libgtest-dev \
# for receipts data processing
sox \
tqdm
# ==================================================================
# libsndfile https://github.com/erikd/libsndfile.git
# ------------------------------------------------------------------
RUN cd /tmp && git clone https://github.com/erikd/libsndfile.git && \
cd libsndfile && git checkout 4bdd7414602946a18799b514001b0570e8693a47 && \
./autogen.sh && ./configure --enable-werror && \
make && make check && make install
# for python
python3-dev python3-pip python3-distutils && \
apt-get clean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt/lists/*

# ==================================================================
# KenLM https://github.com/kpu/kenlm
# python (for bindings)
# ------------------------------------------------------------------
RUN cd /tmp && git clone https://github.com/kpu/kenlm.git && \
cd kenlm && git checkout 4a277534fd33da323205e6ec256e8fd0ff6ee6fa && \
mkdir build && cd build && \
cmake .. -DCMAKE_POSITION_INDEPENDENT_CODE=ON && \
make -j$(nproc) && make install
RUN ln -s /usr/bin/python3 /usr/local/bin/python && \
python3 -m pip --no-cache-dir install --upgrade setuptools numpy sox tqdm

# ==================================================================
# config & cleanup
# Intel MKL https://software.intel.com/en-us/mkl
# ------------------------------------------------------------------
RUN ldconfig && \
apt-get clean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt/lists/* /tmp/*
RUN cd /tmp && curl -sSLo - https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB | apt-key add && \
sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list' && \
apt-get update && apt-get install -y intel-mkl-64bit-2020.4-912
ENV MKLROOT="/opt/intel/mkl"

COPY --from=cpu_arrayfire /opt/arrayfire /opt/arrayfire
COPY --from=cpu_kenlm /opt/kenlm /opt/kenlm
COPY --from=cpu_gloo /opt/gloo /opt/gloo
COPY --from=cpu_onednn /opt/onednn /opt/onednn
Loading

0 comments on commit 1f002f0

Please sign in to comment.