Skip to content

Commit

Permalink
[SPARK-46005][INFRA] Upgrade scikit-learn to 1.3.2 and mlflow to …
Browse files Browse the repository at this point in the history
…2.8.1

### What changes were proposed in this pull request?
1, upgrade `scikit-learn` to the latest 1.3.2
2, sepcify the lower bound of `mlflow` (`mlflow==2.8.1` was already used in CI)

### Why are the changes needed?
`scikit-learn` was pinned in apache#39467 due to a [mlflow issue](apache#39467 (comment))

seems it has been already fixed

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#43905 from zhengruifeng/infra_upgrade_sklearn.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
zhengruifeng authored and dongjoon-hyun committed Nov 20, 2023
1 parent 9c8d418 commit 8dcdac2
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions dev/infra/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ RUN Rscript -e "devtools::install_version('preferably', version='0.4', repos='ht
ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"

RUN pypy3 -m pip install numpy 'pandas<=2.1.3' scipy coverage matplotlib
RUN python3.9 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
RUN python3.9 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'

# Add Python deps for Spark Connect.
RUN python3.9 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
Expand All @@ -110,7 +110,7 @@ RUN apt-get update && apt-get install -y \
python3.10 python3.10-distutils \
&& rm -rf /var/lib/apt/lists/*
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
RUN python3.10 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
RUN python3.10 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
RUN python3.10 -m pip install 'torch<=2.0.1' torchvision --index-url https://download.pytorch.org/whl/cpu
RUN python3.10 -m pip install torcheval
Expand All @@ -122,7 +122,7 @@ RUN apt-get update && apt-get install -y \
python3.11 python3.11-distutils \
&& rm -rf /var/lib/apt/lists/*
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11
RUN python3.11 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
RUN python3.11 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn>=1.3.2'
RUN python3.11 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'
RUN python3.11 -m pip install 'torch<=2.0.1' torchvision --index-url https://download.pytorch.org/whl/cpu
RUN python3.11 -m pip install torcheval
Expand Down

0 comments on commit 8dcdac2

Please sign in to comment.