Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor changes to use the latest spark binary - thanks #3

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
FROM ubuntu:14.04


MAINTAINER Pakhomov Egor <[email protected]>

RUN apt-get -y update
Expand All @@ -10,21 +11,40 @@ RUN /bin/echo debconf shared/accepted-oracle-license-v1-1 select true | /usr/bin
RUN DEBIAN_FRONTEND=noninteractive apt-get -y install oracle-java7-installer oracle-java7-set-default

RUN apt-get -y install curl
RUN curl -s http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0-bin-hadoop2.4.tgz | tar -xz -C /usr/local/
RUN cd /usr/local && ln -s spark-1.3.0-bin-hadoop2.4 spark
RUN apt-get -y update
RUN apt-get install -y python-numpy python-pandas
RUN apt-get install -y python-pip
RUN apt-get install -y libopenblas-dev liblapack-dev liblapacke-dev libatlas-base-dev libatlas-dev


RUN pip install requests
RUN pip install boto


RUN curl -s http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz | tar -xz -C /usr/local/
RUN cd /usr/local && ln -s spark-1.5.1-bin-hadoop2.6 spark
ADD scripts/start-master.sh /start-master.sh
ADD scripts/start-worker /start-worker.sh

ADD scripts/spark-shell.sh /spark-shell.sh
ADD scripts/spark-defaults.conf /spark-defaults.conf
ADD scripts/remove_alias.sh /remove_alias.sh
ENV SPARK_HOME /usr/local/spark


RUN cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh
RUN /bin/echo "export SPARK_WORKER_INSTANCES=2" >> $SPARK_HOME/conf/spark-env.sh
RUN ln -s $SPARK_HOME/sbin/start-slaves.sh /start-slaves.sh
RUN ln -s $SPARK_HOME/sbin/start-slave.sh /start-slave.sh


ENV SPARK_MASTER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
ENV SPARK_WORKER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"

ENV SPARK_MASTER_PORT 7077
ENV SPARK_MASTER_WEBUI_PORT 8080
ENV SPARK_WORKER_PORT 8888
ENV SPARK_WORKER_WEBUI_PORT 8081
ENV SPARK_WORKER_WEBUI_PORT 9091

EXPOSE 8080 7077 8888 9091 4040 7001 7002 7003 7004 7005 7006

EXPOSE 8080 7077 8888 8081 4040 7001 7002 7003 7004 7005 7006
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ To run worker execute:
```
./start-worker.sh
```
or start 2 slaves

$SPARK_HOME/sbin/start-slave.sh $SPARK_MASTER_IP:7077

You can run multiple workers. Every worker would be able to find master by it's container name "spark_master".

To run spark shell against this cluster execute:
Expand Down
2 changes: 1 addition & 1 deletion scripts/spark-shell.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ export SPARK_LOCAL_IP=`awk 'NR==1 {print $1}' /etc/hosts`
/remove_alias.sh # problems with hostname alias, see https://issues.apache.org/jira/browse/SPARK-6680
cd /usr/local/spark
./bin/spark-shell \
--master spark://${SPARK_MASTER_PORT_7077_TCP_ADDR}:${SPARK_MASTER_ENV_SPARK_MASTER_PORT} \
--master spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT} \
-i ${SPARK_LOCAL_IP} \
--properties-file /spark-defaults.conf \
"$@"
4 changes: 2 additions & 2 deletions start-master.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/usr/bin/env bash
docker pull epahomov/docker-spark
docker run -d -t -P --name spark_master epahomov/docker-spark /start-master.sh "$@"
docker pull meyerson/docker-spark
docker run -d -t -P --name spark_master meyerson/docker-spark /start-master.sh "$@"
4 changes: 2 additions & 2 deletions start-worker.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/usr/bin/env bash
docker pull epahomov/docker-spark
docker run -d -t -P --link spark_master:spark_master epahomov/docker-spark /start-worker.sh "$@"
docker pull meyerson/docker-spark
docker run -d -t -P --link spark_master:spark_master meyerson/docker-spark /start-worker.sh "$@"