-
Notifications
You must be signed in to change notification settings - Fork 68
Cassandra tasks getting LOST status in mesos #121
Comments
thanks @AndriiOmelianenko I'll try and reproduce this in the next couple days and get back to you, I'm not sure why a spark job would kill off one of the cassandra executors. |
@BenWhitehead thanks. I'm running DCOS cluster in OpenStack environment, and everything works good until I run Spark job :) |
Logs of cassandra/hdfs/kafka frameworks doesn't say anything good, they are ending with successful there is
|
|
so this what happens when tasks are getting
|
Hmm, thanks for the additional details @AndriiOmelianenko. Can you show me the slave configuration flags that are used for the slaves? Cassandra doesn't run in a docker container but it looks like it may be the docker containerizer that is trying to run the tasks which it won't be able to do. To get the flags of the slave you can hit the mesos slave http api at I would expect to see |
@BenWhitehead yes, there is such options |
Thanks @AndriiOmelianenko. I spent some time trying to reproduce this behavior you're seeing as I've never seen it before (nor have my colleagues). I started a DCOS Cluster on AWS, then ran the following commands:
After all frameworks started their tasks and everything was healthy I ran:
The spark job completed successfully, everything else kept running and the cluster is still healthy. Can you create a gist with the following info about your cluster? Installed packages and version
Installed dcos services
|
@AndriiOmelianenko I've finally been able to reproduce this task lost issue. A fix is in PR #129 This fix will be released with version 0.2.0 that will be released soon. |
I have deployed DCOS cluster and installed here cassandra and spark.
I'm running spark job on one of masters
dcos spark run --submit-args='--class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.4.0-SNAPSHOT.jar 10'
and after it's finishs execution few cassandra executors fail. In mesos it looks like this:Spark job ran successfully: (stdout)
Some cassandra executors can't even stand up after this and keeps geting
LOST
status every few seconds with next stderr:Can anyone help me with this?
The text was updated successfully, but these errors were encountered: