Skip to content
This repository has been archived by the owner on Jan 10, 2019. It is now read-only.

accept offers for default role * in addition to the specified mesosRole #124

Merged

Conversation

flosell
Copy link
Contributor

@flosell flosell commented Jul 25, 2015

Currently, when specifying a role using CASSANDRA_FRAMEWORK_MESOS_ROLE offers for the default * role are ignored.

Marathon and Chronos (since mesos/chronos#462) accept * and the specified mesosRole in this case.
This makes it possible to share common resources but keep some resources exclusively. E.g. to run Marathon and cassandra-mesos, slaves could offer mem, cpu and disk with role * but the cassandra ports only for role cassandra.

This pull request adds this feature. What do you think?

@BenWhitehead
Copy link
Contributor

Thanks for submitting @flosell, feedback to follow shortly.

@@ -172,6 +180,83 @@ public static CassandraNodeTask getTaskForNode(@NotNull final CassandraNode cass
return builder;
}

@NotNull
public static Function<Resource, TreeSet<Long>> resourceToPortSet() {
return new Function<Resource, TreeSet<Long>>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a private static final inner class of this function with a singleton instance returned by this method.

See https://github.com/mesosphere/cassandra-mesos/blob/master/cassandra-mesos-model/src/main/java/io/mesosphere/mesos/util/CassandraFrameworkProtosUtils.java#L176 for an example of how this had been done previously.

@BenWhitehead
Copy link
Contributor

I was able to come up with a scenario that fails when ran against mesos 0.23.0. Details below.

Start a slave with only a subset of the ports dedicated to the configured role (cass).

It looks like the port is being claimed for the incorrect role * instead of cass.

Relevant Scheduler Logs
2015-07-27 18:57:27,032 DEBUG [Thread-48] i.m.m.f.c.s.CassandraScheduler - {} > resourceOffers(driver : org.apache.mesos.MesosSchedulerDriver@19675d97, offers : [id { value: "20150727-185130-16777343-5050-21272-O42" } framework_id { value: "20150727-185130-16777343-5050-21272-0000" } slave_id { value: "20150727-185130-16777343-5050-21272-S0" } hostname: "localhost" resources { name: "ports" type: RANGES ranges { range { begin: 1025 end: 2180 } range { begin: 2182 end: 3887 } range { begin: 3889 end: 5049 } range { begin: 5052 end: 8079 } range { begin: 8082 end: 9159 } range { begin: 9161 end: 65535 } } role: "*" } resources { name: "cpus" type: SCALAR scalar { value: 7.8 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 14333.0 } role: "*" } resources { name: "disk" type: SCALAR scalar { value: 224538.0 } role: "*" } resources { name: "ports" type: RANGES ranges { range { begin: 9160 end: 9160 } } role: "cass" } executor_ids { value: "cassandra.ben.node.0.executor" } ])
2015-07-27 18:57:27,033 DEBUG [Thread-48] i.m.m.f.c.s.CassandraScheduler - {offerId:20150727-185130-16777343-5050-21272-O42,hostname:localhost} > evaluateOffer(driver : id { value: "20150727-185130-16777343-5050-21272-O42" } framework_id { value: "20150727-185130-16777343-5050-21272-0000" } slave_id { value: "20150727-185130-16777343-5050-21272-S0" } hostname: "localhost" resources { name: "ports" type: RANGES ranges { range { begin: 1025 end: 2180 } range { begin: 2182 end: 3887 } range { begin: 3889 end: 5049 } range { begin: 5052 end: 8079 } range { begin: 8082 end: 9159 } range { begin: 9161 end: 65535 } } role: "*" } resources { name: "cpus" type: SCALAR scalar { value: 7.8 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 14333.0 } role: "*" } resources { name: "disk" type: SCALAR scalar { value: 224538.0 } role: "*" } resources { name: "ports" type: RANGES ranges { range { begin: 9160 end: 9160 } } role: "cass" } executor_ids { value: "cassandra.ben.node.0.executor" }, offer : {})
2015-07-27 18:57:27,033 DEBUG [Thread-48] i.m.m.f.c.scheduler.CassandraCluster - {offerId:20150727-185130-16777343-5050-21272-O42,hostname:localhost} > getTasksForOffer(offer : id { value: "20150727-185130-16777343-5050-21272-O42" } framework_id { value: "20150727-185130-16777343-5050-21272-0000" } slave_id { value: "20150727-185130-16777343-5050-21272-S0" } hostname: "localhost" resources { name: "ports" type: RANGES ranges { range { begin: 1025 end: 2180 } range { begin: 2182 end: 3887 } range { begin: 3889 end: 5049 } range { begin: 5052 end: 8079 } range { begin: 8082 end: 9159 } range { begin: 9161 end: 65535 } } role: "*" } resources { name: "cpus" type: SCALAR scalar { value: 7.8 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 14333.0 } role: "*" } resources { name: "disk" type: SCALAR scalar { value: 224538.0 } role: "*" } resources { name: "ports" type: RANGES ranges { range { begin: 9160 end: 9160 } } role: "cass" } executor_ids { value: "cassandra.ben.node.0.executor" })
2015-07-27 18:57:27,033 DEBUG [Thread-48] i.m.m.f.c.scheduler.CassandraCluster - {offerId:20150727-185130-16777343-5050-21272-O42,hostname:localhost} Attempting to launch server task for node.
2015-07-27 18:57:27,109 DEBUG [Thread-48] i.m.m.f.c.scheduler.CassandraCluster - {offerId:20150727-185130-16777343-5050-21272-O42,hostname:localhost} < getTasksForOffer(offer : id { value: "20150727-185130-16777343-5050-21272-O42" } framework_id { value: "20150727-185130-16777343-5050-21272-0000" } slave_id { value: "20150727-185130-16777343-5050-21272-S0" } hostname: "localhost" resources { name: "ports" type: RANGES ranges { range { begin: 1025 end: 2180 } range { begin: 2182 end: 3887 } range { begin: 3889 end: 5049 } range { begin: 5052 end: 8079 } range { begin: 8082 end: 9159 } range { begin: 9161 end: 65535 } } role: "*" } resources { name: "cpus" type: SCALAR scalar { value: 7.8 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 14333.0 } role: "*" } resources { name: "disk" type: SCALAR scalar { value: 224538.0 } role: "*" } resources { name: "ports" type: RANGES ranges { range { begin: 9160 end: 9160 } } role: "cass" } executor_ids { value: "cassandra.ben.node.0.executor" }) = {}, {}
2015-07-27 18:57:27,126 DEBUG [Thread-48] i.m.m.f.c.s.CassandraScheduler - {offerId:20150727-185130-16777343-5050-21272-O42,hostname:localhost} Launching task CASSANDRA_SERVER_RUN in executor cassandra.ben.node.0.executor. Details = name: "cassandra.ben.node" task_id { value: "cassandra.ben.node.0.executor.server" } slave_id { value: "20150727-185130-16777343-5050-21272-S0" } resources { name: "cpus" type: SCALAR scalar { value: 0.1 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 768.0 } role: "*" } resources { name: "disk" type: SCALAR scalar { value: 16.0 } role: "*" } resources { name: "ports" type: RANGES ranges { range { begin: 9042 end: 9042 } range { begin: 9160 end: 9160 } range { begin: 7199 end: 7199 } range { begin: 7001 end: 7001 } range { begin: 7000 end: 7000 } } role: "*" } executor { executor_id { value: "cassandra.ben.node.0.executor" } resources { name: "cpus" type: SCALAR scalar { value: 0.1 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 384.0 } role: "*" } resources { name: "disk" type: SCALAR scalar { value: 256.0 } role: "*" } command { uris { value: "http://bwN56.dev.mesosphere.io:18080/jre-7-linux.tar.gz" extract: true } uris { value: "http://bwN56.dev.mesosphere.io:18080/apache-cassandra-2.1.4-bin.tar.gz" extract: true } uris { value: "http://bwN56.dev.mesosphere.io:18080/cassandra-executor.jar" extract: false } environment { variables { name: "JAVA_OPTS" value: "-Xms256m -Xmx256m" } } value: "$(pwd)/jre*/bin/java -XX:+PrintCommandLineFlags $JAVA_OPTS -classpath cassandra-executor.jar io.mesosphere.mesos.frameworks.cassandra.executor.CassandraExecutor" } name: "cassandra.ben.node.0.executor" source: "java" } data: "\b\002\032\316\004\n\0052.1.4\022$apache-cassandra-2.1.4/bin/cassandra\022\002-f\032\207\004\022x\n\020\n\tLOCAL_JMX\022\003yes\n%\n\037CASSANDRA_JMX_NO_AUTHENTICATION\022\002no\n\021\n\bJMX_PORT\022\00556428\n\025\n\rMAX_HEAP_SIZE\022\004384m\n\023\n\fHEAP_NEWSIZE\022\00310m\032\376\002\n\035\n\fcluster_name\022\rcassandra.ben\n\036\n\021broadcast_address\022\t127.0.0.1\n\030\n\vrpc_address\022\t127.0.0.1\n\033\n\016listen_address\022\t127.0.0.1\n\021\n\fstorage_port\030\3306\n\025\n\020ssl_storage_port\030\3316\n\032\n\025native_transport_port\030\322F\n\r\n\brpc_port\030\310G\n\022\n\005seeds\022\t127.0.0.1\n.\n\017endpoint_snitch\022\033GossipingPropertyFileSnitch\n\037\n\025data_file_directories\"\006./data\n\"\n\023commitlog_directory\022\v./commitlog\n(\n\026saved_caches_directory\022\016./saved_caches\"\n\n\003ben\022\003ben\"\017\b\354\270\003\022\t127.0.0.10\017"
2015-07-27 18:57:27,127 DEBUG [Thread-48] i.m.m.f.c.s.CassandraScheduler - {offerId:20150727-185130-16777343-5050-21272-O42,hostname:localhost} < evaluateOffer(driver : id { value: "20150727-185130-16777343-5050-21272-O42" } framework_id { value: "20150727-185130-16777343-5050-21272-0000" } slave_id { value: "20150727-185130-16777343-5050-21272-S0" } hostname: "localhost" resources { name: "ports" type: RANGES ranges { range { begin: 1025 end: 2180 } range { begin: 2182 end: 3887 } range { begin: 3889 end: 5049 } range { begin: 5052 end: 8079 } range { begin: 8082 end: 9159 } range { begin: 9161 end: 65535 } } role: "*" } resources { name: "cpus" type: SCALAR scalar { value: 7.8 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 14333.0 } role: "*" } resources { name: "disk" type: SCALAR scalar { value: 224538.0 } role: "*" } resources { name: "ports" type: RANGES ranges { range { begin: 9160 end: 9160 } } role: "cass" } executor_ids { value: "cassandra.ben.node.0.executor" }, offer : [name: "cassandra.ben.node" task_id { value: "cassandra.ben.node.0.executor.server" } slave_id { value: "20150727-185130-16777343-5050-21272-S0" } resources { name: "cpus" type: SCALAR scalar { value: 0.1 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 768.0 } role: "*" } resources { name: "disk" type: SCALAR scalar { value: 16.0 } role: "*" } resources { name: "ports" type: RANGES ranges { range { begin: 9042 end: 9042 } range { begin: 9160 end: 9160 } range { begin: 7199 end: 7199 } range { begin: 7001 end: 7001 } range { begin: 7000 end: 7000 } } role: "*" } executor { executor_id { value: "cassandra.ben.node.0.executor" } resources { name: "cpus" type: SCALAR scalar { value: 0.1 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 384.0 } role: "*" } resources { name: "disk" type: SCALAR scalar { value: 256.0 } role: "*" } command { uris { value: "http://bwN56.dev.mesosphere.io:18080/jre-7-linux.tar.gz" extract: true } uris { value: "http://bwN56.dev.mesosphere.io:18080/apache-cassandra-2.1.4-bin.tar.gz" extract: true } uris { value: "http://bwN56.dev.mesosphere.io:18080/cassandra-executor.jar" extract: false } environment { variables { name: "JAVA_OPTS" value: "-Xms256m -Xmx256m" } } value: "$(pwd)/jre*/bin/java -XX:+PrintCommandLineFlags $JAVA_OPTS -classpath cassandra-executor.jar io.mesosphere.mesos.frameworks.cassandra.executor.CassandraExecutor" } name: "cassandra.ben.node.0.executor" source: "java" } data: "\b\002\032\316\004\n\0052.1.4\022$apache-cassandra-2.1.4/bin/cassandra\022\002-f\032\207\004\022x\n\020\n\tLOCAL_JMX\022\003yes\n%\n\037CASSANDRA_JMX_NO_AUTHENTICATION\022\002no\n\021\n\bJMX_PORT\022\00556428\n\025\n\rMAX_HEAP_SIZE\022\004384m\n\023\n\fHEAP_NEWSIZE\022\00310m\032\376\002\n\035\n\fcluster_name\022\rcassandra.ben\n\036\n\021broadcast_address\022\t127.0.0.1\n\030\n\vrpc_address\022\t127.0.0.1\n\033\n\016listen_address\022\t127.0.0.1\n\021\n\fstorage_port\030\3306\n\025\n\020ssl_storage_port\030\3316\n\032\n\025native_transport_port\030\322F\n\r\n\brpc_port\030\310G\n\022\n\005seeds\022\t127.0.0.1\n.\n\017endpoint_snitch\022\033GossipingPropertyFileSnitch\n\037\n\025data_file_directories\"\006./data\n\"\n\023commitlog_directory\022\v./commitlog\n(\n\026saved_caches_directory\022\016./saved_caches\"\n\n\003ben\022\003ben\"\017\b\354\270\003\022\t127.0.0.10\017" ]) = {}
2015-07-27 18:57:27,128 DEBUG [Thread-48] i.m.m.f.c.s.CassandraScheduler - {} < resourceOffers(driver : org.apache.mesos.MesosSchedulerDriver@19675d97, offers : [id { value: "20150727-185130-16777343-5050-21272-O42" } framework_id { value: "20150727-185130-16777343-5050-21272-0000" } slave_id { value: "20150727-185130-16777343-5050-21272-S0" } hostname: "localhost" resources { name: "ports" type: RANGES ranges { range { begin: 1025 end: 2180 } range { begin: 2182 end: 3887 } range { begin: 3889 end: 5049 } range { begin: 5052 end: 8079 } range { begin: 8082 end: 9159 } range { begin: 9161 end: 65535 } } role: "*" } resources { name: "cpus" type: SCALAR scalar { value: 7.8 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 14333.0 } role: "*" } resources { name: "disk" type: SCALAR scalar { value: 224538.0 } role: "*" } resources { name: "ports" type: RANGES ranges { range { begin: 9160 end: 9160 } } role: "cass" } executor_ids { value: "cassandra.ben.node.0.executor" } ])
2015-07-27 18:57:27,129 DEBUG [Thread-49] i.m.m.f.c.s.CassandraScheduler - {taskId:cassandra.ben.node.0.executor.server} > statusUpdate(driver : org.apache.mesos.MesosSchedulerDriver@19675d97, status : task_id { value: "cassandra.ben.node.0.executor.server" } state: TASK_ERROR message: "Task uses more resources cpus(*):0.1; mem(*):768; disk(*):16; ports(*):[9042-9042, 9160-9160, 7199-7199, 7001-7001, 7000-7000] than available ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-9159, 9161-65535]; cpus(*):7.8; mem(*):14333; disk(*):224538; ports(cass):[9160-9160]" slave_id { value: "20150727-185130-16777343-5050-21272-S0" } timestamp: 1.438048647129078E9 source: SOURCE_MASTER reason: REASON_TASK_INVALID)
2015-07-27 18:57:27,129 ERROR [Thread-49] i.m.m.f.c.s.CassandraScheduler - {taskId:cassandra.ben.node.0.executor.server} Got status TASK_ERROR for task cassandra.ben.node.0.executor.server, executor  (REASON_TASK_INVALID, healthy=false): Task uses more resources cpus(*):0.1; mem(*):768; disk(*):16; ports(*):[9042-9042, 9160-9160, 7199-7199, 7001-7001, 7000-7000] than available ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-9159, 9161-65535]; cpus(*):7.8; mem(*):14333; disk(*):224538; ports(cass):[9160-9160]
2015-07-27 18:57:27,129 DEBUG [Thread-49] i.m.m.f.c.scheduler.CassandraCluster - {} > recordHealthCheck(executorId : cassandra.ben.node.0.executor, details : healthy: false msg: "Removing Cassandra server task cassandra.ben.node.0.executor.server. Reason=REASON_TASK_INVALID, source=SOURCE_MASTER, message=\"Task uses more resources cpus(*):0.1; mem(*):768; disk(*):16; ports(*):[9042-9042, 9160-9160, 7199-7199, 7001-7001, 7000-7000] than available ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-9159, 9161-65535]; cpus(*):7.8; mem(*):14333; disk(*):224538; ports(cass):[9160-9160]\"")
2015-07-27 18:57:27,129 INFO  [Thread-49] i.m.m.f.c.scheduler.CassandraCluster - {} health check result unhealthy for node: cassandra.ben.node.0.executor. Message: 'Removing Cassandra server task cassandra.ben.node.0.executor.server. Reason=REASON_TASK_INVALID, source=SOURCE_MASTER, message="Task uses more resources cpus(*):0.1; mem(*):768; disk(*):16; ports(*):[9042-9042, 9160-9160, 7199-7199, 7001-7001, 7000-7000] than available ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-9159, 9161-65535]; cpus(*):7.8; mem(*):14333; disk(*):224538; ports(cass):[9160-9160]"'
2015-07-27 18:57:27,167 DEBUG [Thread-49] i.m.m.f.c.scheduler.CassandraCluster - {} < recordHealthCheck(executorId : cassandra.ben.node.0.executor, details : healthy: false msg: "Removing Cassandra server task cassandra.ben.node.0.executor.server. Reason=REASON_TASK_INVALID, source=SOURCE_MASTER, message=\"Task uses more resources cpus(*):0.1; mem(*):768; disk(*):16; ports(*):[9042-9042, 9160-9160, 7199-7199, 7001-7001, 7000-7000] than available ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-9159, 9161-65535]; cpus(*):7.8; mem(*):14333; disk(*):224538; ports(cass):[9160-9160]\"")
2015-07-27 18:57:27,203 TRACE [Thread-49] i.m.m.f.c.s.CassandraScheduler - {taskId:cassandra.ben.node.0.executor.server} < statusUpdate(driver : org.apache.mesos.MesosSchedulerDriver@19675d97, status : task_id { value: "cassandra.ben.node.0.executor.server" } state: TASK_ERROR message: "Task uses more resources cpus(*):0.1; mem(*):768; disk(*):16; ports(*):[9042-9042, 9160-9160, 7199-7199, 7001-7001, 7000-7000] than available ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-9159, 9161-65535]; cpus(*):7.8; mem(*):14333; disk(*):224538; ports(cass):[9160-9160]" slave_id { value: "20150727-185130-16777343-5050-21272-S0" } timestamp: 1.438048647129078E9 source: SOURCE_MASTER reason: REASON_TASK_INVALID)

Mesos Master
./mesos-master.sh \                                                                                                                                                                                                                                                                     
    --zk=zk://localhost:2181/mesos \                                                                                                                                                                                                                                                      
    --log_dir=$HOME/opt/mesos/work/master/log_dir \                                                                                                                                                                                                                                       
    --work_dir=$HOME/opt/mesos/work/master/work_dir \                                                                                                                                                                                                                                     
    --quorum=1 \                                                                                                                                                                                                                                                                          
    --cluster=bwN56 \
    --roles=cass
Mesos Slave
  sudo ./mesos-slave.sh \                                                                                                                                                                                                                                                                 
    --master=zk://localhost:2181/mesos \                                                                                                                                                                                                                                                  
    --log_dir=$MESOS_HOME/work/slave/log_dir \                                                                                                                                                                                                                                            
    --work_dir=$MESOS_HOME/work/slave/work_dir \                                                                                                                                                                                                                                          
    --containerizers=docker,mesos \                                                                                                                                                                                                                                                       
    --isolation=cgroups/cpu,cgroups/mem \                                                                                                                                                                                                                                                 
    --cgroups_enable_cfs \                                                                                                                                                                                                                                                                
    --executor_registration_timeout=5mins \                                                                                                                                                                                                                                               
    --resources='ports(cass):[9160-9160];ports(*):[1025-2180,2182-3887,3889-5049,5052-8079,8082-9159,9161-65535]'

@flosell
Copy link
Contributor Author

flosell commented Jul 28, 2015

Thanks for the quick feedback!
The code style issues you commented on are fixed. I also have an idea for the scenario you mention, I'll try to fix it in the coming days.

@BenWhitehead
Copy link
Contributor

Sounds good, I'll review when you're ready.

@flosell
Copy link
Contributor Author

flosell commented Jul 31, 2015

The last commit should fix the issue you mentioned. It was previously just trying to find a role that provides all ports and falls back to * in the other case.
The fixed version tries to find a role for every port needed and then groups by role.

Can you have a second look?

@BenWhitehead
Copy link
Contributor

Thanks for the updates @flosell and adding another unit tests for the scenario.

BenWhitehead added a commit that referenced this pull request Aug 3, 2015
accept offers for default role * in addition to the specified mesosRole
@BenWhitehead BenWhitehead merged commit 4f9479b into mesosphere-backup:master Aug 3, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants