Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduced ClusterManager. Externalized support for new cluster types. #3649

Merged
merged 2 commits into from
Jul 9, 2013
Merged

Introduced ClusterManager. Externalized support for new cluster types. #3649

merged 2 commits into from
Jul 9, 2013

Conversation

amitmurthy
Copy link
Contributor

Design

  • Base has only one addprocs. Signature is function addprocs(instances::Union(AbstractVector, Integer); tunnel=false, dir=JULIA_HOME, exename="./julia-release-basic", sshflags::Cmd=``, cman=nothing)
  • The first parameter can be an integer (number of procs) or array of hostnames as before.
  • Additional keyword parameter cman, an object of type ClusterManager
  • The ClusterManager is responsible for starting worker processes. It should expose a callback
    cman.launch_cb(np::Integer, config::Dict) which launches the required number or procs.
  • config parameter above has the following keys - :dir, :exename, :exeflags, :tunnel, :sshflags, :cman - the callback can use these fields to start the instance appropriately.
  • The callback should return a Tuple consisting of the response type and an array of instance information

i.e.

(:io_only, [io1, io2,..]) ||
(:io_host, [(io1, host1), (io2, host2), ...])  ||
(:io_host_port, [(io1, host1, port1), (io2, host2, port2), ...])  ||
(:host_port, [(host1, port1), (host2, port2), ...])  ||
(:cmd, [cmd1, cmd2, ...])  
  • The regular local procs and ssh procs are supported via an internal ClusterManager called RegularCluster (internal to Base)
  • code for SGE and Scyld have been moved into https://github.com/amitmurthy/ClusterManagers.jl
  • NOTE : I HAVE NOT BEEN ABLE TO TEST SCYLD AND SGE SINCE I DO NOT HAVE ACCESS FOR THE SAME.
  • I would like to transfer ownership of https://github.com/amitmurthy/ClusterManagers.jl to JuliaLang (or other regular users of such technologies), since probably more cluster managers will get added to it over time from folks who do use such clustering technologies on a regular basis.
  • addprocs now returns a list of the new process ids instead of an :ok

Closes #3549

@ViralBShah
Copy link
Member

This is great.

disable_parallel_libs should happen by default in all cases, and it could be removed from scyld. Also, this function should really be called disable_threaded_libs.

@amitmurthy
Copy link
Contributor Author

Done.

@ViralBShah
Copy link
Member

Ping @nlhepler @JeffBezanson

@amitmurthy
Copy link
Contributor Author

Have also submitted a pull request on METADATA.jl/devel for ClusterManagers.

When this pull request is merged, please also merge into METADATA

@ViralBShah
Copy link
Member

Travis error does not seem to be related to this PR. GCC passed, and cang failed on statistics.

ViralBShah added a commit that referenced this pull request Jul 9, 2013
Introduced ClusterManager. Externalized support for new cluster types.
@ViralBShah ViralBShah merged commit beeb195 into JuliaLang:master Jul 9, 2013
@nlhepler
Copy link
Contributor

nlhepler commented Jul 9, 2013

Looks good. Thanks Amit!

@amitmurthy amitmurthy deleted the amitm/cluster branch July 10, 2013 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Supporting multiple cluster schedulers
3 participants