-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binder API for specifying, launching, and pooling notebook servers. #5
Conversation
|
||
* Front end developers developing against the API (e.g. Thebe and associated contexts) | ||
* Operators (e.g. codeneuro, try.jupyter.org, mybinder) | ||
* Users (consuming kernels as developers, readers, scientists, researchers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Links to each of these examples would be nice to use for context.
@rgbkrk probably a good idea is to go into some detail about how this relates to the kernel-providers work. We don't want to end up building and maintaining two similar-but-not-quite-the-same projects. |
Below some questions, no answers. Should we rename services to resources? Mainly semantics, my thinking is that you might want to specify as a service something like:
to specify that the notebook needs a GPU as well as access to a resource called "a-very-large-storage-system". Both aren't things that need starting but they need to be available/influence the environment in which the kernel is started. How does the API know how to setup a service called "postgres"? Should the meaning of "postgres" be global or (potentially) specific to each provider of resources? Global is much, much harder to do, undecided how much value it would add. Thoughts on either? |
@betatim the definitions of resources/services could live in repos themselves. A single name like 'spark' could expand to |
Great thoughts @betatim @minrk! Just discussed with @andrewosh, a couple comments re: services / resources
|
The API looks really nice, some comments: Some of the fields in the specification file have changed (we need to update the examples in the binder repo too). Since we want to support a growing number of dependency files, we changed the That being said, if we're going to pull notebooks from a variety of sources (like @rgbkrk suggested in the Binder issue), we would want this field to exist, but with a more fleshed out schema (i.e. a Might still have a few issues with word choice =) In binder-project/binder#8 I think we were debating between And as discussed above, other key issues to tackle are:
Maybe part of this should be creating a detailed |
* `bindings` - create a new binding which is a specification/template for a collection of resources | ||
* `binders` - spawn a binder by `bindingID|Name` as specified by the binding template, list currently running binders | ||
* `pools` - pre-allocate and view details about current pools of running binders | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to get a definition and example of the concepts the resources represent. The binders resources "spawns a binder". But what's a binder? A container with a notebook server? A container with anything? Nothing to do with containers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a container that has a notebook server as the main front end with some amount of Spark, Postgres, etc. services running attached to it. We should lay out the discussions we had today about this as its much easier to reason without these attached services.
Agreement here. It seems like binders and bindings could be come more generic concepts and obviate the need for at least the container spawning parts of the kernel gateway proposal (KGP?). But I'm not quite sure how the API spec'ed here would support the use cases stated in the KGP (and maybe it's not supposed to.) Taking one for consideration: writing new web apps that use remote kernels (dashboards, interactive books, etc.). Say that the web app creator could to define a binding to create a container environment with specific libraries preinstalled for my kernel language of choice. Say that one of the libraries stated for inclusion is a (yet-to-be-written) websocket-to-0mq bridge. The backend of the web app could then use a new binder client lib to talk to the binder server to request instantiation of the binding as a binder, passing whatever auth tokens are necessary to get an instance. After getting an instance, the app could use jupyter-js-services to talk to the kernel in the launched binder container. In this particular scenario, it feels a bit clunky to have to talk to one API to get a kernel container and another to talk to the kernel itself. Contrast this with using jupyter-js-services to both request and comm with the kernel. But I suspect this is the tradeoff of a generic container launching service versus separate kernel launching and a notebook launching services. (Which is binder intended to become?) |
@betatim Interestingly, I keep calling them resources in my head and thinking it was too generic. Since I participated in plenty of bike shedding today on this, 😄, I ended up not stating it. 😉 |
Today @andrewosh, @freeman-lab, @odewahn, @zischwartz, and I iterated on this spec (and names, lots of names) to end up respeccing this into four main actions with corresponding resources:
A little bit of iteration happened in a hackpad: https://jupyter.hackpad.com/Thebe-1012015-cwvDHMfWqJG, but we should probably come back to write up some of our reasoning. At the very least, the commits are in from our collaboration and we can formalize this enhancement proposal a bit more. |
It's worth noting that a given implementation of this doesn't necessarily have to expose build and could use the |
One thing that seemed to get some consensus over 🍕 was that maybe a |
@rgbkrk I think something travis.yml-based is a great idea, since it provides an escape hatch for running code that isn't resorting to starting from scratch with a custom Dockerfile. There's also the conda-style |
POST /builds/repos HTTP 1.1 | ||
Content-Type: application/json | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense for the body here to be a little more generic, similar to the proposed binder.yml
, and let other translation tools handle logic like "if you specify a repo and a requirements.txt
file, it actually means you want to do a pip install
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is mostly in that combinatorial explosion of what "dependencies" mean as well as which Python, Ruby, Go, node, etc. a user means. We're in a bit of an insular world if we encode for Python here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, exactly what we were thinking. If we instead make this generic, other modules can handle the more language-specific translation, and as a result be more open to extension by others. If the body here was like our binder.yml
sketch, which was language-neutral, plus a few extra fields (repository
, contents
, ...) might that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah that makes sense. The YAML should directly map to JSON as well.
@rgbkrk the revision of template is nice, a couple fields like |
Yeah, I imagine |
Sorry I wasn't in NY to work on this with you. It's shaping up nicely. Given that the register API now allows for an arbitrary image and start command, I wonder if it removes the need for the kernel gateway incubator project completely. What do you think? Would it help if I tried to map the narratives from the kernel gateway proposal over to this API to see if they fit? For example, walk what would happen in the "Jupyter notebook launches remote kernel" use case complete with API calls made? If the binder proposal covers them all, then I'll wholeheartedly help out here instead of hacking away on the kernel gateway as a separate thing. I'm still a bit lost on how the |
My sense was that the kernel gateway incubator project was also about exploring ways of mapping the current Kernel APIs but with an auth layer. No extra specs on top, but certainly a way to do discovery of kernelspecs, etc. There's certainly overlap with this proposal.
That would be wonderful.
The second one. We didn't scope it out here yet (mostly because we ran out of time in NYC).
Something tells me there are going to be pieces of each that we need, particularly things like the drop-in-binaries to add to Docker images. When we started talking about these months ago, I recognized there was overlap in the proposals but wasn't sure where things would go. I imagined that a kernel gateway would actually operate in front of an API like the one specced out here. |
I think it would be great if we can get binder to provide the spec part of deployments. There's still work to do on the kernel-provider side of actually making the kernel-only services, and things like auth/cors-related configuration that will be part of a particular deployment. But I think it would be great if we could make that a use case for the specs proposed here, rather than a competing spec. Related: Is it an important or assumption that binder provides something that's at least a human-facing web server (e.g. notebook application)? Is it abusing binder to spin up 'headless' services like a kernel provider? That affects things like whether binder can redirect users to the container, etc. |
Side question: because this is a JEP, does that mean the binder project is going to move into the jupyter org? I ask because, unfortunately, where the project lives impacts whether some of us can contribute. :/ |
@freeman-lab stated a willingness to eventually move binder under Jupyter. For the time being, we can break up some of these things to be projects within the jupyter incubator. More than happy to move the binder-registry repo over, though it is nice to have a go namespace that is similar to |
Working on it. Will submit as a PR against the PR. 😱 |
I adapted the narratives from the kernel gateway proposal to the binder spec proposal. I did not get to adding the specific REST request/response payloads within the text until there's consensus that they even belong in the proposal. Along the way, I took these notes (which was the point of the exercise, I think).
|
Sorry for jumping in so late -- this is all shaping up really nicely. One thing not captured here is how to connect data sources into the container. The We'd talked about a number of other data options, but this at least seems like something we could do quickly and that would allow some semblance of sharing data. |
I'd like to standardize either on GitHub API Style Errors or JSON API errors. In the current registry PR, I'm using GitHub API Style Errors. I'll come back later today to discuss others |
In response to @parente:
I can't recall what binder-db was.
To me this is wholly unrelated to the API spec, but there are several classes of logs and metrics in production here:
I'm personally just going to use a Docker logging driver to forward logs to logstash and deal with splitting these later.
The registry is pretty lightweight. It's a glorified whitelist for images for the deploy/pool/whatever to rely on for pulling and running (which builds atop kubernetes or swarm). As for exploring the general purpose PaaS solutions, I think that should be done. There's probably some thin layer above that allows you to hit the main goals:
If someone wants to explore that, I'm certainly open to it. For the time being I'm going to be running with some of the ideas here as prototypes to deal with our currently running infrastructure but I would love to maintain less code. |
@odewahn what's the difference between data going in the container versus volumes-from? Are you expecting lots of the frontend containers to have the same dependencies yet use different data sources? If that was the case, I'd suggest building off the same image base and adding the data directly (as well as notebooks). Otherwise you also have the operational burden of dealing with Docker volumes, which don't always clean up well (you now have more steps to account for: always launching these two containers for users and always making sure to delete the volume artifacts when tearing down the image |
As long as it can be routed on a path, I think that's an accidental feature (which tmpnb has). We do make specific decisions to cater to running things like the notebook which would impact the feasibility of running other applications. Generally speaking, this is designed to be a service where a development environment is provisioned for a user on demand. On a side note, I tried running rstudio on tmpnb. It didn't work well and I couldn't figure out how to tell it to route down the assigned path. |
It would be nice to collaborate on an official one, we had not really thought about that. We were mostly aiming for CLI tools here. |
https://github.com/jupyter/configurable-http-proxy provides the actual proxy which launch uses directly.
The way tmpnb currently works is to use the first |
@parente re:
Good call, I like adding |
👍 |
This thread is a goldmine for reflecting on use cases. |
Closing as no resolution yet with a quality discussion. Thanks all. Happy to re-open again later. |
The temporary notebook system (tmpnb) was put together to solve some immediate
demands, the primary of which were:
/api/spawn
), as used by e.g. thebeIn order to make running a multi-tenant service like this simple to operate,
maintain, and extend we need a REST API that assists three classes of users:
There are four main actions:
build
- build an image from the contents of a GitHub repository (or possibly some other specification)stage
- make one or more images ready for deployment, including specifying any additional services, and resource allocationdeploy
- deploys a named environment, and provides status about running versions of that environmentpool
- pre-allocate and view details about current pools of running environmentsThe four resources that support these actions are:
builds
stagings
servers
pools
Some of these operations should have authorization, depending on their usage.
These are assumed to be run on an API endpoint (e.g. api.mybinder.org) or
potentially with a leading
/api/
path.The README is the main source to review right now and we can work toward a full spec in swagger.
This was also brought up in binder-project/binder#8 as well as https://groups.google.com/forum/#!msg/jupyter/2K2Wuem1HB8/r4dQ_6FbEAAJ.