Skip to content
refractalize edited this page Apr 11, 2012 · 1 revision

Parallelism

It may seem that the future of CPUs is going to be multitudinal. Moores law hasn't really stopped being true, it's just changed direction from increasing the speed of CPUs to increasing the number of them.

I think about this a lot, not sure why, and really not sure why I think I can contribute to what seems to be a field pretty dominated by smart people. But nevertheless I'll address one possible solution here, using PogoScript of course (or a variation of, at least.)

Async-Async

JavaScript, with the widespread adoption of Node.js, has put a lot of emphasis on asynchronous calling conventions. This allows Node.js to outperform other traditionally faster languages in IO intensive operations, all down to its evented nature. The idea is never to wait for IO operation to complete, but to have a function called on completion instead. This means that a single-threaded process can handle thousands, or even hundreds of thousands of IO operations and remain reasonably responsive.

We're talking about a technology that allows us to reduce the amount of threads in a process (in Node's case, to just one). How can this help us with parallelism?

Queuing

Let's imagine we have two functions, one function returns its results to another, like this:

f(g())

Or, to use a more concrete example, lets say we're downloading a master quality video from somewhere, and then transcoding it to a web-friendly format like H.264:

transcode to h264 (download (master video))

(The examples are in PogoScript so identifiers are allowed spaces in them.)

In this scenario, when a video is downloaded it can start transcoding immediately. But lets say we have lots of videos to transcode, we could write this:

for each @(master video) in (master videos)
    transcode to h264 (download (master video))

If we think about this in terms of queuing, there's a queue between the download process and the transcode process and the queue's size is one. The queue's size means that we cannot download more files than we can transcode; the downloader has to wait for the transcoder to finish (dequeue the previous video) before it can download another. This is standard imperative (and functional) semantics. It's not exactly like this in reality, but thinking that there's a queue of one between each function is useful for the time being.

Now what if we allow the queues to be of arbitrary size? This would allow us to download lots of files while the transcoder is working independently. When we download a video we add it to the queue, and when the transcoder is ready it just picks a video off the queue. When there aren't any videos in the queue, the transcoder waits. This is standard queuing semantics, but how does async JavaScript help?

If we were to rewrite this code to use async callbacks instead, then we could implement queuing semantics:

for each @(master video, done) in (master videos)
    download (master video) @(downloaded video)
        transcode to h264 (downloaded video, done)

(Here we're assuming we have an async for each that passes a done function for when the body has completed, we pass done to the transcode to h264 function.)

In this async version, download would be called lots of times making several outstanding download calls. When they eventually complete, we'll call transcode for each of them, making several outstanding transcode calls. When all of the transcodes are complete the for each will be finished. That's how it would work today in Node.js. But where are the queues?

We can introduce queues as higher-order functions. Lets say we have a limited to function that limits the number of outstanding calls made to an asynchronous function.

limited transcode = (transcode to h264) limited to 5

for each @(master video, done) in (master videos)
    download (master video) @(downloaded video)
        limited transcode (downloaded video, done)

This would limit the number of outstanding transcode processes to 5, a nice round number that (presumably) wouldn't overload our box.

Multicore

Now that we're using asynchronous calls, we have a lot of room to play with. As we've seen, we can queue calls to limit the number of outstanding processes. We can also transfer those calls to other machines on the network to reduce the load on the local machine.

The advantage is in breaking the link between making a request and getting a response. In regular languages it's strictly request/response: you make a request (call a function) and cannot do anything until it responds (returns). In asynchronous (queuing) systems, you can make many requests before getting a response and therein lies the parallelism.

What can we do with this?

First of all we can design a language that is inherently asynchronous. Instead of using the callback style above, we should be able use our original code:

for each @(master video) in (master videos)
    transcode to h264 (download (master video))

We can then provide some asynchronous primitives:

  • Concurrency limits

    Like the example above, we limit the number of outstanding calls to a function

  • Cross-process calls

    Instead of calling functions locally, we can send their code and arguments to other cores or other machines on the network and have them do the work.

These should be enough to build a more automatic parallel framework on top. Instead of the programmer specifying concurrency limits and remote calls, the framework decides itself, based on its own understanding of load across the network.

I'd really like to write some code in this framework to see how useful it could be. I'm sure there are major differences to regular semantics that would catch people out, and would require some thought and practice to get right.

Clone this wiki locally