Skip to content

Latest commit

 

History

History
181 lines (120 loc) · 9.71 KB

java-project-loom.adoc

File metadata and controls

181 lines (120 loc) · 9.71 KB

Understanding Java’s Project Loom

Project Loom’s Virtual Threads

Trying to get up to speed with Java 19’s Project Loom, I watched Nicolai Parlog’s talk and read several blog posts.

All of them showed, how virtual threads (or fibers) can essentially scale to hundred-thousands or millions, whereas good, old, OS-backed Java threads only could scale to a couple of thousand (TBD: check OS-thread hypothesis in real-world scenarios).

try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    IntStream.range(0, 100_000).forEach(i -> executor.submit(() -> {  // (1)
        Thread.sleep(Duration.ofSeconds(1));
        System.out.println(i);
        return i;
    }));
}
  1. The example the blog posts used, letting 100.000 virtual threads sleep.

Hundred-thousand sleeping virtual threads, fine. But could I now just easily execute 100.000 HTTP calls in parallel, with the help of virtual threads?

// what's the difference?

for (int i = 0; i < 1000000; i++) {
    // good, old Java Threads
    new Thread( () -> getURL("https://www.marcobehler.com"))
        .start();
}


for (int i = 0; i < 1000000; i++) {
    // Java 19 virtual threads to the rescue?
    Thread.startVirtualThread(() -> getURL("https://www.marcobehler.com"))
        .start();
}

Let’s find out.

Why are some Java calls blocking?

Here is the code from our getURL method above, which opens a URL and returns its contents as a String.

static String getURL(String url) {
    try (InputStream in = new URL(url).openStream()) {
        byte[] bytes = in.readAllBytes(); // ALERT, ALERT!
        return new String(bytes);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

When you open up the JavaDoc of inputStream.readAllBytes() (or are lucky enough to remember your Java 101 class), it gets hammered into you that the call is blocking, i.e. won’t return until all the bytes are read - your current thread is blocked until then.

How come, I can now supposedly execute this call a million times in parallel, when running inside virtual threads, but not when running inside normal threads?

Parts of the puzzle - topics you never knew you wanted to know more about after CS 101: Sockets & Syscalls.

Sockets

When you want to make an HTTP call or rather send any sort of data to another server, you (or rather the library maintainer in a layer far, far away) will open up a Socket. And accessing sockets, by default, is blocking.

// pseudo-code
Socket s = new Socket();

// blocking call, until data is available
s.read();

However, operating systems also allow you to put sockets into non-blocking mode, which return immediately when there is no data available. And then it’s your responsibility to check back again later, to find out if there is any new data to be read.

// pseudo-code
Socket s = new Socket();

// pseudo code, consult a random Java NIO tutorial
s.setBlockingFalse(true);   // ;D

// yay, this call will return immediately, even if there is no data
s.read();

Syscalls

When executing the getURL() call above, Java doesn’t do the network call (open up a socket, read from it, etc) itself - it asks the underlying operating system to do the call. And here’s the trick: Whenever you are using good-old Java threads, the JVM will use a blocking system call (TBD: show OS call stack.).

When run inside a virtual thread, however, the JVM will use a different system call to do the network request, which is non-blocking (e.g. use epoll on Unix-based systems.), without you, as Java programmer, having to write non-blocking code yourself, e.g. some clunky Java NIO code.

To cut a long story short (and ignoring a whole lot of details), the real difference between our getURL calls inside good, old threads, and virtual threads is, that one call opens up a million blocking sockets, whereas the other call opens up a million non-blocking sockets.

Now, if you tried out this (non-sensical) example in the real world⟨™), you’d find that depending on your operating system, and if you are sending or receiving data, you’d run into operating system socket limits - a reminder that using virtual threads is not an automagically scaling solution without you needing to know what you are doing (isn’t that always true? :) )

Filesystem calls

While we are at it. How would virtual threads behave when working with files?

// Let's read in a million files in parallel!

for (int i = 0; i < 1000000; i++) {
    // Java 19 virtual threads to the rescue?
    Thread.startVirtualThread(() -> readFile(someFile))
                                        .start();
}

With sockets it was easy, because you could just set them to non-blocking. But with file access, there is no async IO (well, except for io_uring in new kernels).

To cut a long story short, your file access call inside the virtual thread, will actually be delegated to a (…​.drum roll…​.) good-old operating system thread, to give you the illusion of non-blocking file access.

How do virtual threads work?

Even though good,old Java threads and virtual threads share the name…​Threads, the comparisons/online discussions feel a bit apple-to-oranges to me.

It helped me think of virtual threads as tasks, that will eventually run on a real thread⟨™) (called carrier thread) AND that need the underlying native calls to do the heavy non-blocking lifting.

In the case of IO-work (REST calls, database calls, queue, stream calls etc.) this will absolutely yield benefits, and at the same time illustrates why they won’t help at all with CPU-intensive work (or make matters worse). So, don’t get your hopes high, thinking about mining Bitcoins in hundred-thousand virtual threads.

Hype & Promises

Almost every blog post on the first page of Google surrounding JDK 19 copied the following text, describing virtual threads, verbatim.

A preview of virtual threads, which are lightweight threads that dramatically
reduce the effort of writing, maintaining, and observing high-throughput,
concurrent applications. Goals include enabling server applications written
in the simple thread-per-request style to scale with near-optimal
hardware utilization (...) enable troubleshooting, debugging, and
profiling of virtual threads with existing JDK tools.

While I do think virtual threads are a great feature, I also feel paragraphs like the above will lead to a fair amount of scale hype-train’ism. Web servers like Jetty have long been using NIO connectors, where you have just a few threads able to keep open hundreds of thousand or even a million connections.

The problem with real applications is them doing silly things, like calling databases, working with the file system, executing REST calls or talking to some sort of queue/stream.

And yes, it’s this type of I/O work where Project Loom will potentially shine. Loom gives you, the programmer or maybe even more "just" the (HTTP/database/queue) library & framework maintainers, the benefit of essentially non-blocking code, without having to resort back to the somewhat unintuitive async programming model (think of RxJava / Project Reactor ) and all the consequences that entails (troubleshooting, debugging etc).

However, forget about automagically scaling up to a million of private threads in real-life scenarios without knowing what you are doing. There is no free lunch.

What about the Thread.sleep example?

We started this article with making threads sleep. So, how does that work?

  • When calling Thread.sleep() on a good, old Java, OS-backed thread, you will in turn, generate a native call that makes the thread sleepey-sleep for a given amount of time. Which is a non-sensical scenario anyway quite costly for 100_000 threads.

  • In case of VirtualThread.sleep(), you will mark the virtual thread as sleeping and create a scheduled task on a good, old Java (OS-thread-based) ScheduledThreadPoolExecutor. That task will unpark / resume your virtual thread after the given [sleep-time]. Exercise for you: apples-to-oranges, again?

Fin

Want to see more of these short technology deep dives? Leave a comment below.

Meanwhile, check out Load Testing: An Unorthodox Guide to find out, why you should worry about other things than scale.

Acknowledgements

Thanks to Tagir Valeev, Vsevolod Tolstopyatov. Andreas Eisele for comments/corrections/discussions.