-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smarter data layout of local variables inside async function frame to save space #3069
Comments
I did a little analysis on the
From this it's pretty clear how fast not reusing bytes adds up. The 4 MiB is OK, it's from the new pub const default_stack_size = 4 * 1024 * 1024;
pub const stack_size: usize = if (@hasDecl(root, "stack_size_std_io_InStream"))
root.stack_size_std_io_InStream
else
default_stack_size;
pub const stack_align = 16;
pub fn InStream(comptime ReadError: type) type {
return struct {
const Self = @This();
pub const Error = ReadError;
pub const ReadFn = if (std.io.is_async)
async fn (self: *Self, buffer: []u8) Error!usize
else
fn (self: *Self, buffer: []u8) Error!usize;
/// Returns the number of bytes read. It may be less than buffer.len.
/// If the number of bytes read is 0, it means end of stream.
/// End of stream is not an error condition.
readFn: ReadFn,
/// Returns the number of bytes read. It may be less than buffer.len.
/// If the number of bytes read is 0, it means end of stream.
/// End of stream is not an error condition.
pub fn read(self: *Self, buffer: []u8) Error!usize {
if (std.io.is_async) {
var stack_frame: [stack_size]u8 align(stack_align) = undefined;
// TODO https://github.com/ziglang/zig/issues/3068
var result: Error!usize = undefined;
return await @asyncCall(&stack_frame, &result, self.readFn, self, buffer);
} else {
return self.readFn(self, buffer);
}
} But 1.02 GiB is not okay! We can't be having a stack size of 1.02 GiB, that's crazy town. We also don't yet request a bigger stack when it's statically known that we need more than the default (for example 16 MiB on Linux). Update: I made it output JSON so one can use Firefox's tree view widget on the json file: |
This introduces the concept of "IO mode" which is configurable by the root source file (e.g. next to `pub fn main`). Applications can put this in their root source file: ``` pub const io_mode = .evented; ``` This will populate `std.io.mode` to be `std.io.Mode.evented`. When I/O mode is evented, `std.os.read` handles EAGAIN by suspending until the file descriptor becomes available for reading. Although the std lib event loop supports epoll, kqueue, and Windows I/O Completion Ports, this integration with `std.os.read` currently only works on Linux. This integration is currently only hooked up to `std.os.read`, and not, for example, `std.os.write`, child processes, and timers. The fact that we can do this and still have a working master branch is thanks to Zig's lazy analysis, comptime, and inferred async. We can continue to make incremental progress on async std lib features, enabling more and more test cases and coverage. In addition to `std.io.mode` there is `std.io.is_async` which is equal to `std.io.mode == .evented`. In case I/O mode is async, `std.io.InStream` notices this and the read function pointer becomes an async function pointer rather than a blocking function pointer. Even in this case, `std.io.InStream` can *still be used as a blocking input stream*. Users of the API control whether it is blocking or async at runtime by whether or not the read function suspends. In case of file descriptors, for example, this might correspond to whether it was opened with `O_NONBLOCK`. The `noasync` keyword makes a function call or `await` assert that no suspension happens. This assertion has runtime safety enabled. `std.io.InStream`, in the case of async I/O, uses by default a 4 MiB frame size for calling the read function. If this is too large or too small, the application can globally increase the frame size used by declaring `pub const stack_size_std_io_InStream = 1234;` in their root source file. This way, `std.io.InStream` will only be generated once, avoiding bloat, and as long as this number is configured to be high enough, everything works fine. Zig has runtime safety to detect when `@asyncCall` is given too small of a buffer for the frame size. This merge introduces -fstack-report which can help identify large async function frame sizes and explain what is making them so big. Until #3069 is solved, it's recommended to stick with blocking IO mode. -fstack-report outputs JSON format, which can then be viewed in a GUI that represents the tree structure. As an example, Firefox does a decent job of this. One feature that is currently missing is detecting that the call stack upper bound is greater than the default for a given target, and passing this upper bound to the linker. As an example, if Zig detects that 20 MiB stack upper bound is needed - which would be quite reasonable - currently on Linux the application would only be given the default of 16 MiB. Unrelated miscellaneous change: added std.c.readv
I've got a small patch undergoing local testing that is a huge improvement: Now that's more like it. It's barely any more stack usage than the 4 MiB reserved by the I'm going to leave this open though - there's still room for improvement. The original test case still has the same flaw. There's also a case I want to consider, where an async function call inside a suspend block races with being resumed and calling an async function. |
On a more general level LLVM is also able to recycle the alloca'd stack slots as needed if they're annotated with the right lifetime hints. |
I'm blocked by this issue too, it seems that even temporary variables sometimes are not reused: const std = @import("std");
fn func1(a: i32) void {
suspend {}
}
fn func2(a: i32) i32 {
suspend {}
return a + 1;
}
fn test1(a: i32) void {
func1(a + a + a + a);
}
fn test2(a: i32) void {
_ = func2(a + a + a + a);
}
pub fn main() void {
std.debug.print("test1: {}\n", .{@sizeOf(@Frame(test1))});
std.debug.print("test2: {}\n", .{@sizeOf(@Frame(test2))});
} frame size of
In real world use cases this makes the frame size grow rapidly, and in my case it reached over 10k. |
In theory these should have the same frame size:
But currently:
The text was updated successfully, but these errors were encountered: