smarter data layout of local variables inside async function frame to save space #3069

andrewrk · 2019-08-15T23:08:58Z

In theory these should have the same frame size:

const std = @import("std");

pub fn main() void {
    const size1: usize = @sizeOf(@Frame(func1));
    const size2: usize = @sizeOf(@Frame(func2));
    std.debug.warn("func1={}\nfunc2={}\n", size1, size2);
}

async fn func1() void {
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
}
async fn func2() void {
    var x: i32 = 1234;
}

But currently:

$ ./zig run test.zig 
func1=80
func2=32

The text was updated successfully, but these errors were encountered:

andrewrk · 2019-09-10T05:07:25Z

I did a little analysis on the async-std-lib branch and it looks like this issue is the next blocker for async/await stuff. This is a little snippet of the output:

@Frame(std.debug.openSelfDebugInfoPosix): 1.02 GiB (padding = 21)
 @Frame(std.debug.openElfDebugInfo): 1.02 GiB (padding = 184)
  @Frame(std.debug.openDwarfDebugInfo): 844.48 MiB (padding = 34)
   @Frame(std.debug.scanAllFunctions): 612.35 MiB (padding = 209)
    @Frame(std.debug.parseDie1): 188.11 MiB (padding = 80)
     @Frame(std.debug.parseFormValue): 184.10 MiB (padding = 282)
      @Frame(std.debug.parseFormValueRef): 20.01 MiB (padding = 126)
       @Frame(std.debug.leb128.readULEB128): 4.00 MiB (padding = 43)
        @Frame(std.io.in_stream.InStream(anyerror).readByte): 4.00 MiB (padding = 46)
         @Frame(std.io.in_stream.InStream(anyerror).readNoEof): 4.00 MiB (padding = 29)
          @Frame(std.io.in_stream.InStream(anyerror).readFull): 4.00 MiB (padding = 19)
           @Frame(std.io.in_stream.InStream(anyerror).read): 4.00 MiB (padding = 8)
            [4194304]u8: 4.00 MiB
            [32]usize: 256.00 bytes
            builtin.StackTrace: 24.00 bytes (padding = 0)

From this it's pretty clear how fast not reusing bytes adds up. The 4 MiB is OK, it's from the new std.io.InStream:

pub const default_stack_size = 4 * 1024 * 1024;
pub const stack_size: usize = if (@hasDecl(root, "stack_size_std_io_InStream"))
    root.stack_size_std_io_InStream
else
    default_stack_size;
pub const stack_align = 16;

pub fn InStream(comptime ReadError: type) type {
    return struct {
        const Self = @This();
        pub const Error = ReadError;
        pub const ReadFn = if (std.io.is_async)
            async fn (self: *Self, buffer: []u8) Error!usize
        else
            fn (self: *Self, buffer: []u8) Error!usize;

        /// Returns the number of bytes read. It may be less than buffer.len.
        /// If the number of bytes read is 0, it means end of stream.
        /// End of stream is not an error condition.
        readFn: ReadFn,

        /// Returns the number of bytes read. It may be less than buffer.len.
        /// If the number of bytes read is 0, it means end of stream.
        /// End of stream is not an error condition.
        pub fn read(self: *Self, buffer: []u8) Error!usize {
            if (std.io.is_async) {
                var stack_frame: [stack_size]u8 align(stack_align) = undefined;
                // TODO https://github.com/ziglang/zig/issues/3068
                var result: Error!usize = undefined;
                return await @asyncCall(&stack_frame, &result, self.readFn, self, buffer);
            } else {
                return self.readFn(self, buffer);
            }
        }

But 1.02 GiB is not okay! We can't be having a stack size of 1.02 GiB, that's crazy town.

We also don't yet request a bigger stack when it's statically known that we need more than the default (for example 16 MiB on Linux).

Update: I made it output JSON so one can use Firefox's tree view widget on the json file:

See #3069

This introduces the concept of "IO mode" which is configurable by the root source file (e.g. next to `pub fn main`). Applications can put this in their root source file: ``` pub const io_mode = .evented; ``` This will populate `std.io.mode` to be `std.io.Mode.evented`. When I/O mode is evented, `std.os.read` handles EAGAIN by suspending until the file descriptor becomes available for reading. Although the std lib event loop supports epoll, kqueue, and Windows I/O Completion Ports, this integration with `std.os.read` currently only works on Linux. This integration is currently only hooked up to `std.os.read`, and not, for example, `std.os.write`, child processes, and timers. The fact that we can do this and still have a working master branch is thanks to Zig's lazy analysis, comptime, and inferred async. We can continue to make incremental progress on async std lib features, enabling more and more test cases and coverage. In addition to `std.io.mode` there is `std.io.is_async` which is equal to `std.io.mode == .evented`. In case I/O mode is async, `std.io.InStream` notices this and the read function pointer becomes an async function pointer rather than a blocking function pointer. Even in this case, `std.io.InStream` can *still be used as a blocking input stream*. Users of the API control whether it is blocking or async at runtime by whether or not the read function suspends. In case of file descriptors, for example, this might correspond to whether it was opened with `O_NONBLOCK`. The `noasync` keyword makes a function call or `await` assert that no suspension happens. This assertion has runtime safety enabled. `std.io.InStream`, in the case of async I/O, uses by default a 4 MiB frame size for calling the read function. If this is too large or too small, the application can globally increase the frame size used by declaring `pub const stack_size_std_io_InStream = 1234;` in their root source file. This way, `std.io.InStream` will only be generated once, avoiding bloat, and as long as this number is configured to be high enough, everything works fine. Zig has runtime safety to detect when `@asyncCall` is given too small of a buffer for the frame size. This merge introduces -fstack-report which can help identify large async function frame sizes and explain what is making them so big. Until #3069 is solved, it's recommended to stick with blocking IO mode. -fstack-report outputs JSON format, which can then be viewed in a GUI that represents the tree structure. As an example, Firefox does a decent job of this. One feature that is currently missing is detecting that the call stack upper bound is greater than the default for a given target, and passing this upper bound to the linker. As an example, if Zig detects that 20 MiB stack upper bound is needed - which would be quite reasonable - currently on Linux the application would only be given the default of 16 MiB. Unrelated miscellaneous change: added std.c.readv

andrewrk · 2019-09-11T02:39:40Z

I've got a small patch undergoing local testing that is a huge improvement:

Now that's more like it. It's barely any more stack usage than the 4 MiB reserved by the std.io.InStream.

I'm going to leave this open though - there's still room for improvement. The original test case still has the same flaw.

There's also a case I want to consider, where an async function call inside a suspend block races with being resumed and calling an async function.

See #3069

LemonBoy · 2019-09-11T07:14:27Z

In theory these should have the same frame size

On a more general level LLVM is also able to recycle the alloca'd stack slots as needed if they're annotated with the right lifetime hints.

Hadron67 · 2021-05-01T13:30:55Z

I'm blocked by this issue too, it seems that even temporary variables sometimes are not reused:

const std = @import("std");

fn func1(a: i32) void {
    suspend {}
}

fn func2(a: i32) i32 {
    suspend {}
    return a + 1;
}

fn test1(a: i32) void {
    func1(a + a + a + a);
}

fn test2(a: i32) void {
    _ = func2(a + a + a + a);
}

pub fn main() void {
    std.debug.print("test1: {}\n", .{@sizeOf(@Frame(test1))});
    std.debug.print("test2: {}\n", .{@sizeOf(@Frame(test2))});
}

frame size of test1 and test2 should only differ by the size of an i32, but actually it's much more:

test1: 64
test2: 112

In real world use cases this makes the frame size grow rapidly, and in my case it reached over 10k.

andrewrk added optimization frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Aug 15, 2019

andrewrk added this to the 0.6.0 milestone Aug 15, 2019

andrewrk mentioned this issue Aug 15, 2019

async/await rewrite follow-up #3063

Closed

10 tasks

andrewrk added a commit that referenced this issue Sep 10, 2019

-fstack-report outputs JSON

7d050d4

See #3069

andrewrk modified the milestones: 0.6.0, 0.5.0 Sep 10, 2019

andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. stage1 The process of building from source via WebAssembly and the C backend. and removed optimization frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Sep 10, 2019

andrewrk added a commit that referenced this issue Sep 10, 2019

-fstack-report outputs JSON

ff051f8

See #3069

andrewrk added a commit that referenced this issue Sep 11, 2019

async function calls re-use frame buffers

c9b2210

See #3069

andrewrk modified the milestones: 0.5.0, 0.6.0 Sep 20, 2019

andrewrk modified the milestones: 0.6.0, 0.7.0 Feb 11, 2020

andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 14, 2020

andrewrk modified the milestones: 0.8.0, 0.9.0 Nov 6, 2020

andrewrk modified the milestones: 0.9.0, 0.10.0 May 19, 2021

Vexu removed stage1 The process of building from source via WebAssembly and the C backend. labels Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smarter data layout of local variables inside async function frame to save space #3069

smarter data layout of local variables inside async function frame to save space #3069

andrewrk commented Aug 15, 2019

andrewrk commented Sep 10, 2019

andrewrk commented Sep 11, 2019

LemonBoy commented Sep 11, 2019

Hadron67 commented May 1, 2021

smarter data layout of local variables inside async function frame to save space #3069

smarter data layout of local variables inside async function frame to save space #3069

Comments

andrewrk commented Aug 15, 2019

andrewrk commented Sep 10, 2019

andrewrk commented Sep 11, 2019

LemonBoy commented Sep 11, 2019

Hadron67 commented May 1, 2021