Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smarter data layout of local variables inside async function frame to save space #3069

Open
andrewrk opened this issue Aug 15, 2019 · 4 comments
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Milestone

Comments

@andrewrk
Copy link
Member

In theory these should have the same frame size:

const std = @import("std");

pub fn main() void {
    const size1: usize = @sizeOf(@Frame(func1));
    const size2: usize = @sizeOf(@Frame(func2));
    std.debug.warn("func1={}\nfunc2={}\n", size1, size2);
}

async fn func1() void {
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
    {
        var x: i32 = 1234;
    }
}
async fn func2() void {
    var x: i32 = 1234;
}

But currently:

$ ./zig run test.zig 
func1=80
func2=32
@andrewrk andrewrk added optimization frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Aug 15, 2019
@andrewrk andrewrk added this to the 0.6.0 milestone Aug 15, 2019
@andrewrk
Copy link
Member Author

I did a little analysis on the async-std-lib branch and it looks like this issue is the next blocker for async/await stuff. This is a little snippet of the output:

@Frame(std.debug.openSelfDebugInfoPosix): 1.02 GiB (padding = 21)
 @Frame(std.debug.openElfDebugInfo): 1.02 GiB (padding = 184)
  @Frame(std.debug.openDwarfDebugInfo): 844.48 MiB (padding = 34)
   @Frame(std.debug.scanAllFunctions): 612.35 MiB (padding = 209)
    @Frame(std.debug.parseDie1): 188.11 MiB (padding = 80)
     @Frame(std.debug.parseFormValue): 184.10 MiB (padding = 282)
      @Frame(std.debug.parseFormValueRef): 20.01 MiB (padding = 126)
       @Frame(std.debug.leb128.readULEB128): 4.00 MiB (padding = 43)
        @Frame(std.io.in_stream.InStream(anyerror).readByte): 4.00 MiB (padding = 46)
         @Frame(std.io.in_stream.InStream(anyerror).readNoEof): 4.00 MiB (padding = 29)
          @Frame(std.io.in_stream.InStream(anyerror).readFull): 4.00 MiB (padding = 19)
           @Frame(std.io.in_stream.InStream(anyerror).read): 4.00 MiB (padding = 8)
            [4194304]u8: 4.00 MiB
            [32]usize: 256.00 bytes
            builtin.StackTrace: 24.00 bytes (padding = 0)

From this it's pretty clear how fast not reusing bytes adds up. The 4 MiB is OK, it's from the new std.io.InStream:

pub const default_stack_size = 4 * 1024 * 1024;
pub const stack_size: usize = if (@hasDecl(root, "stack_size_std_io_InStream"))
    root.stack_size_std_io_InStream
else
    default_stack_size;
pub const stack_align = 16;

pub fn InStream(comptime ReadError: type) type {
    return struct {
        const Self = @This();
        pub const Error = ReadError;
        pub const ReadFn = if (std.io.is_async)
            async fn (self: *Self, buffer: []u8) Error!usize
        else
            fn (self: *Self, buffer: []u8) Error!usize;

        /// Returns the number of bytes read. It may be less than buffer.len.
        /// If the number of bytes read is 0, it means end of stream.
        /// End of stream is not an error condition.
        readFn: ReadFn,

        /// Returns the number of bytes read. It may be less than buffer.len.
        /// If the number of bytes read is 0, it means end of stream.
        /// End of stream is not an error condition.
        pub fn read(self: *Self, buffer: []u8) Error!usize {
            if (std.io.is_async) {
                var stack_frame: [stack_size]u8 align(stack_align) = undefined;
                // TODO https://github.com/ziglang/zig/issues/3068
                var result: Error!usize = undefined;
                return await @asyncCall(&stack_frame, &result, self.readFn, self, buffer);
            } else {
                return self.readFn(self, buffer);
            }
        }

But 1.02 GiB is not okay! We can't be having a stack size of 1.02 GiB, that's crazy town.

We also don't yet request a bigger stack when it's statically known that we need more than the default (for example 16 MiB on Linux).

Update: I made it output JSON so one can use Firefox's tree view widget on the json file:

Screenshot_2019-09-10_00-59-30

andrewrk added a commit that referenced this issue Sep 10, 2019
@andrewrk andrewrk modified the milestones: 0.6.0, 0.5.0 Sep 10, 2019
@andrewrk andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. stage1 The process of building from source via WebAssembly and the C backend. and removed optimization frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Sep 10, 2019
andrewrk added a commit that referenced this issue Sep 10, 2019
andrewrk added a commit that referenced this issue Sep 10, 2019
This introduces the concept of "IO mode" which is configurable by the
root source file (e.g. next to `pub fn main`). Applications can put this
in their root source file:

```
pub const io_mode = .evented;
```

This will populate `std.io.mode` to be `std.io.Mode.evented`. When I/O
mode is evented, `std.os.read` handles EAGAIN by suspending until the
file descriptor becomes available for reading. Although the std lib
event loop supports epoll, kqueue, and Windows I/O Completion Ports,
this integration with `std.os.read` currently only works on Linux.

This integration is currently only hooked up to `std.os.read`, and not,
for example, `std.os.write`, child processes, and timers. The fact that
we can do this and still have a working master branch is thanks to Zig's
lazy analysis, comptime, and inferred async. We can continue to make
incremental progress on async std lib features, enabling more and more
test cases and coverage.

In addition to `std.io.mode` there is `std.io.is_async` which is equal
to `std.io.mode == .evented`. In case I/O mode is async, `std.io.InStream`
notices this and the read function pointer becomes an async function
pointer rather than a blocking function pointer. Even in this case,
`std.io.InStream` can *still be used as a blocking input stream*.
Users of the API control whether it is blocking or async at runtime by whether
or not the read function suspends. In case of file descriptors, for
example, this might correspond to whether it was opened with `O_NONBLOCK`.
The `noasync` keyword makes a function call or `await` assert that no
suspension happens. This assertion has runtime safety enabled.

`std.io.InStream`, in the case of async I/O, uses by default a 4 MiB
frame size for calling the read function. If this is too large or too
small, the application can globally increase the frame size used by
declaring `pub const stack_size_std_io_InStream = 1234;` in their root
source file. This way, `std.io.InStream` will only be generated once,
avoiding bloat, and as long as this number is configured to be high
enough, everything works fine. Zig has runtime safety to detect when
`@asyncCall` is given too small of a buffer for the frame size.

This merge introduces -fstack-report which can help identify large async
function frame sizes and explain what is making them so big. Until #3069 is
solved, it's recommended to stick with blocking IO mode.

-fstack-report outputs JSON format, which can then be viewed in a GUI
that represents the tree structure. As an example, Firefox does a decent
job of this.

One feature that is currently missing is detecting that the call stack
upper bound is greater than the default for a given target, and passing
this upper bound to the linker. As an example, if Zig detects that 20
MiB stack upper bound is needed - which would be quite reasonable -
currently on Linux the application would only be given the default of 16
MiB.

Unrelated miscellaneous change: added std.c.readv
@andrewrk
Copy link
Member Author

I've got a small patch undergoing local testing that is a huge improvement:

Screenshot_2019-09-10_22-29-07

Now that's more like it. It's barely any more stack usage than the 4 MiB reserved by the std.io.InStream.

I'm going to leave this open though - there's still room for improvement. The original test case still has the same flaw.

There's also a case I want to consider, where an async function call inside a suspend block races with being resumed and calling an async function.

andrewrk added a commit that referenced this issue Sep 11, 2019
@LemonBoy
Copy link
Contributor

In theory these should have the same frame size

On a more general level LLVM is also able to recycle the alloca'd stack slots as needed if they're annotated with the right lifetime hints.

@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Sep 20, 2019
@andrewrk andrewrk modified the milestones: 0.6.0, 0.7.0 Feb 11, 2020
@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 14, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 Nov 6, 2020
@Hadron67
Copy link
Contributor

Hadron67 commented May 1, 2021

I'm blocked by this issue too, it seems that even temporary variables sometimes are not reused:

const std = @import("std");

fn func1(a: i32) void {
    suspend {}
}

fn func2(a: i32) i32 {
    suspend {}
    return a + 1;
}

fn test1(a: i32) void {
    func1(a + a + a + a);
}

fn test2(a: i32) void {
    _ = func2(a + a + a + a);
}

pub fn main() void {
    std.debug.print("test1: {}\n", .{@sizeOf(@Frame(test1))});
    std.debug.print("test2: {}\n", .{@sizeOf(@Frame(test2))});
}

frame size of test1 and test2 should only differ by the size of an i32, but actually it's much more:

test1: 64
test2: 112

In real world use cases this makes the frame size grow rapidly, and in my case it reached over 10k.

@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 May 19, 2021
@Vexu Vexu removed stage1 The process of building from source via WebAssembly and the C backend. labels Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Projects
None yet
Development

No branches or pull requests

4 participants