Performance of for await of (async iteration) #31979

alubbe · 2020-02-27T06:18:34Z

I hope this is the right place to ask the question and that this hasn't already been discussed to death elsewhere. Feel free to direct me elsewhere or close the issue if I've missed something.

In short, I am one of the maintainers of exceljs (which is basically a transfrom stream taking in a read stream, unzipping its contents, running an xml parser on the unzipped chunks and then emitting back excel-related events) and we're in the process of adding support for async iteration via for await of (exceljs/exceljs#1135).

In doing that, we've noticed that for await of is significantly slower than the current .on('data',..) based approach. Our benchmark is not a microbenchmark, but a full end-to-end benchmark incl. creating and analyzing excel objects in memory (exceljs/exceljs#1139). Switching to for await of (vs. handling the events in sync callbacks) decreased performance by around 60%.

I have debugged this issue (lddubeau/saxes#32) and in short, the issue arises because for every chunk/event passed into our transform stream, we emit out a magnitude greater of chunks/events. And so what's causing the performance is that the callback code would run through these emitted chunks/events mostly synchronously, whereas the current implementation of Symbol.asyncIterator on Readable calls setImmediate between each event, which is quite expensive. I wrote a simple microbenchmark to compare for of against for await of on the same array or iterator, and the difference is around 10x.

So we've come up with this 'hack' where instead of emitting one-by-one all of these chunks/events that our transform produces, we now gather them up in an array and emit that once. Or phrased another way, instead of calling this.push() for every excel related event that we produce, we call, for each chunk written into our stream, a lot of this.events.push() (where this.events is just an array that initialized in the constructor) and then finally this.push(this.events) once we're done consuming the chunk (and we also reset this.events to an empty array again). Clever, but now consuming the stream is ugly. Instead of writing `` we now write

// We'd like to write this, but it's slow
for await (const chunk of readable) { ... }

// This is within 2% of the runtime of the callback based approach, but not very ergonomic
for await (const chunks of readable) {
  for (const chunk of chunks) { ... }
}

I think this performance issue will bite a lot people because it's so easy to fall into and, at least to me, came as a surprise. I remember reading that readline has similar performance issues (and similarly to the above it produces a lot more events than it takes in) and would probably also see performance improvements from the above approach.

My question boils down to this: Is there a fundamental reason in the spec around async iteration or streams that we have to go to setImmediate if the read buffer still has stuff in it (i.e., if we could call .next() synchronously? Is it something that v8 can/will eventually optimize? If no to both questions, what should library authors do to give users all the advantages of async iteration while not sacrificing performance?

Roping in @BridgeAR as a fellow performance nerd and the only one I know here ;)

The text was updated successfully, but these errors were encountered:

alubbe · 2020-02-27T07:01:48Z

I'm sure there's some clever name for this like 'async iteration performance impacted by transform stream event amplification' ;)

devsnek · 2020-02-27T07:18:05Z

@alubbe are you saying that it calls emit('data') synchronously? If yes, nothing using promises will ever be faster.

That being said, can you point to where the async iterator implementation is calling setImmediate? I can't find such a thing.

lpinca · 2020-02-27T19:00:44Z

@nodejs/streams

mcollina · 2020-02-27T19:12:23Z

Note that you are adding an additional PassThrough in https://github.com/exceljs/exceljs/pull/1135/files#diff-a7c32d962b3105f42a2ae84812252274R60.
As a rule of thumb, each new transform that you add cuts throughput by ~30-50%.

I'm not sure it's all due to that, but that would be my first starting point for improving the perf there.

From my tests which consisted in simply doing some reading from a file, the async iterators had similar performance, but there might be something that I did overlook.

Can you produce a couple of self-runnable example that just use node core? We'll look into them if there are some ways in which we can optimize things.

alubbe · 2020-02-28T06:12:14Z

The PassThrough is used in both versions to force unzipper to support async iteration (it's a temporary hack, see ZJONSSON/node-unzipper#178), so it doesn't affect the numbers I mentioned.

Thanks for all of your input so far, I will try to build an example around readline as soon as I can.

alubbe · 2020-02-28T07:12:14Z

And here it is: https://github.com/alubbe/readline-benchmark

The repo benchmarks 3 ways of using readline. Firstly, using the stream interface:

  await runProfiling('readline stream interface', () => new Promise((resolve, reject) => {
    const rl = readline.createInterface({
      input: fs.createReadStream('big.txt'),
    });

    let i = 0;
    rl.on('line', (line) => {
      i += 1;
    });

    rl.on('error', reject);

    rl.on('close', () => {
      console.log(`Read ${i} lines`);
      resolve();
    });

On my mac (node 13.9.0), this takes between 32 and 35 ms and uses around 3 MB of memory.

Then, it benchmarks async iteration as recommended in the docs:

  await runProfiling('readline async iteration', async () => {
    const rl = readline.createInterface({
        input: fs.createReadStream('big.txt'),
      });

      let i = 0;
      for await (const line of rl) {
        i += 1;
      }
      console.log(`Read ${i} lines`);
  });

On my mac (node 13.9.0), this takes around 50 ms and uses around 9 - 10 MB of memory.

Lastly, I modified two lines in the prototype of readline's Interface to enable combining async iteration with sync interation:

  await runProfiling('readline async iteration via array of lines', async () => {
    const rl = readline.createInterface({
        input: fs.createReadStream('big.txt'),
      });

      let i = 0;
      for await (const lines of rl) {
        for (const line of lines) {
          i += 1;
        }
      }
      console.log(`Read ${i} lines`);
  });

On my mac (node 13.9.0), this takes between 30 and 34 ms and uses around 3 MB of memory.

Here's the full output:

####################################################
WARMUP: Current memory usage: 1.93 MB
WARMUP: readline stream interface profiling started
Read 128457 lines
WARMUP: readline stream interface profiling finished in 73ms
WARMUP: Current memory usage (before GC): 3.41 MB
WARMUP: Current memory usage (after GC): 1.98 MB

####################################################
RUN 1: readline stream interface profiling started
Read 128457 lines
RUN 1: readline stream interface profiling finished in 35ms
RUN 1: Current memory usage (before GC): 3.6 MB
RUN 1: Current memory usage (after GC): 2.24 MB

####################################################
RUN 2: readline stream interface profiling started
Read 128457 lines
RUN 2: readline stream interface profiling finished in 32ms
RUN 2: Current memory usage (before GC): 3.04 MB
RUN 2: Current memory usage (after GC): 2.24 MB

####################################################
RUN 3: readline stream interface profiling started
Read 128457 lines
RUN 3: readline stream interface profiling finished in 32ms
RUN 3: Current memory usage (before GC): 3.22 MB
RUN 3: Current memory usage (after GC): 2.25 MB

####################################################
WARMUP: Current memory usage: 2.24 MB
WARMUP: readline async iteration profiling started
Read 128457 lines
WARMUP: readline async iteration profiling finished in 67ms
WARMUP: Current memory usage (before GC): 3 MB
WARMUP: Current memory usage (after GC): 2.31 MB

####################################################
RUN 1: readline async iteration profiling started
Read 128457 lines
RUN 1: readline async iteration profiling finished in 68ms
RUN 1: Current memory usage (before GC): 10.17 MB
RUN 1: Current memory usage (after GC): 2.36 MB

####################################################
RUN 2: readline async iteration profiling started
Read 128457 lines
RUN 2: readline async iteration profiling finished in 49ms
RUN 2: Current memory usage (before GC): 9.21 MB
RUN 2: Current memory usage (after GC): 2.36 MB

####################################################
RUN 3: readline async iteration profiling started
Read 128457 lines
RUN 3: readline async iteration profiling finished in 48ms
RUN 3: Current memory usage (before GC): 8.86 MB
RUN 3: Current memory usage (after GC): 2.34 MB

####################################################
WARMUP: Current memory usage: 2.31 MB
WARMUP: readline async iteration via array of lines profiling started
Read 128457 lines
WARMUP: readline async iteration via array of lines profiling finished in 31ms
WARMUP: Current memory usage (before GC): 3.16 MB
WARMUP: Current memory usage (after GC): 2.31 MB

####################################################
RUN 1: readline async iteration via array of lines profiling started
Read 128457 lines
RUN 1: readline async iteration via array of lines profiling finished in 31ms
RUN 1: Current memory usage (before GC): 3.12 MB
RUN 1: Current memory usage (after GC): 2.32 MB

####################################################
RUN 2: readline async iteration via array of lines profiling started
Read 128457 lines
RUN 2: readline async iteration via array of lines profiling finished in 34ms
RUN 2: Current memory usage (before GC): 3.03 MB
RUN 2: Current memory usage (after GC): 2.33 MB

####################################################
RUN 3: readline async iteration via array of lines profiling started
Read 128457 lines
RUN 3: readline async iteration via array of lines profiling finished in 30ms
RUN 3: Current memory usage (before GC): 3.61 MB
RUN 3: Current memory usage (after GC): 2.33 MB

alubbe · 2020-02-28T07:14:06Z

This reproduces my excel.js issue quite neatly - the event amplification causes a performance dip, both in cpu and memory. Is there anything library authors, node core or v8 can do or should we recommend this for await of + for of combination for performance sensitive/low-level libraries? I remember reading that the async/await spec allows for optimizations around Promises around sync returns or Promises that have already resolved, but I have no idea if that could apply here.

mcollina · 2020-02-28T07:18:19Z

readline asynciteration has known performance issues (as it adds an internal transform stream) and therefore is not really representative as we would focus in removing that Transform instead.

alubbe · 2020-02-28T07:28:16Z

yes, but benchmark 3 uses that internal transform and is as fast as benchmark 1, which does not, right? If you'd like to try this out with something else, let me know what I should benchmark.

Also, benchmarks 2 and 3 both use the internal transform but there's a big performance difference, so I think it is an independent discussion to removing the transform (which of course is a great idea!)

alubbe · 2020-02-28T08:09:24Z

I could also write a naive readline implementation as a transform stream from scratch (just to focus on the amplification of events) and benchmark that - would that help you since it wouldn't use code that is not yet fully optimized?

mcollina · 2020-02-28T08:53:30Z

That would help, yes.

alubbe · 2020-02-28T13:58:57Z

(Btw did you mean this extra readable stream here when you said transform

node/lib/readline.js

Lines 1073 to 1104 in efec681

    
           Interface.prototype[SymbolAsyncIterator] = function() { 
        
             if (this[kLineObjectStream] === undefined) { 
        
               if (Readable === undefined) { 
        
                 Readable = require('stream').Readable; 
        
               } 
        
               const readable = new Readable({ 
        
                 objectMode: true, 
        
                 read: () => { 
        
                   this.resume(); 
        
                 }, 
        
                 destroy: (err, cb) => { 
        
                   this.off('line', lineListener); 
        
                   this.off('close', closeListener); 
        
                   this.close(); 
        
                   cb(err); 
        
                 } 
        
               }); 
        
               const lineListener = (input) => { 
        
                 if (!readable.push(input)) { 
        
                   this.pause(); 
        
                 } 
        
               }; 
        
               const closeListener = () => { 
        
                 readable.push(null); 
        
               }; 
        
               this.on('line', lineListener); 
        
               this.on('close', closeListener); 
        
               this[kLineObjectStream] = readable; 
        
             } 
        
             return this[kLineObjectStream][SymbolAsyncIterator](); 
        
           };

?)

I've created two naive readline implementations as transform streams:

class NaiveReadline extends Stream.Transform {
  constructor() {
    super({readableObjectMode: true, writableObjectMode: true});
    this.buffer = '';
  }

  _transform(chunk, _encoding, callback) {
    let pausing = false;
    chunk = this.buffer + chunk;
    const lines = chunk.split(lineEnding);
    this.buffer = lines.pop();
    for (const line of lines) {
      if (!this.push(line) && !pausing) {
        pausing = true;
        this.pause();
      }
    }

    if(!pausing) return callback();
    this.once('readable', callback);
  }
};

class NaiveReadline2 extends Stream.Transform {
  constructor() {
    super({readableObjectMode: true, writableObjectMode: true});
    this.buffer = '';
  }

  _transform(chunk, _encoding, callback) {
    chunk = this.buffer + chunk;
    const lines = chunk.split(lineEnding);
    callback(null, lines);
  }
};

I hope they are correct enough for this benchmark (obviously a ton of error handling and event listener clean up is missing).

In any case, let's consume it via stream events:

  await runProfiling('naive readline stream interface', () => new Promise((resolve, reject) => {
    const rl = new NaiveReadline();
    let i = 0;
    rl.on('data', (line) => {
      i += 1;
    });

    rl.on('error', reject);

    rl.on('end', () => {
      console.log(`Read ${i} lines`);
      resolve();
    });

    fs.createReadStream('big.txt').pipe(rl);
  }));

On my mac (node 13.9.0), this takes between 32 and 35 ms and uses around 3 MB of memory (same as node's readline).

Let's benchmark async iteration:

  await runProfiling('naive readline async iteration', async () => {
    let i = 0;
    for await (const line of fs.createReadStream('big.txt').pipe(new NaiveReadline())) {
      i += 1;
    }
    console.log(`Read ${i} lines`);
  });

On my mac (node 13.9.0), this takes around 50 ms and uses around 4 - 10 MB of memory. Again, similar to node's readline.

Lastly, let's test NaiveReadline2 with its simple 1 event -> 1 array of events setup:

  await runProfiling('naive readline2 async iteration via array of lines', async () => {
    let i = 0;
    for await (const lines of fs.createReadStream('big.txt').pipe(new NaiveReadline2())) {
      for (const line of lines) {
        i += 1;
      }
    }
    console.log(`Read ${i} lines`);
  });

On my mac (node 13.9.0), this takes around 30ms and uses around 3 MB of memory - again highlighting the original issue of event amplification slowing down for await of.

Here's the full output:

####################################################
WARMUP: Current memory usage: 1.91 MB
WARMUP: naive readline stream interface profiling started
Read 128457 lines
WARMUP: naive readline stream interface profiling finished in 47ms
WARMUP: Current memory usage (before GC): 3.93 MB
WARMUP: Current memory usage (after GC): 2 MB

####################################################
RUN 1: naive readline stream interface profiling started
Read 128457 lines
RUN 1: naive readline stream interface profiling finished in 40ms
RUN 1: Current memory usage (before GC): 4.11 MB
RUN 1: Current memory usage (after GC): 2.01 MB

####################################################
RUN 2: naive readline stream interface profiling started
Read 128457 lines
RUN 2: naive readline stream interface profiling finished in 33ms
RUN 2: Current memory usage (before GC): 2.27 MB
RUN 2: Current memory usage (after GC): 2.01 MB

####################################################
RUN 3: naive readline stream interface profiling started
Read 128457 lines
RUN 3: naive readline stream interface profiling finished in 33ms
RUN 3: Current memory usage (before GC): 2.86 MB
RUN 3: Current memory usage (after GC): 2.02 MB

####################################################
WARMUP: Current memory usage: 2.02 MB
WARMUP: naive readline async iteration profiling started
Read 128457 lines
WARMUP: naive readline async iteration profiling finished in 69ms
WARMUP: Current memory usage (before GC): 4.45 MB
WARMUP: Current memory usage (after GC): 2.08 MB

####################################################
RUN 1: naive readline async iteration profiling started
Read 128457 lines
RUN 1: naive readline async iteration profiling finished in 61ms
RUN 1: Current memory usage (before GC): 3.37 MB
RUN 1: Current memory usage (after GC): 2.09 MB

####################################################
RUN 2: naive readline async iteration profiling started
Read 128457 lines
RUN 2: naive readline async iteration profiling finished in 48ms
RUN 2: Current memory usage (before GC): 9.98 MB
RUN 2: Current memory usage (after GC): 2.1 MB

####################################################
RUN 3: naive readline async iteration profiling started
Read 128457 lines
RUN 3: naive readline async iteration profiling finished in 58ms
RUN 3: Current memory usage (before GC): 9.62 MB
RUN 3: Current memory usage (after GC): 2.13 MB

####################################################
WARMUP: Current memory usage: 2.09 MB
WARMUP: naive readline2 async iteration via array of lines profiling started
Read 128557 lines
WARMUP: naive readline2 async iteration via array of lines profiling finished in 32ms
WARMUP: Current memory usage (before GC): 2.81 MB
WARMUP: Current memory usage (after GC): 2.07 MB

####################################################
RUN 1: naive readline2 async iteration via array of lines profiling started
Read 128557 lines
RUN 1: naive readline2 async iteration via array of lines profiling finished in 29ms
RUN 1: Current memory usage (before GC): 2.83 MB
RUN 1: Current memory usage (after GC): 2.07 MB

####################################################
RUN 2: naive readline2 async iteration via array of lines profiling started
Read 128557 lines
RUN 2: naive readline2 async iteration via array of lines profiling finished in 30ms
RUN 2: Current memory usage (before GC): 3.15 MB
RUN 2: Current memory usage (after GC): 2.07 MB

####################################################
RUN 3: naive readline2 async iteration via array of lines profiling started
Read 128557 lines
RUN 3: naive readline2 async iteration via array of lines profiling finished in 31ms
RUN 3: Current memory usage (before GC): 3.33 MB
RUN 3: Current memory usage (after GC): 2.09 MB

Let me know if this is more helpful or if there's any other checks you'd like me to run.

mcollina · 2020-02-28T14:14:23Z

(Btw did you mean this extra readable stream here when you said transform

Yes.

Some notes:

the increase in memory size is expected. Async Iteration spans upon multiple micro ticks that streams currently avoids.
the first implementation is probably wrong, as it's using pause() and it should really not be used in that way from within _transform.
the second implementation is doing nothing with this.buffer.

One thing you might want to try is to ditch the Transform stream completely and async iterate the fs.createReadStream() and do the event-amplification there. This would give us a hint if it's our implementation or it's a cost of the mechanism.

alubbe · 2020-02-28T16:35:40Z

Regarding 2., I actually first wrote a version with pause() and the numbers came out the same.

Regarding your suggestion, could you write some pseudo code of what you have in mind? Are you thinking of an async iterator that yields each produced line vs. one that yields an array of lines?

mcollina · 2020-02-28T16:38:23Z

Something like:

for await (let chunk of fs.createReadStream(file) {
  const lines = ...
  for (let line of lines) {
    yield line
  }
}

alubbe · 2020-03-25T12:09:55Z

I've finally found some time to revisit this - and in short, the above is indeed a lot slower than yielding the lines array and letting the consumer iterate over it. I strongly suspect that it's the fact that for await of does not/can not run synchronously when it has the chunks available to do so. Any ideas on how to achieve a fast performance without yielding arrays?

Here is what I benchmarked (all files are in the repo):

const fs = require('fs');
const NaiveReadline = require('./NaiveReadline');
const NaiveReadline2 = require('./NaiveReadline2');
const runProfiling = require('./runProfiling');

const lineEnding = /\r?\n|\r(?!\n)/;
(async () => {
  await runProfiling('naive readline stream interface', () => new Promise((resolve, reject) => {
    const rl = new NaiveReadline();
    let i = 0;
    rl.on('data', (line) => {
      i += 1;
    });

    rl.on('error', reject);

    rl.on('end', () => {
      console.log(`Read ${i} lines`);
      resolve();
    });

    fs.createReadStream('big.txt').pipe(rl);
  }));

  await runProfiling('naive readline async iteration', async () => {
    async function* readline(stream) {
      let buffer = '';
      for await (let chunk of stream) {
        chunk = buffer + chunk;
        const lines = chunk.split(lineEnding);
        buffer = lines.pop();
        for (const line of lines) {
          yield line
        }
      }
    }

    let i = 0;
    for await (const line of readline(fs.createReadStream('big.txt'))) {
      i += 1;
    }
    console.log(`Read ${i} lines`);
  });

  await runProfiling('naive readline2 async iteration via array of lines', async () => {
    async function* readline2(stream) {
      let buffer = '';
      for await (let chunk of stream) {
        chunk = buffer + chunk;
        const lines = chunk.split(lineEnding);
        buffer = lines.pop();
        yield lines;
      }
    }

    let i = 0;
    for await (const lines of readline2(fs.createReadStream('big.txt'))) {
      for (const line of lines) {
        i += 1;
      }
    }
    console.log(`Read ${i} lines`);
  });

  await runProfiling('naive readline3 async iteration via an iterator', async () => {
    async function readline3(stream, iterator) {
      let buffer = '';
      for await (let chunk of stream) {
        chunk = buffer + chunk;
        const lines = chunk.split(lineEnding);
        buffer = lines.pop();
        for (const line of lines) {
          iterator(line)
        }
      }
    }

    let i = 0;
    await readline3(fs.createReadStream('big.txt'), line => {
      i += 1;
    })

    console.log(`Read ${i} lines`);
  });
})();

And here are the benchmarking results:

####################################################
WARMUP: Current memory usage: 1.91 MB
WARMUP: naive readline stream interface profiling started
Read 128457 lines
WARMUP: naive readline stream interface profiling finished in 94ms
WARMUP: Current memory usage (before GC): 2.87 MB
WARMUP: Current memory usage (after GC): 1.99 MB

####################################################
RUN 1: naive readline stream interface profiling started
Read 128457 lines
RUN 1: naive readline stream interface profiling finished in 42ms
RUN 1: Current memory usage (before GC): 5.93 MB
RUN 1: Current memory usage (after GC): 2.02 MB

####################################################
RUN 2: naive readline stream interface profiling started
Read 128457 lines
RUN 2: naive readline stream interface profiling finished in 37ms
RUN 2: Current memory usage (before GC): 2.27 MB
RUN 2: Current memory usage (after GC): 2.26 MB

####################################################
RUN 3: naive readline stream interface profiling started
Read 128457 lines
RUN 3: naive readline stream interface profiling finished in 34ms
RUN 3: Current memory usage (before GC): 2.61 MB
RUN 3: Current memory usage (after GC): 2.27 MB

####################################################
WARMUP: Current memory usage: 2.27 MB
WARMUP: naive readline async iteration profiling started
Read 128457 lines
WARMUP: naive readline async iteration profiling finished in 78ms
WARMUP: Current memory usage (before GC): 2.71 MB
WARMUP: Current memory usage (after GC): 2.27 MB

####################################################
RUN 1: naive readline async iteration profiling started
Read 128457 lines
RUN 1: naive readline async iteration profiling finished in 82ms
RUN 1: Current memory usage (before GC): 4.98 MB
RUN 1: Current memory usage (after GC): 2.3 MB

####################################################
RUN 2: naive readline async iteration profiling started
Read 128457 lines
RUN 2: naive readline async iteration profiling finished in 62ms
RUN 2: Current memory usage (before GC): 5.49 MB
RUN 2: Current memory usage (after GC): 2.31 MB

####################################################
RUN 3: naive readline async iteration profiling started
Read 128457 lines
RUN 3: naive readline async iteration profiling finished in 60ms
RUN 3: Current memory usage (before GC): 5.32 MB
RUN 3: Current memory usage (after GC): 2.3 MB

####################################################
WARMUP: Current memory usage: 2.29 MB
WARMUP: naive readline2 async iteration via array of lines profiling started
Read 128457 lines
WARMUP: naive readline2 async iteration via array of lines profiling finished in 27ms
WARMUP: Current memory usage (before GC): 3.43 MB
WARMUP: Current memory usage (after GC): 2.33 MB

####################################################
RUN 1: naive readline2 async iteration via array of lines profiling started
Read 128457 lines
RUN 1: naive readline2 async iteration via array of lines profiling finished in 27ms
RUN 1: Current memory usage (before GC): 3.61 MB
RUN 1: Current memory usage (after GC): 2.31 MB

####################################################
RUN 2: naive readline2 async iteration via array of lines profiling started
Read 128457 lines
RUN 2: naive readline2 async iteration via array of lines profiling finished in 46ms
RUN 2: Current memory usage (before GC): 3.27 MB
RUN 2: Current memory usage (after GC): 2.32 MB

####################################################
RUN 3: naive readline2 async iteration via array of lines profiling started
Read 128457 lines
RUN 3: naive readline2 async iteration via array of lines profiling finished in 24ms
RUN 3: Current memory usage (before GC): 3.6 MB
RUN 3: Current memory usage (after GC): 2.37 MB

####################################################
WARMUP: Current memory usage: 2.32 MB
WARMUP: naive readline3 async iteration via an iterator profiling started
Read 128457 lines
WARMUP: naive readline3 async iteration via an iterator profiling finished in 26ms
WARMUP: Current memory usage (before GC): 10.84 MB
WARMUP: Current memory usage (after GC): 2.35 MB

####################################################
RUN 1: naive readline3 async iteration via an iterator profiling started
Read 128457 lines
RUN 1: naive readline3 async iteration via an iterator profiling finished in 25ms
RUN 1: Current memory usage (before GC): 10.48 MB
RUN 1: Current memory usage (after GC): 2.36 MB

####################################################
RUN 2: naive readline3 async iteration via an iterator profiling started
Read 128457 lines
RUN 2: naive readline3 async iteration via an iterator profiling finished in 24ms
RUN 2: Current memory usage (before GC): 10.98 MB
RUN 2: Current memory usage (after GC): 2.36 MB

####################################################
RUN 3: naive readline3 async iteration via an iterator profiling started
Read 128457 lines
RUN 3: naive readline3 async iteration via an iterator profiling finished in 34ms
RUN 3: Current memory usage (before GC): 10.98 MB
RUN 3: Current memory usage (after GC): 2.37 MB

mcollina · 2020-03-25T17:20:49Z

Have you tried to use yield* lines? That should transfer control to the iterator of lines.

Overall you are correct: for await is always asynchronous for each chunk (one microtick) and it seems that causes significant issues in your specific case.

alubbe · 2020-03-25T17:30:26Z

Unfortunately, yield* lines is even slower than all the other alternatives.

Going back to the original question of this issue, is this a fundamental thing or is it something where the spec allows a type of micro-tick optimization that v8 could implement, similar to https://v8.dev/blog/fast-async ?

mcollina · 2020-03-25T19:12:22Z

The vast majority of the time is spent inside V8 and it is due to the overhead of yield a lot of data. I hope that an optimization could be done on their side.

As you can see in https://upload.clinicjs.org/public/6e4f252b53ea9834559fff3bc24d62283900d7a565cb99ef9bdae1c459deaf15/28374.clinic-flame.html#selectedNode=50&zoomedNode=&exclude=8000&merged=true, a signficant amount of time is spent in managing the generators.

Here is a flamegraph done with linux perf:
flamegraph.html.gz

As you can see most of the overhead is in the the logic V8 uses to handle async generators :/.

cc @nodejs/v8, do you think there is any optimization possible?

devsnek · 2020-03-25T19:35:09Z

yield* in async generators is pretty complex, and has to perform a lot of awaits. I think its unlikely you'd get better performance when using it.

alubbe · 2020-03-26T06:49:56Z

yeah, yield* is slow, but can we get these two to be roughly the same speed?

// 1 - every chunk gets yielded individually, easy to use, currently a lot slower than 2
for await (const chunk of iterable) { ... }

// 2 - chunks get yielded as arrays as they are produced, forces the consumer to write a second for loop, but much faster
for await (const chunks of iterable) {
  for (const chunk of chunks) { ... }
}

alubbe · 2020-03-27T05:47:33Z

I had a couple of more thoughts this morning:

I think this issue might affect a lot of use cases because it comes into play any time we take something that is 'actually' async like reading chunks from a file system or the network and then amplify it synchronously in any way, such as decompression, parsing strings into structured objects (xml, json, ...), collecting certain keywords, etc.. It does not feel like something users would write themselves a lot, but lower level libraries probably will and users will end up just thinking for-await-of is slow, I have to go back to writing stream handlers myself, thereby missing out on the control flow improvements of for-await-of (such as using await in the loop block, i.e. delaying the next iteration until the completion of something async).
If this turns out to be a fundamental issue (i.e., for await of must schedule each run to the microtask queue instead of iterating through already resolved items synchronously), then library authors could start exporting a generator to their consumers that returns an array of items and then also exporting the stream interface which is derived from that generator:

export async function* transformSomeStreamGenerator(stream, opts) {...}
export async function transfromSomeStream(stream, opts) {
  return Readable.from((function*() {
    for await (const chunks of transformSomeStreamGenerator(stream, opts)) {
      for (const chunk of chunks) {
        yield chunk
      }
    }
  })())
}

This way transfromSomeStream can be used for piping and transformSomeStreamGenerator could be used for high-performance for-await-of, where the consumer is expected to write the for (const chunk of chunks) part themselves.

Any thoughts on this?

mcollina · 2020-03-27T11:56:35Z

You are missing a few async keywords in your example.

In the 2) case, I can't see if it will have any perf benefit at all.

alubbe · 2020-03-27T12:02:37Z

You are missing a few async keywords in your example.

Added ;)

In the 2) case, I can't see if it will have any perf benefit at all.

It would allow two ways of using it:

// 1 - every chunk gets yielded individually, easy to use
// currently a lot slower than 2
for await (const chunk of transfromSomeStream(stream)) { ... }

// 2 - chunks get yielded as arrays as they are produced, 
// forces the consumer to write a second for loop, but much faster
for await (const chunks of transformSomeStreamGenerator(stream)) {
  for (const chunk of chunks) { ... }
}

Happy to write another benchmark for this - but again, this would assume there is no way for v8 to achieve something close to parity. Can anyone comment on that?

mcollina · 2020-03-27T13:04:56Z

I don't understand why you are using Readable.from() in transfromSomeStream.

alubbe · 2020-03-30T20:16:11Z

Sure, let me give you some context. I'm re-writing the internals of exceljs, throwing out streams and replacing them with generators (sync wherever possible, async when needed). However, for backwards compatibility, we still need to expose a stream-like, eventemitter-based interface to our consumers.

So, our main logic is now implemented in a parse function (https://github.com/exceljs/exceljs/pull/1135/files#diff-2fe0207cb76e1f94add746d745867d8eR103), which is an async generator. Because of the issues discussed here, it yields an array of events instead of yielding each event by itself (https://github.com/exceljs/exceljs/pull/1135/files#diff-2fe0207cb76e1f94add746d745867d8eR346).
Based on that I refactored the read function to just call parse, synchronously loop through the events array and emit each event individually (i.e., keep the existing functionality): https://github.com/exceljs/exceljs/pull/1135/files#diff-2fe0207cb76e1f94add746d745867d8eR90-R100
As you can see, for legacy reasons, this is not exactly a streams3 implementation, but if it had been (i.e., if exposing a streams3 implementations wasn't breaking backwards compatibility), I would have used

export async function read(stream, opts) {
  return Readable.from((function*() {
    for await (const chunks of parse(stream, opts)) {
      for (const chunk of chunks) {
        yield chunk
      }
    }
  })())
}

Now, my worry is that the consumers will start using for-await on read rather than parse, because it's nicer/more ergonomic to receive each event individually, rather than an array of events. And then at some point down the line their code runs on slowly on production and they forget about not having used parse, blaming the library or node.js. So in short, I would like to have for-await be equally fast on read and parse (equivalently, have parse yield each event individually).

To re-state my question, is there any way for v8 optimizations to achieve parity or something close to it? I.e., can we somehow avoid paying the full price of constantly going back to the microtask/event queue on each iteration, when we already have the next element and could just do the iteration immediately?

devsnek · 2020-11-24T14:36:01Z

es.discourse.group is an official tc39 forum full of delegates. if someone there is interested I'm sure they will offer.

conartist6 · 2020-11-24T14:38:11Z

I'll take that as a no. Ah well, thanks for the fast response.

alubbe · 2020-11-24T18:05:55Z

I can't be your champion, but your cheerleader ;) What you're proposing is exactly what I was after, because in my investigations here and on the v8 issue it became clear to me that we're leaving performance on the table, but we'd probably need to fix/improve this at the language level. I lack the knowledge to write this up, but I'm going to follow the discussions on https://es.discourse.group/t/syncandasynciterator/554

Thank you for taking this up!

LongTengDao · 2021-02-12T17:13:55Z

if for await run next next sync while object returned by next() is not a promise, things will be perfect.

though it's hopeless, I think it's a problem on es layer, not node layer...

conartist6 · 2021-02-12T19:20:49Z

That's essentially my line of thinking, and I agree that the problem is on the ES layer. I don't think you can change the semantics of the existing for-await syntax through because it explicitly turns sync iterators into async iterators. The semantics you suggest would be a breaking change in that situation. If you change them to explicitly exclude sync iterators in order to avoid the breaking change, then the API feels inconsistent to me. That is still a variant that could be considered, if anyone seemed to be considering the proposal. I'm hoping to create more demand for a solution soon.

Farenheith · 2021-11-17T10:14:56Z

Hello, @alubbe!

I don't know if this can be of any help, but I was making some tests with your benchmark trying to improve the for await performance somehow, and I came up with this code (still not faster than just listening to the events):

function getAsyncIterable(rl) {
  return {
    [Symbol.asyncIterator]() {
      let onError;
      let onClose;
      let onLine;
      let queue = {};
      let error;
      onError = (value) => {
        rl.off('close', onClose);
        rl.off('line', onLine);
        error = value;
      };
      onClose = () => {
        rl.off('error', onError);
        rl.off('line', onLine);
        queue = undefined;
      };
      onLine = (value) => {
        if (queue) {
          const node = { value };
          if (queue.last) {
            queue.last = queue.last.next = node;
          } else {
            queue.last = queue.next = node;
          }
        }
      };
      rl.on('line', onLine);
      rl.once('error', onError);
      rl.once('close', onClose);
      function next() {
        if (!queue) {
          return { done: true };
        }
        if (error) {
          throw error;
        }
        if (queue.next) {
          const { value } = queue.next;
          queue.next = queue.next.next;
          if (!queue.next) {
            queue.last = undefined;
          }
          return {
            value
          };
        } else {
          // If there's no element on the queue, I deactivate the queue filling and will just create a promise
         // That resolves with the next line event
          rl.off('line', onLine);
          return new Promise((resolve, reject) => {
            let onErrorOnce;
            let onCloseOnce;
            let onLineOnce;
            onErrorOnce = (value) => {
              rl.off('close', onCloseOnce);
              rl.off('line', onLineOnce);
              reject(value);
            };
            onCloseOnce = () => {
              rl.off('error', onErrorOnce);
              rl.off('line', onLineOnce);
              resolve({ done: true });
            };
            onLineOnce = (value) => {
              rl.off('close', onCloseOnce);
              rl.off('error', onErrorOnce);
              // Just before returning the listened value, I re-enable the queue processing
              // so no message is lost
              rl.on('line', onLine);
              resolve({ value });
            };
            rl.once('line', onLineOnce);
            rl.once('error', onErrorOnce);
            rl.once('close', onCloseOnce);
          });
        }
      }
      return {
        next
      };
    }
  };
}

await runProfiling('readline manual readline async iterable', async () => {
  const rl = readline.createInterface({
      input: fs.createReadStream('big.txt'),
    });
    const iterable = getAsyncIterable(rl)

    let i = 0;
    for await (const line of iterable) {
      i += 1;
    }
    
    console.log(`Read ${i} lines`);
});

This actually creates an iterable based on event listening, which was an idea the I took from here. With this code, I could get an iteration with duration of 49ms, opposing to the for await over rl, which took 78ms, but still slower than the event listening, 38ms.

conartist6 · 2021-11-22T18:20:05Z

@Farenheith That's interesting, but as far as I know it's not really the problem being discussed here. This thread is about the perf cost of the for await of loop itself, which is a very minor part of the cost of executing your benchmark. Converting from a stream to an iterable has moderate cost, but it need only be incurred once.

Basically you are testing stream -> iterable -> for..await and we're discussing something more like stream -> iterable -> for await -> for await -> for await -> for await -> ....

This is because the functional style often involves a chain of small incremental modifications, at which point the controlling factor for speed is the actual cost of for await.

conartist6 · 2021-11-22T21:05:10Z

Well sorry it is being discussed. I'm just not thinking about that as the most major problem, but you weren't talking to me anyway!

Farenheith · 2021-12-12T20:45:28Z

@conartist6 hello! Well, yes, mate. It was discussed, thank you for noticing it.

I agree that it's not the most important issue, as the overhead that multiples for-await-of put can affect a lot of other cases. But in this specific issue that @alubbe pointed out, this is relevant. I think, maybe, there's also a problem with the readline's Symbol.asyncIterator implementation, as with the code above I could get a performance 37% better than with the vanilla one.

About the proposal you opened, @conartist6 I really hope this someday got implemented! We have heavy use of async iterables in the place I work and I also notice some time ago that we could benefit if the for-await-of could run the code synchronously when it receives a sync result from next. I created even a package to have that functionality, and I just uploaded it with a benchmark test based on @alubbe code and some code improvements. You can see the results here if you want: https://github.com/Codibre/augmentative-iterable/runs/4499575249?check_suite_focus=true

You can notice that, while the iteration over a vanilla readline async iterator takes 97ms (readline async iteration profiling started), iterating over the Symbol.asyncIterator implementation I put above took 59ms (readline manual readline async iterable profiling started), iterating using the package, though, took 48ms (readline augmentative-iterable profiling started).

As I implemented the logic in JavaScript and already got some performance improvement, if your proposal is implemented directly in the V8, then this could be even better!

But again, I think there are some improvements that could be done in the readline's Symbol.asyncIterator implementation, otherwise, I couldn't get a performance boost with the code above (or maybe there's some bug on my code)

conartist6 · 2021-12-13T17:26:55Z

I'm still sort of working on tackling this. My plan is to implement iter-tools with generators but then create custom transpilation to compile the generators down to classes. The classes would support the standard iterator.next, but also an iterator.nextValue() method which would be the equivalent of iterator.next().value. This method would return a sentinel like Symbol.for('@iter-tools/done') to denote the termination of a sequence of values, and would also flatten promises values (i.e. wait for them).

All I have to do then is ensure that my own transforms prefer the nextValue interface, write some docs on perf, draw up some benchmarks, and start telling people that they should be using this. If it takes off, my hope is that that would generate the necessary interest in making this the sanctioned approach (that is to say, advancing my proposal).

conartist6 · 2021-12-15T22:14:55Z

Actually now that I look more carefully at your code, that's basically what it's doing. I can't interpret the benchmark results though. I'd be very curious to see how it fares in the benchmark in fluent-iterable itself. Incidentally I sent you a PR to include iter-tools in that benchmark too.

mcollina · 2021-12-16T07:53:18Z

But again, I think there are some improvements that could be done in the readline's Symbol.asyncIterator implementation, otherwise, I couldn't get a performance boost with the code above (or maybe there's some bug on my code)

If there are performance optimizations to be done in the readline module, please open a PR. I don't think the current implementation was benchmarked in any way. Please tag me on them, I'll be happy to provide guidance/review.

Farenheith · 2021-12-16T10:22:09Z

@mcollina thank you for suggesting it! I'll be happy to do that. I'll try it this weekend

Farenheith · 2021-12-21T14:28:42Z

Hello, @mcollina!

I've been studying the issue and created even a package to test the concept I had in mind: https://github.com/Farenheith/faster-readline-iterator.

It performed very well, in some cases it's 50% faster than the original async iterator, but I don't think I can implement something like that on NodeJs without introducing a breaking change, so I'm giving up lol.
The problem is that the original async iterator has a property called stream, where the Readable generated for the operation is assigned, but my version doesn't generate a Readable, so I can't possibly assign that.

Some tests have broken because of that:

=== release test-readline-async-iterators ===
Path: parallel/test-readline-async-iterators
--- stdout ---
Mismatched noop function calls. Expected exactly 1, actual 0.
...
    at Object.<anonymous> (/home/runner/work/_temp/node-v18.0.0-nightly2021-12-21738597a83c/test/parallel/test-readline-async-iterators.js:77:43)
...
=== release test-readline-async-iterators-backpressure ===
Path: parallel/test-readline-async-iterators-backpressure
--- stderr ---
/home/runner/work/_temp/node-v18.0.0-nightly2021-12-21738597a83c/test/parallel/test-readline-async-iterators-backpressure.js:21
  const highWaterMark = it.stream.readableHighWaterMark;
                                  ^

TypeError: Cannot read properties of undefined (reading 'readableHighWaterMark')
    at /home/runner/work/_temp/node-v18.0.0-nightly2021-12-21738597a83c/test/parallel/test-readline-async-iterators-backpressure.js:21:35
...
Node.js v18.0.0-pre
Command: out/Release/node /home/runner/work/_temp/node-v18.0.0-nightly2021-12-21738597a83c/test/parallel/test-readline-async-iterators-backpressure.js
=== release test-readline-async-iterators-destroy ===
Path: parallel/test-readline-async-iterators-destroy
--- stdout ---
Mismatched noop function calls. Expected exactly 1, actual 0.
...
    at Object.<anonymous> (/home/runner/work/_temp/node-v18.0.0-nightly2021-12-21738597a83c/test/parallel/test-readline-async-iterators-destroy.js:90:57)

I think I got the backpressure part covered so I could even rewrite that test, but the lack of the stream property can cause unexpected behaviors.
It was fun to go deep into it, though, so thank you for instigating me. I could understand better why my version has a performance boost: it doesn't have the overload of the use of a Readable and a generator used in the original code.
Still, the performance is not as good as just listening to the events, and probably nothing will be.

I also uploaded a version of @alubbe benchmark with two new cases:
https://github.com/Farenheith/readline-benchmark/runs/4595820661?check_suite_focus=true

faster readline async iteration: where I use the package faster-readline-iterator. In this case, I got a performance 38% better than using the original async iterator, but 48% worst than just listening to the events;
faster readline async augmentative iteration: here I use faster-readline-iterator and also augmentative-iterable, which optimizes the iteration for an async iterator that can yield non promise values. Here I got a result 5 to 10% better than the faster readline async iteration on average, so it would probably be good if implemented directly on the V8, but I'm really not going into that lol.

mcollina · 2021-12-21T16:58:04Z

It performed very well, in some cases it's 50% faster than the original async iterator, but I don't think I can implement something like that on NodeJs without introducing a breaking change, so I'm giving up lol.

I think a minimal breaking change (dropping .stream) would totally be ok for a +50% increase in throughput. Don't give up, we ship major changes often enough.

Farenheith · 2021-12-22T02:48:06Z

@mcollina thank you for your support on it. I opened the PR #41276.

alubbe · 2021-12-22T06:13:43Z

I just wanted to say that it's awesome that we have some many performance hungry people in the community that almost 2 years after this issue was raised we're still making progress here - so thanks to all of you here and happy holidays!

It's likely that Deno has the same problem as the following issue that exists in Node. nodejs/node#31979 So we tried to avoid this problem by using `AsyncGenerator<T[]>` instead.

clshortfuse · 2022-03-23T02:11:27Z

Tangently related, async () => return Promise() has a pretty harsh penalty. If you use the await syntax, make sure you don't return a Promise. Return the promise result with await. It may not affect the benchmarks much, but I've spotted some Promise objects returned in the code sample. Returning a Promise forces 3 ticks (microtasks) instead of the minimum of 1.

I have a verbose breakdown of the differences and reasoning here: https://stackoverflow.com/a/70979225/1426683

I find the @typescript-eslint/require-await ESLint rule useful for enforcing this.

conartist6 · 2023-07-31T20:37:00Z

So I've been tinkering with the design of coroutine-based async/streaming parsers for the last while now. When I proposed these changes I was unable to find a champion, but I don't think I had yet made the strongest possible case for how important this functionality is.

I've now designed and built a streaming parser framework based on co-recursion. I think there will be great demand for this implementation, but I also think that the people adopting the technology will quickly face this problem. For the moment I know how to deliver a hacked-up fix, but I still really think the proposal to amend the language has strong merit and I would loooove to find someone willing to take up the process at all.

alubbe · 2023-08-01T12:24:44Z

Could you share some details on what you implemented, what the performance differences still are to using streams and events, and what the hack is that you're referring to?

conartist6 · 2023-08-01T13:17:31Z

To start with, the hack is implementing semi-sync iterables without support from the language. I now control a large enough ecosystem that if my tools support those kinds of streams they will have an ample arrays of sources and sinks, making them usable in a practical sense. But of course if there's no standard, I'm likely to create a mess because what I'll be creating is a "transitional" way to use semi-sync iterables something like iterable[Symbol.for('custom-semi-sync-iterator')] (which I have implemented as a proof of concept).

In terms of the difference to streams and events, the most major difference is that data flow is in the complete opposite direction. Instead of data driving callbacks, data processing drives consumption of data. I consider this to be "uninverted" control flow because this is the way that we have all the facilities of structured exception handling. With inverted flow exceptions bubble up to the async callback that received a chunk of the file from disc, and that handler likely has very little context about the application!

Another win that derives from keeping control flow right-side-up is that we gain extensibility. Because the caller is consuming productions from the input stream (based on current state if necessary), you can extend a language using facilities that are remarkably similar to class extension.

pauldraper · 2024-02-17T19:06:02Z

So we've come up with this 'hack' where instead of emitting one-by-one all of these chunks/events that our transform produces, we now gather them up in an array and emit that once

Async iterable is slower than sync iterable because.........async is slower than sync.

Iteration is pretty much irrelevant, other than iteration implies it happens lots of times.

I think chunks of data (and iterating synchronously over each chunk) is the right approach, or at least one that I've accepted.

I've now designed and built a streaming parser framework based on co-recursion

You can design everything around generators. You set up a generator framework (not shown) that lets you write stuff like:

function *parseLines(source, dest) {
  let buffer = "";
  while (true) {
    const next = yield source.read();
    if (next === undefined) {
      break;
    }
    const parts = (buffer + next).split("\n");
    for (const part of parts.slice(-1)) {
      yield dest.write(part);
    }
    buffer = parts[parts.length - 1];
  }
  if (buffer) {
    yield dest.write();
  }
}

Where yield is essentially await, but it doesn't go a microtask queue, and it doesn't maintain stack traces.

devsnek added performance Issues and PRs related to the performance of Node.js. question Issues that look for answers. stream Issues and PRs related to the stream subsystem. labels Feb 27, 2020

alubbe mentioned this issue Mar 27, 2020

Stream API with cancellation lddubeau/saxes#32

Open

Farenheith mentioned this issue Nov 16, 2021

Performance improvement for faster iterations rocket-connect/for-emit-of#23

Closed

pieh mentioned this issue Jan 22, 2022

GC behaviour in sync execution kriszyp/weak-lru-cache#4

Open

conartist6 mentioned this issue Jan 27, 2022

How to mark double-ended iterator tc39/proposal-deiter#5

Open

lambdalisue mentioned this issue Feb 20, 2022

Improve performance (throughput) Shougo/ddu-source-file_rec#4

Merged

slvrtrn mentioned this issue Oct 3, 2022

Remove split2, refactor Rows for better performance ClickHouse/clickhouse-js#108

Closed

metrue mentioned this issue Nov 26, 2023

feat: next() without await honojs/hono#1685

Draft

3 tasks

conartist6 mentioned this issue May 11, 2024

Concurrent vs Stream Async Iterables tc39/proposal-async-iterator-helpers#20

Closed

Performance of for await of (async iteration) #31979

Performance of for await of (async iteration) #31979

Comments

alubbe commented Feb 27, 2020

alubbe commented Feb 27, 2020

devsnek commented Feb 27, 2020 • edited Loading

lpinca commented Feb 27, 2020 • edited Loading

mcollina commented Feb 27, 2020

alubbe commented Feb 28, 2020

alubbe commented Feb 28, 2020

alubbe commented Feb 28, 2020 • edited Loading

mcollina commented Feb 28, 2020

alubbe commented Feb 28, 2020 • edited Loading

alubbe commented Feb 28, 2020 • edited Loading

mcollina commented Feb 28, 2020

alubbe commented Feb 28, 2020

mcollina commented Feb 28, 2020

alubbe commented Feb 28, 2020

mcollina commented Feb 28, 2020

alubbe commented Mar 25, 2020 • edited Loading

mcollina commented Mar 25, 2020

alubbe commented Mar 25, 2020

mcollina commented Mar 25, 2020

devsnek commented Mar 25, 2020

alubbe commented Mar 26, 2020

alubbe commented Mar 27, 2020 • edited Loading

mcollina commented Mar 27, 2020

alubbe commented Mar 27, 2020 • edited Loading

mcollina commented Mar 27, 2020

alubbe commented Mar 30, 2020

devsnek commented Nov 24, 2020

conartist6 commented Nov 24, 2020

alubbe commented Nov 24, 2020

LongTengDao commented Feb 12, 2021 • edited Loading

conartist6 commented Feb 12, 2021

Farenheith commented Nov 17, 2021

conartist6 commented Nov 22, 2021

conartist6 commented Nov 22, 2021

Farenheith commented Dec 12, 2021 • edited Loading

conartist6 commented Dec 13, 2021 • edited Loading

conartist6 commented Dec 15, 2021

mcollina commented Dec 16, 2021

Farenheith commented Dec 16, 2021

Farenheith commented Dec 21, 2021 • edited Loading

mcollina commented Dec 21, 2021

Farenheith commented Dec 22, 2021

alubbe commented Dec 22, 2021

clshortfuse commented Mar 23, 2022 • edited Loading

conartist6 commented Jul 31, 2023

alubbe commented Aug 1, 2023

conartist6 commented Aug 1, 2023 • edited Loading

pauldraper commented Feb 17, 2024 • edited Loading

devsnek commented Feb 27, 2020 •

edited

Loading

lpinca commented Feb 27, 2020 •

edited

Loading

alubbe commented Feb 28, 2020 •

edited

Loading

alubbe commented Feb 28, 2020 •

edited

Loading

alubbe commented Feb 28, 2020 •

edited

Loading

alubbe commented Mar 25, 2020 •

edited

Loading

alubbe commented Mar 27, 2020 •

edited

Loading

alubbe commented Mar 27, 2020 •

edited

Loading

LongTengDao commented Feb 12, 2021 •

edited

Loading

Farenheith commented Dec 12, 2021 •

edited

Loading

conartist6 commented Dec 13, 2021 •

edited

Loading

Farenheith commented Dec 21, 2021 •

edited

Loading

clshortfuse commented Mar 23, 2022 •

edited

Loading

conartist6 commented Aug 1, 2023 •

edited

Loading

pauldraper commented Feb 17, 2024 •

edited

Loading