Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: progress bar for iroh add command #368

Merged
merged 9 commits into from
Oct 20, 2022

Conversation

faassen
Copy link
Contributor

@faassen faassen commented Oct 19, 2022

This introduces a progress bar for the iroh add command. It touches the following layers:

  • the various add_dir, add_file functions in iroh::resolver::unixfs_builder now return a stream of AddEvents

  • these are exposed in the iroh-api where they are mockable. The iroh-api also provides a higher level method to resolve a AddEvent stream to a CID

  • iroh::size introduces a way to calculate total size of each file on the filesystem

  • The iroh add command in the iroh crate then calculates the total size with a spinner, and then uses this to show a progress bar.

Note that indicatif, the progress bar library, allows a lot of tweaking of what we see, both color, characters and text. I made a selection to get started. Suggestions for tweaks of what the bar looks like are welcome.

I will create a follow-up PR where I add a "current file" indicator in the progress bar, but I think this is already major progress so didn't want to hold back merging. Update Getting current file information requires quite a lot of changes inside unixfs_builder with some open questions, so I am going defer the creation of this follow-up PR until later and focus on earlier wins.

@faassen faassen requested review from rklaehn, b5 and ramfox October 19, 2022 13:21
@faassen faassen marked this pull request as ready for review October 19, 2022 13:26
@faassen
Copy link
Contributor Author

faassen commented Oct 19, 2022

I've experimented using try_flatten_stream() to simplify the result of the add_file, add_dir, etc methods in the API. But as discussed with @rklaehn this does hide a real difference between not being able to initiate the stream versus having an error on the stream so I think I'll stay away from that for now.

@faassen
Copy link
Contributor Author

faassen commented Oct 19, 2022

Note to self: I need to check whether I really caught all call sites of add_file, I think I may have missed one in the gateway. Need to provide a consuming version for that place and benchmark code.

Edit: I checked and this is a false alarm; this add_file isn't used anywhere else. So it's still good to go.

@rklaehn
Copy link
Contributor

rklaehn commented Oct 20, 2022

Trying this out right now. The first thing I noticed is that adding is pretty slow. But I guess that is not this PR's fault...

image

@rklaehn
Copy link
Contributor

rklaehn commented Oct 20, 2022

At least on the mac, things start getting weird if you resize the terminal window to a smaller size than the width of the text.

image

If your terminal is already less wide than the text on startup, the delete does not work and you get one line per update:

image

Not sure if you can do anything about that, just wanted to mention it...

I guess we should assume 80 characters per line, and never exceed that to avoid running into such problems. If somebody has a tiny terminal they just have to live with it.

Is it possible to disable this in this case?

@rklaehn
Copy link
Contributor

rklaehn commented Oct 20, 2022

So as far as the API changes are concerned:

  • I have to say that the async_trait output looks horrible in docs
  • I am not sure about returning LocalBoxFuture / LocalBoxStream in the api. That is going to make things quite annoying to use from tokio etc. What's the actual reason for that?

/// An event on the add stream
pub enum AddEvent {
/// Delta of progress in bytes
ProgressDelta(u64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned yesterday, if this event could be made self-contained you could use something like a tokio watch channel to drop progress updates. https://docs.rs/tokio/latest/tokio/sync/watch/index.html

But not the end of the world.

It would be nice to have a bit more information here. E.g. the number of objects and the number of bytes. E.g. if you are adding lots of small files you won't see much progress in terms of bytes, but there is still something happening.

Does not have to happen in this PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get yet how you see using the watch channel. You mean it should keep a count of the total amount of bytes too somewhere here? That doesn't really work well with the progress bar inc() approach, and I'd have to introduce code to introduce a total. Transforming a stream seems simpler to me.

By the "number of objects" do you mean the number of files and such? Getting the file path information is something I want to do as I mentioned in the PR, but it's a bit of an effort and I think I should focus on lower hanging fruit for now.

What do you mean by the number of bytes? The number of bytes is reported here in the u64 so I don't get what you mean?

Copy link
Contributor

@rklaehn rklaehn Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by the number of bytes?

Yes, the number of bytes we have, but the number of files would be nice in addition. Just so you could show how many files have been added when adding a large directory full of small files like the linux kernel. But as you said, it can happen later.

I don't really get yet how you see using the watch channel

Imagine drawing the progress information takes some time, and you push a lot of updates over the channel. Then actually drawing the progress info would slow things down. What you would want to do is to "debounce" (I think that's what they call it in frontend land) the drawing so it happens only at a reasonable rate.

But you can not do this because you need every single event in th eupdate channel. But no big deal, you could compute the aggregate and then debounce that stream using tokio watch.

iroh-api/src/api.rs Show resolved Hide resolved
@faassen
Copy link
Contributor Author

faassen commented Oct 20, 2022

Trying this out right now. The first thing I noticed is that adding is pretty slow. But I guess that is not this PR's fault...

image

Yeah, I noticed widely varying performance too. You should try it on iroh/iroh. It takes a surprisingly long time for a few kb, much longer than for other crates which are much bigger, probably because it has a bunch of small trycmd test files.

@faassen
Copy link
Contributor Author

faassen commented Oct 20, 2022

So as far as the API changes are concerned:

* I have to say that the async_trait output looks horrible in docs

* I am not sure about returning LocalBoxFuture / LocalBoxStream in the api. That is going to make things quite annoying to use from tokio etc. What's the actual reason for that?

Why does it make it more annoying to use? What could I return instead? Is the problem that they're Local? I had problems with non-local variants before but it's possible I missed something.

@rklaehn
Copy link
Contributor

rklaehn commented Oct 20, 2022

Why does it make it more annoying to use?

As soon as you are in async land, basically the default case is that your streams and futures can be polled from any thread (aka are Send). Now, this is sometimes not necessary, e.g. if you have a single threaded executor. But this is not captured in the type system. I am suffering from this all week because some parts of the rocksdb binding are not Send, so it is a pain to use in async.

E.g.

fn main() {
  // a local future
  let f = async { "whatever" }.boxed_local();
  //
  tokio::task::spawn(|| async move {
    let x = f.await; // does not work because f is not Send
  });
}

Now take a look at our codebase and you will see that spawning a task is a very frequent thing in typical async code...
image

Whether that is a good thing or not I won't comment on, it's just how it is.

So an API that only has local futures and streams will be an api that people hate...

What could I return instead?

I did a quick check. The non send requirement kinda bubbles up from the bottom, but it is just a few places where you would have to require Send.

Here is a commit that adds Send and friends in a few places to allow the API to be Send: rklaehn@59bc0b5

Now if we should do that I have no idea, I guess we would have to have a longer discussion. But this is what we would have to do.

Ping @dignifiedquire

rklaehn
rklaehn previously approved these changes Oct 20, 2022
Copy link
Contributor

@rklaehn rklaehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the progress bar again. Seems to work well now.

@rklaehn
Copy link
Contributor

rklaehn commented Oct 20, 2022

Let's continue the Send discussion here: n0-computer/beetle#130

@rklaehn rklaehn self-requested a review October 20, 2022 11:37
@faassen faassen merged commit bff3560 into n0-computer:main Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants