Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and optimize fetch_crate_cratesio #180

Merged
merged 65 commits into from
Jun 22, 2022
Merged

Refactor and optimize fetch_crate_cratesio #180

merged 65 commits into from
Jun 22, 2022

Conversation

NobodyXu
Copy link
Member

@NobodyXu NobodyXu commented Jun 12, 2022

Fixed #170 .

Refactor

  • Split API extract_archive_stream: Create separate APIs for bin, zip and other tar-based formats:
    • extract_bin
    • extract_zip
    • extract_tar_based_stream
    • extract_tar_based_stream_and_visit
  • Use visitor pattern (callback) instead of the filter pattern in untar
  • Refactor out a new function create_tar_decoder out of extract_compressed_from_readable: Now extracter only contains 43 lines.
  • Replace download_and_extract_with_filter with download_tar_based_and_visit, a more versatile solution which enables fetch_crate_cratesio to be completely free of disk I/O.
  • Refactor mod drivers and extract submodules from it
  • Refactor and extract out new mod fmt for PkgFmt and add newtype PkgFmtDecomposed and TarBasedFmt.

Optimize

Added

pub trait PathExt {
    /// Similiar to `os.path.normpath`: It does not perform
    /// any fs operation.
    fn normalize_path(&self) -> Cow<'_, Path>;
}

NobodyXu added 28 commits June 11, 2022 20:15
dedicated to different tasks

Signed-off-by: Jiahao XU <[email protected]>
signature of `download_and_extract_with_filter`

Signed-off-by: Jiahao XU <[email protected]>
and `download_tar_based_and_visit`.

By using these two items, we avoid any I/O altogether.
Everything happens in memory, thus there will be no i/o related errors
or cost.

This commit does not regress anything because
`helpers::load_manifest_path` calls `Manifest::from_path_with_metadata`,
which read in the whole `Cargo.toml` file at once.

Thus this commit would not cause any OOM when the the original code
would not.

Signed-off-by: Jiahao XU <[email protected]>
Since mod `cretesio` is the only one need to have access to it.

Signed-off-by: Jiahao XU <[email protected]>
Rm unused param `path` and the unnecessary
`fs::create_dir_all` since the tar will not be extracted to disk.

Signed-off-by: Jiahao XU <[email protected]>
@NobodyXu NobodyXu mentioned this pull request Jun 12, 2022
4 tasks
NobodyXu added 12 commits June 18, 2022 17:29
It wraps a `Stream<Item = Result<Bytes, E>>` and implements `Read` and
`BufRead` on it so that it can be used on sync context.

Signed-off-by: Jiahao XU <[email protected]>
instead of `Result<Archive, BinstallError>`

Signed-off-by: Jiahao XU <[email protected]>
so that if the `io::Error` wraps a `BinstallError`, we would just unwrap
it and return the inner `BinstallError`.

Otherwise, just wrap the `io::Error` in a `BinstallError`.

Signed-off-by: Jiahao XU <[email protected]>
This have the following advantage:
 - Remove the mpsc channel, which:
    - Remove synchronization required for mpsc.
    - Remove the internal buffering of the mpsc channel, which avoid potentially OOM situation.
 - Improve data locality since it no longer needs to be sent over thread.
 - It uses `block_in_place` to avoid creating an additional blocking
   thread.

The disadvantages would be that the downloader can no longer be run in parallel to the extracter.

If the bottleneck is the decompressor, then the downloader should also pause and wait
for the decompressor to consume the data.

But if the bottleneck is the network, then that might be an issue.

Signed-off-by: Jiahao XU <[email protected]>
It was called before because `spawn_blocking` requires that, but we now
switches to `block_in_place` which no longer needs this.

Signed-off-by: Jiahao XU <[email protected]>
Use consistent codestyle for specifing trait bounds.

Signed-off-by: Jiahao XU <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
@NobodyXu NobodyXu marked this pull request as ready for review June 18, 2022 08:56
@NobodyXu
Copy link
Member Author

@passcod The 2rd round of optimization is done!

Now async_extracter only contains 129 lines and extracter contains 43 lines, though the github diff generates a lot of garbage diff.
The diff for driver.rs is also significantly reduced.

Feel free to review at anytime.

Copy link
Member

@passcod passcod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superficial review only, will do a more in-depth one tonight

src/drivers/vfs.rs Outdated Show resolved Hide resolved
src/drivers/visitor.rs Outdated Show resolved Hide resolved
src/errors.rs Show resolved Hide resolved
src/errors.rs Outdated Show resolved Hide resolved
src/fmt.rs Outdated Show resolved Hide resolved
src/fmt.rs Outdated Show resolved Hide resolved
src/helpers.rs Outdated Show resolved Hide resolved
src/helpers.rs Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
Copy link
Member

@passcod passcod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, wrong button.

src/errors.rs Show resolved Hide resolved
@passcod
Copy link
Member

passcod commented Jun 21, 2022

I like it, but I'll do a final pass before approvinyy and merging after a sleep.

@passcod passcod merged commit 3b5ea35 into cargo-bins:main Jun 22, 2022
@NobodyXu NobodyXu deleted the refactor-and-optimize/AsyncExtracter branch June 22, 2022 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: improvement PR that improves existing features or code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AsyncExtracter TODOs
2 participants