-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize remove_dir_all #4172
Comments
Well, I'm not convinced that a parallel |
Why not? |
I mean, for one, there are no other methods in Tokio like this. More importantly, I really don't feel like maintaining an implementation of |
Oh wow, that actually explains some bugs in other projects I've worked on! I wouldn't mind putting a Here's my initial attempt: async fn fast_remove_dir_all(path: &Path) -> io::Result<()> {
let path = path.to_path_buf();
let path = tokio::task::spawn_blocking(|| -> io::Result<Option<PathBuf>> {
let filetype = fs::symlink_metadata(&path)?.file_type();
if filetype.is_symlink() {
fs::remove_file(&path)?;
Ok(None)
} else {
Ok(Some(path))
}
}).await??;
match path {
None => Ok(()),
Some(path) => remove_dir_all_recursive(path).await,
}
}
async fn remove_dir_all_recursive(path: PathBuf) -> io::Result<()> {
let path_copy = path.clone();
let tasks = tokio::task::spawn_blocking(move || -> io::Result<_> {
let mut tasks = Vec::new();
for child in fs::read_dir(&path)? {
let child = child?;
if child.file_type()?.is_dir() {
tasks.push(spawn_remove_dir_all_recursive(&child.path()));
} else {
fs::remove_file(&child.path())?;
}
}
Ok(tasks)
}).await??;
for result in futures::future::join_all(tasks).await {
result??;
}
tokio::task::spawn_blocking(|| {
fs::remove_dir(path_copy)
}).await??;
Ok(())
}
fn spawn_remove_dir_all_recursive(path: &Path) -> JoinHandle<io::Result<()>> {
tokio::task::spawn(remove_dir_all_recursive(path.to_path_buf()))
} It's kind of ugly IMO, but I've seen it be 4X faster than straight
Dunno if you have ideas for improvements? I'm new to tokio, so I'm guessing there's some more efficient way of doing things that I don't know about. PS: I put together this crate to be able to reproducibly create these large file trees. Here were my params: $ ftzz g ../tmp -n 2M -r 1000
$ ftzz g ../tmp -n 1M
$ ftzz g ../tmp -n 10K |
Slightly faster and prettier version: async fn fast_remove_dir_all(path: &Path) -> io::Result<()> {
let path = path.to_path_buf();
let path = tokio::task::spawn_blocking(|| -> io::Result<_> {
let filetype = symlink_metadata(&path)?.file_type();
if filetype.is_symlink() {
remove_file(&path)?;
Ok(None)
} else {
Ok(Some(path))
}
})
.await??;
match path {
None => Ok(()),
Some(path) => spawn_remove_dir_all_recursive(path).await?,
}
}
async fn remove_dir_all_recursive(path: PathBuf) -> io::Result<()> {
let mut tasks = Vec::new();
for child in read_dir(&path)? {
let child = child?;
if child.file_type()?.is_dir() {
tasks.push(spawn_remove_dir_all_recursive(child.path()));
} else {
remove_file(&child.path())?;
}
}
for task in tasks {
task.await??;
}
remove_dir(path)
}
#[inline]
fn spawn_remove_dir_all_recursive(path: PathBuf) -> JoinHandle<io::Result<()>> {
tokio::task::spawn_blocking(|| {
futures::executor::block_on(remove_dir_all_recursive(path))
})
} |
@Darksonn what are the next steps? Is this something that could go into tokio (assuming an alternative to |
I would be ok with adding a link to the docs, however the implementation you have posted here is incorrect. Imagine the following situation: We are deleting a directory with 1000 sub-directories. Each sub-directory has another empty sub-directory. The |
As for |
I'm going to close this. We are not currently interested in the feature because a correct implementation is too difficult, and it should be prototyped in an external crate first. I would be happy to add a link to such an external crate if you write one. |
Sounds good! This is still on my plate, but I probably won't get to it for a few weeks. |
I will note the |
This finally beats the shitty implementation I posted here: tokio-rs/tokio#4172 (comment) Signed-off-by: Alex Saveau <[email protected]>
Finally got around to this if anyone was wondering (haven't officially released yet though): https://github.com/SUPERCILEX/fuc/tree/master/rmz |
The current implementation simply forwards to the stdlib, but this is the one fs operation that can actually take advantage of concurrency. As long as files within the same directory aren't deleted in parallel, the kernel shouldn't be locking anything and you'll get a full speedup.
The text was updated successfully, but these errors were encountered: