-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup Empty Directories in LocalFilesystem #5976
Comments
My personal preference would be an option for empty directories to be automatically cleaned up, as this is closer to the semantics of object stores, and should be a relatively straightforward case of calling std::fs::remove_dir on each parent until it errors. Alternatively we could allow LocalFilesystem::delete to delete directories if that is what the path points to, I'm a little more lukewarm on this as it sort of leaks out of the abstraction. I am not a fan of a delete_prefix method that has subtely divergent behaviour from standard deletion, I suspect users will find this surprising Edit: I've edited the issue title to reflect the desired use-case, not the proposed solution |
delete_prefixed
method to ObjectStore
trait
I think we can do it automatically by changing delete code for local fs a bit: async fn delete(&self, location: &Path) -> Result<()> {
let config = Arc::clone(&self.config);
let path = self.path_to_filesystem(location)?;
maybe_spawn_blocking(move || {
if let Err(e) = std::fs::remove_file(&path) {
Err(match e.kind() {
ErrorKind::NotFound => Error::NotFound { path, source: e }.into(),
_ => Error::UnableToDeleteFile { path, source: e }.into(),
})
} else {
let root = config.root.to_file_path().unwrap();
let mut parent = path.parent();
while let Some(loc) = parent {
if loc != root && std::fs::remove_dir(loc).is_ok() {
parent = loc.parent();
} else {
break;
}
}
Ok(())
}
})
.await
} |
We should definitely make this opt-in to avoid surprises, but ^ sounds good to me |
I'm perfectly fine with doing this internal to
|
Perhaps we could quantify the performance impact of the additional syscalls? I don't disagree that the operation is relatively common, but I also think consistency is very important, if only some methods cleanup directories it makes for a confused UX IMO. Ultimately LocalFilesystem is not aiming to be the fastest filesystem implementation, for that you want to be using OS specific abstractions like io_uring or direct IO, its goal is to provide good enough performance whilst mirroring object store behaviour |
I think it's good to have such a short-cut for list + deleted as additionally to handling empty dirs it can be more efficient in some implementation |
Also, we can keep both approaches together without too much harm to the UX |
IMO the internalized implementation is more confusing UX-wise. I would not necessarily expect deleting a file to also delete its parent directory which is what would happen. And while deleting the empty directories is the concrete issue we are facing, I think the ability to to recursively delete objects under a path makes sense as discrete operation on its own terms. |
Given you clearly feel more strongly on the matter than I do, let's just add a delete_all method with a default implementation using list and delete_stream. |
Added suggested changes in a #5975 |
IMHO, we're talking about two different things here:
|
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently the way to delete a "directory" through the
ObjectStore
interface is to calllist
with the directory path as prefix and then pipe that todelete_stream
.This is the correct way to do this for object storage since "directories" don't really exist and the path is just a prefix to individual objects.
But for actual filesystems directories are actual inodes themselves in the filesystem which should be cleaned up.
Describe the solution you'd like
Add a new method to
ObjectStore
:async fn delete_prefix(&self, prefix: Option<&Path>) -> Result<()>
This default implementation for this method can just be calling
list(prefix)
and then piping that todelete_stream
but in the case ofLocalFilesystem
(and potentially other implementations in the future) will be specialized to also delete directoriesDescribe alternatives you've considered
This could be handled inside
LocalFilesystem
duringdelete_stream
but could become complicated. The implementation would have to:Additional context
The text was updated successfully, but these errors were encountered: