-
Notifications
You must be signed in to change notification settings - Fork 17
Resort to lstat for FS not supporting dirent.d_type #25
Conversation
|
||
if (entry->d_type == DT_DIR) { | ||
iterate_directory(manifest, pathprefix, file->filename, do_hash); | ||
} else if (entry->d_type == DT_UNKNOWN) { /* fall back to lstat() */ | ||
struct stat sb; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
populate_file_struct() calls lstat(); why do we need to call it again here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice this populate_file_struct()
. Why it's needed there then? I presumed it's called for every file in a separate thread (see get_hash()
) to make these computations parallel. Otherwise there's no need for the DT_DIR
optimization in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's populating the 'file' struct with the lstat details. That being the case, wouldn't it be easier to just call
if (S_ISDIR(file.st_mode)) {
and remove the new call to lstat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By referring to get_hash()
I wanted to say that in the current code populate_file_struct()
is called twice for every file: the first time it's called sequentially and then it's called in the parallel threads again. Which is probably not what you want. AFAIU this
if (entry->d_type == DT_DIR) {
check is an optimization postponing lstat()
calls to the parallel threads.
So there are two solutions:
- use
if (S_ISDIR(file.st_mode)) {
, remove the new call tolstat()
and removepopulate_file_struct()
fromget_hash()
. This is easier to understand, but suboptimal sincelstat()
is called sequentially for all files. - drop the first call to
populate_file_struct()
. In this case the secondlstat()
is called only as a fallback for rare cases whereDT_DIR
is not supported.
I think option 2 is indeed the best approach; please revise your patch and resubmit and I'll +1 |
According to POSIX.1 only d_name and d_ino fields of struct dirent are standardized. d_type isn't always correctly set on file systems like XFS. In such cases it makes sense to resort to lstat(). Otherwise a user has hard time figuring out what's wrong with her setup. Also remove redundant populate_file_struct() as it's called again in parallel threads. Signed-off-by: Dmitry Rozhkov <[email protected]>
|
Looks good to me. +1 |
For the record: In master, this change was reverted due to test suite failures. It caused file types to be calculated for Manifest.full, but not for bundle manifests. |
@phmccarty with performance improvements you mean removing the extra populate_file_struct()? Should this be tracked in a new issue? |
According to POSIX.1 only d_name and d_ino fields of struct
dirent are standardized. d_type isn't always correctly set on file
systems like XFS. In such cases it makes sense to resort to
lstat(). Otherwise a user has hard time figuring out what's
wrong with her setup.
Signed-off-by: Dmitry Rozhkov [email protected]