-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FreeBSD: Fix file descriptor leak on pool import. #15630
Conversation
cc: @ixhamza |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it looks OK, my kernel-level file operations are very rusty, and I am not sure this is the right KPI to use. I see FreeBSD kernel much more often uses vn_open() directly rather than kern_openat(), that allows completely avoid file descriptor creation/freeing and so its leak.
With my original ZFS port I didn't use kern_openat(). I was under impression that there must be a reason that somebody changed that - zfs_file_os.c does much more these days. Having said that, being able to operate on just the file structure without the need to remember descriptor number would be optimal. |
Where did we end up with this? I believe there was an open question if we were using the right KPI. |
Installing user file descriptor for kernel operation is weird. So indeed kern_openat() should not be used. I have no idea about what does zfs attempt to do there, some explanation would be useful (for me). If you start with the path as opposed to file descriptor, perhaps you should use namei() and then the vnode layer KPI. Or you could leave this machinery to userspace and just pass fd into the op. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit title could use a FreeBSD: prefix.
Also, does this only occur on import and not send/recv/diff too? They use these functions as well.
FYI, I'll take a look if other KPI can be used. |
zfs_file_open() is not used for send/recv. AFAIR the userland opens file descriptors and the kernel just uses what's already open. I did find another use for zfs_file_open() - file-based vdevs, but that's about it. |
fd97ceb
to
bf51e73
Compare
@kostikbel @amotin Could you guys take another look? This is a bit out of my comfort zone:) |
error = vn_open(&nd, &flags, 0, fp); | ||
if (error != 0) { | ||
falloc_abort(td, fp); | ||
return (SET_ERROR(error)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to leak buffer in nd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to leak buffer in nd
Do you mean NDFREE_PNBUF() should be moved up? It is allocated by successful vn_open(), not by NDINIT(), so I think we should be fine.
module/os/freebsd/zfs/zfs_file_os.c
Outdated
|
||
NDINIT(&nd, LOOKUP, FOLLOW, UIO_SYSSPACE, path); | ||
flags = FFLAGS(flags); | ||
error = vn_open(&nd, &flags, 0, fp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmode == 0 means that the file gets no access permissions if created. Is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmode == 0 means that the file gets no access permissions if created. Is this intended?
It wasn't intended. Fixed. Thanks.
if (error != 0) { | ||
return (error); | ||
} | ||
fp->f_flag = flags & FMASK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the source of the flags? If the whole O_* namespace is valid for flags then some validation is required, because a lot of flags combinations are invalid and probably result in contradictory state of the file/vnode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the source of the flags? If the whole O_* namespace is valid for flags then some validation is required, because a lot of flags combinations are invalid and probably result in contradictory state of the file/vnode.
Thanks. I added some asserts and moved FFLAGS() to before this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that if going forward with struct file * as handle, and with using vn_open()-like open, it is much better to split kern_openat() and use that. You cleared some flags, but the other important part of openat() handles special errors and fileops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I still do not understand why do ZFS needs to open files from the kernel side. Could somebody clarify please?
The proposed zfs_file_open() function seems to be the better part of the kern_openat() code. If this is going further, then we definitely need to split kern_openat() into two parts, where first part only provides fp, and second would install the fp into fd table.
There are two cases where ZFS opens file from the kernel:
|
bf51e73
to
0ffc915
Compare
So in both cases it is vnodes, not files (as in struct file). Using file layer IMO only adds complications.
|
Maybe, but this way there is a shared abstract layer for FreeBSD and Linux and the same KPI is also used to read/write when we do have file structures (eg. zfs send/recv will open descriptors in userland and those will be used in the kernel through this KPI). If we would operate on vnodes here, we would need to have two separate abstractions for reading and writing for both OSes. |
@pjd any news here? |
I don't plan to work on splitting kern_openat(), so either we merge the patch as-is if there are no concerns to the correctness of the patch or someone else may pick it up and work on kern_openat(). |
Descriptor leak can be easily reproduced by doing: # zpool import tank # sysctl kern.openfiles # zpool export tank; zpool import tank # sysctl kern.openfiles We were leaking four file descriptors on every import. Similar leak most likely existed when using file-based VDEVs. Signed-off-by: Pawel Jakub Dawidek <[email protected]>
0ffc915
to
46e57a3
Compare
|
@pjd we will still probably need this as well for systems where kern_openatfp(9) is not available (yet) |
Descriptor leak can be easily reproduced by doing: # zpool import tank # sysctl kern.openfiles # zpool export tank; zpool import tank # sysctl kern.openfiles We were leaking four file descriptors on every import. Similar leak most likely existed when using file-based VDEVs. External-issue: https://reviews.freebsd.org/D43529 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes openzfs#15630
Descriptor leak can be easily reproduced by doing: # zpool import tank # sysctl kern.openfiles # zpool export tank; zpool import tank # sysctl kern.openfiles We were leaking four file descriptors on every import. Similar leak most likely existed when using file-based VDEVs. External-issue: https://reviews.freebsd.org/D43529 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes #15630
Descriptor leak can be easily reproduced by doing: # zpool import tank # sysctl kern.openfiles # zpool export tank; zpool import tank # sysctl kern.openfiles We were leaking four file descriptors on every import. Similar leak most likely existed when using file-based VDEVs. External-issue: https://reviews.freebsd.org/D43529 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes openzfs#15630
Descriptor leak can be easily reproduced by doing: # zpool import tank # sysctl kern.openfiles # zpool export tank; zpool import tank # sysctl kern.openfiles We were leaking four file descriptors on every import. Similar leak most likely existed when using file-based VDEVs. External-issue: https://reviews.freebsd.org/D43529 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Pawel Jakub Dawidek <[email protected]> Closes openzfs#15630
While here, return the error we get from fgets() instead of pretending we succeeded.
Descriptor leak can be easily reproduced by doing:
We were leaking four file descriptors on every import.
Motivation and Context
The change corrects zfs_file_open()/zfs_file_close() functions to not leave open file descriptor on zfs_file_close().
If one does a lot of pool exports and imports, one can exhaust allowed number of open file descriptors in the system.
Description
In order to close the file descriptor properly we need to also remember its number (and not only the file structure pointer).
This is why I had to create dedicated structure that can hold both: file descriptor number and a pointer to the file structure.
How Has This Been Tested?
Running my bclone tests (which do a lot of pool imports and exports) failed, because of ENFILE (Too many open files in system). After applying the fix tests succeeded and descriptors don't leak.
Types of changes
Checklist:
Signed-off-by
.