Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FreeBSD: Fix file descriptor leak on pool import. #15630

Merged
merged 1 commit into from
Jan 23, 2024

Conversation

pjd
Copy link
Contributor

@pjd pjd commented Dec 4, 2023

While here, return the error we get from fgets() instead of pretending we succeeded.

Descriptor leak can be easily reproduced by doing:

# zpool import tank
# sysctl kern.openfiles
# zpool export tank; zpool import tank
# sysctl kern.openfiles

We were leaking four file descriptors on every import.

Motivation and Context

The change corrects zfs_file_open()/zfs_file_close() functions to not leave open file descriptor on zfs_file_close().
If one does a lot of pool exports and imports, one can exhaust allowed number of open file descriptors in the system.

Description

In order to close the file descriptor properly we need to also remember its number (and not only the file structure pointer).
This is why I had to create dedicated structure that can hold both: file descriptor number and a pointer to the file structure.

How Has This Been Tested?

Running my bclone tests (which do a lot of pool imports and exports) failed, because of ENFILE (Too many open files in system). After applying the fix tests succeeded and descriptors don't leak.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf requested a review from amotin December 5, 2023 19:08
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Dec 5, 2023
@behlendorf
Copy link
Contributor

cc: @ixhamza

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it looks OK, my kernel-level file operations are very rusty, and I am not sure this is the right KPI to use. I see FreeBSD kernel much more often uses vn_open() directly rather than kern_openat(), that allows completely avoid file descriptor creation/freeing and so its leak.

@pjd
Copy link
Contributor Author

pjd commented Dec 7, 2023

While it looks OK, my kernel-level file operations are very rusty, and I am not sure this is the right KPI to use. I see FreeBSD kernel much more often uses vn_open() directly rather than kern_openat(), that allows completely avoid file descriptor creation/freeing and so its leak.

With my original ZFS port I didn't use kern_openat(). I was under impression that there must be a reason that somebody changed that - zfs_file_os.c does much more these days. Having said that, being able to operate on just the file structure without the need to remember descriptor number would be optimal.

@behlendorf
Copy link
Contributor

Where did we end up with this? I believe there was an open question if we were using the right KPI.

@kostikbel
Copy link

Installing user file descriptor for kernel operation is weird. So indeed kern_openat() should not be used. I have no idea about what does zfs attempt to do there, some explanation would be useful (for me). If you start with the path as opposed to file descriptor, perhaps you should use namei() and then the vnode layer KPI. Or you could leave this machinery to userspace and just pass fd into the op.

Copy link
Contributor

@ryao ryao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit title could use a FreeBSD: prefix.

Also, does this only occur on import and not send/recv/diff too? They use these functions as well.

@pjd
Copy link
Contributor Author

pjd commented Dec 17, 2023

Where did we end up with this? I believe there was an open question if we were using the right KPI.

FYI, I'll take a look if other KPI can be used.

@pjd pjd changed the title Fix file descriptor leak on pool import. FreeBSD: Fix file descriptor leak on pool import. Dec 17, 2023
@pjd
Copy link
Contributor Author

pjd commented Dec 27, 2023

The commit title could use a FreeBSD: prefix.

Also, does this only occur on import and not send/recv/diff too? They use these functions as well.

zfs_file_open() is not used for send/recv. AFAIR the userland opens file descriptors and the kernel just uses what's already open. I did find another use for zfs_file_open() - file-based vdevs, but that's about it.

@pjd pjd force-pushed the freebsd-descriptor-leak branch from fd97ceb to bf51e73 Compare December 27, 2023 09:01
@pjd
Copy link
Contributor Author

pjd commented Dec 27, 2023

Installing user file descriptor for kernel operation is weird. So indeed kern_openat() should not be used. I have no idea about what does zfs attempt to do there, some explanation would be useful (for me). If you start with the path as opposed to file descriptor, perhaps you should use namei() and then the vnode layer KPI. Or you could leave this machinery to userspace and just pass fd into the op.

@kostikbel @amotin Could you guys take another look? This is a bit out of my comfort zone:)

error = vn_open(&nd, &flags, 0, fp);
if (error != 0) {
falloc_abort(td, fp);
return (SET_ERROR(error));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to leak buffer in nd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to leak buffer in nd

Do you mean NDFREE_PNBUF() should be moved up? It is allocated by successful vn_open(), not by NDINIT(), so I think we should be fine.


NDINIT(&nd, LOOKUP, FOLLOW, UIO_SYSSPACE, path);
flags = FFLAGS(flags);
error = vn_open(&nd, &flags, 0, fp);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmode == 0 means that the file gets no access permissions if created. Is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmode == 0 means that the file gets no access permissions if created. Is this intended?

It wasn't intended. Fixed. Thanks.

if (error != 0) {
return (error);
}
fp->f_flag = flags & FMASK;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the source of the flags? If the whole O_* namespace is valid for flags then some validation is required, because a lot of flags combinations are invalid and probably result in contradictory state of the file/vnode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the source of the flags? If the whole O_* namespace is valid for flags then some validation is required, because a lot of flags combinations are invalid and probably result in contradictory state of the file/vnode.

Thanks. I added some asserts and moved FFLAGS() to before this line.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that if going forward with struct file * as handle, and with using vn_open()-like open, it is much better to split kern_openat() and use that. You cleared some flags, but the other important part of openat() handles special errors and fileops.

Copy link

@kostikbel kostikbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I still do not understand why do ZFS needs to open files from the kernel side. Could somebody clarify please?

The proposed zfs_file_open() function seems to be the better part of the kern_openat() code. If this is going further, then we definitely need to split kern_openat() into two parts, where first part only provides fp, and second would install the fp into fd table.

@pjd
Copy link
Contributor Author

pjd commented Dec 28, 2023

So I still do not understand why do ZFS needs to open files from the kernel side. Could somebody clarify please?

There are two cases where ZFS opens file from the kernel:

  1. When it reads /boot/zfs/zpool.cache file. This file is used to tell which pools were imported before the reboot so they can be automatically imported on boot. It might be needed even before the root file system is mounted.
  2. You can create the pool out of files, instead of GEOM providers. This is mostly used for testing.

The proposed zfs_file_open() function seems to be the better part of the kern_openat() code. If this is going further, then we definitely need to split kern_openat() into two parts, where first part only provides fp, and second would install the fp into fd table.

@pjd pjd force-pushed the freebsd-descriptor-leak branch from bf51e73 to 0ffc915 Compare December 28, 2023 02:20
@kostikbel
Copy link

So I still do not understand why do ZFS needs to open files from the kernel side. Could somebody clarify please?

There are two cases where ZFS opens file from the kernel:

1. When it reads /boot/zfs/zpool.cache file. This file is used to tell which pools were imported before the reboot so they can be automatically imported on boot. It might be needed even before the root file system is mounted.

2. You can create the pool out of files, instead of GEOM providers. This is mostly used for testing.

So in both cases it is vnodes, not files (as in struct file). Using file layer IMO only adds complications.

The proposed zfs_file_open() function seems to be the better part of the kern_openat() code. If this is going further, then we definitely need to split kern_openat() into two parts, where first part only provides fp, and second would install the fp into fd table.

@pjd
Copy link
Contributor Author

pjd commented Jan 9, 2024

There are two cases where ZFS opens file from the kernel:

1. When it reads /boot/zfs/zpool.cache file. This file is used to tell which pools were imported before the reboot so they can be automatically imported on boot. It might be needed even before the root file system is mounted.

2. You can create the pool out of files, instead of GEOM providers. This is mostly used for testing.

So in both cases it is vnodes, not files (as in struct file). Using file layer IMO only adds complications.

Maybe, but this way there is a shared abstract layer for FreeBSD and Linux and the same KPI is also used to read/write when we do have file structures (eg. zfs send/recv will open descriptors in userland and those will be used in the kernel through this KPI). If we would operate on vnodes here, we would need to have two separate abstractions for reading and writing for both OSes.

@mmatuska
Copy link
Contributor

@pjd any news here?

@pjd
Copy link
Contributor Author

pjd commented Jan 20, 2024

@pjd any news here?

I don't plan to work on splitting kern_openat(), so either we merge the patch as-is if there are no concerns to the correctness of the patch or someone else may pick it up and work on kern_openat().

Descriptor leak can be easily reproduced by doing:

	# zpool import tank
	# sysctl kern.openfiles
	# zpool export tank; zpool import tank
	# sysctl kern.openfiles

We were leaking four file descriptors on every import.

Similar leak most likely existed when using file-based VDEVs.

Signed-off-by: Pawel Jakub Dawidek <[email protected]>
@pjd pjd force-pushed the freebsd-descriptor-leak branch from 0ffc915 to 46e57a3 Compare January 20, 2024 06:44
@kostikbel
Copy link

@pjd any news here?

I don't plan to work on splitting kern_openat(), so either we merge the patch as-is if there are no concerns to the correctness of the patch or someone else may pick it up and work on kern_openat().

https://reviews.freebsd.org/D43529

@mmatuska
Copy link
Contributor

@pjd we will still probably need this as well for systems where kern_openatfp(9) is not available (yet)

@behlendorf behlendorf removed the Status: Code Review Needed Ready for review and testing label Jan 23, 2024
@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Jan 23, 2024
@behlendorf behlendorf merged commit a4bf6ba into openzfs:master Jan 23, 2024
24 of 26 checks passed
mmatuska pushed a commit to mmatuska/zfs that referenced this pull request Jan 23, 2024
Descriptor leak can be easily reproduced by doing:

	# zpool import tank
	# sysctl kern.openfiles
	# zpool export tank; zpool import tank
	# sysctl kern.openfiles

We were leaking four file descriptors on every import.

Similar leak most likely existed when using file-based VDEVs.

External-issue: https://reviews.freebsd.org/D43529
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pawel Jakub Dawidek <[email protected]>
Closes openzfs#15630
behlendorf pushed a commit that referenced this pull request Jan 26, 2024
Descriptor leak can be easily reproduced by doing:

	# zpool import tank
	# sysctl kern.openfiles
	# zpool export tank; zpool import tank
	# sysctl kern.openfiles

We were leaking four file descriptors on every import.

Similar leak most likely existed when using file-based VDEVs.

External-issue: https://reviews.freebsd.org/D43529
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pawel Jakub Dawidek <[email protected]>
Closes #15630
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 13, 2024
Descriptor leak can be easily reproduced by doing:

	# zpool import tank
	# sysctl kern.openfiles
	# zpool export tank; zpool import tank
	# sysctl kern.openfiles

We were leaking four file descriptors on every import.

Similar leak most likely existed when using file-based VDEVs.

External-issue: https://reviews.freebsd.org/D43529
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pawel Jakub Dawidek <[email protected]>
Closes openzfs#15630
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 13, 2024
Descriptor leak can be easily reproduced by doing:

	# zpool import tank
	# sysctl kern.openfiles
	# zpool export tank; zpool import tank
	# sysctl kern.openfiles

We were leaking four file descriptors on every import.

Similar leak most likely existed when using file-based VDEVs.

External-issue: https://reviews.freebsd.org/D43529
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pawel Jakub Dawidek <[email protected]>
Closes openzfs#15630
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants