Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add is_hugetlbfs() to GuestMemoryRegion #120

Merged
merged 1 commit into from
Feb 4, 2021

Conversation

teawater
Copy link
Contributor

@teawater teawater commented Nov 6, 2020

Virtio-balloon can release the unused host memory to decrease the memory usage of the VMM.
Release normal pages and hugetlbfs pages require different operations. (madvise MADV_DONTNEED and fallocate64 FALLOC_FL_PUNCH_HOLE)

This commit add Add is_hugetlbfs() to GuestMemoryRegion to help VMM decide if this is a hugetlbfs address or not.

@@ -288,6 +288,11 @@ pub trait GuestMemoryRegion: Bytes<MemoryRegionAddress, E = Error> {
fn as_volatile_slice(&self) -> Result<volatile_memory::VolatileSlice> {
self.get_slice(MemoryRegionAddress(0), self.len() as usize)
}

/// Returns true if the region is hugepages
fn is_hugepages(&self) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does hugepage here mean Hugetlbfs? Or it also includes Transparent Huge Pages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, hugepage means Hugetlbfs.
THPs can be release as normal pages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hugepage is a little ambiguous, should we clearly state that it's hugetlbfs?

@jiangliu
Copy link
Member

jiangliu commented Nov 6, 2020

Maybe we lack of some method to attaching some label information to memory regions. For the is_hugepage case, we just store/retrieve a flag associated with a region, and the flag doesn't affect vm-memory behavior. So it would be great to introduce a new common mechanism to attach label data to regions.

@teawater
Copy link
Contributor Author

teawater commented Nov 6, 2020

Maybe we lack of some method to attaching some label information to memory regions. For the is_hugepage case, we just store/retrieve a flag associated with a region, and the flag doesn't affect vm-memory behavior. So it would be great to introduce a new common mechanism to attach label data to regions.

What about add a new u64 flags to MmapRegion to store the label of a region?

@teawater teawater force-pushed the memory_huge branch 4 times, most recently from 98d0b34 to 0354dcf Compare November 9, 2020 02:36
@teawater teawater changed the title Add get_host_address_and_hugepages() to GuestMemory Add is_hugetlbfs() to GuestMemoryRegion Nov 9, 2020
@teawater
Copy link
Contributor Author

teawater commented Nov 9, 2020

@jiangliu Pushed a new version according to your comments.

@alexandruag
Copy link
Collaborator

Hi! Are you using something like memfds for hugepage-backed areas? (that would explain the fallocate). Having a way to express the presence of hugepages at the vm-memory interface level (i.e. GuestMemory and/or GuestMemoryRegion) looks interesting I'm actually not sure what the best approach is (have to think more :D).

However, if you know you're using MmapRegion/GuestRegionMmap it looks like you can inspect the flags member to determine whether MAP_HUGETLB is set (I'm not 100% sure how this interacts with file mappings; would be nice if someone readily has more info here). Also, even at a higher level,MADV_FREE can be applied to any (anonymous) mapping, FALLOC_FL_PUNCH_HOLE to file mappings (if supported by the fs), and a distinction can be made based on the presence or absence of GuestMemoryRegion::file_offset. Is any of this helpful for the time being?

@teawater
Copy link
Contributor Author

Hi! Are you using something like memfds for hugepage-backed areas? (that would explain the fallocate). Having a way to express the presence of hugepages at the vm-memory interface level (i.e. GuestMemory and/or GuestMemoryRegion) looks interesting I'm actually not sure what the best approach is (have to think more :D).

Yes, I am working on virtio-balloon free page reporting of cloud-hypervisor. Its areas without backing file are based on memfds. The areas will setup with MFD_HUGETLB if need.

However, if you know you're using MmapRegion/GuestRegionMmap it looks like you can inspect the flags member to determine whether MAP_HUGETLB is set (I'm not 100% sure how this interacts with file mappings; would be nice if someone readily has more info here).

Thanks! I will use it instead of "set_hugetlbfs" in the next version.

Also, even at a higher level,MADV_FREE can be applied to any (anonymous) mapping, FALLOC_FL_PUNCH_HOLE to file mappings (if supported by the fs), and a distinction can be made based on the presence or absence of GuestMemoryRegion::file_offset. Is any of this helpful for the time being?

I would like to use MADV_FREE as a option but not default.
The reason is:

  1. MADV_FREE is lazy free.
  2. MADV_FREE is just work for anon page.

Best,
Hui

@jiangliu
Copy link
Member

jiangliu commented Nov 10, 2020

Hi! Are you using something like memfds for hugepage-backed areas? (that would explain the fallocate). Having a way to express the presence of hugepages at the vm-memory interface level (i.e. GuestMemory and/or GuestMemoryRegion) looks interesting I'm actually not sure what the best approach is (have to think more :D).

Yes, I am working on virtio-balloon free page reporting of cloud-hypervisor. Its areas without backing file are based on memfds. The areas will setup with MFD_HUGETLB if need.

In addition to mmap(MFD_HUGETLB), there are other ways to use hugetlbfs backend. So detecting the MFD_HUGETLB flag may not be concise.

@alexandruag
Copy link
Collaborator

It's true there are multiple ways to leverage huge pages in general, but specifically within MmapRegion, flags should catch all of them, right? For example, can we mmap a file backed by hugepages without setting that flag?

@teawater, just wondering, why is lazy free worse? Also, with respect to point number 2, you'd use DONT_NEED for anon only as well, since it looks like FALLOC_FL_PUNCH_HOLE has to be used for mmap-ed fds.

That being said, having a high level api to provide page size information sounds interesting. For example I'm wondering whether more information than a simple boolean flag is useful/necessary (i.e. the actual page size). Also, looks like Windows supports something similar but they're called large pages. I would def like to think about his more. @teawater is your use case blocked the existing vm-memory implementation not supporting that information? There's also the possibility of managing an external (with respect to vm_memory) mapping of addresses/regions to huge page information. Can something like that work?

@teawater
Copy link
Contributor Author

It's true there are multiple ways to leverage huge pages in general, but specifically within MmapRegion, flags should catch all of them, right? For example, can we mmap a file backed by hugepages without setting that flag?

Yes.
And memfd with MFD_HUGETLB doesn't need it either.

@teawater, just wondering, why is lazy free worse?

lazy free is not suitable for many environments because the pages will be released when memory pressure.
But memory pressure may cause some performance problems.
Many systems try to avoid going into memory stressed states.
That is why I think MADV_FREE should not be the default operation.

Also, with respect to point number 2, you'd use DONT_NEED for anon only as well, since it looks like FALLOC_FL_PUNCH_HOLE has to be used for mmap-ed fds.

Thanks for your reminding.

That being said, having a high level api to provide page size information sounds interesting. For example I'm wondering whether more information than a simple boolean flag is useful/necessary (i.e. the actual page size). Also, looks like Windows supports something similar but they're called large pages. I would def like to think about his more. @teawater is your use case blocked the existing vm-memory implementation not supporting that information? There's also the possibility of managing an external (with respect to vm_memory) mapping of addresses/regions to huge page information. Can something like that work?

I think it is not about page size but page type.
Linux kernel provides different release pages(keep vma) API for hugetlb pages and others.

@alexandruag
Copy link
Collaborator

Thanks for the details! I'll have a better look at these things and return to the thread afterwards. Meanwhile, it would be great if other folks chime in as well.

@EmeraldShift
Copy link

Maybe we lack of some method to attaching some label information to memory regions. For the is_hugepage case, we just store/retrieve a flag associated with a region, and the flag doesn't affect vm-memory behavior. So it would be great to introduce a new common mechanism to attach label data to regions.

This makes sense, but for this specific issue I'm not understanding the purpose behind having set_hugetlbfs, independent of the region constructor. In my mind, it would make more sense for a flag/option to be passed in on region construction that specifies the desired backing behavior (base pages, THP, memfd/hugetlbfs, etc.). That way, the flag (and is_hugetlbfs) would always be consistent with the region's behavior, and we could combine this PR's API (an indicator of the region's hugepage status) with #118 to provide a configurable hugepage API to VMMs.

@jiangliu
Copy link
Member

Maybe we lack of some method to attaching some label information to memory regions. For the is_hugepage case, we just store/retrieve a flag associated with a region, and the flag doesn't affect vm-memory behavior. So it would be great to introduce a new common mechanism to attach label data to regions.

This makes sense, but for this specific issue I'm not understanding the purpose behind having set_hugetlbfs, independent of the region constructor. In my mind, it would make more sense for a flag/option to be passed in on region construction that specifies the desired backing behavior (base pages, THP, memfd/hugetlbfs, etc.). That way, the flag (and is_hugetlbfs) would always be consistent with the region's behavior, and we could combine this PR's API (an indicator of the region's hugepage status) with #118 to provide a configurable hugepage API to VMMs.

Our current solution is to create file descriptors by the vmm and build memory regions by using the MmapRegion::build() interface. In other words, we have moved the dirty work into vmm:)

@alexandruag
Copy link
Collaborator

Hi again! Seems like the current iteration of the PR is the most straightforward thing we can do right now to provide the desired functionality. Like @EmeraldShift mentioned, the set method looks a bit out of place, but it's prob the best option until we come up with a new iteration of the interfaces.

Just to clarify the semantics here a bit, if a region is backed by transparent huge pages, the is_hugepages method as defined here should return false right? Would be nice to have a clearer description of the expected behaviour as part of the documentation. Also, does anyone know more about any potential correspondence between huge pages on Linux and large pages on Windows? Should we only present this functionality on platforms where it's relevant (i.e. via conditional compilation options)?

@teawater
Copy link
Contributor Author

teawater commented Dec 4, 2020

Hi again! Seems like the current iteration of the PR is the most straightforward thing we can do right now to provide the desired functionality. Like @EmeraldShift mentioned, the set method looks a bit out of place, but it's prob the best option until we come up with a new iteration of the interfaces.

Just to clarify the semantics here a bit, if a region is backed by transparent huge pages, the is_hugepages method as defined here should return false right?

Yes.

Would be nice to have a clearer description of the expected behaviour as part of the documentation.

What about add the description to https://github.com/rust-vmm/vm-memory/blob/master/DESIGN.md#backend-implementation-based-on-mmap ?

Also, does anyone know more about any potential correspondence between huge pages on Linux and large pages on Windows? Should we only present this functionality on platforms where it's relevant (i.e. via conditional compilation options)?

@alexandruag
Copy link
Collaborator

Sounds good!

jiangliu
jiangliu previously approved these changes Dec 10, 2020
@alxiord
Copy link
Member

alxiord commented Dec 15, 2020

Hi @teawater, we're having some trouble with the CI, so builds might be failing or hanging. We' re sorry for the inconvenience, we'll update you here once it's back up and running.

@teawater
Copy link
Contributor Author

Hi @teawater, we're having some trouble with the CI, so builds might be failing or hanging. We' re sorry for the inconvenience, we'll update you here once it's back up and running.

@aghecenco OK. Thanks!

Copy link
Collaborator

@alexandruag alexandruag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi everyone and apologies for the delay in circling back. Left two more comments that I recently thought about; would be great to hear your opinion on them. After we have a conclusion, we should also add a changelog entry for the additions and decide whether we want the is_hugetlbs method to be conditionally compiled/included depending on the OS, and we're good to go.

DESIGN.md Outdated
@@ -122,6 +122,18 @@ let buf = &mut [0u8; 5];
let result = guest_memory_mmap.write(buf, addr);
```

One of the responsibilities of `GuestRegionMmap` is to provide an API
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, it's probably best to move something like this description and example to the doc comments of the is_hugetlbfs method from the GuestMemory interface, where it gets better visibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -288,6 +288,11 @@ pub trait GuestMemoryRegion: Bytes<MemoryRegionAddress, E = Error> {
fn as_volatile_slice(&self) -> Result<volatile_memory::VolatileSlice> {
self.get_slice(MemoryRegionAddress(0), self.len() as usize)
}

/// Returns true if the region is backed by hugetlbfs
fn is_hugetlbfs(&self) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, would it actually make sense to return an Option<bool>, where None represents that no information is available? In our particular situation (where is_hugetlbfs has to be set via the set_hugetlbfs accesor), this would also be a natural fit since the field starts out as None, and only becomes true or false if it's explicitly set. Just wondering what other folks think about this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@teawater teawater force-pushed the memory_huge branch 3 times, most recently from 6ff201f to 1dd3577 Compare January 18, 2021 01:25
src/mmap_unix.rs Outdated
@@ -172,6 +173,7 @@ impl MmapRegion {
prot,
flags,
owned: true,
hugetlbfs: false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the hugetlbfs field of MmapRegion as an Option<bool> which is None initially? (since there's no available information right now, until we explicitly set it with set_hugetlbfs)

jiangliu
jiangliu previously approved these changes Jan 21, 2021
/// # Examples
///
/// ```
/// # #[cfg(feature = "backend-mmap")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just merged a PR that updated all the code examples that needed features. Can you please update this example to follow that template?

Changes that would be needed:

  • specify in the header that this example is based on the the mmap feature. Here is an example:
    /// # Examples (uses the `backend-mmap` feature)
  • declare the [cfg(feature = "backend-mmap")] only one at the begginging of the example, and get rid of the test_guest_memory_mmap_is_hugetlbfs function. The code can sit directly in the example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_guest_memory_mmap_is_hugetlbs function is still there. To get rid of it, and also declare the #[cfg(feature = "backend-mmap")], we would need to start a code block (and have it all conditionally compiled only for that feature).

    /// # #[cfg(feature = "backend-mmap")]
    /// # {
    /// # use vm_memory::{GuestAddress, GuestMemory, GuestMemoryMmap, GuestRegionMmap};
    ///
    /// let addr = GuestAddress(0x1000);
    /// let mem = GuestMemoryMmap::from_ranges(&[(addr, 0x1000)]).unwrap();
    /// let r = mem.find_region(addr).unwrap();
    /// assert_eq!(r.is_hugetlbfs(), None);
    /// # }

With this approach, you no longer need the function test_guest_memory_mmap_is_hugetlbfs, and all the cfg macros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo fmt fails because of alignment. The code needs to be aligned as in my previous comment. We can fix it in a subsequent PR.

src/mmap_windows.rs Outdated Show resolved Hide resolved
jiangliu
jiangliu previously approved these changes Jan 22, 2021
Virtio-balloon can release the unused host memory to decrease the memory
usage of the VMM.
Release normal pages and hugetlbfs pages requiring different operations.
(madvise MADV_DONTNEED and fallocate64 FALLOC_FL_PUNCH_HOLE)

This commit add Add is_hugetlbfs() to GuestMemoryRegion to help
VMM decide if this is a hugetlbfs address or not.
It returns None represents that no information is available.

Signed-off-by: Hui Zhu <[email protected]>

#[cfg(feature = "backend-mmap")]
#[test]
fn test_guest_memory_mmap_is_hugetlbfs() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicates the test in comment. (as pointed out before)

@elmarco
Copy link
Contributor

elmarco commented Jan 27, 2021

Why not add a is_hugetlbs() to GuestRegionMmap and query the underlying MmapRegion flags?

/// # Examples
///
/// ```
/// # #[cfg(feature = "backend-mmap")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo fmt fails because of alignment. The code needs to be aligned as in my previous comment. We can fix it in a subsequent PR.

/// assert_eq!(r.is_hugetlbfs(), None);
/// # }
/// ```
fn is_hugetlbfs(&self) -> Option<bool> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late comment, but can we actually just use conditional compilation here and make it available only on Linux, instead of having a blank implementation for Windows?

@andreeaflorescu
Copy link
Member

Why not add a is_hugetlbs() to GuestRegionMmap and query the underlying MmapRegion flags?

I think it might makes sense to have it part of the interface in case this information is needed by crates that do not take a dependency on the mmap backend feature.

@elmarco
Copy link
Contributor

elmarco commented Jan 27, 2021

Why not add a is_hugetlbs() to GuestRegionMmap and query the underlying MmapRegion flags?

I think it might makes sense to have it part of the interface in case this information is needed by crates that do not take a dependency on the mmap backend feature.

I see. But can't we remove the need for set_hugetlbfs() and query the underlying memory instead? (via some trait)

@andreeaflorescu
Copy link
Member

I see. But can't we remove the need for set_hugetlbfs() and query the underlying memory instead? (via some trait)

This is an interesting point, that would make the implementation nicer. I do not really know how could we achieve this, so any ideas would be welcome. @bonzini @jiangliu @alexandruag do you have a suggestion?

@jiangliu
Copy link
Member

I see. But can't we remove the need for set_hugetlbfs() and query the underlying memory instead? (via some trait)

This is an interesting point, that would make the implementation nicer. I do not really know how could we achieve this, so any ideas would be welcome. @bonzini @jiangliu @alexandruag do you have a suggestion?

I don't know any easy way to figure out whether a fd is backed by hugetlbfs:(
We could get file name from /proc/fd/[fdnum], then check the filename. But that's too complicated.

@elmarco
Copy link
Contributor

elmarco commented Jan 28, 2021

I see. But can't we remove the need for set_hugetlbfs() and query the underlying memory instead? (via some trait)

This is an interesting point, that would make the implementation nicer. I do not really know how could we achieve this, so any ideas would be welcome. @bonzini @jiangliu @alexandruag do you have a suggestion?

I don't know any easy way to figure out whether a fd is backed by hugetlbfs:(
We could get file name from /proc/fd/[fdnum], then check the filename. But that's too complicated.

I was thinking of simply relying on the flags that were given to mmap().

@jiangliu
Copy link
Member

I see. But can't we remove the need for set_hugetlbfs() and query the underlying memory instead? (via some trait)

This is an interesting point, that would make the implementation nicer. I do not really know how could we achieve this, so any ideas would be welcome. @bonzini @jiangliu @alexandruag do you have a suggestion?

I don't know any easy way to figure out whether a fd is backed by hugetlbfs:(
We could get file name from /proc/fd/[fdnum], then check the filename. But that's too complicated.

I was thinking of simply relying on the flags that were given to mmap().

It works for anonymous memory backed by hugettlbfs by detecting the MAP_HUGETLB flag, but it doesn't work when passing a prepared hugetlbfs fd to MmapRegion::build().

Copy link
Member

@andreeaflorescu andreeaflorescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a long standing PR, let's just merge it, and submit a PR with the required changes afterwards.

@jiangliu jiangliu merged commit e63914e into rust-vmm:master Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants