Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The HDD volumes can significantly affect the write performance of SSD volumes on the same server node #16518

Open
TimLand opened this issue Sep 9, 2024 · 3 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@TimLand
Copy link

TimLand commented Sep 9, 2024

System information

Type Version/Name
Distribution Name Oracle Linux 7
Distribution Version 7.9
Kernel Version 5.4.17
Architecture x86_64
OpenZFS Version 2.1.15

Describe the problem you're observing

The write performance of SSD volumes on the same server node can be severely impacted by HDD volumes. When creating RAID-Z storage pools with entirely mechanical drives (HDD) and all-SSD respectively on the same server node, and then performing sequential writes to the volumes generated by these two pools simultaneously, it was observed that the sequential write speed of the SSD volume only reached 300 MB/s. When writing to the HDD volume was stopped, the write speed of the SSD volume returned to its normal rate of 1600 MB/s.

Describe how to reproduce the problem

On the same node, create a RAID 5 storage pool named hdd_pool using HDD drives, and another RAID 5 storage pool named ssd_pool using SSD drives. Then, create a volume called hdd_volume based on hdd_pool and a volume called ssd_volume based on ssd_pool. When performing simultaneous write operations using fio on both hdd_volume and ssd_volume, it is observed that the write speed of ssd_volume is only 300 MB/s and the write speed of hdd_volume is 287 MB/s. When the fio write operation on hdd_volume is stopped, the write speed of ssd_volume increases to 1600 MB/s.

The fio log indicates the following:
fio --name=test --rw=write --direct=1 --numjobs=1 --ioengine=libaio --iodepth=64 --bs=1M --group_reporting --runtime=60 --size=100G --filename=/dev/zd0
WRITE:io=891904KB,aggrb=200113KB/s,minb=200113KB/s,maxb=200113KB/s,mint=4457msec,maxt=4457msec

fio --name=test --rw=write --direct=1 --numjobs=1 --ioengine=libaio --iodepth=64 --bs=1M --group_reporting --runtime=60 --size=100G --filename=/dev/zd0
WRITE: io=100751MB,aggrb=1678.5MB/s,minb=1678.5MB/s,maxb=1678.5MB/s,mint=60026msec,maxt=60026msec

@TimLand TimLand added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 9, 2024
@snajpa
Copy link
Contributor

snajpa commented Sep 9, 2024

How does the CPU usage look like during the HDD pool usage?

What's your preempt model now?

cat /sys/kernel/debug/sched/preempt

can you try switching to full if not already there?

it'd be really helpful if you could record fio and the ZFS threads with perf and then if you could produce flamegraphs

but let's start with the preempt model - why I'm saying it could play a role: I suspect there's a possibility that processes are spinning on-cpu while data is being waited on and that this spinning eats up CPU resources for other activity...

(otherwise I have no idea on what could be happening so it's probably going to require a bit of back and forth)

@TimLand
Copy link
Author

TimLand commented Sep 10, 2024

Thanks, these are the flame graphs we've collected:
The following graph shows the results of running fio with the IO engine using libaio
hdd_ssd_vol_fio_with_libaio_write

The following graph shows the results of running fio with the IO engine using psync
hdd_ssd_vol_fio_with_psync_write

I have observed that when FIO uses the psync parameter, the performance of the SSD volume seems normal and is not impacted by the HDD volume. Does libaio maintain a global complete queue?When there are slow block devices, would this affect the speed at which elements are reaped from the complete queue?

@amotin
Copy link
Member

amotin commented Sep 10, 2024

Since ZFS does not really support async I/O, it can only execute as many simultaneous I/Os as there are threads to issue them. Those can be either kernel threads or ZFS/ZVOL, but neither are infinite and can create a bottleneck. Threading model for ZVOLs, if that is the case, was reworked recently in ZFS master in #15992. Meanwhile you may try to experiment with zvol_threads module parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants