Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework FDT dedup log sync #17038

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

pcd1193182
Copy link
Contributor

This PR condenses the FDT dedup log syncing into a single sync pass. This reduces the overhead of modifying indirect blocks for the dedup table multiple times per txg. In addition, changes were made to the formula for how much to sync per txg. We now also consider the backlog we have to clear, to prevent it from growing too large, or remaining large on an idle system.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.

Authored-by: Don Brady [email protected]
Authored-by: Paul Dagnelie [email protected]

Motivation and Context

When flushing the DDT log, currently this takes place over multiple sync passes. This means that the same indirect blocks can be updated several times during one sync, which sort of defeats the purpose the DDT log was meant to solve in the first place. In addition, there is no mechanism in place to reduce the size of the DDT log; we try to keep up with the ingest rate, but that's it. If it ever does grow to a large size, we may never make progress in reducing the size, which can result in increased import times.

Description

There are two main changes included in this patch. The first is condensing all the syncing into a single sync pass. We do this by removing the code that divided the flush targets by the number of passes, and generally not doing any work beyond the first sync pass.

The second is the modification to the flush targets for each txg. The basic algorithm has changed; rather than directly targeting the ingest rate, the primary mechanism for determining how much to flush is by looking at the size of the backlog and dividing it by a target turnover rate (measured in TXGs). The idea is that this will smooth out the noise in the ingest rate, and over time, the flush rate will match the ingest rate. This is the result of a differential equation: dbacklog/dt = ingest_rate - backlog/C describes the change in backlog over time. This results in the backlog tending towards C * ingest_rate, where C is the turnover rate. Then the flush rate is C * ingest_rate / C, which is just the ingest rate.

However, one potential issue with this algorithm is that the backlog size is now proportional to the ingest rate. Whenever we do an import of the pool, we have to read through the whole DDT log to build up the in-memory state. If a user has hard requirements on import time, then large DDT log backlogs can cause problems for them. As a result, there is a separate pressure-based system to keep the backlog sizing from rising above a cap, when that cap is set. The way the pressure system works is that every txg, pressure is added if the backlog is above the cap and increasing; the amount added is proportional to the backlog divided by the cap, which helps us catch up to rapid spikes. If the backlog is above the cap but not increasing, we maintain the pressure; either it was a brief spike, or we've added enough pressure to bring the size down. Finally, if the backlog is below the cap, we release some of the pressure. The pressure is based on how far below the cap we are; that way, we quickly release pressure if the increased ingest rate abates, and we return to normal behavior. Here is a few charts to help demonstrate the behavior of this cap system:

In this example, we start with an ingest rate of 2k entries per second. We have a cap of 50k set, and the target turnover rate is 20 ( 20 make the changes happen more quickly, and be easier to see). At txg 10, the ingest rate increases by a factor of 3, and then at TXG 100 it decreases to the baseline. As you can see, the un-capped backlog quickly grows as the flush rate slowly rises to match the new ingest rate. Meanwhile, the capped backlog's flush rate climbs quickly to bring the backlog down near the cap, and then stabilizes to keep it there. Similarly, when the ingest rate drops, the un-capped backlog quickly starts falling as the flush rate slowly drops to the new baseline. Meanwhile the cap-based system starts to flush below the cap size and then corrects, levelling off quickly near the previous baseline.
image

Finally, in addition to these changes, I added a new test to the ZTS to verify that pacing works as expected.

How Has This Been Tested?

In addition to the zfs test suite, I ran several tests where I simulated various ingestion patterns into the DDT, and verified that the backlog behaved as expected with and without the cap set.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

This PR condenses the FDT dedup log syncing into a single sync
pass. This reduces the overhead of modifying indirect blocks for the
dedup table multiple times per txg. In addition, changes were made to
the formula for how much to sync per txg. We now also consider the
backlog we have to clear, to prevent it from growing too large, or
remaining large on an idle system.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.

Authored-by: Don Brady <[email protected]>
Authored-by: Paul Dagnelie <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
man/man4/zfs.4 Outdated Show resolved Hide resolved
module/zfs/ddt.c Outdated Show resolved Hide resolved
Signed-off-by: Paul Dagnelie <[email protected]>
@pcd1193182 pcd1193182 mentioned this pull request Feb 12, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants