-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TXG timestamp database #16853
base: master
Are you sure you want to change the base?
Add TXG timestamp database #16853
Conversation
2a20b11
to
364f813
Compare
364f813
to
891c8f2
Compare
It crashes on |
This reminds me we recently added |
ret = sscanf(timestr, "%4d-%2d-%2d %2d:%2d", &tm.tm_year, &tm.tm_mon, | ||
&tm.tm_mday, &tm.tm_hour, &tm.tm_min); | ||
if (ret < 3) { | ||
fprintf(stderr, gettext("Failed to parse the date.\n")); | ||
usage(B_FALSE); | ||
} | ||
|
||
// Adjust struct | ||
tm.tm_year -= 1900; | ||
tm.tm_mon -= 1; | ||
|
||
return (timegm(&tm)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if strptime()
or something else specialized would be better.
/* Load time log */ | ||
error = spa_load_txg_log_time(spa); | ||
if (error != 0) | ||
return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, EIO)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could instead delete it and start from scratch. Not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hym, maybe. But I feel that when we can't load part of it, then something funky is happening and should be investigated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the question is between inability to access data at all vs loosing this data, I choose the first. I was thinking about the same for BRT on read-only imports, but haven't got to it yet.
891c8f2
to
ba5ee33
Compare
963a5a3
to
33a7c27
Compare
module/zfs/spa.c
Outdated
VERIFY0(spa_rrd_write(spa, tx, &spa->spa_txg_log_time.dbr_minutes, | ||
DMU_POOL_TXG_LOG_TIME_MINUTES, insert)); | ||
VERIFY0(spa_rrd_write(spa, tx, &spa->spa_txg_log_time.dbr_days, | ||
DMU_POOL_TXG_LOG_TIME_DAYS, insert)); | ||
VERIFY0(spa_rrd_write(spa, tx, &spa->spa_txg_log_time.dbr_days, | ||
DMU_POOL_TXG_LOG_TIME_MONTHS, insert)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be simpler to store a single unix timestamp for the log time, rather than discrete minutes/days/months?
Also, you're storing dbr_days
into DMU_POOL_TXG_LOG_TIME_MONTHS
. Typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
The idea was to have different resolutions, so as long as the ZFS pool runs, we keep fewer historic TXGs.
We are also limited by the size of the ZAP entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned you're going to run into timezone issues if you don't use Unix timestamps. If you write a timestamp at 10am EST in New York, and then later import the pool in San Francisco and write a timestamp at 10am PST, will there be two timestamps at "10am"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... You are right me goal was to store time in UTC, the zpool scrub
converts this time to be UTC.
I was sure the gethrtime
return a UTC time stamp, but you are right it doesn't.
I have to fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I have double check, and I think that the implementation is correct.
The gethrestime_sec(void) function uses ktime_get_coarse_real_ts64 which from my understanding returns a UTC time.
This feature enables tracking of when TXGs are committed to disk, providing an estimated timestamp for each TXG. With this information, it becomes possible to perform scrubs based on specific date ranges, improving the granularity of data management and recovery operations. Signed-off-by: Mariusz Zaborski <[email protected]>
33a7c27
to
7797e3f
Compare
Forgot to mention this earlier - can you add a test case to exercise |
Unfortunately, I don't have an idea how to add such test, as to test it we would need to wait for rrd to be created. This will create very long test. Do you have some suggestions? |
The test case could temporarily set the system clock forward to simulate the passage of time. |
if (!spa_writeable(spa)) { | ||
return; | ||
} | ||
if (txg == spa->spa_last_noted_txg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this be true if it is called only once per TXG sync?
} dbrrd_rounding_t; | ||
|
||
typedef struct { | ||
hrtime_t rrdd_time; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how compatible this is, but in general it should not be hrtime_t
here. On Illumos and FreeBSD gethrestime_sec()
returns time_t
, while on Linux -- uint64_t
. I think uint64_t
would be better here from every aspect.
{ | ||
rrd->rrd_head = ntohll(rrd->rrd_head); | ||
rrd->rrd_tail = ntohll(rrd->rrd_tail); | ||
rrd->rrd_length = ntohll(rrd->rrd_length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not incorrect, I can't recall other place where ZFS would store things in network order. DDT does it per-record, for example see ddt_zap_decompress()
. I don't insist, just propose to think wider.
Also it seems like this will be a single > 4KB ZAP entry. I hope MOS ZAP has sufficiently big block size to store it efficiently.
|
||
rrd_ntoh(&spa->spa_txg_log_time.dbr_minutes); | ||
rrd_ntoh(&spa->spa_txg_log_time.dbr_days); | ||
rrd_ntoh(&spa->spa_txg_log_time.dbr_months); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZAP has native support for byte swapping if you specify that you store 8-byte integers (same as 2- and 4-byte). But instead you are storing 8 1-byte integers to create yourself a problem. Array of 1-byte integers makes sense for opaque data blobs, but integers should be stored as integers. Actually, since all fields in the RRD value are uint64_t, I think you could store it as array of 8-byte integers and ZAP would do all the byte-swapping for you. It would be much cleaner than your manual byte swapping!
Motivation and Context
This feature enables tracking of when TXGs are committed to disk, providing an estimated timestamp for each TXG.
With this information, it becomes possible to perform scrubs based on specific date ranges, improving the granularity of data management and recovery operations.
Description
To achieve this, we implemented a round-robin database that keeps track of time. We separate the tracking into minutes, days, and years. We believe this provides the best resolution for time management. This feature does not track the exact time of each transaction group (txg) but provides an estimate. The txg database can also be used in other scenarios where mapping dates to transaction groups is required.
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.