Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are the genericMode (write & verify) tests data destructive? #171

Open
stevecs opened this issue Nov 26, 2024 · 4 comments
Open

Are the genericMode (write & verify) tests data destructive? #171

stevecs opened this issue Nov 26, 2024 · 4 comments

Comments

@stevecs
Copy link

stevecs commented Nov 26, 2024

Documentation needs to be more detailed/clarified as to what commands can be destructive of user data. I understand that the openSeaChest_GenericTests --genericMode=read does not appear to be data destructive but only the READ option is detailed. the write and verify options have no information as to their operation.

Basically I was looking for the drive to read the data in a sector, and write the same data /back/ to the same sector and verify that data is correct. so should NOT be data destructive assuming no errors and OSC is the only process accessing the drive/LBA in question. To locate and remap (with --repairOnFly or at end) (which I know this part would lose lose data when the sector was remapped if data could not be read correctly).

But this is not mentioned anywhere. Can anyone elaborate?

@vonericsen
Copy link
Contributor

Hi @stevecs,

I will expand the details a bit in the help.

For the --genericMode option it switches between read commands, write commands (using pattern of all zeroes), and the verify command during the test.
Write is the only one that is data destructive (and requires an extra option to confirm this before it runs).

The difference between read and verify is this:

  • read = issues a read command to bring data from the drive to the system. May read from cache depending on state of read-look-ahead and write-cache on the drive.
  • verify = issues the verify command (or read-verify for ATA drives) which forces the drive to read this location on the medium and return a pass/fail if it was able to be read or not. No data is transferred to the host system. If the most recent version of the data for this sector is still in write-cache, this forces the drive to first write it to the drive, then read it to verify it wrote to the medium properly and without error. One note was we implemented this as an option to assist with slower USB attached drives to be able to test more without being limited to the interface transfer speed, but it can be used on any interface.

If that is a better explanation of those options, or you can provide some other feedback about this, I will make sure that is taken into account when updating the help outut.

Basically I was looking for the drive to read the data in a sector, and write the same data /back/ to the same sector and verify that data is correct. so should NOT be data destructive assuming no errors and OSC is the only process accessing the drive/LBA in question. To locate and remap (with --repairOnFly or at end) (which I know this part would lose lose data when the sector was remapped if data could not be read correctly).

There is not currently an option in openSeaChest to do this, but I do have a very similar write-read-compare test in my internal issue tracker to implement. I can look into a version that would preserve existing data which I think would meet your needs.
Are you trying to do this for individual sectors? Ranges? a full drive? Just curious as I think I can reasonably make this part of how I implement the write-read-compare.

If you are using a SATA drive, there is a feature called "Write-Read-Verify" that can be enabled on the drive as well. What this feature does is it instructs the drive to write the data, read it back, and compare it to the original from the host for every sector it writes. This is all done in drive firmware though so it's not anything more than a feature to enable from openSeaChest, but it is in openSeaChest_Configure. There are a few more details and specific modes to how this works, but it is something I know data centers want on their drives for data-integrity purposes. I can expand on this information if you would like more detail, just let me know!

@stevecs
Copy link
Author

stevecs commented Nov 27, 2024

@vonericsen Thank you for the response/write up. That does help clear up some of the confusion as to what each command does. And actually with your description of the verify command (by it doing it internal to the drive) can be a very useful item as I test batches of drives (50-100 or more at a time). so interfaces speeds become an issue even when spread across several HBA's. (average internal drive speeds being 100-280MB/s (inner/outer) and say going through a 12Gbps (48Gbps) connection the transport becomes the bottleneck.

Adding such information to the man page/help output would be very useful for 'future me' and anyone else that may have similar questions.

As for your test programs/scenarios. Both concepts (write; read; compare) on a 'blank' or drive where data preservation is not needed sounds perfect for when I get a pallet of 'new' drives or want to basically run through sector tests to find (and ideally remap) any bad sectors.

On drives that are in situ and some for long periods having an option that is NOT data destructive by default (i.e. read; write; compare) so basically keeping the same information in each sector, will allow testing of 'weak' sectors and move them out. In this case normally the drives are part of a higher-level data integrity system (i.e. ZFS; BTRFS; or similar). So IF a sector gets remapped incorrectly or data written back incorrectly can be 'healed' by the upper level process (assuming that errors across drives statistically not going to be in the same file for example).

In both cases would probably be looking at using this across the entire drive as a unit. I.e. lba ranges are useful if say I run a full drive test and then find that a bunch of errors are spatially grouped, doing a range is nice. But in that same case I would probably pull the drive anyway in that case if there are enough errors that it would reduce the number of spares per track (on rotational media).

I was not aware of the 'Write Read Verify' on SATA drives. I've seen that or similar on SAS & SCSI devices, never really looked on SATA. Would be interested what that feature is and do some testing. Would increase latency (assuming there is a single head on the armature; so two rotations of the media would be required) but in large near-line storage where latency is not as critical that does sound useful. I come from a datacenter environment and have run into storage errors a lot over the last 40+ years (split/wild writes & reads; latent corruption; weak sectors; etc) so most of my 'home' environments follow closely to the knowledge I've picked up from larger scales. Would be interested in any more detail you can share on that.

@vonericsen
Copy link
Contributor

@stevecs,

Adding such information to the man page/help output would be very useful for 'future me' and anyone else that may have similar questions.

I will add this extra detail to the help and man pages info. I've been working on adding a Wiki for openSeaChest so I'll see if I can find a place for this kind of info on there as well.

On drives that are in situ and some for long periods having an option that is NOT data destructive by default (i.e. read; write; compare) so basically keeping the same information in each sector, will allow testing of 'weak' sectors and move them out. In this case normally the drives are part of a higher-level data integrity system (i.e. ZFS; BTRFS; or similar). So IF a sector gets remapped incorrectly or data written back incorrectly can be 'healed' by the upper level process (assuming that errors across drives statistically not going to be in the same file for example).

Sounds good. One idea (and please let me know from your side if this is a good one or not) to help preserve data would be for openSeaChest to first unmount the drive partitions before it begins. We have code to do this already at erase time (helps prevent "ghost" files cached in RAM among other strange behaviors we've run into). I don't think we currently have code to remount, but I don't think that would be an issue to add after it runs this operation.
I know ZFS is a bit different and our unmount code does not support zfs currently (will not be unmounted and does not cause an error), but I don't think that would be an issue in that kind of situation, but I am curious from your experience if this would make sense or not. I don't have any personal experience with BTRFS so I'm not sure how that would work, but anything you think might be helpful for me to know would also be good to share.

I was not aware of the 'Write Read Verify' on SATA drives. I've seen that or similar on SAS & SCSI devices, never really looked on SATA. Would be interested what that feature is and do some testing. Would increase latency (assuming there is a single head on the armature; so two rotations of the media would be required) but in large near-line storage where latency is not as critical that does sound useful. I come from a datacenter environment and have run into storage errors a lot over the last 40+ years (split/wild writes & reads; latent corruption; weak sectors; etc) so most of my 'home' environments follow closely to the knowledge I've picked up from larger scales. Would be interested in any more detail you can share on that.

SAS and SCSI have the Write and Verify commands which basically do this on each write that you issue with this specific command and is similar in that regard. FYI SATA drives attached to SAS/SATA HBAs will translate this to a write followed by read-verify to perform the same behavior.

The Write-Read-Verify (WRV) feature on SATA has been around since at least ATA8-ACS in 2008.
The option in openSeaChest_Configure is --wrv which can dump information about the feature and it's current state with info: --wrv info or you can configure it with all, vendor, or some number of sectors to do this on.
The SATA feature has 4 different modes which gets calculated based on what input you give to the number of sectors (or all).
You can also disable it with this option: --wrv disable
The spec says enabling this may reduce performance and it also says that if write caching is enabled the drive may report completion before it has completed writing to the medium (this is to allow it to mitigate the performance impact of this feature). If write cache is disabled, then it will not complete the command until the write and verification have been completed (most likely 2 revs of the disk unless the drive has some way to otherwise improve performance).
It also notes that if an unrecoverable error is encountered during the write, read, or verification then it sets the device fault bit. This is different than a standard abort/unrecoverable scenario as once a device has hit the fault condition it cannot be cleared until it is power cycled (FYI ACS-6 is considering a behavior change to allow it to be cleared without power cycling the drive to make it easier to recover in data centers). So keep this in mind as well when thinking about this feature.

The spec says this feature's state is volatile, so each time the drive comes back from a power on reset or hardware reset it reverts to default settings (which is most likely this feature being disabled). So you would probably want a script to enable this on system boot. The -i output in openSeaChest should tell you if WRV is supported and whether it is enabled or not. --wrv info will give more specific details about how it is configured.

--wrv all sets it to run on all writes to the drive.
--wrv vendor sets it to run on the vendor's uniquely defined number of sectors from the time it is enabled until it is has written this many sectors. You can only reliably get the count for this once enabled to this mode though (it's how the standard is written).
--wrv someNumberHere sets it to do reads after however many sectors is input up to a maximum of 261120 sectors. This will select which mode best matches your input (WRV mode 1 is 65536 sectors, mode 3 is any other value). Once this many sectors have been written and followed by a verification, then the verification will stop being performed on any following writes (feature will still be enabled).

I'm sure there is history for why there are these different modes, but I do not know what the history would be since this is from before I stated at Seagate. Generally when a customer wants this these days, they tend to want the all mode.

@stevecs
Copy link
Author

stevecs commented Nov 27, 2024

@vonericsen

Sounds good. One idea (and please let me know from your side if this is a good one or not) to help preserve data would be for openSeaChest to first unmount the drive partitions before it begins. We have code to do this already at erase time (helps prevent "ghost" files cached in RAM among other strange behaviors we've run into). I don't think we currently have code to remount, but I don't think that would be an issue to add after it runs this operation. I know ZFS is a bit different and our unmount code does not support zfs currently (will not be unmounted and does not cause an error), but I don't think that would be an issue in that kind of situation, but I am curious from your experience if this would make sense or not. I don't have any personal experience with BTRFS so I'm not sure how that would work, but anything you think might be helpful for me to know would also be good to share.

This is interesting from a technical point of view but in practice I don't really see the benefit of having the drive unmount the partitions before hand and could cause some issues. I /think/ this may be from the perspective that a partition on a drive directly relates to a mounted filesystem/partition on the host system. That scenario doesn't really happen much in any of my use cases (i.e. the drive's partitions are 'wrapped' in several other layers, going to a volume manager, then a filesystem (or combined like in zfs) and then to the host to be mounted. So even if something like that could be worked out can cause much larger issues (i.e. if you're taking out say 1 drive on an array of 200 then the resultant pool/virtual device would have much larger impacts). On a home system probably not much of an issue (where hd partition = mounted partition) though still in that case if the partition in question happens to be say the OS partition then another issue.

In either case to determine if the drive(s) are quiesced would still be up to the storage admin to determine. I could see perhaps a warning if you were able to determine if the drive had mounted handles but not really to take action.

Generally in operation what I would normally do is to either stop all i/o to the pools at a minimum, or export/shut down the higher level wrapped functions preferably, and then run against drives in parallel when testing many of them. if testing a small subset would then just detach those drives then re-attach to the underlying virtual block device and if remaps occurred run a scrub, or if not let the next scheduled scrub handle anything (to reduce unneeded i/o to the pool).

The Write-Read-Verify (WRV) feature on SATA has been around since at least ATA8-ACS in 2008. The option in openSeaChest_Configure is --wrv which can dump information about the feature and it's current state with info: --wrv info or you can configure it with all, vendor, or some number of sectors to do this on. The SATA feature has 4 different modes which gets calculated based on what input you give to the number of sectors (or all). You can also disable it with this option: --wrv disable The spec says enabling this may reduce performance and it also says that if write caching is enabled the drive may report completion before it has completed writing to the medium (this is to allow it to mitigate the performance impact of this feature). If write cache is disabled, then it will not complete the command until the write and verification have been completed (most likely 2 revs of the disk unless the drive has some way to otherwise improve performance). It also notes that if an unrecoverable error is encountered during the write, read, or verification then it sets the device fault bit. This is different than a standard abort/unrecoverable scenario as once a device has hit the fault condition it cannot be cleared until it is power cycled (FYI ACS-6 is considering a behavior change to allow it to be cleared without power cycling the drive to make it easier to recover in data centers). So keep this in mind as well when thinking about this feature.

The performance hit from this makes sense (and what I expected/intuited) I will still do some testing here just out of curiosity on some spare drives. Though I think won't use in any 'production' sense with the other caveats (I do see drives with write cache disabled to force an assurance that data is stored on stable media, as well as with cache enabled can cause issues with the calling application to not be able to resend the data (i.e. app assumes write is done & successful). So would just create more 'cleanup' mess.

This aspect (about putting the drive into a state that requires a power cycle) would be a no-go for me or really anyone in a DC type environment. If ACS-6 is considering changing that, great, but then would still be a no-go for a long while until all drives in the deployed fleet can be assured to have the updates.

The spec says this feature's state is volatile, so each time the drive comes back from a power on reset or hardware reset it reverts to default settings (which is most likely this feature being disabled). So you would probably want a script to enable this on system boot. The -i output in openSeaChest should tell you if WRV is supported and whether it is enabled or not. --wrv info will give more specific details about how it is configured.

Kind of figured this as it's standard when coming back from a power cycle and why there are already either scripts (to set by 'DIY SAN types' or custom firmware images that change the default settings for large sans/OEM drives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants