-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retry device mapper and cryptsetup errors #1721
Conversation
if err != nil { | ||
return "", errors.Wrapf(err, "failed to create dm-verity target. device=%s", devPath) | ||
return "", fmt.Errorf("frailed to create dm-verity target for device=%s: %w", devPath, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this specific to container layer devices? If so, would it be possible to just make an error message that says something like "container layer for container [X] could not be mounted in time. Retrying may resolve this issue.". This way the person consuming the error knows a little bit more specifically what happened and how to move forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessarily, at this point GCS isn't aware if it's a container layer being mounted or not. The host, however, is. The host side error message will have something like: failed to add LCOW layer: failed to add SCSI layer: failed to modify UVM with new SCSI mount: guest modify: guest RPC failure: frailed to create dm-verity target for device=/dev/sda: device-mapper table load: no such device: unknown
, so I don't we need to word this differently.
} | ||
// check retry-able errors | ||
for _, e := range errs { | ||
if errors.Is(dmErr.Err, e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a non-retriable error is encountered, does it make sense to fail fast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what it does, we loop through possible errors and in case we hit a non-retriable error, we finish the loop and return error on L255
2e176ac
to
1d8d651
Compare
1d8d651
to
9a4a019
Compare
9a4a019
to
ee8f293
Compare
Occasionally /dev/sd* devices arrive late and not available at the time when verity or dm-crypt targets are created. This commit introduces a `CreateDevice` wrapper which can retry the operation on specific errors and always retries cryptsetup once, but with a large retry timeout. Signed-off-by: Maksim An <[email protected]>
ee8f293
to
89e4738
Compare
Occasionally /dev/sd* devices arrive late and not available at the time when verity or dm-crypt targets are created. This commit introduces a `CreateDevice` wrapper which can retry the operation on specific errors and always retries cryptsetup once, but with a large retry timeout. Signed-off-by: Maksim An <[email protected]>
Occasionally /dev/sd* devices arrive late and not available at the time when verity or dm-crypt targets are created. This commit introduces a
CreateDevice
wrapper which can retry the operation on specific errors and always retries cryptsetup once, but with a large retry timeout.