Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry device mapper and cryptsetup errors #1721

Merged
merged 1 commit into from
Aug 7, 2023

Conversation

anmaxvl
Copy link
Contributor

@anmaxvl anmaxvl commented Apr 7, 2023

Occasionally /dev/sd* devices arrive late and not available at the time when verity or dm-crypt targets are created. This commit introduces a CreateDevice wrapper which can retry the operation on specific errors and always retries cryptsetup once, but with a large retry timeout.

if err != nil {
return "", errors.Wrapf(err, "failed to create dm-verity target. device=%s", devPath)
return "", fmt.Errorf("frailed to create dm-verity target for device=%s: %w", devPath, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specific to container layer devices? If so, would it be possible to just make an error message that says something like "container layer for container [X] could not be mounted in time. Retrying may resolve this issue.". This way the person consuming the error knows a little bit more specifically what happened and how to move forward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily, at this point GCS isn't aware if it's a container layer being mounted or not. The host, however, is. The host side error message will have something like: failed to add LCOW layer: failed to add SCSI layer: failed to modify UVM with new SCSI mount: guest modify: guest RPC failure: frailed to create dm-verity target for device=/dev/sda: device-mapper table load: no such device: unknown, so I don't we need to word this differently.

}
// check retry-able errors
for _, e := range errs {
if errors.Is(dmErr.Err, e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a non-retriable error is encountered, does it make sense to fail fast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what it does, we loop through possible errors and in case we hit a non-retriable error, we finish the loop and return error on L255

@anmaxvl anmaxvl force-pushed the retry-dev-mapper-create-device branch 4 times, most recently from 2e176ac to 1d8d651 Compare April 10, 2023 23:51
@anmaxvl anmaxvl marked this pull request as ready for review April 11, 2023 04:09
@anmaxvl anmaxvl requested a review from a team as a code owner April 11, 2023 04:09
@anmaxvl anmaxvl force-pushed the retry-dev-mapper-create-device branch from 1d8d651 to 9a4a019 Compare May 5, 2023 01:15
@anmaxvl anmaxvl force-pushed the retry-dev-mapper-create-device branch from 9a4a019 to ee8f293 Compare June 6, 2023 00:45
Occasionally /dev/sd* devices arrive late and not available at the
time when verity or dm-crypt targets are created. This commit
introduces a `CreateDevice` wrapper which can retry the operation
on specific errors and always retries cryptsetup once, but with
a large retry timeout.

Signed-off-by: Maksim An <[email protected]>
@anmaxvl anmaxvl force-pushed the retry-dev-mapper-create-device branch from ee8f293 to 89e4738 Compare August 7, 2023 17:05
@anmaxvl anmaxvl merged commit 0423eec into microsoft:main Aug 7, 2023
@anmaxvl anmaxvl deleted the retry-dev-mapper-create-device branch August 7, 2023 17:44
princepereira pushed a commit to princepereira/hcsshim that referenced this pull request Aug 29, 2024
Occasionally /dev/sd* devices arrive late and not available at the
time when verity or dm-crypt targets are created. This commit
introduces a `CreateDevice` wrapper which can retry the operation
on specific errors and always retries cryptsetup once, but with
a large retry timeout.

Signed-off-by: Maksim An <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants