Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for downloading object, be pessimistic and check how many bytes got c… #3790

Closed

Conversation

su47flying
Copy link

i got the same issue with [#2805]. Compact
i find some download object's length was not same with it's real length(on object storage).
this PR will check the copy lenght, if got the length was not equal to the real length, thanos will do retry.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

in my env, after we apply this code, it works well.
i can get some retry log for this code.

@@ -194,9 +194,17 @@ func DownloadFile(ctx context.Context, logger log.Logger, bkt BucketReader, src,
}()
defer runutil.CloseWithLogOnErr(logger, f, "download block's output file")

if _, err = io.Copy(f, rc); err != nil {
atts, err := bkt.Attributes(ctx, src)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will double the number of API calls required to download a file. I don't think we should do it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if don't do it. sometimes return value of io.Copy was less than the legnth of on the object storage.
the file was not complete, but was not err be return.
so ,the thanos compact will change state to [halt]. halt meas ....
if do it, thanos compact will retry.
i don't know some graceful approach to fix this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if don't do it. sometimes return value of io.Copy was less than the legnth of on the object storage.

Looks an issue with a specific backend storage that shouldn't be fixed here. Which backend storage are you using?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm I believe this is not the right place to address it, for a couple of reasons:

  1. We shouldn't introduce additional API calls
  2. If the reader returned by S3 Get() can return partial response without erroring out, then the problem is widespread across the whole codebase (any usage of Get()) so fixing it here would just cover a small amount of use cases

Checking the logic to have a better proposal

Copy link
Contributor

@pracucci pracucci Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: #2805 (comment)

Let's continue the discussion on the issue.

@su47flying su47flying force-pushed the check-download-what-was-copied branch from 8932e32 to 3a19a59 Compare February 12, 2021 10:39
Base automatically changed from master to main February 26, 2021 16:31
@stale
Copy link

stale bot commented Jun 2, 2021

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Jun 2, 2021
@stale stale bot closed this Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants