Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long time delay associated with head requests to Amazon S3 #2461

Open
bartosz-michalak opened this issue Mar 23, 2020 · 9 comments · May be fixed by #2669
Open

Long time delay associated with head requests to Amazon S3 #2461

bartosz-michalak opened this issue Mar 23, 2020 · 9 comments · May be fixed by #2669

Comments

@bartosz-michalak
Copy link

bartosz-michalak commented Mar 23, 2020

Hello. We use carrierwave to handle attachments stored by Amazon S3. Files in bucket are private. Our API sometimes has to return data about many attachments (100 for instance). In this endpoints we're observing a big delay in response (the more attachments, the longer response time). We've used New Relic to investigate what affects this time, and turns out that HEAD requests are performing one by one (not simultaneously):

image

Moreover, this causes high CPU utilization (sometimes over 50%) - the more S3 requests, the more CPU utilization.
Has anyone encountered a similar problem?

@bartosz-michalak bartosz-michalak changed the title Head requests to Amazon S3 Big delay related to head requests to Amazon S3 Mar 23, 2020
@bartosz-michalak bartosz-michalak changed the title Big delay related to head requests to Amazon S3 Long time delay associated with head requests to Amazon S3 Mar 23, 2020
@alexshk
Copy link

alexshk commented May 26, 2020

Hi. Yes, I have similar issue with carrierwave and fog-aws.

@alexshk
Copy link

alexshk commented May 26, 2020

Solved issue with replacing fog-was with carrierwave-aws gem.

@felipefava
Copy link

Had the same issue. I used carrierwave-aws as @alexshk mentioned, and is faster now. Thanks

@marlonjf
Copy link

@alexshk @felipefava do you still have the HEAD requests with carrierwave-aws?

For me it's too slow with fog-aws or carrierwave-aws, both have the requests

@mshibuya
Copy link
Member

Anyone still having the issue?
I looked at the code but both fog-aws and carrierwave-aws are implemented to avoid head requests. So my guess is something in the user code is triggering the head request...
If you can make the head request intentionally fail and get the stack trace, it will greatly help tracking down what triggered the request.

@rajyan rajyan linked a pull request Jul 4, 2023 that will close this issue
@rajyan
Copy link
Contributor

rajyan commented Jul 4, 2023

@mshibuya

Hi, we had a similar problem and looked into this problem deeper with stack profiling.
What we found was, conditional processing is always evaluated even we are not using the specific conditional version. For condition that needs the file content, it caused unexpected requests like this issue.

This is the minimum reproducible example
using the examples from
https://github.com/carrierwaveuploader/carrierwave#getting-started
https://github.com/carrierwaveuploader/carrierwave/wiki/How-to:-Do-conditional-processing

class AvatarUploader < CarrierWave::Uploader::Base
  version :jpg_version, if: :webp? do
    process convert: :jpg
  end

  protected

  def webp?(new_file)
    new_file.content_type == 'image/webp'
  end
end

class User < ApplicationRecord
  mount_uploader :avatar, AvatarUploader
end

# This evaluates the condition of :jpg_version even :jpg_version is not used
# which requests to S3 to get the content_type if storage is remote
User.first.avatar.url

I think that the condition should not be evaluated until the actual version is called on retrieving, but should always be called on storing, ended up with this fix #2669

@mshibuya
Copy link
Member

Thank you for the report. It's very helpful to understand the actual use-case that causes this issue.

But before proceeding I need to confirm one thing. Why do you need to check if the file is webp or not? Isn't it simpler to convert every image file into jpg, regardless of the source format?
You know, checking content type is costly as it will invoke the head request. I'm wondering if doing so is worth enough.

@rajyan
Copy link
Contributor

rajyan commented Jul 15, 2023

Thank you for looking into this issue.
For our very specific use case, we needed to check the content type because

  1. The API we are referring mostly returns jpg images
  2. It sometimes returns webp content with .jpg extension...

We were using conditional processing, because we don't need to convert the image in most cases, and we thought that it would be much more cost efficient (don't need to process nor uploading duplicated jpg files).

The example in above might be a specific usage, but I think that lot of people are doing similar conditional processings, for example, this wiki example needs remote request too https://github.com/carrierwaveuploader/carrierwave/wiki/How-to:-Do-conditional-processing.

Also, we, and I guess most users that are reporting similar issues too, were not aware that the all versions' conditions are checked every time we called the mounted column (#2132). Much more, I believe it is difficult to assume that the HEAD request is called even when we are not using the actual version.

@rajyan
Copy link
Contributor

rajyan commented Aug 2, 2023

@mshibuya
Do you have any thoughts about the above comment?
I still think there should be a way to avoid HEAD requests, especially when only reading and not accessing the version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants