Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DownloadTo function very slow #6205

Open
stasasekulic opened this issue Nov 10, 2024 · 3 comments
Open

DownloadTo function very slow #6205

stasasekulic opened this issue Nov 10, 2024 · 3 comments
Assignees
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. pillar-performance The issue is related to performance, one of our core engineering pillars. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)

Comments

@stasasekulic
Copy link

stasasekulic commented Nov 10, 2024

Hi!

I've integrated latest azure sdk for cpp into my application and I have noticed that DownloadTo function is working very slow.
After some comparison with old SDK, I have noticed that latest SDK was using BOOST library and DownloadTo could run async.
Also I have noticed that new SDK has Concurrency parameter in the transfer options, but changing it did affect speed at all.

Is there a way how this function could be speed up, its ~5x slower compared to the old SDK?

Update: When downloading whole blob at once it is fast, but when I has to be downloaded partially in ~10-100mb chunks, then it is way slower

Thanks in advance!

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Nov 10, 2024
@ahsonkhan
Copy link
Member

What is the versions of the old and new SDKs you are comparing against, where you noticed a performance difference? Are you installing azure-storage-blobs-cpp from vcpkg?

Could you share some more detail about what you are observing:

  • rough size/quantity of the blobs
  • what OS/platform you are running on
  • whether you are using a custom HTTP transport or relying on the built-in one that comes by default
  • and sample code snippet showcasing how you constructed the client and SDK method calls you are making?

We don't depend on/use the boost library in our track 2 storage SDKs (the packages shipping out of this repo). Maybe you are referring to the older/track 1 SDKs based on cpprestsdk?
https://github.com/Azure/azure-sdk-for-cpp/blob/b74d9c36be7f1e3b39de4767b2c26e06490a3d1c/sdk/storage/MigrationGuide.md#migration-benefits

https://learn.microsoft.com/en-us/azure/storage/blobs/quickstart-blobs-c-plus-plus?tabs=managed-identity%2Croles-azure-portal

@ahsonkhan ahsonkhan added pillar-performance The issue is related to performance, one of our core engineering pillars. Storage Storage Service (Queues, Blobs, Files) labels Nov 12, 2024
@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Nov 12, 2024
@stasasekulic
Copy link
Author

stasasekulic commented Nov 12, 2024

  • Blob size : ~1-10GB
  • OS : Linux
  • Relying on the built-in one that comes by default
size_t BlobFile::Read(uint8_t* buf, size_t length) 
{       
   options.Range.Value().Length= length;
   options.Range.Value().Offset= current_position;

    auto downloadResponse= m_blob_client.DownloadTo(buf, length, options);

    auto read_bytes= downloadResponse.Value.ContentRange.Length.Value();

    if(read_bytes > 0) {
        current_position += read_bytes;
        if(current_position == blob_size) {
            is_EOF= true;
        }
    }
    else {
        is_EOF= true;
    }

    return read_bytes;
}

In the init I set Concurrency option to 80, get blob size,... Nothing special.
For auth I'm using OAuth ClientSecretCredential which I set before the start and then afterwards I use it.

I'm aware that you are not using CPPREST or BOOST, I'm using later azure sdk for cpp.
I build SDK it manually and then I use it.

After some investigation looks like "the problem" is that I am downloading blob piece by piece and pass it to another layer. I'm not downloading blob in one take, neither I download it to file. Also when I compared old sdk there was option OPEN_READ and blob could also be downloaded in pieces but it worked faster.

@stasasekulic
Copy link
Author

Solution that looks like it works for now was to download at least 4mb in internal buffer, then read from that buffer. After the buffer is empty, download new chunk.
Splitting in too small chunks resulted with speed downgrade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. pillar-performance The issue is related to performance, one of our core engineering pillars. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

4 participants