Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transfer_manager sliced download is slow on cloud run #1093

Closed
gdhananjay opened this issue Jul 25, 2023 · 4 comments
Closed

transfer_manager sliced download is slow on cloud run #1093

gdhananjay opened this issue Jul 25, 2023 · 4 comments
Assignees
Labels
api: storage Issues related to the googleapis/python-storage API.

Comments

@gdhananjay
Copy link

gdhananjay commented Jul 25, 2023

My aim is to achieve more download speed. 1GB file 4 cpu 8 GB memory using serverless method cloud run.

First option is Python client library as my application is in python, Download never goes beyond 67 MB/s . Tried with many combination of number of processes and chunk sizes using transfer manager. Attached are stats. My basic doubt is for 48 worker count it should at least consume more CPU. It seems it's not consuming more cpu.

Source code:

from google.cloud.storage import Client, transfer_manager
from datetime import datetime, timezone
import os

storage_client = Client()

bucket = storage_client.bucket('myBucket')
chunk_list = [33554432, 52428800, 78905344, 104857600]
work_list = [4, 8, 16, 22, 32, 48]
for chunk in chunk_list:
    for worker in work_list:
      blob = bucket.blob('data_sets/my1GbFile')
      print('download started: ', 'worker:', worker, 'chunk_size: ', chunk)
      start_time = datetime.now(timezone.utc)
      transfer_manager.download_chunks_concurrently(blob, '/tmp/myTmpFile_' + str(worker) + '_' + str(chunk), chunk_size=chunk, max_workers=worker)
      deltaTime = (datetime.now(timezone.utc) - start_time)
      executionTimeMilliSec = (round(deltaTime.total_seconds()*1000))
      print('download completed: ', worker, chunk, executionTimeMilliSec)
      os.remove('/tmp/myTmpFile_' + str(worker) + '_' + str(chunk))

Output stats on cloud shell :
output_cloudshell.txt

Output stats on cloud run:

gen 2- stats.csv

It is possible to verify download speed and if really using all cpu cores on cloud run. Fact is this work as expected on cloud shell.

Could you guide me on how to reach out cloud run support, I already raised it in forum
https://www.googlecloudcommunity.com/gc/Serverless/cloud-bucket-blob-download-is-very-slow-in-cloud-run/m-p/614852/highlight/true#M1926

Note: i tried cloud run with 8 cpu and 16 GB ram still no change in stats.

attached my dockerfile for cloud run:

Dockerfile.txt

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Jul 25, 2023
@andrewsg
Copy link
Contributor

Can you verify that your Cloud Run instance can achieve higher download speeds from sources other than Cloud Storage? I'm unclear about the baseline performance expectations of Run instances.

@gdhananjay
Copy link
Author

I will try on download from other than gcs. I also don't see any special comment on baseline performance in docs. I am searching it from last 4-5 days. Is it possible to check from your side ?
Actually this is very important, this can change track of my project. If thats the case it's also good to document in all gcs client libraries.

@andrewsg
Copy link
Contributor

No, I do not have any information on Cloud Run performance I'm afraid. I would recommend actually performing realistic tests of downloading data from other sources from your actual application and seeing if any of them get substantially faster than your performance from GCS.

@gdhananjay
Copy link
Author

I tried same container on GKE and Local machine and on cloud run. It seems cloud run is slow with download/upload speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API.
Projects
None yet
Development

No branches or pull requests

2 participants