Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing methods on cads_api_client.processing.Results object when downloading from cds-beta #105

Open
WeatherGod opened this issue Aug 1, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@WeatherGod
Copy link

What happened?

I have existing code that I'm trying out on cds-beta. The code was based on the example given at https://github.com/ecmwf/cdsapi/blob/master/examples/example-era5-update.py in this repository to do non-blocking downloads of requests. This works fine when pointed to the legacy cds URL, however, it fails if pointed to a cds-beta URL because the method update() doesn't exist, and even if it did exist, the next line would fail because the result object doesn't have a reply attribute.

Old URL:

(Pdb) type(r)
<class 'cdsapi.api.Result'>
(Pdb) dir(r)
['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', 
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 
'_deleted', '_download', '_url', 'check', 'cleanup', 'content_length', 'content_type', 'debug', 'delete', 'download', 'error', 'info', 
'location', 'progress', 'reply', 'retry_max', 'robust', 'session', 'sleep_max', 'timeout', 'toJSON', 'update', 'verify', 'warning']

New URL:

(Pdb) type(r)
<class 'cads_api_client.processing.Results'>
(Pdb) dir(r)
['__annotations__', '__attrs_attrs__', '__attrs_own_setattr__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', 
'__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', 
'__le__', '__lt__', '__match_args__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', 
'__setattr__', '__setstate__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', 'download', 
'from_rel_href', 'from_request', 'get_link_href', 'get_links', 'get_result_href', 'get_result_size', 'headers', 'json', 'location', 
'log_messages', 'reason', 'response', 'session', 'status_code']

What are the steps to reproduce the bug?

Run the code in the examples directory against a cds-beta url.

Version

v0.7.0

Platform (OS and architecture)

linux

Relevant log output

No response

Accompanying data

No response

Organisation

No response

@WeatherGod WeatherGod added the bug Something isn't working label Aug 1, 2024
@ferguskeatinge
Copy link

Has there been any update on this? Really needs to be addressed ASAP

@WeatherGod
Copy link
Author

What I have realized is that the new algorithm basically does the waiting for you, sort of. It isn't perfect, but you can put the wait logic under a if isinstance(r, cdsapi.api.Result): check, and then after that if-block call r.download(filename).

This isn't perfect because one of the things I liked with the original approach was that I could loop over a list of available result objects and check which one was ready to download. Usually, though, they are ready in the order that the requests came in anyway.

@WeatherGod
Copy link
Author

I'll also warn that the .delete() method is also missing in Results. I think the new API autodeletes the requests now, but I'm not sure.

@ALopAlos
Copy link

ALopAlos commented Aug 9, 2024

Thanks. This comment has been noticed and included on our CDS-Beta backlog to be addressed.

@siggemannen
Copy link

I'm also curious what the idea is of the new API. The downloads aren't instant, for example i place a request for a single variable (neutral wind) at a single point of datetime and single measure and it took 13 minutes to become ready.
Is one supposed to hang around waiting for every variable, which in python is not always easy?

The old version had support for just returning the current status and one could check it from time to time and move on in life.

@ferguskeatinge
Copy link

I'm also curious what the idea is of the new API. The downloads aren't instant, for example i place a request for a single variable (neutral wind) at a single point of datetime and single measure and it took 13 minutes to become ready. Is one supposed to hang around waiting for every variable, which in python is not always easy?

The old version had support for just returning the current status and one could check it from time to time and move on in life.

Switch to cads_api_client

@siggemannen
Copy link

@ferguskeatinge you mean https://github.com/ecmwf-projects/cads-api-client? Yeah, i was just trying to figure out how to use it directly, but there's no mention of it in the official docs!

@ferguskeatinge
Copy link

it took me a while to figure out, set the client using your key and the new URL, set the collection, make a request by using collection.submit. From there you can get the request_uid, save it to a .csv then come back later but using client.get_request(request_uid) to download.

@siggemannen
Copy link

siggemannen commented Aug 23, 2024

Yes. I have made it work. For reference, here's a "simple" version of my function:

"""
dataset is cams-global-atmospheric-composition-forecasts or whatever catalogue name your data has
request_data - the request to send to api, with date, variable, time etc...
request_id - this is the internal file id which is used to map to the API request_uid
reqs - this is a map {"request_id": "request_uid"} mapping which we use to keep track of internal and external IDs
maxReqs - this limits how many outstanding requests we can have
queue_wait - how much to sleep between retries
url / token - these are the login info, if not set, it will try to resolve using .cda-file
"""
def __download_from_cda_using_beta__(dataset, request_data, request_id, reqs, maxReqs = 100, queue_wait = 1, url = None, token = None):
    try:
        if url is None:
            from cdsapi.api import get_url_key_verify #fetch key the standard way
            url, token, _ = get_url_key_verify(None, None, None)
        import cads_api_client
        #create client
        client = cads_api_client.ApiClient(token, url)
        #this loads the dataset manager
        collection = client.collection(dataset)
        
        r = None
        #Check value from the queue...
        eraRequest = None
        retry = 0
        if request_id in reqs:
            eraRequest = reqs[request_id]
        if eraRequest is None:
            if len(reqs) > maxReqs: #490we need a buffer so we don't get too many calls
                return "enqueued"
            r = collection.submit(**request_data)
            reqs[request_id] = r.request_uid
            retry = 3
        else:
            r = client.get_request(eraRequest).make_remote() #create a remote object that can manage the request ID
            retry = 1
        #Now, we can ask our result to get data...
        
        for cnt in range(retry):
            
            reply = r.status
            #print(reply)
            if reply == 'completed' or reply == 'successful':
                r.download(file_path)
                del reqs[request_id] #remove since we're done with it
                #r.delete() #this is done automatically it seems
                return "Ok"
            elif reply == 'running' or reply == "accepted":
                if retry > 1:
                    time.sleep(2) #sleep a bit longer cause it's probably finished soon
            elif reply == 'queued':
                if retry > 1:
                    time.sleep(queue_wait)
            elif reply == 'failed' or reply == 'unavailable':
                del reqs[request_id] #restart request
                #r.delete() #not sure if this ever occurs
                break
            else:
                print(reply)
            #otherwise continue
        return reply
    except Exception as e:
        traceback.print_exc()     

I haven't figured out how to delete requests from the web nor how to disable logging, but maybe it's not possible yet?

@ti-sanjid
Copy link

it took me a while to figure out, set the client using your key and the new URL, set the collection, make a request by using collection.submit. From there you can get the request_uid, save it to a .csv then come back later but using client.get_request(request_uid) to download.

@ferguskeatinge can you elaborate how did you do that? may be share the code?

@ti-sanjid
Copy link

ti-sanjid commented Aug 30, 2024

thanks @siggemannen !!!

here's a simpler version.

import cads_api_client
url='https://cds-beta.climate.copernicus.eu/example' # insert your url here
key='106f4c03-example-419e-b662-03c7f2example7'# insert your key here

client=cads_api_client.ApiClient(key ,url)
request_id='531cc190-b799-example-885c-20b992966881' # insert the reques id of your file
r=client.get_request(request_id).make_remote()
reply=r.status
if reply == 'completed' or reply == 'successful':
    r.download('/your/directory/location/ERA5 Data/test.nc') # insert your directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants