API discussion: keeping duplicated methods `xyz`/`xyz_async` vs having an async kwarg? #2598

bsipocz · 2022-11-23T22:25:34Z

While I was reviewing #2597, I run into this quasi-dilemma, but I raise it as a separate question for @keflavich and @andamian, as well as @ceb8.

We have a lot of historical baggage around our API, and I would like to revisit some of it to see whether they still make sense, or we should slowly deprecate them out. At some point we need to switch versioning anyway, so one big breath and breaking API doesn't feel off the table.

So, the question for async and sync jobs: we traditionally have method_async and method, auto-generated with async_to_sync. But some modules don't follow this, and some modules manually have duplicated methods. So I wonder, now as we move on towards using proper VO tools, and pyvo, whether we want to revisit our API preferences, and would rather go with this kwarg-driven approach? Or the method duplication is better? (And frankly, it's baked into the API too much already that if it's the same, we could just stick with it).

I very much interested in the take from Adrian, whether it made any sense to you while redoing e.g. alma, or it was something to go along with?

The text was updated successfully, but these errors were encountered:

andamian · 2022-11-24T03:58:48Z

alma has only sync access but cadc has both. The magic of async_to_sync is more trouble than it's worth it. I just realized some mistakes I've made in cadc now that I'm specifically looking at this.

The sync case is simple so I'm not going to discuss it. For the async one, the main difference is that the control returns to the client while the request is being processed but if followed back (and successful) it can return same/similar result. To me, this looks very similar to the stream attribute in requests functions: by default query_xyz should return the results (whether behind the scene is sync or async) but if the user wants more control and service supports async requests, it could return from the method before the request is completed and let user take over (poll the service and access results when done).

With async requests, there's also the question of the request ID. In IVOA, services return a job with an ID. With that job ID, the user can check the status of the job and access the results when job is completed. astroquery might need to recommend a function that takes a job ID and returns job details data associated with it that can be executed any time. In cadc, we expose the pyvo.Job class which works with VO services but probably not with others.

To summarize, if backwards compatibility is not an issue I would get rid of async_to_sync and add the optional async parameter to methods that use async services. Mind you, my experience is mainly with VO services (which BTW could have a default implementation to achieve that.) and I don't know if it works with others.

keflavich · 2022-11-24T08:19:10Z

Using a kwarg instead of the async-to-sync magic makes sense to me, so if we want to start overhauling services, we can do that.

My vague sense is that the _async methods have not been widely used outside of development and debugging.

We should do a little research on this.

https://github.com/search?q=query_region_async&type=Code turns up almost all hits to astroquery tests in various forks.

query_region is harder to search for, but
https://github.com/search?q=simbad.query_region&type=Code
turns up several user use-cases, while https://github.com/search?p=2&q=simbad.query_region_async&type=Code is just astroquery forks.

So if we want to make a change, let's spell out the proposed API here and discuss further. I'm hearing: remove _async methods, and instead add async=True/False. For most services, the return will still be a requests response object. Is that correct?

andamian · 2022-11-25T23:43:10Z

Yes. Missing async or async=False will still return a request.Response. The question is what async=True should return?

For VO services, as I've said, it could return a UWS job. The pattern is well documented.

Do we have any idea/examples of async non-VO services? I just assume that they return a URL where the data is being staged. The URL is inaccessible until data becomes available. At least this is the pattern that ALMA used to have if I remember correctly. If that's the case, the simplest is to assume that this is what the return should be - a response with a redirect URL that the client needs to check for when the results become available. We'll need to figure out if that's possible to achieve with VO services too. With these services we might want to return more details about the jobs for clients that could make use of that info.

keflavich · 2022-11-26T08:44:52Z

No, if we're talking about a unified method, async=False returns the parsed table, not a response object.

The original idea for async was to enable asynchronous/parallel downloads from normal HTTP web services, not from VO. We only later encountered services that actually had separate query & staging processes, and those were not handled through the async mechanism. besancon is one example, iirc.

ceb8 · 2022-11-28T10:24:30Z

@keflavich My experience (primarily with MAST, but I've been thinking about the construction while working on the new NEOCC service too) matches with yours in that the async versions have been primarily useful for debugging for modules that don't have a truly asynchronous option.

bsipocz · 2023-09-27T21:01:17Z

@andamian:

Yet another occasion when I'm running into the async vs sync question:

at least in the TAP cases, shouldn't we default to run an async search instead, and maybe expose sync to the users? Currently we default to pyvo's default, which looks to be sync.

andamian · 2023-09-27T23:32:04Z

Not an easy choice, especially with TAP. sync operations are those that return fast (2min is probably a good rule of thumb). More than that and the connection might time out. The async mode with that in mind.

For TAP, the duration of the query is not deterministic, so async might be a safer bet. Besides increased the increased complexity of dealing with jobs, there are other drawbacks: the response time (results are only ready when the job completes and data is staged) and possible restrictions on the size of the staged data.

The client can also implement a hybrid mode, where the API is sync, but underneath uses the service async endpoint to start jobs and periodically check their status until they complete. In this mode they do not have to deal with the jobs.

Does this help at all?

bsipocz · 2023-09-27T23:47:52Z

OK, gotcha, this helps a lot, thank you.
So, based on the archive preference, we may have different defaults whether we default to sync or async (e.g. we may want to do async for irsa). However, I think there is a fundamental agreement that we should move away from duplicating methods (let alone when they are not even real async ones), in favor of a kwarg.

bsipocz added the needs-discussion label Nov 23, 2022

bsipocz mentioned this issue Nov 23, 2022

eHST queries and data using TAP and minor fixes in ESA modules #2597

Merged

weaverba137 mentioned this issue Feb 10, 2023

async_to_sync should find and replace sections anywhere in the docstring #2664

Open

bsipocz mentioned this issue Apr 27, 2023

DOC: make signatures more explicit in MAST docstrings #2665

Open

This was referenced Sep 6, 2023

ENH: ipac.irsa backend refactoring to use TAP rather than Gator #2823

Merged

WIP - SIMBAD refactor: remove query_X #2245

Closed

bsipocz mentioned this issue Jun 1, 2024

WIP: NRAO Archive Query replacement #3015

Draft

3 tasks

bsipocz mentioned this issue Oct 18, 2024

Refactor astroquery.heasarc to use VO protocols #2997

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API discussion: keeping duplicated methods `xyz`/`xyz_async` vs having an async kwarg? #2598

API discussion: keeping duplicated methods `xyz`/`xyz_async` vs having an async kwarg? #2598

bsipocz commented Nov 23, 2022

andamian commented Nov 24, 2022

keflavich commented Nov 24, 2022

andamian commented Nov 25, 2022 •

edited

Loading

keflavich commented Nov 26, 2022

ceb8 commented Nov 28, 2022

bsipocz commented Sep 27, 2023

andamian commented Sep 27, 2023

bsipocz commented Sep 27, 2023

API discussion: keeping duplicated methods xyz/xyz_async vs having an async kwarg? #2598

API discussion: keeping duplicated methods xyz/xyz_async vs having an async kwarg? #2598

Comments

bsipocz commented Nov 23, 2022

andamian commented Nov 24, 2022

keflavich commented Nov 24, 2022

andamian commented Nov 25, 2022 • edited Loading

keflavich commented Nov 26, 2022

ceb8 commented Nov 28, 2022

bsipocz commented Sep 27, 2023

andamian commented Sep 27, 2023

bsipocz commented Sep 27, 2023

API discussion: keeping duplicated methods `xyz`/`xyz_async` vs having an async kwarg? #2598

API discussion: keeping duplicated methods `xyz`/`xyz_async` vs having an async kwarg? #2598

andamian commented Nov 25, 2022 •

edited

Loading