Improve speed of DBS3Reader APIs #11098

vkuznet · 2022-04-15T12:11:32Z

Impact of the new feature
Current implementation of DBS3Reader API relies on sequential access of data from DBS. Even though it works just fine for heavy populated datasets/blocks it can be very slow especially when we require to fetch information from lots of blocks, e.g. parentage information. This ticket should address the speed of DBS3Reader APIs by refactoring codebase to take advantage of concurrency in API calls

Is your feature request related to a problem? Please describe.
For heavily populated datasets (with lots of blocks) I found that certain APIs, e.g. listDatasetFileDetails is very slow, see dmwm/dbs2go#5

Describe the solution you'd like
Refactor code to take advantage of concurrent (parallel) execution of APIs

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
See this ticket: dmwm/dbs2go#5

The text was updated successfully, but these errors were encountered:

vkuznet · 2022-04-18T11:52:29Z

The proposed approach in PR#11099 can be applied to the following APIs:

listFilesInBlockWithParents
listFileBlockLocation
getParentFilesGivenParentDataset
getParentFilesByLumi
findAndInsertMissingParentage
fixMissingParentageDatasets

All of them uses sequential calls to DBS APIs via for loop. These calls can be parallelized which will significantly reduce time spent in a given API. I suggest to provide individual PRs for each listed APIs.

vkuznet · 2022-04-19T11:16:55Z

@amaltaro I think this issue should stays open until we merge #11099 or even longer until I provide other PRs for different APIs

vkuznet · 2022-04-21T18:37:37Z

PR #11099 addresses first three APIs:

listFilesInBlockWithParents
listFileBlockLocation
getParentFilesGivenParentDataset

The getParentFilesByLumi is partially covered by parallel execution, and last two APIs findAndInsertMissingParentage and fixMissingParentageDatasets will require more significant effort to make their code execute concurrently.

amaltaro · 2022-04-25T22:17:06Z

Reopening it because the right PR to fix it is being discussed in #11099

vkuznet added New Feature latency improvement Enhancement DBS scalability labels Apr 15, 2022

vkuznet self-assigned this Apr 15, 2022

vkuznet mentioned this issue Apr 15, 2022

speed up listDatasetFileDetails API #11099

Merged

vkuznet mentioned this issue Apr 18, 2022

move ckey/cert functions to Utils.CertTools #11101

Merged

amaltaro closed this as completed in #11101 Apr 19, 2022

amaltaro reopened this Apr 25, 2022

amaltaro closed this as completed in #11099 May 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve speed of DBS3Reader APIs #11098

Improve speed of DBS3Reader APIs #11098

vkuznet commented Apr 15, 2022

vkuznet commented Apr 18, 2022

vkuznet commented Apr 19, 2022 •

edited

Loading

vkuznet commented Apr 21, 2022

amaltaro commented Apr 25, 2022

Improve speed of DBS3Reader APIs #11098

Improve speed of DBS3Reader APIs #11098

Comments

vkuznet commented Apr 15, 2022

vkuznet commented Apr 18, 2022

vkuznet commented Apr 19, 2022 • edited Loading

vkuznet commented Apr 21, 2022

amaltaro commented Apr 25, 2022

vkuznet commented Apr 19, 2022 •

edited

Loading