You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Impact of the new feature
Current implementation of DBS3Reader API relies on sequential access of data from DBS. Even though it works just fine for heavy populated datasets/blocks it can be very slow especially when we require to fetch information from lots of blocks, e.g. parentage information. This ticket should address the speed of DBS3Reader APIs by refactoring codebase to take advantage of concurrency in API calls
Is your feature request related to a problem? Please describe.
For heavily populated datasets (with lots of blocks) I found that certain APIs, e.g. listDatasetFileDetails is very slow, see dmwm/dbs2go#5
Describe the solution you'd like
Refactor code to take advantage of concurrent (parallel) execution of APIs
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
The proposed approach in PR#11099 can be applied to the following APIs:
listFilesInBlockWithParents
listFileBlockLocation
getParentFilesGivenParentDataset
getParentFilesByLumi
findAndInsertMissingParentage
fixMissingParentageDatasets
All of them uses sequential calls to DBS APIs via for loop. These calls can be parallelized which will significantly reduce time spent in a given API. I suggest to provide individual PRs for each listed APIs.
The getParentFilesByLumi is partially covered by parallel execution, and last two APIs findAndInsertMissingParentage and fixMissingParentageDatasets will require more significant effort to make their code execute concurrently.
Impact of the new feature
Current implementation of DBS3Reader API relies on sequential access of data from DBS. Even though it works just fine for heavy populated datasets/blocks it can be very slow especially when we require to fetch information from lots of blocks, e.g. parentage information. This ticket should address the speed of DBS3Reader APIs by refactoring codebase to take advantage of concurrency in API calls
Is your feature request related to a problem? Please describe.
For heavily populated datasets (with lots of blocks) I found that certain APIs, e.g.
listDatasetFileDetails
is very slow, see dmwm/dbs2go#5Describe the solution you'd like
Refactor code to take advantage of concurrent (parallel) execution of APIs
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
See this ticket: dmwm/dbs2go#5
The text was updated successfully, but these errors were encountered: