speed up listDatasetFileDetails API #11099

vkuznet · 2022-04-15T12:20:42Z

Status

ready

Description

Use concurrent features to speed up listDatasetFileDetails API. In my local setup I achieve speed up by factor of 5 or more for a specific dataset. Here are benchmark numbers for using /VBF1Parked/Run2012D-v1/RAW dataset. This dataset contains 594 blocks. The current codebase takes approximately 190 seconds to fetch its parents and file lumis. Using proposed solution I achieve the following numbers:

37 seconds using 50 concurrent tasks
41 seconds using 100 concurrent tasks
90 seconds using 10 concurrent tasks

Internally, I used requests python library instead of pycurl_manager.py. The latter is not suitable for concurrent execution since it is not thread safe as it holds global curl object. As such curl options are set on first tasks, but can't be changed (since code set them up) in others until first task is completed. The requests library does not depend on global curl object and curl library itself, and it is suitable for concurrent execution of multiple URL calls.

This PR only addresses speed up of single listDatasetFileDetails DBS3Reader API but other APIs where multiple (sequential) calls to DBS are made can benefit from this approach.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

dmwm/dbs2go#5

External dependencies / deployment changes

relies on Python requests library

vkuznet · 2022-04-15T12:24:32Z

@amaltaro , @todor-ivanov , @klannon this PR speed up single (most expensive) DBS3Reader API by factor of 5 or more. The code shows that concurrent execution of calls to upstream server can provide significant improvement. I suggest that you study this pattern which can be applied to different parts of WMCore codebase where multiple calls to upstream server (DBS, Rucio, etc.) will be required. Even thought it is not in Q2 plan I decided to put this forward as I obtained significant improvements can be made which entire system can benefit from and (more importantly) will allow to work with large datasets and avoid timeouts on CMSWEB.

cmsdmwmbot · 2022-04-15T12:32:24Z

Jenkins results:

Python3 Unit tests: succeeded
- 7 tests added
- 1 changes in unstable tests
Python3 Pylint check: failed
- 27 warnings and errors that must be fixed
- 2 warnings
- 23 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13024/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2022-04-15T12:54:23Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 7 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 25 warnings and errors that must be fixed
- 2 warnings
- 22 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13026/artifact/artifacts/PullRequestReport.html

vkuznet · 2022-04-15T13:03:09Z

@amaltaro , on my second commit I got weird unit test failure which I'm 100% sure is not related to my changes but I want to point you to it since it may show how this unit test became unstable. Please see https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-WMCore-PR-test/13026/testReport/junit/WMCore_t.Services_t.UUID_t/UUIDTest/testUUID/ which shows:

'bcb3' == 'bcb3' : Second component of UUID the same bcb3 != bcb3

I think it is random error since UUID may be close enough during random process generation. Moreover, the message of test is wrong. I suggest that we address this unit issue separately. I'll trigger test again.

vkuznet · 2022-04-15T13:03:19Z

test this please

amaltaro

Valentin, even though this looks like a great performance boost, I do not think this is the right implementation.

Why don't we use the pycurl_manager module for this? We already have some DBS concurrent APIs implemented in this module (inherited from your work porting Unified features into WMCore):
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/Tools/Common.py

and I think the best way forward would be to actually create a new module under:
WMCore/Services/DBS/youNameIt.py

and put this code in there, if possible relying on pycurl_manager only instead of requests.
If you see a need to use the straight forward requests library, then I think we could have a chat first before putting all this effort in.

Please let me know your thoughts (BTW, it's a holiday here, so you might hear back from me only on Monday).

cmsdmwmbot · 2022-04-15T13:13:37Z

Jenkins results:

Python3 Unit tests: succeeded
- 7 tests added
- 1 changes in unstable tests
Python3 Pylint check: failed
- 25 warnings and errors that must be fixed
- 2 warnings
- 22 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13027/artifact/artifacts/PullRequestReport.html

vkuznet · 2022-04-15T13:18:27Z

@amaltaro I explained in description that pycurl_manager.py can't be used in concurrent futures since it locks curl object. The only way to use pycurl_manager is to use multi_getdata and it is on my todo list.

vkuznet · 2022-04-15T14:00:21Z

ok, I refactored the code using pycurl_manager.py multi_getdata and results are even more impressive. I achieve 10+ improvements for this API. Here is new structure:

WMCore/Services/DBS/DBSUtils.py contains new common code
WMCore/Services/DBS/DBS3Reader.py now can use either approach, tested with requests and pycurl_manager.py libraries
I borrowed code from WMCore/MicroService/Tools/Common.py which seems to be common between these two code-bases, therefore I suggest to re-factor the code again to put this common code elsewhere. Please see appropriate comment in WMCore/Services/DBS/DBSUtils.py

Bottom line, I would like to keep around concurrent.features code since it demonstrates new paradigm on how concurrent programming should be done. This concept can be applied to different parts of WMCore which may benefit from it. The requests library used in original codebase is just an example and I suggest to keep it around, but it is not required now after my second implementation with multi_getdata from pycurl_manager.py

vkuznet · 2022-04-15T14:01:47Z

New best value I achieved with multi_getdata is 16 seconds, which is almost factor of 12 better than current implementation based on sequential programming pattern.

cmsdmwmbot · 2022-04-15T14:03:52Z

Jenkins results:

Python3 Unit tests: succeeded
- 7 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 24 warnings and errors that must be fixed
- 2 warnings
- 25 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13029/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2022-04-15T17:52:27Z

Jenkins results:

Python3 Unit tests: succeeded
- 7 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 24 warnings and errors that must be fixed
- 2 warnings
- 20 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13033/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2022-04-18T12:51:32Z

Jenkins results:

Python3 Unit tests: succeeded
- 7 tests added
- 2 changes in unstable tests
Python3 Pylint check: failed
- 24 warnings and errors that must be fixed
- 2 warnings
- 20 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13036/artifact/artifacts/PullRequestReport.html

vkuznet · 2022-04-18T13:23:49Z

I re-factor the code and generalize it. The new API dbsParallelApi will allow to be used by different APIs in DBS3Reader (see full list in #11098). I added new attribute parallel into constructor of DBS3Reader such that class can execute either linear (sequential) or concurrent (parallel) DBS APis workflows. The code now can be further expanded to speed-up other APIs.

cmsdmwmbot · 2022-04-18T13:32:58Z

Jenkins results:

Python3 Unit tests: succeeded
- 7 tests added
- 1 changes in unstable tests
Python3 Pylint check: failed
- 25 warnings and errors that must be fixed
- 2 warnings
- 21 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13038/artifact/artifacts/PullRequestReport.html

vkuznet · 2022-05-05T11:53:42Z

Alan, let me answer your questions in order they appear. For clarify I'll repeat the list:

maybe we should rename the DBSUtil module to something more meaningful, e.g.: DBSParallel, or DBSPycurl, or DBSConc? Just to give a clear distinction that its underlying library is different than DBS3Reader
- VK: Please be specific, I choose name using my view, if you want to name it differently please specify how, I should not guess which name out of your suggestion is more appropriate
logger parameter in those DBSUtil functions is not consistent. Sometimes it's mandatory, others it's optional. From a different angle, maybe it could be resolved by converting those functions into methods of a class, defining the logger object only once when it gets instantiated. Just an idea though...
- VK: indeed, logger parameter seems inconsistent, I only added based on existing DBS3Reader code. But after your review I decided to completely remove it as it is not used by parallel code at all
the DBSUtil functions are not consistent as well in the way they got developed. For instance, dbsListFileParents and other 2 only return data, while dbsParentFilesGivenParentDataset replicates the whole logic from DBS3Reader. I would stick to one model only, so either we (re-)implement everything in the DBSUtil function, or we only retrieve data and send it not processed back to DBS3Reader
- VK: this is side-effect of code review, originally I placed code within DBS3Reader which then you requested to put into new module. As such code was copied to new module and inherited original naming convention used in DBS3Reader
these changes are not compliant with the guidelines. New modules should not have any pylint/pep8/pycodestyle issues, unless there is no way around of it (as mentioned here)
- VK: fair enough, will improve
there is a mix of variable names as well (included in this PR, what was there already does not need to be touched), e.g. block_parents.
- VK: I made changes consistent with names used by DBS3Reader. I don't know what is better to have consistency with code surrounding my changes or be strict with guidelines to make new code. In latter case, new code will be inconsistent with existing code. I am open to suggestion here. If you insist to make new code based on guidelines I'll adjust new variables but at the end we'll end-up with a mix of naming conventions. Even your proposal to adjust code violates naming conventions. Bottom line, guidelines are not a panacea to all cases.

vkuznet · 2022-05-05T11:55:25Z

test this please

cmsdmwmbot · 2022-05-05T11:58:33Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests added
Python3 Pylint check: failed
- 6 warnings and errors that must be fixed
- 1 warnings
- 75 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 29 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13143/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2022-05-05T12:03:22Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests added
Python3 Pylint check: failed
- 6 warnings and errors that must be fixed
- 1 warnings
- 75 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 29 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13144/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2022-05-05T12:28:40Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests added
- 2 changes in unstable tests
Python3 Pylint check: succeeded
- 1 warnings
- 71 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 30 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13145/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2022-05-05T12:30:09Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests added
- 1 changes in unstable tests
Python3 Pylint check: succeeded
- 1 warnings
- 71 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 30 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13146/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2022-05-05T13:01:30Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests added
- 1 changes in unstable tests
Python3 Pylint check: succeeded
- 1 warnings
- 71 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13149/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2022-05-05T13:31:43Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests added
Python3 Pylint check: succeeded
- 1 warnings
- 71 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13150/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2022-05-05T13:35:01Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests added
- 1 changes in unstable tests
Python3 Pylint check: succeeded
- 1 warnings
- 71 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13151/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2022-05-05T16:42:14Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests added
Python3 Pylint check: succeeded
- 1 warnings
- 73 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13154/artifact/artifacts/PullRequestReport.html

vkuznet · 2022-05-05T17:01:01Z

@amaltaro , I made necessary changes and replied to all your comments (some of them were unclear and I left my questions). Meanwhile, I run code through autopep8, and verified that Jenkins reports 10 score for new module. I also complement the code with integration tests. Please review again.

amaltaro

Valentin, this looks good to me. I'd suggest to add the docstring I mentioned in the code though.

In addition to that, please:

next time, please keep in mind that test/* changes should NOT go together with the source code changes, i.e., they must be provided in different commits;
please squash these commits in a single one (if test was separated, we should squash them in 2 commits instead).

Thanks

src/python/WMCore/Services/DBS/DBS3Reader.py

cmsdmwmbot · 2022-05-06T15:24:14Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests added
- 1 changes in unstable tests
Python3 Pylint check: succeeded
- 1 warnings
- 73 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13163/artifact/artifacts/PullRequestReport.html

vkuznet · 2022-05-06T20:47:52Z

@amaltaro now you have it squashed and rebased.

cmsdmwmbot · 2022-05-06T20:58:22Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 8 tests added
- 2 changes in unstable tests
Python3 Pylint check: succeeded
- 1 warnings
- 73 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13166/artifact/artifacts/PullRequestReport.html

amaltaro · 2022-05-07T00:37:02Z

test this please

cmsdmwmbot · 2022-05-07T00:48:07Z

Jenkins results:

Python3 Unit tests: succeeded
- 8 tests added
- 1 changes in unstable tests
Python3 Pylint check: succeeded
- 1 warnings
- 73 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13167/artifact/artifacts/PullRequestReport.html

amaltaro · 2022-05-06T02:14:21Z

src/python/WMCore/Services/DBS/DBS3Reader.py

@@ -72,17 +74,18 @@ class DBS3Reader(object):
    General API for reading data from DBS
    """

-    def __init__(self, url, logger=None, **contact):
+    def __init__(self, url, logger=None, parallel=None, **contact):


Can you please create a docstring for this class and specify what parallel is for?

todor-ivanov

looks good to me

vkuznet self-assigned this Apr 15, 2022

vkuznet added latency improvement Enhancement DBS scalability labels Apr 15, 2022

vkuznet requested review from todor-ivanov and amaltaro April 15, 2022 12:21

vkuznet force-pushed the slow-dataset-files branch from 2a48633 to 5a2ac3c Compare April 15, 2022 12:45

amaltaro reviewed Apr 15, 2022

View reviewed changes

vkuznet added the PR: Work in progress label Apr 15, 2022

vkuznet force-pushed the slow-dataset-files branch from 3cf4f31 to 15d0454 Compare April 15, 2022 17:34

vkuznet mentioned this pull request Apr 18, 2022

Improve speed of DBS3Reader APIs #11098

Closed

vkuznet force-pushed the slow-dataset-files branch from 15d0454 to 7ebeaf3 Compare April 18, 2022 12:41

vkuznet force-pushed the slow-dataset-files branch from 7ebeaf3 to f8970cb Compare April 18, 2022 13:20

vkuznet mentioned this pull request Apr 18, 2022

move ckey/cert functions to Utils.CertTools #11101

Merged

vkuznet force-pushed the slow-dataset-files branch from f8970cb to b337ce1 Compare April 19, 2022 14:56

vkuznet force-pushed the slow-dataset-files branch from ca701ec to 745a33d Compare May 5, 2022 12:18

vkuznet force-pushed the slow-dataset-files branch from 61f6787 to f07765c Compare May 5, 2022 12:50

vkuznet force-pushed the slow-dataset-files branch from 4b071e9 to 13a1e92 Compare May 5, 2022 13:22

vkuznet force-pushed the slow-dataset-files branch from ffcc61e to d318762 Compare May 5, 2022 16:26

vkuznet requested a review from amaltaro May 5, 2022 16:59

amaltaro approved these changes May 6, 2022

View reviewed changes

src/python/WMCore/Services/DBS/DBS3Reader.py Show resolved Hide resolved

amaltaro added the PR: squashing needed label May 6, 2022

vkuznet force-pushed the slow-dataset-files branch from e1fdc83 to e2bfb55 Compare May 6, 2022 15:13

speed up listDatasetFileDetails API

dfc5272

vkuznet force-pushed the slow-dataset-files branch from c987a77 to dfc5272 Compare May 6, 2022 20:46

vkuznet removed the PR: squashing needed label May 6, 2022

amaltaro approved these changes May 7, 2022

View reviewed changes

amaltaro merged commit 3882e1f into dmwm:master May 7, 2022

todor-ivanov reviewed May 10, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up listDatasetFileDetails API #11099

speed up listDatasetFileDetails API #11099

vkuznet commented Apr 15, 2022

vkuznet commented Apr 15, 2022

cmsdmwmbot commented Apr 15, 2022

cmsdmwmbot commented Apr 15, 2022

vkuznet commented Apr 15, 2022

vkuznet commented Apr 15, 2022

amaltaro left a comment

cmsdmwmbot commented Apr 15, 2022

vkuznet commented Apr 15, 2022

vkuznet commented Apr 15, 2022

vkuznet commented Apr 15, 2022

cmsdmwmbot commented Apr 15, 2022

cmsdmwmbot commented Apr 15, 2022

cmsdmwmbot commented Apr 18, 2022

vkuznet commented Apr 18, 2022

cmsdmwmbot commented Apr 18, 2022

vkuznet commented May 5, 2022

vkuznet commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

vkuznet commented May 5, 2022

amaltaro left a comment

cmsdmwmbot commented May 6, 2022

vkuznet commented May 6, 2022

cmsdmwmbot commented May 6, 2022

amaltaro commented May 7, 2022

cmsdmwmbot commented May 7, 2022

amaltaro May 6, 2022

todor-ivanov left a comment

speed up listDatasetFileDetails API #11099

speed up listDatasetFileDetails API #11099

Conversation

vkuznet commented Apr 15, 2022

Status

Description

Is it backward compatible (if not, which system it affects?)

Related PRs

External dependencies / deployment changes

vkuznet commented Apr 15, 2022

cmsdmwmbot commented Apr 15, 2022

cmsdmwmbot commented Apr 15, 2022

vkuznet commented Apr 15, 2022

vkuznet commented Apr 15, 2022

amaltaro left a comment

Choose a reason for hiding this comment

cmsdmwmbot commented Apr 15, 2022

vkuznet commented Apr 15, 2022

vkuznet commented Apr 15, 2022

vkuznet commented Apr 15, 2022

cmsdmwmbot commented Apr 15, 2022

cmsdmwmbot commented Apr 15, 2022

cmsdmwmbot commented Apr 18, 2022

vkuznet commented Apr 18, 2022

cmsdmwmbot commented Apr 18, 2022

vkuznet commented May 5, 2022

vkuznet commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

cmsdmwmbot commented May 5, 2022

vkuznet commented May 5, 2022

amaltaro left a comment

Choose a reason for hiding this comment

cmsdmwmbot commented May 6, 2022

vkuznet commented May 6, 2022

cmsdmwmbot commented May 6, 2022

amaltaro commented May 7, 2022

cmsdmwmbot commented May 7, 2022

amaltaro May 6, 2022

Choose a reason for hiding this comment

todor-ivanov left a comment

Choose a reason for hiding this comment