Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up listDatasetFileDetails API #11099

Merged
merged 1 commit into from
May 7, 2022
Merged

Conversation

vkuznet
Copy link
Contributor

@vkuznet vkuznet commented Apr 15, 2022

Fixes #11098

Status

ready

Description

Use concurrent features to speed up listDatasetFileDetails API. In my local setup I achieve speed up by factor of 5 or more for a specific dataset. Here are benchmark numbers for using /VBF1Parked/Run2012D-v1/RAW dataset. This dataset contains 594 blocks. The current codebase takes approximately 190 seconds to fetch its parents and file lumis. Using proposed solution I achieve the following numbers:

  • 37 seconds using 50 concurrent tasks
  • 41 seconds using 100 concurrent tasks
  • 90 seconds using 10 concurrent tasks

Internally, I used requests python library instead of pycurl_manager.py. The latter is not suitable for concurrent execution since it is not thread safe as it holds global curl object. As such curl options are set on first tasks, but can't be changed (since code set them up) in others until first task is completed. The requests library does not depend on global curl object and curl library itself, and it is suitable for concurrent execution of multiple URL calls.

This PR only addresses speed up of single listDatasetFileDetails DBS3Reader API but other APIs where multiple (sequential) calls to DBS are made can benefit from this approach.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

dmwm/dbs2go#5

External dependencies / deployment changes

relies on Python requests library

@vkuznet
Copy link
Contributor Author

vkuznet commented Apr 15, 2022

@amaltaro , @todor-ivanov , @klannon this PR speed up single (most expensive) DBS3Reader API by factor of 5 or more. The code shows that concurrent execution of calls to upstream server can provide significant improvement. I suggest that you study this pattern which can be applied to different parts of WMCore codebase where multiple calls to upstream server (DBS, Rucio, etc.) will be required. Even thought it is not in Q2 plan I decided to put this forward as I obtained significant improvements can be made which entire system can benefit from and (more importantly) will allow to work with large datasets and avoid timeouts on CMSWEB.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 27 warnings and errors that must be fixed
    • 2 warnings
    • 23 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13024/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet force-pushed the slow-dataset-files branch from 2a48633 to 5a2ac3c Compare April 15, 2022 12:45
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 25 warnings and errors that must be fixed
    • 2 warnings
    • 22 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13026/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Apr 15, 2022

@amaltaro , on my second commit I got weird unit test failure which I'm 100% sure is not related to my changes but I want to point you to it since it may show how this unit test became unstable. Please see https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-WMCore-PR-test/13026/testReport/junit/WMCore_t.Services_t.UUID_t/UUIDTest/testUUID/ which shows:

'bcb3' == 'bcb3' : Second component of UUID the same bcb3 != bcb3

I think it is random error since UUID may be close enough during random process generation. Moreover, the message of test is wrong. I suggest that we address this unit issue separately. I'll trigger test again.

@vkuznet
Copy link
Contributor Author

vkuznet commented Apr 15, 2022

test this please

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, even though this looks like a great performance boost, I do not think this is the right implementation.

Why don't we use the pycurl_manager module for this? We already have some DBS concurrent APIs implemented in this module (inherited from your work porting Unified features into WMCore):
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/Tools/Common.py

and I think the best way forward would be to actually create a new module under:
WMCore/Services/DBS/youNameIt.py

and put this code in there, if possible relying on pycurl_manager only instead of requests.
If you see a need to use the straight forward requests library, then I think we could have a chat first before putting all this effort in.

Please let me know your thoughts (BTW, it's a holiday here, so you might hear back from me only on Monday).

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 25 warnings and errors that must be fixed
    • 2 warnings
    • 22 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13027/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Apr 15, 2022

@amaltaro I explained in description that pycurl_manager.py can't be used in concurrent futures since it locks curl object. The only way to use pycurl_manager is to use multi_getdata and it is on my todo list.

@vkuznet
Copy link
Contributor Author

vkuznet commented Apr 15, 2022

ok, I refactored the code using pycurl_manager.py multi_getdata and results are even more impressive. I achieve 10+ improvements for this API. Here is new structure:

  • WMCore/Services/DBS/DBSUtils.py contains new common code
  • WMCore/Services/DBS/DBS3Reader.py now can use either approach, tested with requests and pycurl_manager.py libraries
  • I borrowed code from WMCore/MicroService/Tools/Common.py which seems to be common between these two code-bases, therefore I suggest to re-factor the code again to put this common code elsewhere. Please see appropriate comment in WMCore/Services/DBS/DBSUtils.py

Bottom line, I would like to keep around concurrent.features code since it demonstrates new paradigm on how concurrent programming should be done. This concept can be applied to different parts of WMCore which may benefit from it. The requests library used in original codebase is just an example and I suggest to keep it around, but it is not required now after my second implementation with multi_getdata from pycurl_manager.py

@vkuznet
Copy link
Contributor Author

vkuznet commented Apr 15, 2022

New best value I achieved with multi_getdata is 16 seconds, which is almost factor of 12 better than current implementation based on sequential programming pattern.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 24 warnings and errors that must be fixed
    • 2 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13029/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet force-pushed the slow-dataset-files branch from 3cf4f31 to 15d0454 Compare April 15, 2022 17:34
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 24 warnings and errors that must be fixed
    • 2 warnings
    • 20 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13033/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 24 warnings and errors that must be fixed
    • 2 warnings
    • 20 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13036/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet force-pushed the slow-dataset-files branch from 7ebeaf3 to f8970cb Compare April 18, 2022 13:20
@vkuznet
Copy link
Contributor Author

vkuznet commented Apr 18, 2022

I re-factor the code and generalize it. The new API dbsParallelApi will allow to be used by different APIs in DBS3Reader (see full list in #11098). I added new attribute parallel into constructor of DBS3Reader such that class can execute either linear (sequential) or concurrent (parallel) DBS APis workflows. The code now can be further expanded to speed-up other APIs.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 25 warnings and errors that must be fixed
    • 2 warnings
    • 21 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13038/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented May 5, 2022

Alan, let me answer your questions in order they appear. For clarify I'll repeat the list:

  • maybe we should rename the DBSUtil module to something more meaningful, e.g.: DBSParallel, or DBSPycurl, or DBSConc? Just to give a clear distinction that its underlying library is different than DBS3Reader
    • VK: Please be specific, I choose name using my view, if you want to name it differently please specify how, I should not guess which name out of your suggestion is more appropriate
  • logger parameter in those DBSUtil functions is not consistent. Sometimes it's mandatory, others it's optional. From a different angle, maybe it could be resolved by converting those functions into methods of a class, defining the logger object only once when it gets instantiated. Just an idea though...
    • VK: indeed, logger parameter seems inconsistent, I only added based on existing DBS3Reader code. But after your review I decided to completely remove it as it is not used by parallel code at all
  • the DBSUtil functions are not consistent as well in the way they got developed. For instance, dbsListFileParents and other 2 only return data, while dbsParentFilesGivenParentDataset replicates the whole logic from DBS3Reader. I would stick to one model only, so either we (re-)implement everything in the DBSUtil function, or we only retrieve data and send it not processed back to DBS3Reader
    • VK: this is side-effect of code review, originally I placed code within DBS3Reader which then you requested to put into new module. As such code was copied to new module and inherited original naming convention used in DBS3Reader
  • these changes are not compliant with the guidelines. New modules should not have any pylint/pep8/pycodestyle issues, unless there is no way around of it (as mentioned here)
    • VK: fair enough, will improve
  • there is a mix of variable names as well (included in this PR, what was there already does not need to be touched), e.g. block_parents.
    • VK: I made changes consistent with names used by DBS3Reader. I don't know what is better to have consistency with code surrounding my changes or be strict with guidelines to make new code. In latter case, new code will be inconsistent with existing code. I am open to suggestion here. If you insist to make new code based on guidelines I'll adjust new variables but at the end we'll end-up with a mix of naming conventions. Even your proposal to adjust code violates naming conventions. Bottom line, guidelines are not a panacea to all cases.

@vkuznet
Copy link
Contributor Author

vkuznet commented May 5, 2022

test this please

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
  • Python3 Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 1 warnings
    • 75 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 29 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13143/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
  • Python3 Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 1 warnings
    • 75 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 29 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13144/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet force-pushed the slow-dataset-files branch from ca701ec to 745a33d Compare May 5, 2022 12:18
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 1 warnings
    • 71 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 30 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13145/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 1 warnings
    • 71 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 30 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13146/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet force-pushed the slow-dataset-files branch from 61f6787 to f07765c Compare May 5, 2022 12:50
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 1 warnings
    • 71 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13149/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet force-pushed the slow-dataset-files branch from 4b071e9 to 13a1e92 Compare May 5, 2022 13:22
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
  • Python3 Pylint check: succeeded
    • 1 warnings
    • 71 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13150/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 1 warnings
    • 71 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13151/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet force-pushed the slow-dataset-files branch from ffcc61e to d318762 Compare May 5, 2022 16:26
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
  • Python3 Pylint check: succeeded
    • 1 warnings
    • 73 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13154/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet requested a review from amaltaro May 5, 2022 16:59
@vkuznet
Copy link
Contributor Author

vkuznet commented May 5, 2022

@amaltaro , I made necessary changes and replied to all your comments (some of them were unclear and I left my questions). Meanwhile, I run code through autopep8, and verified that Jenkins reports 10 score for new module. I also complement the code with integration tests. Please review again.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin, this looks good to me. I'd suggest to add the docstring I mentioned in the code though.

In addition to that, please:

  1. next time, please keep in mind that test/* changes should NOT go together with the source code changes, i.e., they must be provided in different commits;
  2. please squash these commits in a single one (if test was separated, we should squash them in 2 commits instead).

Thanks

src/python/WMCore/Services/DBS/DBS3Reader.py Show resolved Hide resolved
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 1 warnings
    • 73 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13163/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented May 6, 2022

@amaltaro now you have it squashed and rebased.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 8 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 1 warnings
    • 73 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13166/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

amaltaro commented May 7, 2022

test this please

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 1 warnings
    • 73 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13167/artifact/artifacts/PullRequestReport.html

@@ -72,17 +74,18 @@ class DBS3Reader(object):
General API for reading data from DBS
"""

def __init__(self, url, logger=None, **contact):
def __init__(self, url, logger=None, parallel=None, **contact):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please create a docstring for this class and specify what parallel is for?

@amaltaro amaltaro merged commit 3882e1f into dmwm:master May 7, 2022
Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve speed of DBS3Reader APIs
4 participants