Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characterization and propagation of task type based on cmsDriver steps #11680

Merged
merged 2 commits into from
Sep 26, 2023

Conversation

khurtado
Copy link
Contributor

@khurtado khurtado commented Jul 28, 2023

Fixes #11712

Status

Ready for review

Description

Parse the job PSet configuration and extract relevant fields of the cmsDriver command to map them to a given physics type.

A detailed view of the changes provided in this PR is:

  • job creator now dumps physicsTaskType attribute as well; later used by JobSubmitter and SimpleCondorPlugin
  • new CMS_extendedJobType condor classad is provided, being a comma separated list of physics task type
  • StdSpecs: change how the configCache is loaded and output modules determined.
  • provide physics type getter/setter methods for both WMTask and CMSSW (WMStep) modules

Is it backward compatible (if not, which system it affects?)

NO

Related PRs

https://github.com/dmwm/cms-htcondor-es/pull/206/files

External dependencies / deployment changes

None

@khurtado
Copy link
Contributor Author

@amaltaro This is still very green, but is this what you were referring to? I.e.: Get --steps parameters from the steps configFile files and map those to the physics task types (we still need to come up with that mapping). Then, in WMTask, use that information: for taskChain, there is only 1 configFile, for stepChain, we join all of the task types from all steps in a single string.

Then, what? Do we create a field: self.data.CMStaskType ?

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 127 new failures
    • 429 tests deleted
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 77 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 23 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14384/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 127 new failures
    • 429 tests deleted
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 77 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 23 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14385/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 2 warnings
    • 92 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 21 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14386/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 144 new failures
    • 11 changes in unstable tests
  • Python3 Pylint check: failed
    • 13 warnings and errors that must be fixed
    • 4 warnings
    • 183 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 21 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14387/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 69 new failures
    • 9 changes in unstable tests
  • Python3 Pylint check: failed
    • 20 warnings and errors that must be fixed
    • 12 warnings
    • 249 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 36 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14388/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 69 new failures
    • 9 changes in unstable tests
  • Python3 Pylint check: failed
    • 19 warnings and errors that must be fixed
    • 12 warnings
    • 249 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 36 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14389/artifact/artifacts/PullRequestReport.html

@khurtado
Copy link
Contributor Author

khurtado commented Jul 28, 2023

@amaltaro Here is the current logic I came up with:

  1. I'm getting all the parameters from the --steps arguments from all configFiles
  2. There is some criteria (still to be validated) based on this, to map those arguments to the task types we want: GENSIM, DIGI, RECO, MINIAOD, NANOAOD. I could not find any DIGIRECO type in the ES monitoring/, so I guess we don't need that anymore?
  3. In WMTask, we create a new field self.data.CMSTaskType which is basically just the task type for anything other than Production or Processing. For Processing, we return DataProcessing and for Production, we put all the CMSTaskTypes we gather from WMStep followed by a comma. So, taskChains will only have 1 of those per task, but stepChains will have many.
  4. As part of StdBase, we call the function procTask.setCMSTaskType() after procTask.setTaskType(taskType), so that we persist this information
  5. In the JobCreatorPoller, we save this information in the job pickle as job['CMSTaskType']
  6. In the condor plugin, we store this as a new classad

Does this makes sense to you? Anything I'm missing? We still would need to validate the criteria to go from --steps arguments to the physics CMS task types. We would need input from Bugra and Sunil for that.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 70 new failures
    • 9 changes in unstable tests
  • Python3 Pylint check: failed
    • 19 warnings and errors that must be fixed
    • 12 warnings
    • 249 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 36 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14390/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 69 new failures
    • 9 changes in unstable tests
  • Python3 Pylint check: failed
    • 20 warnings and errors that must be fixed
    • 12 warnings
    • 249 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 36 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14391/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khurtado Kenyi, I think you are in the right track here to discover and publish the step physics type, which involves:

  1. discovering the physics type in ReqMgr2. I am still unsure how to do this in the most efficient form, but ideally it should be done at the point that we load the PSet to do other things.
  2. provide getters/setters at WMTask layer (or maybe the best would be to have a WMStep layer, to cover multi step jobs).
  3. provide this new attribute at the job creation (arguments that get pickled)
  4. load the job pickle file and pass it over as a job classad

I know this is a work in progress, but on what concerns the naming convention, I think our best option right now would be to call it like physicsStepType or physicsTaskType in the source code.

Just an observation, some of the PSets come filled of secondary LFNs, making it fairly heavy to load and parse in memory. Given that the cmsDriver command always come in the beginning of the file, I wonder if we could parse only the head of the file.

Another thought that just occurred to me is, why can't we have cmsDriver command to provide us with a commented out line with the map of --steps to an actual physics type? This way we are no longer responsible for this logic, which honestly speaking, it's a hard spot to be, especially because we are not the origin of this information. What do you think?

@khurtado
Copy link
Contributor Author

khurtado commented Aug 2, 2023

@amaltaro Thank you! I like the physicsStepType or physicsTaskType names. I will change it to that. I will see how NOT to load the whole PSET but only read the first few lines instead.

I do agree that if we could have the actual physics type as a commented out line in the pset, then we would not need to guess it from the --steps command line and this would be the best for us. Do you know who do we request this feature to?

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 69 new failures
    • 9 changes in unstable tests
  • Python3 Pylint check: failed
    • 19 warnings and errors that must be fixed
    • 12 warnings
    • 249 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 36 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14393/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 67 new failures
    • 8 changes in unstable tests
  • Python3 Pylint check: failed
    • 19 warnings and errors that must be fixed
    • 12 warnings
    • 249 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 36 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14394/artifact/artifacts/PullRequestReport.html

@khurtado
Copy link
Contributor Author

khurtado commented Aug 3, 2023

please test this

1 similar comment
@khurtado
Copy link
Contributor Author

khurtado commented Aug 3, 2023

please test this

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 4 new failures
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 57 warnings and errors that must be fixed
    • 20 warnings
    • 619 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 142 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14398/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 44 warnings and errors that must be fixed
    • 18 warnings
    • 330 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 75 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14477/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

amaltaro commented Sep 8, 2023

Yes. However, a physics type for CMSSW steps seems like a property worth to have in even if it is only used for monitoring at present, specially if the work is done already and doesn't increase the load in the WM system substantially to do it at the step level. I could remove it and implement the command string approach at the Task only level if you think it would be more beneficial though. I'm all in to do it that way for the campaign names, since it will mean less work though :).

Yes, I fail to see any use case at the moment. But as you say, it could be that in the near future we find other usage for this physicsType at the CMSSW. And given that it's all already implemented, let us stick to these (unless unit tests proven to be too hard to make ;))

@khurtado
Copy link
Contributor Author

@amaltaro Please, check this PR when you get a chance.

@amaltaro
Copy link
Contributor

@khurtado Kenyi, it looks like #11710 created conflicts with this PR. Could you please carefully rebase this PR? Once you know rebase has been successfully, you might want to squash the commits as well (or if you prefer, we can first go through a PR review, before you actually squash commits).

@khurtado khurtado force-pushed the 10604 branch 2 times, most recently from 4215dfa to 4f667e2 Compare September 25, 2023 16:26
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 8 tests deleted
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 96 warnings and errors that must be fixed
    • 29 warnings
    • 621 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 214 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14504/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests deleted
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 96 warnings and errors that must be fixed
    • 29 warnings
    • 621 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 214 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14505/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 44 warnings and errors that must be fixed
    • 18 warnings
    • 331 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 76 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14506/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
  • Python3 Pylint check: failed
    • 44 warnings and errors that must be fixed
    • 18 warnings
    • 331 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 76 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14507/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
  • Python3 Pylint check: failed
    • 44 warnings and errors that must be fixed
    • 18 warnings
    • 331 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 76 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14508/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 44 warnings and errors that must be fixed
    • 18 warnings
    • 331 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 76 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14509/artifact/artifacts/PullRequestReport.html

@khurtado
Copy link
Contributor Author

khurtado commented Sep 25, 2023

@amaltaro I fixed the merge conflicts, squashed the commits and added unit tests. This is ready for another PR review.

@khurtado khurtado requested a review from amaltaro September 25, 2023 18:10
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 4 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 95 warnings and errors that must be fixed
    • 24 warnings
    • 502 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 190 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14510/artifact/artifacts/PullRequestReport.html

@khurtado khurtado force-pushed the 10604 branch 2 times, most recently from 68acb7f to 1a0e788 Compare September 25, 2023 18:22
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 4 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 95 warnings and errors that must be fixed
    • 24 warnings
    • 502 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 190 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14511/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 4 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 95 warnings and errors that must be fixed
    • 24 warnings
    • 502 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 190 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14512/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 4 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 95 warnings and errors that must be fixed
    • 24 warnings
    • 502 comments to review
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded
    • 190 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14513/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

@khurtado these changes look good to me. From what I can tell, all these changes should be backward compatible (such that UNKNOWN will be reported if a workload has been created before ReqMgr2 is running this code). Nonetheless, we need to be careful with this validation and validate both new and old workflows. FYI @todor-ivanov

Kenyi, I updated the initial description to the best of my knowledge. However, please do pay more attention to PR and Issues description and make sure they are up-to-date. If you feel like the templates need improvement, I am happy to make those changes.

@amaltaro amaltaro merged commit 7fc0c7b into dmwm:master Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a physics task type for all workflow types in the system
3 participants