-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need to pull logs multiple ways and compare them #305
Comments
Start by looking at workflow:
|
OK attempting to pull from WMArchive mac-122185:~ jen_a$ ssh vocms0130.cern.ch -bash-4.1$ export PYTHONPATH=$PWD/WMArchive/src/python:$PWD/WMCore/src/python:/afs/cern.ch/user/v/valya/public/spark:/usr/lib/spark/python Valid starting Expires Service principal OK where am I supposed to find the right area to do this? |
I keep getting timeouts: -bash-4.2$ export PYTHONPATH=$PWD/WMArchive/src/python:$PWD/WMCore/src/python:/afs/cern.ch/user/v/valya/public/spark:/usr/lib/spark/python Load LogFinder{"logCollect": ["root://castorcms.cern.ch//castor/cern.ch/cms/store/logs/prod/2017/01/WMAgent/pdmvserv_task_HIG-RunIISummer16DR80Premix-01090__v1_T_161206_005801_5743/pdmvserv_task_HIG-RunIISummer16DR80Premix-01090__v1_T_161206_005801_5743-LogCollectForHIG-RunIISummer16DR80Premix-01090_0-cmswn2179-393-logs.tar"]} -bash-4.2$ xrdcp root://castorcms.cern.ch//castor/cern.ch/cms/store/logs/prod/2017/01/WMAgent/pdmvserv_task_HIG-RunIISummer16DR80Premix-01090__v1_T_161206_005801_5743/pdmvserv_task_HIG-RunIISummer16DR80Premix-01090__v1_T_161206_005801_5743-LogCollectForHIG-RunIISummer16DR80Premix-01090_0-cmswn2179-393-logs.tar . |
Jen, please try xrdcp from lxplus and if it times out post your question to dmDevelopment forum. xrdcp is out of the scope of WMArchive project, it's generic tool to get files from remote locations. |
still got the timeout: |
And the reply: ---Carl On 02/03/2017 02:36 PM, Jennifer K Adelman-Mccarthy wrote:
So how do I get them off then? https://github.com/dmwm/WMArchive/wiki/Given-a-LFN,-return-a-logArchive-or-logCollect-tarball-location just has to do xrdcp |
OK Dirk had the right solution for pulling the logs: OK how do I know which of the tar files to look into? |
The starting point is unmerged file, right. As in the wiki pages It returns both logArchive file and corresponding output files as a list.
|
Seangchan,
I don't recall exactly but since there are multiple steps involved the queries
represent list of these steps, find files (append to list), find logs (append
to list). Since everything run on spark which is by nature distributed
I will not rely on specific order.
Valentin.
…On 0, ticoann ***@***.***> wrote:
The starting point is unmerged file, right.
As in the wiki pages
https://github.com/dmwm/WMArchive/wiki/Given-a-LFN,-return-a-logArchive-or-logCollect-tarball-location
It returns both logArchive file and corresponding output files as a list.
I am not sure how the list is organized but I am assuming the order of the unmerged output matches the order of logArchive file, isn't it? @vkuznet?
```
"queries": ["/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/EEDD6A05-01AD-E611-B4C7-001E675A6AA9.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/58098F3B-8FAC-E611-BC87-001E67DFF7CB.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/46B86B27-01AD-E611-9A98-001E67E69E32.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/DCFBE0FB-00AD-E611-BD13-001E67DFFF5F.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/689521FA-00AD-E611-A6AB-001E67A404B5.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/549A3202-01AD-E611-A745-001E67A3FE66.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/3CB5DEF8-00AD-E611-A256-001E67A3FC1D.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/969189F9-00AD-E611-82B1-001E67A3F8A8.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/F63804FA-00AD-E611-95A3-001E67A3AEB8.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/C84C5B0D-01AD-E611-877B-001E67A42161.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/52C56AFB-00AD-E611-ACEC-001E67DFFB4F.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/8C5F11F9-00AD-E611-B3C5-001E675A6653.root",
"/store/unmerged/RunIISummer16MiniAODv2/DMV_NNPDF30_Vector_Mphi-500_Mchi-200_gSM-0p25_gDM-1p0_v2_13TeV-powheg/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/130000/6CD954F7-00AD-E611-83C6-001E67DDCC81.root",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/40227668-ac52-11e6-9b63-02163e017c3c-0-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/931d89dc-ac4f-11e6-9b63-02163e017c3c-4-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/40227668-ac52-11e6-9b63-02163e017c3c-3-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/1fc3af04-ac52-11e6-9b63-02163e017c3c-5-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/931d89dc-ac4f-11e6-9b63-02163e017c3c-2-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/40227668-ac52-11e6-9b63-02163e017c3c-2-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/931d89dc-ac4f-11e6-9b63-02163e017c3c-0-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/931d89dc-ac4f-11e6-9b63-02163e017c3c-1-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/931d89dc-ac4f-11e6-9b63-02163e017c3c-3-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/40227668-ac52-11e6-9b63-02163e017c3c-1-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/1fc3af04-ac52-11e6-9b63-02163e017c3c-0-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/1fc3af04-ac52-11e6-9b63-02163e017c3c-1-0-logArchive.tar.gz",
"/store/unmerged/logs/prod/2016/11/17/pdmvserv_EXO-RunIISummer16MiniAODv2-00065_00033_v0__161115_160653_2105/StepOneProc/0000/0/1fc3af04-ac52-11e6-9b63-02163e017c3c-4-0-logArchive.tar.gz"]}
```
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#305 (comment)
|
Can somebody help me find this log file? When I look in WMStats I am looking for: The "L" in WMStats tells me it ran on vocms0126.cern.ch WMBS job id: 3875037 I go onto vocms0126.cern.ch and go look there first, since the job is still in cooloff... So I try WMArchive: 7/02/10 18:53:37 INFO EventLoggingListener: Logging events to hdfs:///user/spark/applicationHistory/local-1486749212555 Load LogFinder{"logCollect": []} And it doesn't appear to be there either. Help! Jen |
Hi Jen, logcollect job probably didn't finished/started yet. That probably why you can't fine logCollect location in WMArchive. The log file is not archived in JobArchiver yet. I am not sure whether this is a bug or just not ready yet. You can find log files under here. /data/srv/wmagent/current/install/wmagent/JobCreator/JobCache/pdmvserv_task_TSG-PhaseIFall16GS-00005__v1_T_170205_161623_9267/TSG-PhaseIFall16GS-00005_0/TSG-PhaseIFall16GS-00005_0MergeRAWSIMoutput/TSG-PhaseIFall16DR-00003_0/JobCollection_384033_0/job_3875037 I am not sure how long the file will be preserved there but I think it will move to JobArchiver when it gets deleted.
|
I was looking in JobArchiver.. wonder if they are still in JobCreator.. let me go look OK so there they are..
if you still can't find them... go to the bar.... |
Bar sounds good. :-) But before you hit the bar, you can check WMArchive as well. LogCollect job may be finished when workflow is still running. But even if workflow is finished, logCollect job still might not be done since we doesn't count that for completed status. Anyway, searching for log collect job in WMArchive is not ideal for currently running workflows. |
Yep, but the deal is we need to be able to find them, and get them posted in ggus tickets for sites if we suspect site issues. so the fact that the logs "move" during the time workflows in in flight makes it tricky to find them. BUT if we know all the landing places in the middle, at least we know where to look, in what order when we need to. |
Last week at the WMArchive meeting it was requested that we pull logs "the old way" by pulling them off the agent:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/CompOpsWorkflowTeamLeaderResponsibilities#Retrieving_log_files_from_failed
and via the instructions at:
https://github.com/dmwm/WMArchive/wiki/Given-a-LFN,-return-a-logArchive-or-logCollect-tarball-location
and looking at what Unified pulls:
to see if we are getting the same information all 3 ways, and if that information is what the sites need to debug workflows:
The text was updated successfully, but these errors were encountered: