Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combines block records in single dict that includes all associated wfs #11245

Merged
merged 4 commits into from
Aug 17, 2022

Conversation

germanfgv
Copy link
Contributor

@germanfgv germanfgv commented Aug 5, 2022

Fixes #11244

Status

Ready

Description

Turn list of blocks into a dictionary to avoid duplicate registers for single block. Turns 'workflowname' into a set of workflows that generated the block

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

None

External dependencies / deployment changes

None

@germanfgv germanfgv force-pushed the fixDeleteBlocks branch 3 times, most recently from a3b0d59 to 354acab Compare August 6, 2022 00:01
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 tests no longer failing
    • 3 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13487/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13486/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13488/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13489/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 3 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13490/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13491/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13504/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

amaltaro commented Aug 8, 2022

@germanfgv I am not sure I fully undertand the problem you reported. However, looking into this line that you implemented:

if block['workflowNames'].issubset(deletableWfsDict)

it makes me think that, regardless whether we have your changes in place or not, block['workflowNames'] will be a subset of deletableWfsDict if it has only 1 workflow or multiple workflows in the set.

Unless completedWorkflows is always fully built before deletableWorkflows, then in this case yes, it would only evaluate to True once all the completedWorkflows are also deletableWorkflows.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13505/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13506/artifact/artifacts/PullRequestReport.html

@germanfgv
Copy link
Contributor Author

germanfgv commented Aug 12, 2022

@germanfgv I am not sure I fully undertand the problem you reported. However, looking into this line that you implemented:

if block['workflowNames'].issubset(deletableWfsDict)

it makes me think that, regardless whether we have your changes in place or not, block['workflowNames'] will be a subset of deletableWfsDict if it has only 1 workflow or multiple workflows in the set.

Unless completedWorkflows is always fully built before deletableWorkflows, then in this case yes, it would only evaluate to True once all the completedWorkflows are also deletableWorkflows.

Maybe this concern is addressed with my second commit. Before, we were disregarding incomplete workflows in the query due to HAVING COUNT(*) = SUM(dbsbuffer_workflow.completed).

Because of this, you are right that if block['workflowNames'].issubset(deletableWfsDict) would almost always evaluate true (Almost because deletable and completed are not the same. E.g. a completed repack workflow that cannont be deleted because its output is being used by a promptReco workflow) deletable > completed.

By removing the completion check in the query in 0bf4285, now we can properly avoid deleting blocks that belong to incomplete workflows

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @germanfgv Thanks for the suggested change.
I am still having some concerns about this extra aggregation that is needed in the python code. This is all a sign that we are ignoring the uniqueness of the records as they come from the CROUP BY SQL statement, which to me seems to be dangerous.
In parallel to the already existing changes, which as far as I understood are already working in one of the T0 machines in production, I'd like to be able to test few of those SQL queries and DAO outcomes myself in one of the machines affected by this bug, if you do not mind.

@@ -49,7 +49,6 @@ class GetCompletedBlocks(DBFormatter):
dbsbuffer_dataset_subscription.site,
dbsbuffer_workflow.name,
dbsbuffer_block.create_time
HAVING COUNT(*) = SUM(dbsbuffer_workflow.completed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this protection here? IIRC this is meant to assure all the workflows associated with a block are in status completed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was useful in GetDeletableBlocks.py because that query was not grouping by workflow name. Here, if you exclude records with workflows that are not complete, you will still have other records with other workflows associated to the block, so the block will still end up in the results, even with uncompleted associated workflows.

Even worst, as we are excluding the uncompleted workflows from the query, later we are not taking thos uncompleted workflows into account when we compare against the deletableWfsDict list

Copy link
Contributor

@todor-ivanov todor-ivanov Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because that query was not grouping by workflow name

That is correct.
I have double checked now, that introducing the dbsbuffer_workflow.name in the SELECT and GROUP BY conditions splits the grouping further down, creating multiple groups per block - one for every associated workflow. And then it does count the completed workflows but they enter the query every time a workflow completes..... so if we have 1 block with 3 associated PromptReco requests out of which only two are completed, we will have that block returned 2 times in the output. Once the third Prompt Reco is complete the same block would enter the output result 3 times.

This makes the rest of the logic quite hard to follow indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And since it is quite subtle detail and difficult to understand and remember. Let me put some concrete examples:

Here is the output from the current SQL query with the counts per group included in the output (filtered for a single block):

SQL> SELECT
  2     count(*),
  3     dbsbuffer_block.blockname,
  4     dbsbuffer_location.pnn,
  5     dbsbuffer_dataset.path,
  6     dbsbuffer_dataset_subscription.site,
  7     dbsbuffer_workflow.name,
  8     dbsbuffer_block.create_time
  9  FROM dbsbuffer_dataset_subscription
 10  INNER JOIN dbsbuffer_dataset ON
 11     dbsbuffer_dataset.id = dbsbuffer_dataset_subscription.dataset_id
 12  INNER JOIN dbsbuffer_block ON
 13     dbsbuffer_block.dataset_id = dbsbuffer_dataset_subscription.dataset_id
 14  INNER JOIN dbsbuffer_file ON
 15     dbsbuffer_file.block_id = dbsbuffer_block.id
 16  INNER JOIN dbsbuffer_workflow ON
 17     dbsbuffer_workflow.id = dbsbuffer_file.workflow
 18  INNER JOIN dbsbuffer_location ON
 19     dbsbuffer_location.id = dbsbuffer_block.location
 20  WHERE dbsbuffer_dataset_subscription.delete_blocks = 1
 21  AND dbsbuffer_dataset_subscription.subscribed = 1
 22  AND dbsbuffer_block.status = 'Closed'
 23  AND dbsbuffer_block.deleted = 0
 24  AND dbsbuffer_block.blockname = '/ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00'
 25   GROUP BY dbsbuffer_block.blockname,
 26     dbsbuffer_location.pnn,
 27     dbsbuffer_dataset.path,
 28     dbsbuffer_dataset_subscription.site,
 29     dbsbuffer_workflow.name,
 30     dbsbuffer_block.create_time
 31  HAVING COUNT(*) = SUM(dbsbuffer_workflow.completed)
 32  ORDER BY dbsbuffer_workflow.name;

  COUNT(*) BLOCKNAME														    PNN 	    PATH				     SITE		NAME					 CREATE_TIME
---------- ------------------------------------------------------------------------------------------------------------------------ --------------- ---------------------------------------- ------------------ ---------------------------------------- -----------
	82 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	PromptReco_Run357080_ZeroBias		  1660249466
	82 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	PromptReco_Run357080_ZeroBias		  1660249466

Now lets release the HAVING COUNT(*) = SUM(dbsbuffer_workflow.competed) constraint:

SQL> SELECT
  2     count(*),
  3     dbsbuffer_block.blockname,
  4     dbsbuffer_location.pnn,
  5     dbsbuffer_dataset.path,
  6     dbsbuffer_dataset_subscription.site,
  7     dbsbuffer_workflow.name,
  8     dbsbuffer_block.create_time
  9  FROM dbsbuffer_dataset_subscription
 10  INNER JOIN dbsbuffer_dataset ON
 11     dbsbuffer_dataset.id = dbsbuffer_dataset_subscription.dataset_id
 12  INNER JOIN dbsbuffer_block ON
 13     dbsbuffer_block.dataset_id = dbsbuffer_dataset_subscription.dataset_id
 14  INNER JOIN dbsbuffer_file ON
 15     dbsbuffer_file.block_id = dbsbuffer_block.id
 16  INNER JOIN dbsbuffer_workflow ON
 17     dbsbuffer_workflow.id = dbsbuffer_file.workflow
 18  INNER JOIN dbsbuffer_location ON
 19     dbsbuffer_location.id = dbsbuffer_block.location
 20  WHERE dbsbuffer_dataset_subscription.delete_blocks = 1
 21  AND dbsbuffer_dataset_subscription.subscribed = 1
 22  AND dbsbuffer_block.status = 'Closed'
 23  AND dbsbuffer_block.deleted = 0
 24  AND dbsbuffer_block.blockname = '/ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00'
 25   GROUP BY dbsbuffer_block.blockname,
 26     dbsbuffer_location.pnn,
 27     dbsbuffer_dataset.path,
 28     dbsbuffer_dataset_subscription.site,
 29     dbsbuffer_workflow.name,
 30     dbsbuffer_block.create_time
 31  ORDER BY dbsbuffer_workflow.name;

  COUNT(*) BLOCKNAME														    PNN 	    PATH				     SITE		NAME					 CREATE_TIME
---------- ------------------------------------------------------------------------------------------------------------------------ --------------- ---------------------------------------- ------------------ ---------------------------------------- -----------
	82 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	PromptReco_Run357080_ZeroBias		  1660249466
	82 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	PromptReco_Run357080_ZeroBias		  1660249466
	77 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	PromptReco_Run357081_ZeroBias		  1660249466
	77 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	PromptReco_Run357081_ZeroBias		  1660249466
	 2 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	PromptReco_Run357096_ZeroBias		  1660249466
	 2 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	PromptReco_Run357096_ZeroBias		  1660249466
	 8 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	PromptReco_Run357098_ZeroBias		  1660249466
	 8 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	PromptReco_Run357098_ZeroBias		  1660249466
	 7 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	PromptReco_Run357100_ZeroBias		  1660249466
	 7 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	PromptReco_Run357100_ZeroBias		  1660249466
	13 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	PromptReco_Run357101_ZeroBias		  1660249466

	13 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	PromptReco_Run357101_ZeroBias		  1660249466
	 9 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	PromptReco_Run357102_ZeroBias		  1660249466
	 9 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	PromptReco_Run357102_ZeroBias		  1660249466
	 1 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	PromptReco_Run357106_ZeroBias		  1660249466
	 1 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	PromptReco_Run357106_ZeroBias		  1660249466
	 2 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	PromptReco_Run357112_ZeroBias		  1660249466
	 2 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	PromptReco_Run357112_ZeroBias		  1660249466

18 rows selected.

It is obvious that this block would continue entering the output of the original query every time a PromptReco completes until the result of the original query does not give exactly 18 records as the one above.

And indeed the expected behavior is achieved once we have the field dbsbuffer_workflow.name , which is further splitting the groups, removed.

Here is the query without the extra splitting and without the constraint for all the associated workflows to be completed as before:

SQL> SELECT
  2     count(*),
  3     dbsbuffer_block.blockname,
  4     dbsbuffer_location.pnn,
  5     dbsbuffer_dataset.path,
  6     dbsbuffer_dataset_subscription.site,
  7     dbsbuffer_block.create_time
  8  FROM dbsbuffer_dataset_subscription
  9  INNER JOIN dbsbuffer_dataset ON
 10     dbsbuffer_dataset.id = dbsbuffer_dataset_subscription.dataset_id
 11  INNER JOIN dbsbuffer_block ON
 12     dbsbuffer_block.dataset_id = dbsbuffer_dataset_subscription.dataset_id
 13  INNER JOIN dbsbuffer_file ON
 14     dbsbuffer_file.block_id = dbsbuffer_block.id
 15  INNER JOIN dbsbuffer_workflow ON
 16     dbsbuffer_workflow.id = dbsbuffer_file.workflow
 17  INNER JOIN dbsbuffer_location ON
 18     dbsbuffer_location.id = dbsbuffer_block.location
 19  WHERE dbsbuffer_dataset_subscription.delete_blocks = 1
 20  AND dbsbuffer_dataset_subscription.subscribed = 1
 21  AND dbsbuffer_block.status = 'Closed'
 22  AND dbsbuffer_block.deleted = 0
 23  AND dbsbuffer_block.blockname = '/ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00'
 24   GROUP BY dbsbuffer_block.blockname,
 25     dbsbuffer_location.pnn,
 26     dbsbuffer_dataset.path,
 27     dbsbuffer_dataset_subscription.site,
 28     dbsbuffer_block.create_time ;

  COUNT(*) BLOCKNAME														    PNN 	    PATH				     SITE		CREATE_TIME
---------- ------------------------------------------------------------------------------------------------------------------------ --------------- ---------------------------------------- ------------------ -----------
       201 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_Disk	 1660249466
       201 /ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00						    T0_CH_CERN_Disk /ZeroBias/Run2022C-PromptReco-v1/AOD     T1_US_FNAL_MSS	 1660249466

And here is the result if we apply the constraint for workflow completion.

SQL> SELECT
  2     count(*),
  3     dbsbuffer_block.blockname,
  4     dbsbuffer_location.pnn,
  5     dbsbuffer_dataset.path,
  6     dbsbuffer_dataset_subscription.site,
  7     dbsbuffer_block.create_time
  8  FROM dbsbuffer_dataset_subscription
  9  INNER JOIN dbsbuffer_dataset ON
 10     dbsbuffer_dataset.id = dbsbuffer_dataset_subscription.dataset_id
 11  INNER JOIN dbsbuffer_block ON
 12     dbsbuffer_block.dataset_id = dbsbuffer_dataset_subscription.dataset_id
 13  INNER JOIN dbsbuffer_file ON
 14     dbsbuffer_file.block_id = dbsbuffer_block.id
 15  INNER JOIN dbsbuffer_workflow ON
 16     dbsbuffer_workflow.id = dbsbuffer_file.workflow
 17  INNER JOIN dbsbuffer_location ON
 18     dbsbuffer_location.id = dbsbuffer_block.location
 19  WHERE dbsbuffer_dataset_subscription.delete_blocks = 1
 20  AND dbsbuffer_dataset_subscription.subscribed = 1
 21  AND dbsbuffer_block.status = 'Closed'
 22  AND dbsbuffer_block.deleted = 0
 23  AND dbsbuffer_block.blockname = '/ZeroBias/Run2022C-PromptReco-v1/AOD#d54f99e8-9d28-4567-921f-edce2a207f00'
 24   GROUP BY dbsbuffer_block.blockname,
 25     dbsbuffer_location.pnn,
 26     dbsbuffer_dataset.path,
 27     dbsbuffer_dataset_subscription.site,
 28     dbsbuffer_block.create_time
 29  HAVING COUNT(*) = SUM(dbsbuffer_workflow.completed) ;

no rows selected

It returns no rows as expected, because there are still workflows associated with this block which are not completed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid this same faulty logic applies for the extra granularity added by the multiple sites as well as with the multiple associated workflows

dictResults = {}
for record in listResults:
# Populates results dict and adds all workflows of the same block to a single record
blockname = record['blockname']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use cammelCase for the python variables.

for record in listResults:
# Populates results dict and adds all workflows of the same block to a single record
blockname = record['blockname']
if blockname in dictResults:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the DAO must return the list of results as it is from the DABASE and no extra data reformatting should be needed. It seems to me that if this extra aggregation by block name here in the python code is needed, then something is not quite working as we expect in the GROUP BY clause in the SQL query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this aggregation outside of the DAO. Also, we may be able to modify the query so it ignores blocks with blocks that are not completed, but then that's probably redundant with the deletableWorkflows query

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @germanfgv! We largely discussed the problem an tested multiple times with you, and we now know in details what is going on. I'd suggest we continue as it is. And also since this DAO is used at a single place in the WMCore code, my previous remark that we should avoid the aggregation in the DAO and leave it at the component would only make the code at the component more unreadable. So lets proceed with what we already have. Just put the comments, as we discussed during our meeting, mentioning what are the background condition we rely on in order for that logic to work. So that we do remember those little but extremely important details. Thanks!

blockname = record['blockname']
if blockname in dictResults:
dictResults[blockname]['workflowNames'].add(record['name'])
dictResults[blockname]['sites'].add(record['site'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this line here would just add the site to the set of sites of an already existing block name, skipping the values of the rest of the columns in the record. I am not sure this is safe. Are we not losing information here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other columns are block name, location, dataset, and creation time. All of those are the same for each block, regardless of the subscription in the record.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, put this little detail at a comment in the code, please.

# Populates results dict and adds all workflows of the same block to a single record
blockname = record['blockname']
if blockname in dictResults:
dictResults[blockname]['workflowNames'].add(record['name'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as bellow.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 29 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13507/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @germanfgv , please squash those commits in a single one so we can merge it.

@todor-ivanov todor-ivanov merged commit d813d61 into dmwm:master Aug 17, 2022
@amaltaro
Copy link
Contributor

@germanfgv @todor-ivanov thanks for following this up to the end. In the future, please keep in mind that:

  • the label saying it's "work in progress" or "do not merge yet" should be removed by the developer, to flag that it's ready to go
  • commits must be squashed not to pollute the file history with unneeded commits.
    Thanks!

@todor-ivanov
Copy link
Contributor

Hi @amaltaro,

This must have been my fault. We agreed with @germanfgv for this PR he should skip the squash step because no unittest related commits were needed, but instead I was supposed to push the Squash and Merge button and let github do the job for us... Well, IIRC I did so, but obviously something did not go as expected.

@amaltaro
Copy link
Contributor

Oh, looking at the master commit history, it indeed did the job and there is only 1 commit (I had looked only at the PR itself).

@todor-ivanov
Copy link
Contributor

This actually is a good lesson not to touch this button any more (I will put some red tape on my screen right where it is :) )
I was actually expecting to see the squash && merge action to be performed also in the PR. I do consider having a single commit in the PR to be more readable for future reference. Next time we should squash the old fashion way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RucioInjector deleting blocks prematurely due to multiple workflows writting on them
5 participants