Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce dataset_lifetime column to be integer and not null #11115

Merged
merged 1 commit into from
May 2, 2022

Conversation

amaltaro
Copy link
Contributor

@amaltaro amaltaro commented Apr 25, 2022

Fixes #11108
Alternative to #11111

Status

ready

Description

This change will ensure that:

  • the database won't accept None as input for the dataset_lifetime column
  • that we properly convert None to 0 whenever needed, before creating a record in the database.

Is it backward compatible (if not, which system it affects?)

NO (database schema change)

Related PRs

If this proposal is acceptable and correct, we might want to merge this in master only; backport and merge #11111 to the 2.0.2_wmagent branch and create a new patch release.

External dependencies / deployment changes

None

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 4 warnings
    • 9 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13077/artifact/artifacts/PullRequestReport.html

@jhonatanamado
Copy link
Contributor

jhonatanamado commented Apr 26, 2022

Hi @amaltaro, Alan please consider the following changes in this PR in order to fix an issue in the deployment of a T0 agent during the creation of the oracle tables as can been seen below

DEBUG:root:Problem creating database table 

CREATE TABLE dbsbuffer_dataset_subscription (
                 id                 INTEGER      NOT NULL,
                 dataset_id         INTEGER      NOT NULL,
                 site               VARCHAR(100) NOT NULL,
                 custodial          INTEGER      DEFAULT 0,
                 auto_approve       INTEGER      DEFAULT 0,
                 move               INTEGER      DEFAULT 0,
                 priority           VARCHAR(10)  DEFAULT 'Low',
                 subscribed         INTEGER      DEFAULT 0,
                 phedex_group       VARCHAR(100),
                 delete_blocks      INTEGER,
                 dataset_lifetime   INTEGER      NOT NULL DEFAULT 0,
                 PRIMARY KEY (id),
                 CONSTRAINT uq_dbs_dat_sub UNIQUE (dataset_id, site, custodial, auto_approve, move, priority)
               )

(cx_Oracle.DatabaseError) ORA-00907: missing right parenthesis
[SQL: CREATE TABLE dbsbuffer_dataset_subscription (
                 id                 INTEGER      NOT NULL,
                 dataset_id         INTEGER      NOT NULL,
                 site               VARCHAR(100) NOT NULL,
                 custodial          INTEGER      DEFAULT 0,
                 auto_approve       INTEGER      DEFAULT 0,
                 move               INTEGER      DEFAULT 0,
                 priority           VARCHAR(10)  DEFAULT 'Low',
                 subscribed         INTEGER      DEFAULT 0,
                 phedex_group       VARCHAR(100),
                 delete_blocks      INTEGER,
                 dataset_lifetime   INTEGER      NOT NULL DEFAULT 0,
                 PRIMARY KEY (id),
                 CONSTRAINT uq_dbs_dat_sub UNIQUE (dataset_id, site, custodial, auto_approve, move, priority)
               )]

I deployed a replay with WMCore 2.0.2.patch2 + PR11115 + Commit63d99e and the replay finished without issues. I checked the container rules created by T0 for subscriptions to disk and the rules have the proper lifetime.

'phedex_group': phedex_group,
'delete_blocks': delete_blocks,
'dataset_lifetime': subscriptionInfo['DatasetLifetime']})
bind = {'id': datasetID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more protective if you explicitly cast data-types for each binding value, e.g.

bind = {'id': int(datasetID), 'site': str(site), ...}

Doing this way you'll ensure that proper data-type is passed to bind dictionary which you later pass to ORACLE DB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good suggestion indeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm updating this code with a few more safe checks for None. But these suggestions might actually cause more problems than resolve them. For instance, casting None to Integer will crash, casting a byte string to string will make things even worse.

In addition to that, dataset_id is already protected by the database schema and this code is in use for many years. Different than dataset_lifetime, which is a new feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way how you implemented those additional checks do look sound to me. Thanks @amaltaro

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing in it the proper way @amaltaro. As of the current implementation I basically did second the previous two comments. But I read the PR status as not-tested so I believe you would have those fixed during testing anyway.

'phedex_group': phedex_group,
'delete_blocks': delete_blocks,
'dataset_lifetime': subscriptionInfo['DatasetLifetime']})
bind = {'id': datasetID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good suggestion indeed.

@@ -49,7 +49,7 @@ def __init__(self, logger = None, dbi = None, params = None):
subscribed INTEGER DEFAULT 0,
phedex_group VARCHAR(100),
delete_blocks INTEGER,
dataset_lifetime INTEGER DEFAULT 0,
dataset_lifetime INTEGER NOT NULL DEFAULT 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaltaro I think @jhonatanamado's comment needs to be addressed here too.

@amaltaro
Copy link
Contributor Author

@jhonatanamado thank you very much for testing this in a replay! I fixed the order of those constraints in my second commit. It also has a fix further data type checks, as requested by Valentin and Todor. Please have another look at it.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 4 warnings
    • 9 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13083/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

I confirm that Oracle-based Jenkins tests are happier now, while the previous order of constraints was causing 92 new tests to fail.

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follwo up @amaltaro !
The code looks good.

'phedex_group': phedex_group,
'delete_blocks': delete_blocks,
'dataset_lifetime': subscriptionInfo['DatasetLifetime']})
bind = {'id': datasetID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way how you implemented those additional checks do look sound to me. Thanks @amaltaro

Copy link
Contributor

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to fix it right, please add validation to all passed values, I provided concrete examples to each individual columns.

'delete_blocks': delete_blocks,
'dataset_lifetime': subscriptionInfo['DatasetLifetime']})
bind = {'id': datasetID,
'site': site,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you protect site to satisfy to correct regexp. The schema only require it to be not null, so someone can pass "ABC" and it will be fine, but this is not what our sites are. Please put protection to validate this input value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already validated upstream, at the request spec level, e.g.:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/StdBase.py#L1135

'dataset_lifetime': subscriptionInfo['DatasetLifetime']})
bind = {'id': datasetID,
'site': site,
'custodial': 0 if custodialFlag is None else custodialFlag,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custodialFlag here can be passed as -1 , it is still and integer but I doubt it is correct value, please validate it appropriately to fall to a specific range, e.g. be only 1 or 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the custodialFlag is actually either True or False, as defined in this very same code. Unsure how was/is getting casted to integer though...

'site': site,
'custodial': 0 if custodialFlag is None else custodialFlag,
'auto_approve': 1 if site in subscriptionInfo['AutoApproveSites'] else 0,
'move': 0 if isMove is None else isMove,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isMove requires validation too, someone can pass -1 and code will not spot it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually removing this check on isMove because it's already validated a few lines above.

'custodial': 0 if custodialFlag is None else custodialFlag,
'auto_approve': 1 if site in subscriptionInfo['AutoApproveSites'] else 0,
'move': 0 if isMove is None else isMove,
'priority': subscriptionInfo['Priority'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make proper default here, e.g. subscriptionInfo.get('Priority', "some-default-value"). And, the value should be validated too to satisfy allowed values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request priority default comes from the upper spec file, here (and data type is also validated in there):
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/StdBase.py#L1025

Having default values defined in multiple places is just source for a future error.

'auto_approve': 1 if site in subscriptionInfo['AutoApproveSites'] else 0,
'move': 0 if isMove is None else isMove,
'priority': subscriptionInfo['Priority'],
'phedex_group': phedex_group,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you really need it anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it's no longer relevant. Actually, a good chunk of this is no longer needed and we need to re-purpose it according to the Rucio convention. To be addressed here: #9639

'move': 0 if isMove is None else isMove,
'priority': subscriptionInfo['Priority'],
'phedex_group': phedex_group,
'delete_blocks': delete_blocks}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires validation too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already covered a few lines above and in accordance with the database schema. I could be wrong, but AFAIK it's more performant to have a column with None/null than with an integer 0. So I'm in favor of actually not changing this 1 or None.

@amaltaro
Copy link
Contributor Author

@vkuznet in addition to the follow up comments I made along the code, I also pushed another commit with further changes. I will squash it once review is over. Please have another look at your convenience, but keep in mind that much of this actually comes from upstream - request level - and most of that is already validated and data type is checked as well.

I'd like to avoid adding multiple layers of data validation whenever possible.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 4 warnings
    • 9 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13091/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you mentioned several times that some values are validated upstream please put appropriate comment to the code about these checks. It would be useful to know why some values you check and others do not.

@amaltaro
Copy link
Contributor Author

Since you mentioned several times that some values are validated upstream please put appropriate comment to the code about these checks. It would be useful to know why some values you check and others do not.

I understand your request, but bear in mind that EVERY single workflow parameter is actually validated by WMSpec/StdSpecs specs. If we do so, then we should do it for 100s of files that are using/setting workflow related parameters.

Instead, just give it some time and you will get to know what those parameters are, but in short everything is defined in this super class:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/StdBase.py#L968

@amaltaro
Copy link
Contributor Author

BTW, these latest changes have also been tested with oracle jenkins.

ensure move and custodial are not None; fix order of db constraints

address some of Valentins concerns

add docstring and extra comment for the param/validation
@amaltaro
Copy link
Contributor Author

amaltaro commented May 2, 2022

@vkuznet Valentin, I just added a docstring string and further comments on the parameters/validation. However, as stated above, every workflow-related information that goes to the relational database is validated beforehand and IMO adding such comments to 1 DAO out of hundreds won't be of much help. Please have another look at your convenience.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 4 warnings
    • 8 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13116/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

amaltaro commented May 2, 2022

Thank you Valentin and Todor. Given that it doesn't change anything in the central services, I'm merging it now.

@amaltaro amaltaro merged commit daea165 into dmwm:master May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WorkQueue fails to create subscription due to ORACLE unique constraint violation
5 participants