Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt ReqMgr2 authorization based on the request status #11223

Merged
merged 2 commits into from
Jul 27, 2022

Conversation

amaltaro
Copy link
Contributor

@amaltaro amaltaro commented Jul 19, 2022

Fixes #6072

Status

ready

Description

This PR removes:

  • the outdated authorization logic based on the request type (PERMISSION_BY_REQUEST_TYPE data structure)
  • no longer relies on the same "PERMISSION_BY_REQUEST_TYPE" document, which might be available in CouchDB
  • and deprecate the permissions ReqMgr2 REST endpoint (we no longer store any auth/authz related data in CouchDB)

It provides:

  • support authorization map defined in ReqMgr2 service configuration
  • such map needs to define 3 different level of privileges:
    • ppd: can perform a small set of write actions (involving request status transition or not)
    • ops: can perform everything that ppd can, and a few extra actions (like workflow assignment)
    • admin: can perform any write action (sort of ReqMgr2 admin). Meant to be used by machines/services
  • a new class AuthzByStatus to parse the configuration map, validate it and to make a decision of which roles/groups are allowed for a given write operation

Is it backward compatible (if not, which system it affects?)

YES (this authorization was implemented in the couchapps)

Related PRs

None

External dependencies / deployment changes

It requires these deployment changes: dmwm/deployment#1175

For our own education, the previous document was uploaded to CouchDB through a PUT call to:

/reqmgr2/data/app_config/DEFAULT

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 3 tests deleted
    • 4 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 19 warnings
    • 153 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 63 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13413/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

Looking into testbed ReqMgr2 logs, I see a bunch of:

reqmgr2-20220719-reqmgr2-77bffcb54d-v4jwl.log:Getting from Cache due to: CouchNotFoundError - reason: Object Not Found, data: {} result: b'{"error":"not_found","reason":"missing"}\n'

which actually comes from this code:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/ReqMgr/DataStructs/ReqMgrConfigDataCache.py#L40

thus, fallback'ing to an in-memory data structure. This explains the question 1. from the initial description.

@amaltaro
Copy link
Contributor Author

@haozturk Hasan, I think your input would be valuable in this PR. Could you please have a look at the initial description and at what was documented in the PERMISSION_BY_STATUS.py module? Do you think the current implementation is a reasonable commitment?

@haozturk
Copy link

Hi @amaltaro thanks a lot for providing these changes. Here are my comments:

  1. I'm thinking of the problems that we had due to the lack of flexibility of setting permissions. I think the permission of "rejecting/aborting" requests should be separated because of the fact that it requires additional actions that OPS is aware and does such as rejecting relevant ACDCs and invalidating corresponding output datasets. We have seen cases where people neglected these additional actions and we ended up orphan ACDCs and datasets. Should PDMV have this privilege? They should until we provide them an interface to reject requests properly. We already have this for MC requests. So, they just tell us workflow names over an API and Unified does every necessary thing. However, we don't have a similar interface for ReReco and Relvals, which require developments from both parties. In short, I think we should separate reject/abort privilege and give this privilege to PDMV for now.
    "staging": ADMIN_PERMISSION,
    "staged": ADMIN_PERMISSION,

OPS needs to perform these transitions in exceptional cases. Can we make these OPS_PERMISSION?
3. I wouldn't call PDMV as OFFLINE_GROUP. Essentially, we are offline group. They are PPD.

Please let me know if there is anything else that I should comment on.

@amaltaro
Copy link
Contributor Author

@haozturk thanks for your reply. Let me make sure I understand:

  1. for rejected/aborted, those are already separated in this new permission model. Do you think that PPD should have privileges to abort workflows as well (meaning, a running workflow)? For rejecting workflows, I think a very valid use case is when they inject a new batch/campaign, and notice a problem with such requests before they even get started. Thus rejecting all of those before they are handed to Unified.
  2. Can you please clarify what would be the use cases for Ops having to force a status transition to staging/staged? We might find out that we can actually resolve something in WMCore and allow these transitions to be performed only by WM systems.
  3. I like to call us Computing (offline), while PPD is definitely Offline. I am happy to rename it to PPD though. Will do so in one of the future changes to this PR.
    Thanks Hasan!

@haozturk
Copy link

  1. Yes, they should have the privilege to abort as well until we provide a better interface. We still have the possibility of messing up by skipping ACDCs etc. but I complained multiple times about this and I don't see such issues for a while. So, i think it's okay to grant this permission until we have better interface. We had cases as you described, i.e. they realize an issue after submitting the requests and they need to abort/reject.
  2. We do it when we want to run a workflow that is stuck in staging where there is no hope from DM team: https://its.cern.ch/jira/browse/CMSCOMPPR-21824 We don't want to solve it w/ partial_copy, because we only want to release stuck workflows from a given campaign, not everything.
  3. I said this, because it was a bit confusing for me. I thought it's better to name them w/ their "official" names. Last call is up to you, ofc

For the time being, allow rejected and aborted as well.
"""

### FIXME TODO: remove these comments
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: all these comments need to be removed. I will do so before getting it merged.

@amaltaro
Copy link
Contributor Author

Thank you very much for your input, Hasan. I updated this PR with your suggestions.

Regarding status transition to staging and staged, do you think you would need them both? Or you just need to exceptionally force a workflow once its input data rules have been created (thus, once it's sitting in status staging)? Current changes allow it for both cases, but if I understood it right, you would only need the staged one.

Todor, Valentin, I still need to test this, but it would be great if you could have a first look into it. Thanks

@amaltaro amaltaro requested review from todor-ivanov and vkuznet July 22, 2022 03:21
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 19 warnings
    • 152 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 63 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13431/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

test this please

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 19 warnings
    • 152 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 63 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13432/artifact/artifacts/PullRequestReport.html

@haozturk
Copy link

Thanks a lot for your changes Alan.

Regarding status transition to staging and staged, do you think you would need them both? Or you just need to exceptionally force a workflow once its input data rules have been created (thus, once it's sitting in status staging)? Current changes allow it for both cases, but if I understood it right, you would only need the staged one.

We need both. Recently we needed to do assigned to staging as well, because we had a request whose campaign enforces 2 copies for pileup for 2 locations (CERN & FNAL). There was already a wmcore_transferor rule for one of these sites, however MSTransferor wasn't advancing the request to staging, because it wasn't able to create the rule for the other location due to lack of quota. The requests were urgent and we couldn't sort out the quota issue w/ DM team in a reasonable amount of time. Therefore, we advanced the requests and enforced them to run w/ the single copy of the pileup. This was discussed here on Slack if you're interested to learn more. What a mess ha :)

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amaltaro

I looked at it and could not spot anything in regards to the details of coding style etc that may need a change. All looks good in this regards. I do have one general comment though. Since now we tie the process to a status transition process, which is a more dynamic process (even in long term system maintenance) than just a type of a workflow, I'd say we may need more flexibility in this approach rather than it was before. Just an idea to consider here, no need to be a good one though. But we may think of splitting this PERMISSION_BY_STATUS file in two and expose some (especially the part with the set of transitions) in a config file.

Other than that all looks good to me.

Copy link
Contributor

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do nor understand why ReqMgr2 authz is here since we have the same info in CRIC? In other words how CRIC rules are different from ReqMgr2 and why we duplicate this information here? Please note, we adopted CRIC both in apache FE and APS, as such this PR implies two level of authorization. First, user will be checked in FE, then over here. And, what worries me is that rules may differ and can cause conflicts.

@amaltaro
Copy link
Contributor Author

Thank you for these comments, let me try to answer each of them here.

@todor-ivanov yes, I do agree that there is room for improvements and we can expand this authorization to make it more flexible in the future with other use cases (one of them would be to support status transition, instead of target status). Note that the current changes are configurable, provided that we write that document to central CouchDB. If we do not, then it loads from memory.

@vkuznet CRIC is the database of groups/roles (resources and etc), it does not perform any authz algorithm, as you know. The CMSWEB frontends only perform the authentication part, meaning who can access CMS computing resources or not. It then adds headers to the HTTP requests that go to the backend services, and those backend services are responsible for performing the authz logic, if any. That means, in our application (ReqMgr2 here), we need to define which CRIC roles/groups can do what, hence this implementation.

I am still working on the unit tests that apparently need to be updated. Once things are looking better to me, I will clean a couple of things from this PR and request a new review ;)

Please let me know if you have any other concerns and/or questions.

Copy link
Contributor

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, I still very uncomfortable with your arguments. Let me explain why. The code defines hard-coded values which you rely on. First, any hard-coded values is source of troubles. Second, I don't see any rules here. For instance, you define the following:

# Admin group/roles: facops/web-service, reqmgr/admin, reqmgr/developer
ADMIN_GROUP = ['reqmgr', 'facops']
ADMIN_ROLES = ['admin', 'developer', 'web-service']
# Ops group/roles: dataops/production-operator
OPS_GROUP = ['dataops']
OPS_ROLES = ['production-operator']
# Offline group/roles: reqmgr/data-manager
PPD_GROUP = ['reqmgr']
PPD_ROLES = ['data-manager']

Let me ask few questions:

  • why PPD_GROUP is assigned to reqmgr? In my view those are two different things.
  • why ADMIN_ROLES include web-services, again in my view those are two different things, the former is related to DMWM role, while the later is related to CMSWEB scope.
  • etc.

I rather prefer that you'll work with CRIC team to define necessary (persistent) roles in CRIC DB rather in ReqMgr codebase. This has few benefits:

  • persistency, it implies that whatever rules, group, roles you'll define will stay in DB and will be visible to any service. I can foresee that these rules can be used among different services, e.g. ReqMgr, MS services, etc.
  • they will be define in one place and therefore it will avoid potential divergencies between CRIC and others
  • finally, we should rely on e-groups to define independent rules/groups/roles. For instance, you may define ppd_group e-group where you can include different set of people. Moreover, such group can have dynamic nature, i.e. people can be added or removed. What you eventually need an API which will resolve e-group into list of user names which can be used for authorization.

At the end, I do not like the current proposal since I see (so far) many flaws in it which can lead to different set of problems. I suggest to re-evaluate it and rather define set of APIs which we may fill out to define authorization workflows. Once we'll have the APIs we can discuss their implementation.

@amaltaro
Copy link
Contributor Author

I think it will be better if I explain these over a Zoom chat, please ping me on slack if you are available in the coming hours. Otherwise we can chat in the next week.

However, just to comment on a few things:

  • About the "hard-coded" values, those are default values, which of course it means they need to be hard-coded. That document lives in CouchDB though, so we can update it at any time with HTTP requests
  • the PPD group creates workflows into ReqMgr2 (and take a few other actions as well), hence they need to have some permissions to do so.
  • the "web-services" role is a catch all for CMSWEB inter-services communication. It's a legacy group/role that has been used since my arrival in the CMS and I do not think this is the right moment to change it
  • given that we are discussing and working on token auth/authz, I don't think redesigning things with the CRIC team is a productive work at the moment.

Regarding CRIC, sorry, but I do not understand what you are saying. It looks like that is going out-of-scope to be honest.

@vkuznet
Copy link
Contributor

vkuznet commented Jul 22, 2022

Alan, before zoom meeting here I want to share with you an alternative proposal:

  • define list of APIs to perform authorization based on request status, e.g.
    • you can either define single API, authzUsert(request) which will take request name and return true or false for given user, e.g. authzUser("aborted"), or
    • use set of APIs for each workflow, e.g. authzAborted(), authzRejected(), etc.
    • These API will do the following:
      • get user info from HTTP header
      • obtain user groups from HTTP header
      • compare user groups with allowed set of groups for given workflow
      • check user presence in authorization DB for that workflow. Here I refer to DB which you should clearly define and I doubt CouchDB is a proper place, I rather prefer to have independent ORACLE DB for that.
  • you may need to either define service to manage authorization DB with APIs which will be used to query it or properly define administrative procedure and tools to communicate with that DB
    • the authDB service can have the following APIs
      • /authdb/user?name=alan will return user authdb details such as group/roles for given name
      • /authdb/user?dn=bla will return user details such as group/roles for given DN
      • /authdb/users?group=dataops will return list of users details for given group
      • /authdb/users?role=data-manager will return list of users details for given role

Of course, you will eventually need to develop and maintain tools or authDB service, but I see this as a good investment since with such architecture you will achieve:

  • independent and technology transparent authorization DB
  • you will manage users regardless of WMCore services
  • you may start using authDB service in different WMCore services
  • you will develop one set of APIs which can be used across services
  • you will be easily define, change, expand any authorization request to existing or new workflows, e.g. if we'll come up with yet another transition workflow (hypothetically: running-mcm or running-campaign) you can easily change/manage authz behavior for them
  • no hard-coded values will be defined in a code and all data will reside and managed in authorization database (once again please re-consider to use CouchDB for that purpose, it is not designed and used for that).

@klannon
Copy link

klannon commented Jul 25, 2022

For this PR, as I understand it, the goal is to restore some functionailty (which is needed for operations) that was inadvertently lost. @amaltaro and @haozturk, can you confirm? I think creating a new authorization service is out of scope for this specific issue. I would suggest this PR be merged (unless there are in-scope comments that still need to be resolved). The suggested new authorization service can be created as a new issue that can be evaluated as part of the standard WMCore issue prioritization discussion.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 19 warnings
    • 149 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 63 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13436/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

After having a chat with Valentin last Friday and clarifying all the comments/questions that were pending here, we agreed that:

  • having roles/groups configurable via service configuration; or
  • having a default data structure in the WMCore release, plus the ability to update permissions through HTTP requests (thus having it persisted in CouchDB).

are basically equivalent in terms of service flexibility, and none of them would require a new release to be created in a rush.

However, I did mention that I would look into having it in the service configuration only, thus not depending on CouchDB at all. My last 2 commits provide the necessary changes for that, in addition to requiring these deployment changes:
dmwm/deployment#1175

I will get some basic tests running tomorrow, and once I feel more comfortable with these changes, request for another review.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 14 new failures
    • 2 tests deleted
    • 2 tests no longer failing
    • 5 tests added
    • 5 changes in unstable tests
  • Python3 Pylint check: failed
    • 15 warnings and errors that must be fixed
    • 26 warnings
    • 273 comments to review
  • Pylint py3k check: failed
    • 2 warnings
  • Pycodestyle check: succeeded
    • 46 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13437/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests deleted
    • 2 tests no longer failing
    • 5 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 14 warnings and errors that must be fixed
    • 26 warnings
    • 274 comments to review
  • Pylint py3k check: failed
    • 2 warnings
  • Pycodestyle check: succeeded
    • 17 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13438/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 6 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 12 warnings and errors that must be fixed
    • 26 warnings
    • 276 comments to review
  • Pylint py3k check: failed
    • 2 warnings
  • Pycodestyle check: succeeded
    • 14 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13439/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 6 tests added
  • Python3 Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 26 warnings
    • 276 comments to review
  • Pylint py3k check: failed
    • 2 warnings
  • Pycodestyle check: succeeded
    • 14 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13440/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

@vkuznet I updated this PR such that it no longer defines authentication-related information in CouchDB, instead it all comes from the ReqMgr2 configuration now (see link to the deployment PR).
That authz structure is validated and there is a getter method to find which roles/groups are required for such action (see new module AuthzByStatus). I also removed the permissions REST endpoint and its related code.

I still need to run real tests in my VM, but I would appreciate any feedback that you might have meanwhile (note that I have a couple of placeholders in the code, meant to be removed before I get it merged). I will also update the PR description once I hear back from you. Thanks

@amaltaro amaltaro requested a review from vkuznet July 26, 2022 17:03
Copy link
Contributor

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, I think the code is in good shape, the only thing I think is missing is full description of data-format used through the code. It is common issue with python based code where the format passed across function and classes is not clear. I suggested in a code to fill this gap or you can put description to the top of the module.

a permission group and a list of allowed statuses.
:param authzRolesGroups: a nested dictionary with CRIC roles and groups
permissions for each permissions group
:return: None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be extremely useful if you'll provide a data-format description of authzByStatus input. Below the code already relies on specific keys (e.g. permission) and values (e.g. NO_STATUS). I think it will enhance the understanding of code below.

perform action, otherwise return None
"""
# FIXME TODO: remove the line below
print(f"Checking user permissions for request args: {requestArgs}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan: remove this line before merging it.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 6 tests added
  • Python3 Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 26 warnings
    • 276 comments to review
  • Pylint py3k check: failed
    • 2 warnings
  • Pycodestyle check: succeeded
    • 14 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13441/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

amaltaro commented Jul 26, 2022

@vkuznet thanks for the prompt feedback. I updated the code according to your suggestion. In addition to that, I replaced everyone by ppd, since it's not really everyone that were able to execute such operations. Feel free to review it, I still need to run some final tests though.
UPDATE: initial PR description has been updated as well.

@amaltaro amaltaro requested a review from vkuznet July 26, 2022 19:48
AuthzByStatus([{"permission": "admin", "statuses": ["new", "assigned"]},
{"permission": "ops", "statuses": ["staging", "staged"]},
{"permission": "ppd", "statuses": ["acquired", "Alan"]}],
{"admin": "a", "ops": "o", "ppd": "e"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line in git is not align with previous lines (it is shifted to the left by one character).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not part of the list object, but instead it's a different parameter passed to the class. So the column alignment needs to match the [ above, which seems correct to me (reason why my IDE doesn't auto-reformat it).

@amaltaro
Copy link
Contributor Author

I had to push in a third commit to correct the error sent in the response object back to the client. Before that commit, here is an example of the HTTPResponse object status and reason:

2022-07-27 15:33:42,195:INFO:reqmgr2: AMR resp status: 400
2022-07-27 15:33:42,196:INFO:reqmgr2: AMR resp reason: Bad Request

with these headers:

Server: CherryPy/17.4.0
Content-Type: text/html;charset=utf-8
X-Rest-Status: 1102
X-Error-Http: 400
X-Error-Id: d46b9c0d9c6104c71549a08160a6025a
X-Error-Detail: Invalid spec parameter value: (403, 'You are not allowed to access this resource.')
X-Rest-Time: 2932.072 us
Content-Length: 793
Vary: Accept-Encoding
CMS-Server-Time: D=8083 t=1658928822184853
Connection: close

While with the new commit, it becomes:

2022-07-27 15:48:39,703:INFO:reqmgr2: AMR resp status: 403
2022-07-27 15:48:39,704:INFO:reqmgr2: AMR resp reason: Forbidden

with these headers:

Server: CherryPy/17.4.0
Content-Type: text/html;charset=utf-8
X-Rest-Status: 200
X-Error-Http: 403
X-Error-Id: 95f450492396a3ff15dbd61986780bc4
X-Error-Detail: You are not allowed to access this resource.
X-Rest-Time: 4236.460 us
Content-Length: 750
Vary: Accept-Encoding
CMS-Server-Time: D=9631 t=1658929719691580

So now the client gets the correct status/reason. However, I have two observations:

  • X-Rest-Status has a different code now. Reason is that we do not create a RESTError exception, we simply re-raise the HTTPError. For the RESTError, see implementation here
  • Connection new headers don't report whether the connection has been closed or not. I fail to see where it comes from, but I think CherryPy should be closing it...

@vkuznet would you have any thoughts here?

@vkuznet
Copy link
Contributor

vkuznet commented Jul 27, 2022

@amaltaro , I think the issue here is that Request.py code does not explicitly set cherrypy error code, like cherrypy.response.status=400 (or whatever code should be). Moreover, I don't see any code which set application error in RESTError, while it is the case for cherrypy.HTTPError. But I do not know where in a code chain this code is set (if any).

@amaltaro
Copy link
Contributor Author

Note that response status is correct and it matches the HTTPError status code (with the last commit).

The only thing strange to me is the header X-Rest-Status, which is apparently used in WMCore to report an application status code(?). When we have an exception inheriting from RESTError, that exception gets an app_code, but not in the case of a plain HTTPError.
Looking at the module I previously pointed out, here is where we define X-Rest-Status=200 for HTTPError exception type:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/REST/Error.py#L296

So I think there is nothing to be changed and the current behavior looks good, even though I don't understand what was the reason to set X-Rest-Status to 200 in this case...
If this is Okay with you, I will squash these commits and put it in a new release.

@vkuznet
Copy link
Contributor

vkuznet commented Jul 27, 2022

ahh, that's exactly the part I was looking for. So, if you set X-REST-Status to 200 in an exception block I think it depends what it reflects to. For instance, if rest call was successful then 200 makes sense to me, but here the REST call was not even called due to authorization check and I think 200 is wrong. And, I would expect that if you have an exception you should not set this header to 200, a 500 (Internal Server Error) is much better value in this case.

Therefore, I suggest that you change this code within your PR or open new issue for that. The changes in this PR are good to me.

amaltaro added 2 commits July 27, 2022 15:25
Replace ReqMgr2 permission data structure, from request type to request status

Apply Hasans suggestions

Remake authorization to depend solely on service configuration

pylint fixes for AuthzByStatus

Update everyone to ppd in the src files

Return correct error code and message to the client

remove my debugging lines
update unit tests

improve unit tests

remove no longer needed unit tests

add fake permissions to test reqmgr2 config

more unit test fixes and pylint corrections

further unit tests pylint

update unit tests replacing everyone by ppd
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 6 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 5 warnings and errors that must be fixed
    • 26 warnings
    • 276 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 14 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13454/artifact/artifacts/PullRequestReport.html

@amaltaro amaltaro merged commit 759ea31 into dmwm:master Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a fine-grained permissions to ReqMgr2
6 participants