-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopt ReqMgr2 authorization based on the request status #11223
Conversation
Jenkins results:
|
Looking into testbed ReqMgr2 logs, I see a bunch of:
which actually comes from this code: thus, fallback'ing to an in-memory data structure. This explains the question 1. from the initial description. |
@haozturk Hasan, I think your input would be valuable in this PR. Could you please have a look at the initial description and at what was documented in the PERMISSION_BY_STATUS.py module? Do you think the current implementation is a reasonable commitment? |
Hi @amaltaro thanks a lot for providing these changes. Here are my comments:
OPS needs to perform these transitions in exceptional cases. Can we make these Please let me know if there is anything else that I should comment on. |
@haozturk thanks for your reply. Let me make sure I understand:
|
|
For the time being, allow rejected and aborted as well. | ||
""" | ||
|
||
### FIXME TODO: remove these comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: all these comments need to be removed. I will do so before getting it merged.
Thank you very much for your input, Hasan. I updated this PR with your suggestions. Regarding status transition to staging and staged, do you think you would need them both? Or you just need to exceptionally force a workflow once its input data rules have been created (thus, once it's sitting in status Todor, Valentin, I still need to test this, but it would be great if you could have a first look into it. Thanks |
Jenkins results:
|
test this please |
Jenkins results:
|
Thanks a lot for your changes Alan.
We need both. Recently we needed to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amaltaro
I looked at it and could not spot anything in regards to the details of coding style etc that may need a change. All looks good in this regards. I do have one general comment though. Since now we tie the process to a status transition process, which is a more dynamic process (even in long term system maintenance) than just a type of a workflow, I'd say we may need more flexibility in this approach rather than it was before. Just an idea to consider here, no need to be a good one though. But we may think of splitting this PERMISSION_BY_STATUS
file in two and expose some (especially the part with the set of transitions) in a config file.
Other than that all looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do nor understand why ReqMgr2 authz is here since we have the same info in CRIC? In other words how CRIC rules are different from ReqMgr2 and why we duplicate this information here? Please note, we adopted CRIC both in apache FE and APS, as such this PR implies two level of authorization. First, user will be checked in FE, then over here. And, what worries me is that rules may differ and can cause conflicts.
Thank you for these comments, let me try to answer each of them here. @todor-ivanov yes, I do agree that there is room for improvements and we can expand this authorization to make it more flexible in the future with other use cases (one of them would be to support status transition, instead of target status). Note that the current changes are configurable, provided that we write that document to central CouchDB. If we do not, then it loads from memory. @vkuznet CRIC is the database of groups/roles (resources and etc), it does not perform any authz algorithm, as you know. The CMSWEB frontends only perform the authentication part, meaning who can access CMS computing resources or not. It then adds headers to the HTTP requests that go to the backend services, and those backend services are responsible for performing the authz logic, if any. That means, in our application (ReqMgr2 here), we need to define which CRIC roles/groups can do what, hence this implementation. I am still working on the unit tests that apparently need to be updated. Once things are looking better to me, I will clean a couple of things from this PR and request a new review ;) Please let me know if you have any other concerns and/or questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alan, I still very uncomfortable with your arguments. Let me explain why. The code defines hard-coded values which you rely on. First, any hard-coded values is source of troubles. Second, I don't see any rules here. For instance, you define the following:
# Admin group/roles: facops/web-service, reqmgr/admin, reqmgr/developer
ADMIN_GROUP = ['reqmgr', 'facops']
ADMIN_ROLES = ['admin', 'developer', 'web-service']
# Ops group/roles: dataops/production-operator
OPS_GROUP = ['dataops']
OPS_ROLES = ['production-operator']
# Offline group/roles: reqmgr/data-manager
PPD_GROUP = ['reqmgr']
PPD_ROLES = ['data-manager']
Let me ask few questions:
- why PPD_GROUP is assigned to
reqmgr
? In my view those are two different things. - why ADMIN_ROLES include web-services, again in my view those are two different things, the former is related to DMWM role, while the later is related to CMSWEB scope.
- etc.
I rather prefer that you'll work with CRIC team to define necessary (persistent) roles in CRIC DB rather in ReqMgr codebase. This has few benefits:
- persistency, it implies that whatever rules, group, roles you'll define will stay in DB and will be visible to any service. I can foresee that these rules can be used among different services, e.g. ReqMgr, MS services, etc.
- they will be define in one place and therefore it will avoid potential divergencies between CRIC and others
- finally, we should rely on e-groups to define independent rules/groups/roles. For instance, you may define ppd_group e-group where you can include different set of people. Moreover, such group can have dynamic nature, i.e. people can be added or removed. What you eventually need an API which will resolve e-group into list of user names which can be used for authorization.
At the end, I do not like the current proposal since I see (so far) many flaws in it which can lead to different set of problems. I suggest to re-evaluate it and rather define set of APIs which we may fill out to define authorization workflows. Once we'll have the APIs we can discuss their implementation.
I think it will be better if I explain these over a Zoom chat, please ping me on slack if you are available in the coming hours. Otherwise we can chat in the next week. However, just to comment on a few things:
Regarding CRIC, sorry, but I do not understand what you are saying. It looks like that is going out-of-scope to be honest. |
Alan, before zoom meeting here I want to share with you an alternative proposal:
Of course, you will eventually need to develop and maintain tools or authDB service, but I see this as a good investment since with such architecture you will achieve:
|
For this PR, as I understand it, the goal is to restore some functionailty (which is needed for operations) that was inadvertently lost. @amaltaro and @haozturk, can you confirm? I think creating a new authorization service is out of scope for this specific issue. I would suggest this PR be merged (unless there are in-scope comments that still need to be resolved). The suggested new authorization service can be created as a new issue that can be evaluated as part of the standard WMCore issue prioritization discussion. |
Jenkins results:
|
After having a chat with Valentin last Friday and clarifying all the comments/questions that were pending here, we agreed that:
are basically equivalent in terms of service flexibility, and none of them would require a new release to be created in a rush. However, I did mention that I would look into having it in the service configuration only, thus not depending on CouchDB at all. My last 2 commits provide the necessary changes for that, in addition to requiring these deployment changes: I will get some basic tests running tomorrow, and once I feel more comfortable with these changes, request for another review. |
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
@vkuznet I updated this PR such that it no longer defines authentication-related information in CouchDB, instead it all comes from the ReqMgr2 configuration now (see link to the deployment PR). I still need to run real tests in my VM, but I would appreciate any feedback that you might have meanwhile (note that I have a couple of placeholders in the code, meant to be removed before I get it merged). I will also update the PR description once I hear back from you. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alan, I think the code is in good shape, the only thing I think is missing is full description of data-format used through the code. It is common issue with python based code where the format passed across function and classes is not clear. I suggested in a code to fill this gap or you can put description to the top of the module.
a permission group and a list of allowed statuses. | ||
:param authzRolesGroups: a nested dictionary with CRIC roles and groups | ||
permissions for each permissions group | ||
:return: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be extremely useful if you'll provide a data-format description of authzByStatus input. Below the code already relies on specific keys (e.g. permission
) and values (e.g. NO_STATUS
). I think it will enhance the understanding of code below.
perform action, otherwise return None | ||
""" | ||
# FIXME TODO: remove the line below | ||
print(f"Checking user permissions for request args: {requestArgs}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alan: remove this line before merging it.
Jenkins results:
|
@vkuznet thanks for the prompt feedback. I updated the code according to your suggestion. In addition to that, I replaced |
AuthzByStatus([{"permission": "admin", "statuses": ["new", "assigned"]}, | ||
{"permission": "ops", "statuses": ["staging", "staged"]}, | ||
{"permission": "ppd", "statuses": ["acquired", "Alan"]}], | ||
{"admin": "a", "ops": "o", "ppd": "e"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line in git is not align with previous lines (it is shifted to the left by one character).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not part of the list object, but instead it's a different parameter passed to the class. So the column alignment needs to match the [
above, which seems correct to me (reason why my IDE doesn't auto-reformat it).
I had to push in a third commit to correct the error sent in the response object back to the client. Before that commit, here is an example of the HTTPResponse object status and reason:
with these headers:
While with the new commit, it becomes:
with these headers:
So now the client gets the correct status/reason. However, I have two observations:
@vkuznet would you have any thoughts here? |
@amaltaro , I think the issue here is that Request.py code does not explicitly set cherrypy error code, like |
Note that response status is correct and it matches the HTTPError status code (with the last commit). The only thing strange to me is the header So I think there is nothing to be changed and the current behavior looks good, even though I don't understand what was the reason to set X-Rest-Status to 200 in this case... |
ahh, that's exactly the part I was looking for. So, if you set Therefore, I suggest that you change this code within your PR or open new issue for that. The changes in this PR are good to me. |
Replace ReqMgr2 permission data structure, from request type to request status Apply Hasans suggestions Remake authorization to depend solely on service configuration pylint fixes for AuthzByStatus Update everyone to ppd in the src files Return correct error code and message to the client remove my debugging lines
update unit tests improve unit tests remove no longer needed unit tests add fake permissions to test reqmgr2 config more unit test fixes and pylint corrections further unit tests pylint update unit tests replacing everyone by ppd
Jenkins results:
|
Fixes #6072
Status
ready
Description
This PR removes:
PERMISSION_BY_REQUEST_TYPE
data structure)permissions
ReqMgr2 REST endpoint (we no longer store any auth/authz related data in CouchDB)It provides:
ppd
: can perform a small set of write actions (involving request status transition or not)ops
: can perform everything thatppd
can, and a few extra actions (like workflow assignment)admin
: can perform any write action (sort of ReqMgr2 admin). Meant to be used by machines/servicesAuthzByStatus
to parse the configuration map, validate it and to make a decision of which roles/groups are allowed for a given write operationIs it backward compatible (if not, which system it affects?)
YES (this authorization was implemented in the couchapps)
Related PRs
None
External dependencies / deployment changes
It requires these deployment changes: dmwm/deployment#1175
For our own education, the previous document was uploaded to CouchDB through a PUT call to: