-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(policies): Allow policies to be applied to resources based on tags #9684
Conversation
@@ -26,7 +27,8 @@ public DefaultEntitySpecResolver(Authentication systemAuthentication, EntityClie | |||
new DomainFieldResolverProvider(entityClient, systemAuthentication), | |||
new OwnerFieldResolverProvider(entityClient, systemAuthentication), | |||
new DataPlatformInstanceFieldResolverProvider(entityClient, systemAuthentication), | |||
new GroupMembershipFieldResolverProvider(entityClient, systemAuthentication)); | |||
new GroupMembershipFieldResolverProvider(entityClient, systemAuthentication), | |||
new TagFieldResolverProvider(entityClient, systemAuthentication)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an inefficient design. The urns and their aspect reads should be batched and not fetching each aspect one by one for each urn. I know this isn't part of the scope for this PR, but authentication causes latency on practically every api call so it has a high impact. Even if we don't batch urns, we should definitely batch the aspects There should be a batched resolver for domain, owner, data platform, group membership, and tags. All these aspects should be fetched in batch using the entity client's same getV2 method just using the collection as something other then a single item Collections.singleton(
This is a perfect opportunity to address the tech debt here by adding the extra aspect and improve performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading the adjacent code, I don't think we are sequentially performing each of these resolvers for each request. Rather in getFieldResolvers
(in this file) we are creating a dictionary of what query to run based on the EntityFieldType that an incoming request has in their filter.
See
datahub/metadata-auth/auth-api/src/main/java/com/datahub/authorization/ResolvedEntitySpec.java
Line 25 in 943bb57
return fieldResolvers.get(entityFieldType).getFieldValuesFuture().join().getValues(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That line is executing a request per field meaning separate requests for each aspect. It is using futures, but not batching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fieldResolvers.get(entityFieldType)
is only getting 1 resolver and generating 1 future, which depends on the filter being analysed.
I think this is the line responsible for multiple possible requests:
datahub/metadata-service/auth-impl/src/main/java/com/datahub/authorization/PolicyEngine.java
Line 230 in 943bb57
return filter.getCriteria().stream().allMatch(criterion -> checkCriterion(criterion, resource)); |
I think any performance improvement here would come from policies that have multiple filter conditions like:
contains tag X AND is part of Domain Y AND is OwnedBy z
, here 3 calls would happen sequentially (one of the them, domain, is out of scope). The remaining filterResolvers are either not used or make no call to the entity service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if a single policy operates on 3 fields it would benefit from fetching all three at the same time. If there are multiple policies which depend on 3 different fields, it would also benefit from fetching all three at the same time. The general idea is to fetch the required aspects in 1 call for the policies being evaluated rather then 3 subsequent calls. I am assuming the same resource is being passed to each policy and that we would not be fetching the same aspect for each policy already today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point in the code, only a single Policy’s filter gets evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going down the route of evaluating all policies to check whether they apply for a given resource request is a large refactor of the policy engine. I would suggest a design document before doing anything else.
@RyanHolstien @david-leifker do we have metrics on how much the PolicyEngine is a performance bottleneck?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in this case if no policies are authored with any tag predicates on them? Does the code performance stay the same as today or degrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code performs the same as today. The field resolver for tags is not invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a tech debt perspective, spending the extra time to improve a high impact area of the code is justified. See comment in the PR, thanks!
Details
globalTags
aspect.TagFieldResolverProvider
which implements checking FieldResolvers forglobalTags
aspect.Tests
TagFieldResolverProvider
class.Checklist