Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(policies): Allow policies to be applied to resources based on tags #9684

Merged
merged 2 commits into from
Jan 23, 2024

Conversation

pedro93
Copy link
Collaborator

@pedro93 pedro93 commented Jan 22, 2024

Details

  • Adds support for tags in policy resource filters. This will apply to all entities which have a globalTags aspect.
  • Adds new TagFieldResolverProvider which implements checking FieldResolvers for globalTags aspect.

Tests

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the devops PR or Issue related to DataHub backend & deployment label Jan 22, 2024
@anshbansal anshbansal added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Jan 22, 2024
@anshbansal anshbansal removed the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Jan 22, 2024
@@ -26,7 +27,8 @@ public DefaultEntitySpecResolver(Authentication systemAuthentication, EntityClie
new DomainFieldResolverProvider(entityClient, systemAuthentication),
new OwnerFieldResolverProvider(entityClient, systemAuthentication),
new DataPlatformInstanceFieldResolverProvider(entityClient, systemAuthentication),
new GroupMembershipFieldResolverProvider(entityClient, systemAuthentication));
new GroupMembershipFieldResolverProvider(entityClient, systemAuthentication),
new TagFieldResolverProvider(entityClient, systemAuthentication));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an inefficient design. The urns and their aspect reads should be batched and not fetching each aspect one by one for each urn. I know this isn't part of the scope for this PR, but authentication causes latency on practically every api call so it has a high impact. Even if we don't batch urns, we should definitely batch the aspects There should be a batched resolver for domain, owner, data platform, group membership, and tags. All these aspects should be fetched in batch using the entity client's same getV2 method just using the collection as something other then a single item Collections.singleton(

This is a perfect opportunity to address the tech debt here by adding the extra aspect and improve performance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the adjacent code, I don't think we are sequentially performing each of these resolvers for each request. Rather in getFieldResolvers (in this file) we are creating a dictionary of what query to run based on the EntityFieldType that an incoming request has in their filter.

See

return fieldResolvers.get(entityFieldType).getFieldValuesFuture().join().getValues();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line is executing a request per field meaning separate requests for each aspect. It is using futures, but not batching.

Copy link
Collaborator Author

@pedro93 pedro93 Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fieldResolvers.get(entityFieldType) is only getting 1 resolver and generating 1 future, which depends on the filter being analysed.

I think this is the line responsible for multiple possible requests:

return filter.getCriteria().stream().allMatch(criterion -> checkCriterion(criterion, resource));

I think any performance improvement here would come from policies that have multiple filter conditions like:
contains tag X AND is part of Domain Y AND is OwnedBy z, here 3 calls would happen sequentially (one of the them, domain, is out of scope). The remaining filterResolvers are either not used or make no call to the entity service.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if a single policy operates on 3 fields it would benefit from fetching all three at the same time. If there are multiple policies which depend on 3 different fields, it would also benefit from fetching all three at the same time. The general idea is to fetch the required aspects in 1 call for the policies being evaluated rather then 3 subsequent calls. I am assuming the same resource is being passed to each policy and that we would not be fetching the same aspect for each policy already today.

Copy link
Collaborator Author

@pedro93 pedro93 Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point in the code, only a single Policy’s filter gets evaluated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going down the route of evaluating all policies to check whether they apply for a given resource request is a large refactor of the policy engine. I would suggest a design document before doing anything else.

@RyanHolstien @david-leifker do we have metrics on how much the PolicyEngine is a performance bottleneck?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in this case if no policies are authored with any tag predicates on them? Does the code performance stay the same as today or degrade?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code performs the same as today. The field resolver for tags is not invoked.

Copy link
Collaborator

@david-leifker david-leifker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a tech debt perspective, spending the extra time to improve a high impact area of the code is justified. See comment in the PR, thanks!

@anshbansal anshbansal dismissed david-leifker’s stale review January 23, 2024 14:06

Discussed in slack

@anshbansal anshbansal merged commit f0a48b6 into master Jan 23, 2024
36 checks passed
@anshbansal anshbansal deleted the feature/cus-1378 branch January 23, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants