Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2578] [Feature] Hierarchical access config in dbt_project.yml #7619

Closed
3 tasks done
Tracked by #7979
sp-tkerlavage opened this issue May 12, 2023 · 10 comments
Closed
3 tasks done
Tracked by #7979
Assignees
Labels
enhancement New feature or request model_groups_access Issues related to groups multi_project

Comments

@sp-tkerlavage
Copy link

sp-tkerlavage commented May 12, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The new groups and access features have been defined as properties
https://docs.getdbt.com/reference/resource-properties/access

Is it possible to instead define them as configs so that they can be specified in the dbt_project.yml hierarchically?
dbt_project.yml:

models:
  <resource-path>:
    +group: path1
    +access: private
  <resource-path>:
    +group: path2
    +access: public

Then certain model configurations can override these hierarchical definitions?
path1.yml

groups:
  - name: path1
    owner:
      name: Customer Success Team
      email: [email protected]

# Then, add 'group' + 'access' modifier to specific models
models:
  # This is a public model -- it's a stable & mature interface for other teams/projects
  - name: dim_customers
    group: path1
    access: public

Describe alternatives you've considered

Obviously we can define the access individually for each model as shown in the documentation

Who will this benefit?

Anyone that wants to hierarchically apply groups/access to large swaths of models

Are you interested in contributing this feature?

Most likely not familiar enough with the inner workings of dbt-core to contribute

Anything else?

No response

@sp-tkerlavage sp-tkerlavage added enhancement New feature or request triage labels May 12, 2023
@github-actions github-actions bot changed the title [Feature] Hierarchical group/access tags in project.yml [CT-2578] [Feature] Hierarchical group/access tags in project.yml May 12, 2023
@sp-tkerlavage sp-tkerlavage changed the title [CT-2578] [Feature] Hierarchical group/access tags in project.yml [CT-2578] [Feature] Hierarchical group/access properties in project.yml May 12, 2023
@sp-tkerlavage sp-tkerlavage changed the title [CT-2578] [Feature] Hierarchical group/access properties in project.yml [CT-2578] [Feature] Hierarchical group/access configs in project.yml May 12, 2023
@sp-tkerlavage sp-tkerlavage changed the title [CT-2578] [Feature] Hierarchical group/access configs in project.yml [CT-2578] [Feature] Hierarchical group/access configs in dbt_project.yml May 12, 2023
@dbeatty10
Copy link
Contributor

Thanks for reaching out @sp-tkerlavage !

The group config can be defined in dbt_project.yml, and it can already be defined hierarchically as you describe (see example below).

Are you requesting that the access property be a converted from a property to a config as well?

Example

Suppose you have the following directory tree:

models/
├── _models.yml
├── resource_path_1
│   └── stg_customers.sql
└── resource_path_2
    └── dim_customers.sql

And this models/_models.yml:

groups:
  - name: path1
    owner:
      name: Customer Success Team
      email: [email protected]
  - name: path2
    owner:
      name: Developer Experience Team
      email: [email protected]

And this dbt_project.yml:

name: "my_project"
version: "1.0.0"
config-version: 2
profile: "sandcastle"


models:
  my_project:
    resource_path_1:
      +group: path1
    resource_path_2:
      +group: path2

Then we can see that the group config is taken into account (and you can override it in a _properties.yml or the .sql or .py model file):

dbt \
    --quiet \
    ls \
    --output json \
    --output-keys name original_file_path version access group contract latest_version \
    -s '*.dim_customers*' \
| python -m json.tool

Output:

{
    "name": "dim_customers",
    "original_file_path": "models/resource_path_2/dim_customers.sql",
    "group": "path2",
    "contract": {
        "enforced": false,
        "checksum": null
    },
    "access": "protected",
    "version": null,
    "latest_version": null
}

@sp-tkerlavage
Copy link
Author

Yes thats what I meant, sorry for the confusion. I meant it would be nice to have the access property as a config so that it can be assigned hierarchically.

@dbeatty10
Copy link
Contributor

dbeatty10 commented May 16, 2023

No prob @sp-tkerlavage. I did some research, and this was an intentional decision as described here:

This should be a model-level attribute, not a configuration set & inherited for many models at once in dbt_project.yml. Every public model must be consciously and individually marked as such. This adds a teensy bit of friction, with the aim of ensuring intentionality.

and affirmed again here:

explicitly prohibit setting certain attributes in dbt_project.yml, for multiple models at once — e.g. access

Given that this was an intentional decision, my default inclination would be to close this as "not planned" unless new information has come to light that wasn't considered originally. What do you think @jtcohen6?

@sp-tkerlavage
Copy link
Author

sp-tkerlavage commented May 17, 2023

Actually one last bit of input, if the goal is to make sure that model access is set to public consciously, could access be a hierarchical configuration that is only configurable to either protected or the more restrictive private?

Or another suggestion is for access to be a property of the group, where again it can either be left as the default protected, or made more restrictive by explicitly setting it to private

groups:
  - name: path1
    access: private
    owner:
      name: Customer Success Team
      email: [email protected]
  - name: path2
    # this will just use the default of access: protected
    owner:
      name: Developer Experience Team
      email: [email protected]

In my use case I want everything to be private.With the current implementation, there's a risk of someone creating a model and forgetting to assign it an access level. If someone does forget to make the access policy more restrictive, models from outside the group can reference it later on in the pipeline it becomes difficult to find and/or debug

@dbeatty10 dbeatty10 changed the title [CT-2578] [Feature] Hierarchical group/access configs in dbt_project.yml [CT-2578] [Feature] Hierarchical access config in dbt_project.yml May 17, 2023
@dbeatty10 dbeatty10 removed the triage label May 17, 2023
@jtcohen6
Copy link
Contributor

Sorry for the delayed follow-up!

@dbeatty10 @joellabes and I have been chatting about this. I think there are two options here:

  1. While we should disallow setting access: public to many models at once (in dbt_project.yml), we should absolutely allow setting access: private for a whole subfolder at once. This goes along nicely with a correspondence between groups and subfolders. I'm not sure of exactly how we'd implement that—we don't have any patterns exactly like it today—but I bet we could find a way.
  2. The current approach to setting model access is too strict. We would be better off opening this up, and allowing users to set access wherever they want, to whatever they want, like any other configurations?

This is an opinion I've held strongly, but I can be overruled! My concern is that the full implications of setting a model to access: public are not clear right when you build it (unlike, say, with materialized: table). As soon as a model is public, it's going to be quite hard to take that back. I wonder if it's appropriate to wait until the implications of marking a model public are clearer (through the rest of this year): Other teams can discovery your public model. They can take a dependency on it, perhaps even without first talking to you.

The right answer here could still be process, rather than technical. It could be part of the PR description / review process: "Which new public models am I creating / existing models am I designating as public for the first time"?

@graciegoheen
Copy link
Contributor

graciegoheen commented Jul 17, 2023

The current approach to setting model access is too strict. We would be better off opening this up, and allowing users to set access wherever they want, to whatever they want, like any other configurations?

I think yes, if and only if we're able to alert dbt Cloud users that a given code change is "adding X new public models, upgrading Y existing models from protected -> public, etc.". Perhaps this alert could be handled via a dbt Cloud CI job. That way we would have some guardrails to prevent someone from accidentally exposing some sensitive models as public, while still allowing them to configure access by resource path (which I think is a valid way to configure all forms of access).

We could handle setting access the same way (rather than allowing setting access: private and access: protected in the dbt_project.yml but not access: public), but this may be a lift to produce information on why a given model is marked as "modified" in order to make this type of alerting possible.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 21, 2023

Thanks for writing that up @graciegoheen! I really liked where we landed in the conversation on Monday. We should make access a configuration for consistency, and allow users to set any value of it wherever — so long as we also have/support mechanisms for notifying when a significant change is being made, instances where many models are being promoted to access: public all at once. A clear "FYI" in dbt Cloud CI could be one such mechanism!

Within dbt-core, it would make sense to continue promoting access to being a top-level node attribute, at this point both for backward compatibility and because it's such an important input to metadata use cases. It doesn't feel great to keep doing this on a case-by-case basis, but that's the motivation behind this refactor:

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 7, 2023

A clear "FYI" in dbt Cloud CI could be one such mechanism!

Update: I am talking with @thorn14 about what ^this could look like :)

Not a hard blocker to pursuing this work in the meantime. At this point, we should probably open a dedicated implementation ticket for:

  • Turning access into a config that users can set anywhere
  • (Back compat) Promoting access to be a node-level attribute in manifest.json (similar to meta or tags). IMO It doesn't need to be in both places—we could pop it out of config, and into the top-level attributes (à la our thinking from a few months ago for: [CT-2296] [Spike] Remove the distinction between configs and non-configs #7157)

Update: Implementation ticket opened: #8383

@jonasberg
Copy link

Would love to see this feature - being able to set access to private on a directory level would be very helpful for governance of what models can be built upon!

@graciegoheen
Copy link
Contributor

I'm closing this as a duplicate, since we are tracking the implementation on #8383

Thanks for all of your input :)

@graciegoheen graciegoheen closed this as not planned Won't fix, can't repro, duplicate, stale Sep 8, 2023
@graciegoheen graciegoheen removed the Refinement Maintainer input needed label Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model_groups_access Issues related to groups multi_project
Projects
None yet
Development

No branches or pull requests

5 participants