Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gh-1032] Feature flags for topics and confidentiality and custom confidentiality list #1049

Merged
merged 12 commits into from
Feb 19, 2024

Conversation

TejasRGitHub
Copy link
Contributor

@TejasRGitHub TejasRGitHub commented Feb 9, 2024

Feature or Bugfix

  • Feature

Detail

This features adds feature flags to enable / disable topics and confidentiality fields while creating a dataset. Moreover, it adds ways to provide custom confidentiality fields to show on the UI.
Please note - For the custom confidentiality fields you will have to provide the mapping to the existing confidentiality levels ( i.e. Unclassified, Secret, Official ) in data.all . Please refer to this section about setting data.all for more details - https://data-dot-all.github.io/dataall/deploy-aws/

Testing

  1. Tested on local dev setup ( All Config Switches , Custom confidentiality values and standard confidentiality values )
  2. Tested on AWS account ( All Config Switches and custom confidentiality / standard confidentiality values )
  3. Unit test

Things to do

  • Readme document to let user know how to use this feature of custom_confidentiality_mapping ( TODO When PR is APPROVED ) [EDIT] - Readme for GH-1032  #1071

How to use this feature.

The config.json is already updated with the new configs , if you want to use the custom mapping , please add the following section in the config.json under the modules \ datasets \ features

"custom_confidentiality_mapping" : {
                    "Public" : "Unclassified",
                    "Custom Confidentiality" : "Official",
                    "Custom Confidential" : "Secret",
                    "Another Confidentiality" : "Official"
 }

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)? No
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization? No
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features? No
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored? No
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@TejasRGitHub TejasRGitHub changed the title [Gh-1032] feature flags for topics and confidentiality and custom confidentiality list [Gh-1032] Feature flags for topics and confidentiality and custom confidentiality list Feb 9, 2024
@noah-paige
Copy link
Contributor

noah-paige commented Feb 12, 2024

Thanks for opening @TejasRGitHub - I did an initial review of the PR and left some comments to address - I'll test this out in a deployment tomorrow

@TejasRGitHub
Copy link
Contributor Author

TejasRGitHub commented Feb 13, 2024

Updated PR with the changes requested.

Testing

  • Unit tests
  • Testing on AWS account ( Sanity tested the Dataset creation and Editing )

@noah-paige
Copy link
Contributor

noah-paige commented Feb 14, 2024

Testing on AWS:

With Dropdowns enabled (Topics and confidentiality) & Custom Confidentiality Mapping As:

                  "Public" : "Unclassified",
                  "Custom Official" : "Official",
                  "Custom Secret" : "Secret",
                  "Super Official" : "Official"
  • CICD Succeeds
  • Create Dataset --> Create with a Custom Confidentiality
  • Catalog --> Filter By Custom Confidentiality Classes
  • Dataset Overview --> Dataset Governance Shows Custom Confidentiality Classes

With Dropdowns Disabled (Topics and Confidentiality):

  • CICD Succeeds
  • Import Dataset --> Does Not Show Topics or Confidentiality
    • ERROR: Form looks good and doesnt show topics or confidentiality but creating the dataset stack receive CREATE_FAILED (See Comment below in Notes as to FIx)
  • Catalog --> Does Not Show Filters By Topic or Confidentiality
  • Dataset Overview --> Dataset Governance Does Not Show Topic or Confidentiality

With Dropdowns Enabled (No Custom Mapping):

  • CICD Succeeds
  • Edit Dataset --> Shows Default Confidentiality
  • Catalog --> Show Filters By Topic or Default Confidentiality
  • Dataset Overview --> Dataset Governance Show Topic or Default Confidentiality

Notes:

  • If updating to custom confidentiality and for pre-existing datasets using the original confidentiality option Unclassified —> They will all now be treated as Official / Secret since get_confidentiality_level() will return None

  • Also if disabling confidentiality_dropdown then all datasets will be treated as Official / Secret

  • If your custom confidentiality has a whitespace the Catalog Filter treats it as 2 separate filter items (i.e. for "Super Official" dataset confidentiality classification - catalog filter will have options "super" and "official" and both would return the dataset)

  • NEEDS FIXING: When need to not execute the following line when creating dataset_stack and confidentiality_drop_down is false:
    Tags.of(self).add('Classification', dataset.confidentiality) (L535 of dataset_stack.py)
    Can likely just add an if statement (if dataset.confidentiality:)

@noah-paige
Copy link
Contributor

noah-paige commented Feb 14, 2024

I finished testing and called out some notes in the above comment - I think the only point that needs fixing before PR approval is the dataset_stack creation Error due to missing tag value on dataset.confidentiality

Otherwise, I think the behaviors of how we are treating pre-existing datasets when switching to custom confidentiality or disabling altogether is fine (it opts to the more restritive option rather than Unclassified behavior which may be best) but would like some second opinions on it @TejasRGitHub @dlpzx

One final suggestion is to rename topics_dropdown and confidentiality_dropdown to topics_metadata and confidentiality_metadata (or classification, classes, similar...) A name the illustrates we are not just enabling/disabling a dropdown but changing the additional info we attribute to data.all datasets

@TejasRGitHub
Copy link
Contributor Author

Hi @noah-paige , Thanks for catching that. I must have missed that during testing.

For the notes, about behaviours ( when a drop down is not present / changing the behaviour from custom to standard) I agree that the behaviour is restrictive and I don't have any issue with it as such.

Regarding your suggestion, I understand why you made this suggestion, but the configs i.e. confidentiality_dropdown and topics_dropdown are truly for switching enabling/disabling those field. Adding metadata ( like custom mapping for confidentiality ) is a configuration addition. From a user perspective, when a user sets the configs in config.json they can enable and disable the configs as suggested by the config names. For extra metadata, I think the user will be informed about this in the readme / webpage to setup the AWS deployment, etc

Apart from that, your notes made me realize that the catalog search would be rendered useless if the custom mapping names have a space. For that I suggest we remove the spaced part in the name and make it whole while indexing it at dataall/modules/datasets/indexers/dataset_indexer.py ( Similar to what is been done with the tags )

I will fix the error pointed out by you and push the updated code. Before that please let me know if you agree with the above as well

@noah-paige
Copy link
Contributor

The above sounds good to me - please go ahead making those final changes

@@ -34,7 +36,7 @@ def upsert(cls, session, dataset_uri: str):
'source': dataset.S3BucketName,
'resourceKind': 'dataset',
'description': dataset.description,
'classification': dataset.confidentiality,
'classification': re.sub('[^A-Za-z0-9]+', '', dataset.confidentiality),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this to update the confidentiality to remove any special characters/spaces

@TejasRGitHub
Copy link
Contributor Author

TejasRGitHub commented Feb 15, 2024

Hi @noah-paige , Updated the code.

Testing

Deployed data.all with Confidentiality is disabled and Topics is enabled.

Able to create Dataset ✅
Able to edit the dataset ✅

Deployed data.all with Confidentiality is Enabled and Topics is Enabled

Able to edit the same dataset ✅
Able to import a dataset ✅

Deployed data.all with Confidentiality is Enabled and Topics is Enabled and using custom mapping which contain spaces and '-' and some special characters

Able to create Dataset . And able to see the custom confidentiality string ✅
Catalog show the custom confidentiality with spaces and special characters now showing without spaces and special characters and contains the full string ✅
Able to edit the dataset and change the mapping ✅

Copy link
Contributor

@noah-paige noah-paige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for making the iterative changes @TejasRGitHub - this PR looks good to me!

@TejasRGitHub
Copy link
Contributor Author

TejasRGitHub commented Feb 16, 2024

Hi @noah-paige , Thanks for approving. Merging the PR (squashing and merging ). I will raise another PR to provide readme instrunctions on the features

@noah-paige noah-paige merged commit b6449d1 into data-dot-all:main Feb 19, 2024
8 checks passed
@noah-paige noah-paige added this to the v2.3.0 milestone Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature flags for topics and confidentiality and custom text list for confidentiality
2 participants