[Gh-1032] Feature flags for topics and confidentiality and custom confidentiality list #1049

TejasRGitHub · 2024-02-09T21:43:48Z

Feature or Bugfix

Feature

Detail

This features adds feature flags to enable / disable topics and confidentiality fields while creating a dataset. Moreover, it adds ways to provide custom confidentiality fields to show on the UI.
Please note - For the custom confidentiality fields you will have to provide the mapping to the existing confidentiality levels ( i.e. Unclassified, Secret, Official ) in data.all . Please refer to this section about setting data.all for more details - https://data-dot-all.github.io/dataall/deploy-aws/

Testing

Tested on local dev setup ( All Config Switches , Custom confidentiality values and standard confidentiality values )
Tested on AWS account ( All Config Switches and custom confidentiality / standard confidentiality values )
Unit test

Things to do

Readme document to let user know how to use this feature of custom_confidentiality_mapping ( TODO When PR is APPROVED ) [EDIT] - Readme for GH-1032 #1071

How to use this feature.

The config.json is already updated with the new configs , if you want to use the custom mapping , please add the following section in the config.json under the modules \ datasets \ features

"custom_confidentiality_mapping" : {
                    "Public" : "Unclassified",
                    "Custom Confidentiality" : "Official",
                    "Custom Confidential" : "Secret",
                    "Another Confidentiality" : "Official"
 }

Relates

Feature flags for topics and confidentiality and custom text list for confidentiality #1032

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

Does this PR introduce or modify any input fields or queries - this includes
fetching data from storage outside the application (e.g. a database, an S3 bucket)? No
- Is the input sanitized?
- What precautions are you taking before deserializing the data you consume?
- Is injection prevented by parametrizing queries?
- Have you ensured no eval or similar functions are used?
Does this PR introduce any functionality or component that requires authorization? No
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
- Are you logging failed auth attempts?
Are you using or adding any cryptographic features? No
- Do you use a standard proven implementations?
- Are the used keys controlled by the customer? Where are they stored? No
Are you introducing any new policies/roles/users?
- Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…dentiality list

backend/dataall/modules/datasets/api/dataset/resolvers.py

frontend/src/modules/Datasets/components/DatasetGovernance.js

noah-paige · 2024-02-12T22:56:03Z

Thanks for opening @TejasRGitHub - I did an initial review of the PR and left some comments to address - I'll test this out in a deployment tomorrow

TejasRGitHub · 2024-02-13T17:34:23Z

Updated PR with the changes requested.

Testing

Unit tests
Testing on AWS account ( Sanity tested the Dataset creation and Editing )

noah-paige · 2024-02-14T17:21:19Z

Testing on AWS:

With Dropdowns enabled (Topics and confidentiality) & Custom Confidentiality Mapping As:

                  "Public" : "Unclassified",
                  "Custom Official" : "Official",
                  "Custom Secret" : "Secret",
                  "Super Official" : "Official"

CICD Succeeds
Create Dataset --> Create with a Custom Confidentiality
Catalog --> Filter By Custom Confidentiality Classes
Dataset Overview --> Dataset Governance Shows Custom Confidentiality Classes

With Dropdowns Disabled (Topics and Confidentiality):

CICD Succeeds
Import Dataset --> Does Not Show Topics or Confidentiality
- ERROR: Form looks good and doesnt show topics or confidentiality but creating the dataset stack receive CREATE_FAILED (See Comment below in Notes as to FIx)
Catalog --> Does Not Show Filters By Topic or Confidentiality
Dataset Overview --> Dataset Governance Does Not Show Topic or Confidentiality

With Dropdowns Enabled (No Custom Mapping):

CICD Succeeds
Edit Dataset --> Shows Default Confidentiality
Catalog --> Show Filters By Topic or Default Confidentiality
Dataset Overview --> Dataset Governance Show Topic or Default Confidentiality

Notes:

If updating to custom confidentiality and for pre-existing datasets using the original confidentiality option Unclassified —> They will all now be treated as Official / Secret since get_confidentiality_level() will return None
Also if disabling confidentiality_dropdown then all datasets will be treated as Official / Secret
If your custom confidentiality has a whitespace the Catalog Filter treats it as 2 separate filter items (i.e. for "Super Official" dataset confidentiality classification - catalog filter will have options "super" and "official" and both would return the dataset)
NEEDS FIXING: When need to not execute the following line when creating dataset_stack and confidentiality_drop_down is false:
Tags.of(self).add('Classification', dataset.confidentiality) (L535 of dataset_stack.py)
Can likely just add an if statement (if dataset.confidentiality:)

noah-paige · 2024-02-14T21:37:20Z

I finished testing and called out some notes in the above comment - I think the only point that needs fixing before PR approval is the dataset_stack creation Error due to missing tag value on dataset.confidentiality

Otherwise, I think the behaviors of how we are treating pre-existing datasets when switching to custom confidentiality or disabling altogether is fine (it opts to the more restritive option rather than Unclassified behavior which may be best) but would like some second opinions on it @TejasRGitHub @dlpzx

One final suggestion is to rename topics_dropdown and confidentiality_dropdown to topics_metadata and confidentiality_metadata (or classification, classes, similar...) A name the illustrates we are not just enabling/disabling a dropdown but changing the additional info we attribute to data.all datasets

TejasRGitHub · 2024-02-15T00:04:59Z

Hi @noah-paige , Thanks for catching that. I must have missed that during testing.

For the notes, about behaviours ( when a drop down is not present / changing the behaviour from custom to standard) I agree that the behaviour is restrictive and I don't have any issue with it as such.

Regarding your suggestion, I understand why you made this suggestion, but the configs i.e. confidentiality_dropdown and topics_dropdown are truly for switching enabling/disabling those field. Adding metadata ( like custom mapping for confidentiality ) is a configuration addition. From a user perspective, when a user sets the configs in config.json they can enable and disable the configs as suggested by the config names. For extra metadata, I think the user will be informed about this in the readme / webpage to setup the AWS deployment, etc

Apart from that, your notes made me realize that the catalog search would be rendered useless if the custom mapping names have a space. For that I suggest we remove the spaced part in the name and make it whole while indexing it at dataall/modules/datasets/indexers/dataset_indexer.py ( Similar to what is been done with the tags )

I will fix the error pointed out by you and push the updated code. Before that please let me know if you agree with the above as well

noah-paige · 2024-02-15T14:14:49Z

The above sounds good to me - please go ahead making those final changes

…ialiy if OFF

TejasRGitHub · 2024-02-15T18:54:55Z

backend/dataall/modules/datasets/indexers/dataset_indexer.py

@@ -34,7 +36,7 @@ def upsert(cls, session, dataset_uri: str):
                    'source': dataset.S3BucketName,
                    'resourceKind': 'dataset',
                    'description': dataset.description,
-                    'classification': dataset.confidentiality,
+                    'classification': re.sub('[^A-Za-z0-9]+', '', dataset.confidentiality),


Adding this to update the confidentiality to remove any special characters/spaces

TejasRGitHub · 2024-02-15T19:04:08Z

Hi @noah-paige , Updated the code.

Testing

Deployed data.all with Confidentiality is disabled and Topics is enabled.

Able to create Dataset ✅
Able to edit the dataset ✅

Deployed data.all with Confidentiality is Enabled and Topics is Enabled

Able to edit the same dataset ✅
Able to import a dataset ✅

Deployed data.all with Confidentiality is Enabled and Topics is Enabled and using custom mapping which contain spaces and '-' and some special characters

Able to create Dataset . And able to see the custom confidentiality string ✅
Catalog show the custom confidentiality with spaces and special characters now showing without spaces and special characters and contains the full string ✅
Able to edit the dataset and change the mapping ✅

noah-paige

thanks for making the iterative changes @TejasRGitHub - this PR looks good to me!

TejasRGitHub · 2024-02-16T20:59:20Z

Hi @noah-paige , Thanks for approving. ~~Merging the PR (squashing and merging )~~. I will raise another PR to provide readme instrunctions on the features

TejasRGitHub added 2 commits February 9, 2024 12:09

data-dot-allGH-1032 - Code Changes for Feature Flags and custom confi…

5aafbdf

…dentiality list

data-dot-allGH-1032 - Using {} when config not present

8d59e23

TejasRGitHub changed the title ~~[Gh-1032] feature flags for topics and confidentiality and custom confidentiality list~~ [Gh-1032] Feature flags for topics and confidentiality and custom confidentiality list Feb 9, 2024

Python lint correction

1b2b43f

TejasRGitHub commented Feb 9, 2024

View reviewed changes

backend/dataall/modules/datasets/api/dataset/resolvers.py Outdated Show resolved Hide resolved

TejasRGitHub added 2 commits February 12, 2024 11:04

Refactoring Unit test for more simplicity and removing code smell

0272a30

Fixing Lint

b423c66

noah-paige reviewed Feb 12, 2024

View reviewed changes

backend/dataall/modules/datasets/api/dataset/resolvers.py Outdated Show resolved Hide resolved

noah-paige reviewed Feb 12, 2024

View reviewed changes

backend/dataall/modules/datasets/api/dataset/resolvers.py Outdated Show resolved Hide resolved

noah-paige reviewed Feb 12, 2024

View reviewed changes

frontend/src/modules/Datasets/components/DatasetGovernance.js Show resolved Hide resolved

trajopadhye and others added 3 commits February 13, 2024 09:40

Merge branch 'main' into data-dot-allgh-1032-feature-flags

d531126

Fixes for tests

3155575

Correcting DatasetGovernance.js with auto approval

6df3c7e

TejasRGitHub requested a review from noah-paige February 13, 2024 17:34

TejasRGitHub and others added 4 commits February 15, 2024 11:30

Resolving comments raised on PR - Fixing indexing, Bug when confident…

6c8bf32

…ialiy if OFF

Merge branch 'main' into data-dot-allgh-1032-feature-flags

a5f16f3

Removing unneccessary comments

3fffbea

Using Regex for removing any special characters

87721d6

TejasRGitHub commented Feb 15, 2024

View reviewed changes

noah-paige approved these changes Feb 16, 2024

View reviewed changes

noah-paige merged commit b6449d1 into data-dot-all:main Feb 19, 2024
8 checks passed

noah-paige added this to the v2.3.0 milestone Feb 19, 2024

noah-paige linked an issue Feb 19, 2024 that may be closed by this pull request

Feature flags for topics and confidentiality and custom text list for confidentiality #1032

Closed

noah-paige mentioned this pull request Feb 19, 2024

Feature flags for topics and confidentiality and custom text list for confidentiality #1032

Closed

TejasRGitHub mentioned this pull request Feb 26, 2024

Automatic migration for custom features like custom confidentiality mapping , etc #1077

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gh-1032] Feature flags for topics and confidentiality and custom confidentiality list #1049

[Gh-1032] Feature flags for topics and confidentiality and custom confidentiality list #1049

TejasRGitHub commented Feb 9, 2024 •

edited

Loading

noah-paige commented Feb 12, 2024 •

edited

Loading

TejasRGitHub commented Feb 13, 2024 •

edited

Loading

noah-paige commented Feb 14, 2024 •

edited

Loading

noah-paige commented Feb 14, 2024 •

edited

Loading

TejasRGitHub commented Feb 15, 2024

noah-paige commented Feb 15, 2024

TejasRGitHub Feb 15, 2024

TejasRGitHub commented Feb 15, 2024 •

edited

Loading

noah-paige left a comment

TejasRGitHub commented Feb 16, 2024 •

edited

Loading

[Gh-1032] Feature flags for topics and confidentiality and custom confidentiality list #1049

[Gh-1032] Feature flags for topics and confidentiality and custom confidentiality list #1049

Conversation

TejasRGitHub commented Feb 9, 2024 • edited Loading

Feature or Bugfix

Detail

Testing

Things to do

How to use this feature.

Relates

Security

noah-paige commented Feb 12, 2024 • edited Loading

TejasRGitHub commented Feb 13, 2024 • edited Loading

Testing

noah-paige commented Feb 14, 2024 • edited Loading

noah-paige commented Feb 14, 2024 • edited Loading

TejasRGitHub commented Feb 15, 2024

noah-paige commented Feb 15, 2024

TejasRGitHub Feb 15, 2024

Choose a reason for hiding this comment

TejasRGitHub commented Feb 15, 2024 • edited Loading

Testing

Deployed data.all with Confidentiality is disabled and Topics is enabled.

Deployed data.all with Confidentiality is Enabled and Topics is Enabled

Deployed data.all with Confidentiality is Enabled and Topics is Enabled and using custom mapping which contain spaces and '-' and some special characters

noah-paige left a comment

Choose a reason for hiding this comment

TejasRGitHub commented Feb 16, 2024 • edited Loading

TejasRGitHub commented Feb 9, 2024 •

edited

Loading

noah-paige commented Feb 12, 2024 •

edited

Loading

TejasRGitHub commented Feb 13, 2024 •

edited

Loading

noah-paige commented Feb 14, 2024 •

edited

Loading

noah-paige commented Feb 14, 2024 •

edited

Loading

TejasRGitHub commented Feb 15, 2024 •

edited

Loading

TejasRGitHub commented Feb 16, 2024 •

edited

Loading