-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redshift Data Sharing #955
Comments
DesignThis design is up to date with the latest implementation changes Assumptions
HLD and User experienceInitial design with Lake Formation ---> NOT USED ANYMOREDuring implementation we realized that datashares with LakeFormation do not bring much value when not being used to actually share further in Lake Formation (it just plots metadata in Glue). It might be useful in the future if we integrate with IAM Identity Center as the integration Redshift-LF works way better with IAM IC. But for the moment we won't be using it. If in the future we want to revert the changes the code is in commits: bf476fc, ef9662b and 17970d7. Initial data sharing design with data.all Redshift consumption roles ---> NOT USED ANYMOREThere are 3 reasons why this design has been further improved:
Current designWe add more guardrails on the onboarding of clusters by setting as requirement that there must be a pivot role connection created for each cluster used. This is a pre-req for creating other types of connections and for opening share requests. Following the numeration above:
User experienceRedshift connectionCreate/Delete and list If any parameter in the connection form is invalid it will throw an error. If the Team does not have permissions to create a connection in the environment or does not have tenant permissions for redshift it will also throw an error. Screen.Recording.2024-07-25.at.08.59.54.movRedshift datasetImport Form Screen.Recording.2024-07-25.at.09.17.29.movList Datasets view ** List Datasets in Environment** Screen.Recording.2024-07-25.at.10.43.36.movDataset edit form, Tables tab and schema modal Table view, columns tab and Table edit form Delete Table, Dataset Screen.Recording.2024-07-25.at.11.06.02.movThey get deleted and removed from the catalog **Catalog indexing ** Screen.Recording.2024-07-25.at.10.06.50.mov** Feed, Votes** Screen.Recording.2024-07-25.at.10.15.38.movGlossary Redshift permissions controls Screen.Recording.2024-07-25.at.08.57.10.movPermissionsIAM permissionsIAM permissions are granted solely to the pivot role. List and describe permissions are granted to all resources if needed and write operations on Redshift workgroups, namespaces and clusters are restricted to those resources that have been onboarded to data.all in the form of Connections. Every time a connection is added to the environment, the pivot role gets updated (the environment stack gets deployed) data.all application permissionsHere we are referring to permissions guarding API calls. These are not IAM permissions but data.all specific permissions that can be of the type: tenant-level, environment-level, or group-level permissions. For more info, check the Permission model section in the docs. To avoid complex permissions backfilling migrations or the risk of too open migrations, in this case I am considering dataset sharing and future extensions to decide which permissions to include. Redshift Connection permissionsAll API-facing methods of
Connections are not editable at the moment, so there are no permissions to UPDATE_CONNECTIONS. Redshift Dataset permissions
Sharing with RedshiftWe will share Redshift tables. We could have decided to implement full dataset sharing, but sharing with more granularity is more aligned with least-privilege principles. Datashares only work for Encrypted clusters. Therefore we should add guardrails preventing shares for non-encrypted clusters. Or directly disable the onboarding of clusters that are not encrypted. In the connection we should include the encryption type of the cluster. Alternative 1: datashare per share requestWhen a share request is approved: Alternative 2: datashare per datasetWhen a share request is approved: When revoking tables: Alternative 3: datashare per dataset-requester namespaceSame steps as alternative 2 but in this case we create a different datashare for each target namespace. Comparison
----> decision: Alternative 3 offers the nicest, most secure experience for users LimitationsAll alternatives take into account Redshift service quotas. In principle the max number of dbs in a cluster is 60 (provisioned cluster) and 100 (serverless), but this excludes databases created from datashares, so we are safe. As for datasharing limitations we should add it in the docs: https://docs.aws.amazon.com/redshift/latest/dg/considerations.html. As for sharing between a Redshift provisioned cluster and a serverless cluster, the documentation states that it is possible: https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-datasharing.html More constraints: You can create only one consumer database for one datashare on a consumer cluster. You can't create multiple consumer databases referring to the same datashare. --> accounted for in #1467 |
Implementation planPre-requisitesTo implement the design I will open multiple pull requests (list might vary)
Redshift datasets
Redshift data sharing
Related tasks needed for release
Documentation (also needed for release)
Integration testing -----> tracked in #1510Wait for #1409
Redshift next steps ---> tracked in #1509
NOT Redshift tasks out of scope
|
@dlpzx I've read through the design and watched your video as well (it was very helpful as it answered some of my questions). Overall I don't see any big problems but I do have some concerns.
I would also want to make sure that there's a consistent user experience when registering consumer roles or redshift consumer connections. Even today I find it weird that we register consumer roles in "Teams" tab under environments. I don't think that's intuitive. Perhaps with the addition of redshift connections we can instead add a new tab on the environment "Consumer Connections" or smth similar where you can manage your consumer IAM roles and redshift consumer connections etc.. Also I don't really feel that this new type "Warehouses" is actually going to be reusable for anything else other than Redshift so I think it's misleading. I would like to hear your arguments why you think it would be much better to put this as a new UI on the left main bar vs making it a new tab on the environment.
Thank you! |
I really like how descriptive the design is. Answered most of my questions too!
|
Thanks @zsaltys and @anushka-singh for the input, you went straight to the tricky points.
DESIGN UPDATED WITH THE FEEDBACK! |
…NAL DELETE DATASETS_BASE (#1242) ### Feature or Bugfix - Refactoring ### Detail After all the previous PRs are merged, there should be no circular dependencies between `datasets` and `datasets_sharing`. We can now proceed to: - move `datasets_base` models, repositories, permissions and enums to `datasets` - adjust the `__init__` files to establish the `datasets_sharing` depends on `datasets` - adjust the Module interfaces to ensure that all necessary dataset models... are imported in the interface for sharing Next steps: - share_notifications paramter to dataset_sharing in config.json ### Relates #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…me datasets as s3_datasets) (#1250) ### Feature or Bugfix - Refactoring ### Detail - Rename `datasets` module to `s3_datasets` module This PR is the first step to extract a generic datasets_base module that implements the undifferentiated concepts of Dataset in data.all. s3_datasets will use this base module to implement the specific implementation for S3 datatasets. ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…te datasets_base and move enums) (#1257) ### Feature or Bugfix⚠️ This PR should be merged after #1250. - Refactoring ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. This PR: - Creates the skeleton of the `datasets_base` module consisting of 3 packages: `db`, `api`, `services`. And adds the `__init__` file. - Adds the dependency of `s3_datasets` to `datasets_base` in the `__init__` file of the `s3_datasets` module - Moves datasets_enums to datasets_base ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…te DatasetBase db model and S3Dataset model) (#1258) ### Feature or Bugfix⚠️ This PR should be merged after #1257. - Feature - Refactoring ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. **This PR does**: - Adds a generic `DatasetBase` model in datasets_base.db that is used in s3_datasets.db to build the `S3Dataset` model using joined table inheritance in [sqlalchemy](https://docs.sqlalchemy.org/en/20/orm/inheritance.html) - Rename all usages of Dataset to S3Dataset (in the future some will be returned to DatasetBase, but for the moment we will keep them as S3Dataset) - Add migration script that backfills `datasets` table and renames `s3_datasets` --->⚠️ In the process of migrating we are doing some "scary" operations on the dataset table, if for any reason the migration encounters any issue it could result in catastrophic loss of information --> for this reason this [PR](#1267) implements RDS snapshots on migrations. **This PR does not**: - Feed registration stays as: `FeedRegistry.register(FeedDefinition('Dataset', S3Dataset))` using `Dataset` with the `S3Dataset` resource type. It is out of the scope of this PR to migrate the Feed definition. - Exactly the same for the GlossaryRegistry registration. We keep `object_type='Dataset'` to avoid backwards compatibility issues. - It does not change the resourceType for permissions. We keep using a generic `Dataset` as target for S3 permissions. If we are to split permissions into DatasetBase permissions and S3Dataset permissions we would do it on a different PR #### Remarks Inserting new items of S3Dataset does not require any changes. SQL Alchemy joined inheritance automatically inserts data in the parent table and then another one to the child table as explained in this stackoverflow [link](https://stackoverflow.com/questions/39926937/sqlalchemy-how-to-insert-a-joined-table-inherited-class-instance-when-the-pare) (I was not able to find it in the official docs) ### Relates - #1123 - #955 - #1267 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…te DatasetBaseRepository and move DatasetLock) (#1276) ### Feature or Bugfix⚠️ merge after #1258 - Refactoring ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. In this small PR: - we move the generic DatasetLock model to datasets_base - move the DatasetLock db operations to databasets_base DatasetBaseRepository - move activity to DatasetBaseRepository ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…e DatasetServiceInterface to datasets_base, add property, create first list API for datasets_base) (#1281) ### Feature or Bugfix - Feature - Refactoring ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. In this PR we: - Move DatasetServiceInterface to datasets_base. This interface is used by datasets_sharing to "inject" logic in s3_datasets - add property dataset_type to the DatasetServiceInterface interface to distinguish which type of dataset this interface applies to. - create first list API for datasets_base. 👀 This is the most important part. When having multiple types of datasets users will still list all datasets together in several places in the UI (e.g. in listDatasets in DatasetList view, in listDatasetsEnvironment in Environment view) This API calls are not specific to s3_datasets, but generic to any type of dataset. Thus, they should be part of datasets_base. This PR introduces the datasets_list_service, datasetListRepository and includes only one example of API that moves to dataset_base. In next PRs we will move the rest of APIs ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…ve list queries to dataset_base or rename them) (#1282) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. In this PR we: - Restructure listDatasetsOwnedByEnvGroup as listS3DatasetsOwnedByEnvGroup and move it into Worksheets in FE: the reason why it is moved to Worksheets is that it is the only place where it is used in the FE. One could argue that in the BE listS3DatasetsOwnedByEnvGroup is part of the S3_Dataset module. The way I see it, FE and BE are independent and their modularization strategies fit the type of programming, what makes sense in FE might not make it in BE. In BE queries belong to the module whose services/models they are performing actions on, in this case s3_datasets. In FE queries belong to the module where they are used and if a query is used by more than one module then it can be placed in the generic `services` directory. What is important is that we define the dependencies. In this case it is important to make Worksheets dependent of S3_Datasets (as we do in the index in `frontend/src/modules/Worksheets/index.js` and in `backend/dataall/modules/worksheets/__init__.py` - Move listDatasetsCreatedInEnvironment to datasets_base ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…art 1 (renaming, enums and permissions) (#1284) ### Feature or Bugfix - Feature - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. In this PR: - Rename `dataset_sharing` as `s3_dataset_shares` - Create `shares_base` and introduce dependency (`s3_dataset_shares` depends on `shares_base`) - Move generic enums to shares_base - Move generic permissions to shares_base ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Feature ### Detail Complete design in #955. This particular PR is focused on small FE enhancements to adapt the share views to Redshift shares: Add RedshiftTable as type to plot in shareView -> list Items, edit (add items), verify items ![Screenshot 2024-08-12 at 13 29 18](https://github.com/user-attachments/assets/0c48ca8f-5ce4-41c5-aca9-62928c4345d0) Solve issue with redirect in the ShareView header (it redirected to s3-datasets/dataset/uri) Add principal resolver that resolves as principal the Redshift role (also removed unused fields for principal in backend) ![Screenshot 2024-08-12 at 13 31 07](https://github.com/user-attachments/assets/60be4e6d-fb0c-4a23-9e04-3775f9d0d4f8) Replace IAM role references with a generic role and added icons ![Screenshot 2024-08-12 at 13 31 51](https://github.com/user-attachments/assets/1798a902-3398-4cbc-8aef-96797298c91a) Finally, added shares tab in the Redshift Dataset View: ![image](https://github.com/user-attachments/assets/e321304c-8dfa-460f-bca0-ef24f4fcb594) ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…res (#1467) ### Feature or Bugfix - Feature ### Detail Complete design in #955. This particular PR is the CORE of the redshift dataset sharing implementation. - Implement sharing logic in ECS task for approve, revoke and verify. ❗check the "Sharing with Redshift" section of the design with the key decisions on the sharing workflow - Add the necessary redshift data API calls in the redshift_data client - Move share alarm utils to shares_base so that they can be re-used in redshift sharing. It would be good to rename the file but it can wait. - Includes tests for the processor functions: approve, revoke, verify In contrast to the design in the Glue or S3 sharing mechanisms, in this case I decided to keep it simple and use the AWS client directly from the processor without a manager. ❗ I did not find a way to check permissions granted to redshift roles in Redshift. For this reason in the verification task we are not checking the last 2 steps of the share. In Redshift it is possible to check user permissions to tables (with [has_table_permissions](https://docs.aws.amazon.com/redshift/latest/dg/r_HAS_TABLE_PRIVILEGE.html)) and role permissions to datashares, databases and schemas woth some of the [info tables and views](https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_system-tables.html); but when it comes to tables there is not a table to look up or a system function. For the moment I have not included this step, but I'll be meeting more Redshift experts for guidance. The "good" thing is that it is the last step for a share to succeed, so it will be a matter of users trying and getting an "Access denied for insufficient permissions", which can be troubleshooted There are still a number of issues to be fixed on subsequent PRs: - add guardrails to share creation - polish FE (e.g. principal id, resource type) - avoid IAM checks and dataset and IAM locks for Redshift ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
commit 22a6f6ef Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:28:07 GMT-0400 (Eastern Daylight Time) Add integ tests commit 4fb7d653 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:26:36 GMT-0400 (Eastern Daylight Time) Merge env test changes commit 4cf42e8 Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:19:34 GMT-0400 (Eastern Daylight Time) improve docs commit 65f930a Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:10:56 GMT-0400 (Eastern Daylight Time) fix failures commit 170b7ce Author: Petros Kalos <[email protected]> Date: Wed Jul 03 2024 10:52:20 GMT-0400 (Eastern Daylight Time) add group/consumption_role invite/remove tests commit ba77d69 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 06:51:47 GMT-0400 (Eastern Daylight Time) Rename alias for env_vars kms key in cognito lambdas FE and BE (#1385) ### Feature or Bugfix - Bugfix ### Detail For the case in which we deploy FE and BE in us-east-1 the new lambda env_key alias is the same one for TriggerFunctionCognitoUrlsConfig in FE and for TriggerFunctionCognitoConfig in BE, which results in a failure of the CICD in the FE stack because the alias already exists. This PR changes the name of both aliases to avoid this conflict. It also adds envname to avoid issues with other deployment environments/tooling account in the future ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e5923a9 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 04:27:11 GMT-0400 (Eastern Daylight Time) Fix lambda_env_key out of scope for vpc-facing cognito setup (#1384) ### Feature or Bugfix - Bugfix ### Detail The KMS key for the Lambda environment variables in the Cognito IdP stack was defined inside an if-clause for internet facing frontend. Outside of that if, for vpc-facing architecture the kms key does not exist and the CICD pipeline fails. This PRs move the creation of the KMS key outside of the if. ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 3ccacfc Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 13:56:58 GMT-0400 (Eastern Daylight Time) Add delete docs not found when re indexing in catalog task (#1365) ### Feature or Bugfix <!-- please choose --> - Feature ### Detail - Add logic to Catalog Indexer Task to Delete Docs No Longer in RDS - TODO: Add Ability to Re-index Catalog Items via Dataall Admin UI ### Relates - #1078 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e2817a1 Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 05:14:07 GMT-0400 (Eastern Daylight Time) Fix/glossary status (#1373) ### Feature or Bugfix <!-- please choose --> - Bugfix ### Detail - Add back `status` to Glossary GQL Object for GQL Operations (getGlossary, listGlossaries) - Fix `listOrganizationGroupPermissions` enforce non null on FE ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit c3c58bd Author: Petros Kalos <[email protected]> Date: Fri Jun 28 2024 06:55:42 GMT-0400 (Eastern Daylight Time) add environment tests (#1371) ### Feature or Bugfix Feature ### Detail * add list_environment tests * add test for updating an environment (via update_stack) * generalise the polling functions for stacks ### Relates #1220 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e913d48 Author: dlpzx <[email protected]> Date: Fri Jun 28 2024 04:15:49 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in miscellaneous dropdowns (#1367) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. In this case the views required custom dropdowns. ❗ I used `noOptionsText` whenever it was necessary instead of checking groupOptions lenght >0 - [x] DatasetEditForm.js -> ❗ I kept the stewards field as `freesolo` - what that means is that users CAN specify options that are not on the list. I would like the reviewer to confirm this is what we want. At the end stewardship is a delegation of permissions, it makes sense that delegation happens to other teams. Also changed DatasetCreateForm - [X] RequestDashboardAccessModal.js - already implemented, minor changes - [X] EnvironmentTeamInviteForm.js - already implemented, minor changes. -> Kept `freesolo` because invited teams might not be the user teams. Same reason why there is no check for groupOptions == 0, if there are no options there is still the free text option. - [X] EnvironmentRoleAddForm.js - [X] NetworkCreateModal.js ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit ee71d7b Author: Tejas Rajopadhye <[email protected]> Date: Thu Jun 27 2024 14:08:27 GMT-0400 (Eastern Daylight Time) [Gh 1301] Enhancement Feature - Bulk share reapply on dataset (#1363) ### Feature or Bugfix - Feature ### Detail - Adds feature to reapply shares in bulk for a dataset. - Also contains bugfix for AWS worker lambda errors ### Relates - #1301 - #1364 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? N/A - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? N/A - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? N/A - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? N/A - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: trajopadhye <[email protected]> commit 27f1ad7 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 13:18:32 GMT-0400 (Eastern Daylight Time) Convert Dataset Lock Mechanism to Generic Resource Lock (#1338) ### Feature or Bugfix <!-- please choose --> - Feature - Bugfix - Refactoring ### Detail - Convert Dataset Lock Mechanism to Generic Resource Lock - Extend locking to Share principals (i.e. EnvironmentGroup and Consumption Roles) - Making locking a generic component not tied to datasets ### Relates - #1093 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: dlpzx <[email protected]> commit e3b8658 Author: Petros Kalos <[email protected]> Date: Thu Jun 27 2024 12:50:59 GMT-0400 (Eastern Daylight Time) ignore ruff change in blame (#1372) ### Feature or Bugfix <!-- please choose --> - Feature - Bugfix - Refactoring ### Detail - <feature1 or bug1> - <feature2 or bug2> ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 2e80de4 Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 10:59:18 GMT-0400 (Eastern Daylight Time) Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) (#1359) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. This is one of the last PRs focused on renaming files and cleaning-up the s3_datasets_shares module. The first step is a consolidation of the file and classes names in the services to clearly refer to s3_shares: - `services.managed_share_policy_service.SharePolicyService` ---> `services.s3_share_managed_policy_service.S3SharePolicyService` - `services.dataset_sharing_alarm_service.DatasetSharingAlarmService` --> `services.s3_share_alarm_service.S3ShareAlarmService` - `services.managed_share_policy_service.SharePolicyService` --> `services.s3_share_managed_policy_service.S3SharePolicyService` 👀 The main refactoring happens in what is used to be `services.dataset_sharing_service`. - The part that implements the `DatasetServiceInterface` has been moved to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService` - The part used in the resolvers and by other methods has been renamed as `services.s3_share_service.py` and the methods for the folder/table permissions are also added to the S3ShareService (from share_item_service) Lastly, there is one method previously in share_item_service that has been moved to the GlueClient directly as `get_glue_database_from_catalog`. ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 1c09015 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 04:16:14 GMT-0400 (Eastern Daylight Time) fix listOrganizationGroupPermissions (#1369) ### Feature or Bugfix <!-- please choose --> - Bugfix ### Detail - Fix listOrganizationGroupPermissions ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 976ec6b Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 04:13:14 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in create pipelines (#1368) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. This PR implements it for createPipelines ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 6c909a3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 11:18:04 GMT-0400 (Eastern Daylight Time) fix migration to not rely on OrganizationService or RequestContext (#1361) ### Feature or Bugfix <!-- please choose --> - Bugfix ### Detail - Ensure migration script does not need RequestContext - otherwise fails in migration trigger lambda as context info not set / available ### Relates - #1306 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 90835fb Author: Anushka Singh <[email protected]> Date: Wed Jun 26 2024 11:17:22 GMT-0400 (Eastern Daylight Time) Issue1248: Persistent Email Reminders (#1354) ### Feature or Bugfix - Feature ### Detail - When a share request is initiated and remains pending for an extended period, dataset producers will receive automated email reminders at predefined intervals. These reminders will prompt producers to either approve or extend the share request, thereby preventing delays in accessing datasets. Attaching screenshots for emails: <img width="1336" alt="Screenshot 2024-06-20 at 5 34 31 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/d7be28c3-5c98-4146-92b1-295e136137a3"> <img width="1322" alt="Screenshot 2024-06-20 at 5 34 52 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/047556e8-59ee-4ebf-b8a7-c0a6684e2a63"> - Email will be sent every Monday at 9am UTC. Schedule can be changed in cron expression in container.py ### Relates - #1248 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Anushka Singh <[email protected]> Co-authored-by: trajopadhye <[email protected]> Co-authored-by: Mohit Arora <[email protected]> Co-authored-by: rbernota <[email protected]> Co-authored-by: Rick Bernotas <[email protected]> Co-authored-by: Raj Chopde <[email protected]> Co-authored-by: Noah Paige <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jaidisido <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: mourya-33 <[email protected]> Co-authored-by: nikpodsh <[email protected]> Co-authored-by: MK <[email protected]> Co-authored-by: Manjula <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Daniel Lorch <[email protected]> Co-authored-by: Tejas Rajopadhye <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> commit e477bdf Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 10:39:09 GMT-0400 (Eastern Daylight Time) Enforce non null on GQL query string if non null defined (#1362) ### Feature or Bugfix <!-- please choose --> - Bugfix ### Detail - Add `String!` to ensure non null input argument on FE if defined as such on backend GQL operation for `listS3DatasetsSharedWithEnvGroup` ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit d6b59b3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 08:48:52 GMT-0400 (Eastern Daylight Time) Fix Init Share Base (#1360) ### Feature or Bugfix <!-- please choose --> - Bugfix ### Detail - Need to register processors in init for s3 dataset shares API module ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit bd3698c Author: Petros Kalos <[email protected]> Date: Wed Jun 26 2024 05:19:14 GMT-0400 (Eastern Daylight Time) split cognito urls setup and cognito user creation (#1366) ### Feature or Bugfix - Bugfix ### Details For more details about the issue read #1353 In this PR we are solving the problem by splitting the configuration of Cognito in 2. * First part (cognito_users_config.py) is setting up the required groups and users and runs after UserPool deployment * Second part (cognito_urls_config.py) is setting up Cognito's callback/logout urls and runs after the CloudFront deployment We chose to split the functionality because we need to have the users/groups setup for the integration tests which are run after the backend deployment. The other althernative is to keep the config functionality as one but make the integ tests run after CloudFront stage. ### Relates - Solves #1353 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
commit 4425e756 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:57:31 GMT-0400 (Eastern Daylight Time) Fix commit 4cd2bf77 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:56:38 GMT-0400 (Eastern Daylight Time) Fix commit 22a6f6ef Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:28:07 GMT-0400 (Eastern Daylight Time) Add integ tests commit 4fb7d653 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:26:36 GMT-0400 (Eastern Daylight Time) Merge env test changes commit 4cf42e8 Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:19:34 GMT-0400 (Eastern Daylight Time) improve docs commit 65f930a Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:10:56 GMT-0400 (Eastern Daylight Time) fix failures commit 170b7ce Author: Petros Kalos <[email protected]> Date: Wed Jul 03 2024 10:52:20 GMT-0400 (Eastern Daylight Time) add group/consumption_role invite/remove tests commit ba77d69 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 06:51:47 GMT-0400 (Eastern Daylight Time) Rename alias for env_vars kms key in cognito lambdas FE and BE (#1385) ### Feature or Bugfix - Bugfix ### Detail For the case in which we deploy FE and BE in us-east-1 the new lambda env_key alias is the same one for TriggerFunctionCognitoUrlsConfig in FE and for TriggerFunctionCognitoConfig in BE, which results in a failure of the CICD in the FE stack because the alias already exists. This PR changes the name of both aliases to avoid this conflict. It also adds envname to avoid issues with other deployment environments/tooling account in the future ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e5923a9 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 04:27:11 GMT-0400 (Eastern Daylight Time) Fix lambda_env_key out of scope for vpc-facing cognito setup (#1384) ### Feature or Bugfix - Bugfix ### Detail The KMS key for the Lambda environment variables in the Cognito IdP stack was defined inside an if-clause for internet facing frontend. Outside of that if, for vpc-facing architecture the kms key does not exist and the CICD pipeline fails. This PRs move the creation of the KMS key outside of the if. ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 3ccacfc Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 13:56:58 GMT-0400 (Eastern Daylight Time) Add delete docs not found when re indexing in catalog task (#1365) ### Feature or Bugfix <!-- please choose --> - Feature ### Detail - Add logic to Catalog Indexer Task to Delete Docs No Longer in RDS - TODO: Add Ability to Re-index Catalog Items via Dataall Admin UI ### Relates - #1078 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e2817a1 Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 05:14:07 GMT-0400 (Eastern Daylight Time) Fix/glossary status (#1373) ### Feature or Bugfix <!-- please choose --> - Bugfix ### Detail - Add back `status` to Glossary GQL Object for GQL Operations (getGlossary, listGlossaries) - Fix `listOrganizationGroupPermissions` enforce non null on FE ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit c3c58bd Author: Petros Kalos <[email protected]> Date: Fri Jun 28 2024 06:55:42 GMT-0400 (Eastern Daylight Time) add environment tests (#1371) ### Feature or Bugfix Feature ### Detail * add list_environment tests * add test for updating an environment (via update_stack) * generalise the polling functions for stacks ### Relates #1220 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e913d48 Author: dlpzx <[email protected]> Date: Fri Jun 28 2024 04:15:49 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in miscellaneous dropdowns (#1367) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. In this case the views required custom dropdowns. ❗ I used `noOptionsText` whenever it was necessary instead of checking groupOptions lenght >0 - [x] DatasetEditForm.js -> ❗ I kept the stewards field as `freesolo` - what that means is that users CAN specify options that are not on the list. I would like the reviewer to confirm this is what we want. At the end stewardship is a delegation of permissions, it makes sense that delegation happens to other teams. Also changed DatasetCreateForm - [X] RequestDashboardAccessModal.js - already implemented, minor changes - [X] EnvironmentTeamInviteForm.js - already implemented, minor changes. -> Kept `freesolo` because invited teams might not be the user teams. Same reason why there is no check for groupOptions == 0, if there are no options there is still the free text option. - [X] EnvironmentRoleAddForm.js - [X] NetworkCreateModal.js ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit ee71d7b Author: Tejas Rajopadhye <[email protected]> Date: Thu Jun 27 2024 14:08:27 GMT-0400 (Eastern Daylight Time) [Gh 1301] Enhancement Feature - Bulk share reapply on dataset (#1363) ### Feature or Bugfix - Feature ### Detail - Adds feature to reapply shares in bulk for a dataset. - Also contains bugfix for AWS worker lambda errors ### Relates - #1301 - #1364 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? N/A - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? N/A - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? N/A - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? N/A - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: trajopadhye <[email protected]> commit 27f1ad7 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 13:18:32 GMT-0400 (Eastern Daylight Time) Convert Dataset Lock Mechanism to Generic Resource Lock (#1338) ### Feature or Bugfix <!-- please choose --> - Feature - Bugfix - Refactoring ### Detail - Convert Dataset Lock Mechanism to Generic Resource Lock - Extend locking to Share principals (i.e. EnvironmentGroup and Consumption Roles) - Making locking a generic component not tied to datasets ### Relates - #1093 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: dlpzx <[email protected]> commit e3b8658 Author: Petros Kalos <[email protected]> Date: Thu Jun 27 2024 12:50:59 GMT-0400 (Eastern Daylight Time) ignore ruff change in blame (#1372) ### Feature or Bugfix <!-- please choose --> - Feature - Bugfix - Refactoring ### Detail - <feature1 or bug1> - <feature2 or bug2> ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 2e80de4 Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 10:59:18 GMT-0400 (Eastern Daylight Time) Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) (#1359) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. This is one of the last PRs focused on renaming files and cleaning-up the s3_datasets_shares module. The first step is a consolidation of the file and classes names in the services to clearly refer to s3_shares: - `services.managed_share_policy_service.SharePolicyService` ---> `services.s3_share_managed_policy_service.S3SharePolicyService` - `services.dataset_sharing_alarm_service.DatasetSharingAlarmService` --> `services.s3_share_alarm_service.S3ShareAlarmService` - `services.managed_share_policy_service.SharePolicyService` --> `services.s3_share_managed_policy_service.S3SharePolicyService` 👀 The main refactoring happens in what is used to be `services.dataset_sharing_service`. - The part that implements the `DatasetServiceInterface` has been moved to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService` - The part used in the resolvers and by other methods has been renamed as `services.s3_share_service.py` and the methods for the folder/table permissions are also added to the S3ShareService (from share_item_service) Lastly, there is one method previously in share_item_service that has been moved to the GlueClient directly as `get_glue_database_from_catalog`. ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 1c09015 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 04:16:14 GMT-0400 (Eastern Daylight Time) fix listOrganizationGroupPermissions (#1369) ### Feature or Bugfix <!-- please choose --> - Bugfix ### Detail - Fix listOrganizationGroupPermissions ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 976ec6b Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 04:13:14 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in create pipelines (#1368) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. This PR implements it for createPipelines ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 6c909a3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 11:18:04 GMT-0400 (Eastern Daylight Time) fix migration to not rely on OrganizationService or RequestContext (#1361) ### Feature or Bugfix <!-- please choose --> - Bugfix ### Detail - Ensure migration script does not need RequestContext - otherwise fails in migration trigger lambda as context info not set / available ### Relates - #1306 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 90835fb Author: Anushka Singh <[email protected]> Date: Wed Jun 26 2024 11:17:22 GMT-0400 (Eastern Daylight Time) Issue1248: Persistent Email Reminders (#1354) ### Feature or Bugfix - Feature ### Detail - When a share request is initiated and remains pending for an extended period, dataset producers will receive automated email reminders at predefined intervals. These reminders will prompt producers to either approve or extend the share request, thereby preventing delays in accessing datasets. Attaching screenshots for emails: <img width="1336" alt="Screenshot 2024-06-20 at 5 34 31 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/d7be28c3-5c98-4146-92b1-295e136137a3"> <img width="1322" alt="Screenshot 2024-06-20 at 5 34 52 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/047556e8-59ee-4ebf-b8a7-c0a6684e2a63"> - Email will be sent every Monday at 9am UTC. Schedule can be changed in cron expression in container.py ### Relates - #1248 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Anushka Singh <[email protected]> Co-authored-by: trajopadhye <[email protected]> Co-authored-by: Mohit Arora <[email protected]> Co-authored-by: rbernota <[email protected]> Co-authored-by: Rick Bernotas <[email protected]> Co-authored-by: Raj Chopde <[email protected]> Co-authored-by: Noah Paige <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jaidisido <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: mourya-33 <[email protected]> Co-authored-by: nikpodsh <[email protected]> Co-authored-by: MK <[email protected]> Co-authored-by: Manjula <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Daniel Lorch <[email protected]> Co-authored-by: Tejas Rajopadhye <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> commit e477bdf Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 10:39:09 GMT-0400 (Eastern Daylight Time) Enforce non null on GQL query string if non null defined (#1362) ### Feature or Bugfix <!-- please choose --> - Bugfix ### Detail - Add `String!` to ensure non null input argument on FE if defined as such on backend GQL operation for `listS3DatasetsSharedWithEnvGroup` ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit d6b59b3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 08:48:52 GMT-0400 (Eastern Daylight Time) Fix Init Share Base (#1360) ### Feature or Bugfix <!-- please choose --> - Bugfix ### Detail - Need to register processors in init for s3 dataset shares API module ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit bd3698c Author: Petros Kalos <[email protected]> Date: Wed Jun 26 2024 05:19:14 GMT-0400 (Eastern Daylight Time) split cognito urls setup and cognito user creation (#1366) ### Feature or Bugfix - Bugfix ### Details For more details about the issue read #1353 In this PR we are solving the problem by splitting the configuration of Cognito in 2. * First part (cognito_users_config.py) is setting up the required groups and users and runs after UserPool deployment * Second part (cognito_urls_config.py) is setting up Cognito's callback/logout urls and runs after the CloudFront deployment We chose to split the functionality because we need to have the users/groups setup for the integration tests which are run after the backend deployment. The other althernative is to keep the config functionality as one but make the integ tests run after CloudFront stage. ### Relates - Solves #1353 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…ift guardrails (#1484) ### Feature or Bugfix - Feature ### Detail Complete design in #955. This particular PR is focused on adding validation checks when a share request is created - Remove IAM role checks from generic sharing_Service to specific share processors - Add interface to execute checks on approve, submit and revoke API calls - Moved S3 checks to new S3Validator - Implemented Redshift checks in RedshiftValidator - Added tests for Redshift validator ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…atasets (check_on_delete, list_shared_datasets...) (#1511) ### Feature or Bugfix - Feature ### Detail Complete design in #955. This particular PR is focused on adding missing functionalities in redshift_datasets that need to be implemented inside redshift_datasets. For example, when we delete a redshift dataset we would want to first check if there are any share requests shared for that dataset. To avoid circular dependencies it is required to use an interface in the same way it was implemented for S3. In this PR: - Add `RedshiftShareDatasetService(DatasetServiceInterface)` class and implement required abstract methods (check_on_delete, resolve_user_shared_datasets..... - Use this class in redshift_Datasets module in resolvers, on dataset deletion... - Some of the code was very similar to the db queries implemented by S3 datasets; for this reason in this PR some of the queries are moved to the generic ShareObjectRepository to be reused by both types of dataset ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Closing this issue, remaining tasks will be tracked in the corresponding documentation pull requests and follow-up github issues |
…tasets (#1512) ### Feature or Bugfix Documentation ### Detail Added userguide documentation for #955 - Redshift Connections - Redshift Dataset import and table management - Changes in S3 Datasets to clearly differentiate both types ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Documentation ### Detail Added userguide documentation for #955 - list down all shareable items with small definition - Add technical details for each type of shareable item (including Redshift) - Add data consumption section for Redshift ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
…nections for import Redshift Dataset (#1565) ### Feature or Bugfix - Feature: enhancement ### Detail This feature is an enhancement suggested by Redshift experts on #955, which is well explained in #1562. This PR: - adds more info and tooltips that explain details about Redshift Connections on the UI - restricts the type of connection that can be used to import a dataset: ONLY DATA_USER CONNECTIONS CAN BE USED TO IMPORT DATASETS. It implements this logic both in the frontend and backend FIRST VERSION: <img width="1126" alt="image" src="https://github.com/user-attachments/assets/14bb5a85-9868-4e8d-b7aa-1c84feb2a681"> UPDATED: ![image](https://github.com/user-attachments/assets/1b199dba-d6ee-471f-9cd7-d74e70b8dd4b) ### Relates #1562 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Bugfix ### Detail We were validating if the Redshift role for a Redshift share request existed in the dataset account, while we should be validating if it exists in the target account (share.environmentUri) ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Bugfix ### Detail The share verify task for Redshift shares was returning a `list index out of range` error when verifying the health of a share given that the datashare was desauthorized in the source. Tested in AWS: ![Screenshot 2024-10-17 at 14 44 31](https://github.com/user-attachments/assets/fa008a2a-4b99-46eb-bb6d-635d518159a3) ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Bugfix ### Detail Redshift sharing is implemented for the READ-only datashares. Write datashares is in preview in multiple regions, but in those regions where it is still not in preview, using the AllowWrites parameter in the API call of authorize_data_share results in an error of the type `An error occurred (InvalidParameterValue) when calling the AuthorizeDataShare operation: DATA_SHARING_WRITES support is not yet available.` This PR removes the usage of that parameter, which was in either case already using the default value (allowWrites=False) ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Description:
Enable seamless data integration with Redshift as a new data source in ‘data.all’. This feature enhances collaboration by allowing users to easily publish, discover and share Redshift data within the data.all platform. Users can securely configure Redshift instance, streamlining the process of making Redshift datasets accessible.
Details:
Adding Redshift Instance and Publishing Tables
Tables Available for Discovery
Self-service Share Process for Redshift Data Sharing
Benefits:
@dlpzx
The text was updated successfully, but these errors were encountered: