Redshift Data Sharing #955

anmolsgandhi · 2024-01-09T18:29:39Z

Description:

Enable seamless data integration with Redshift as a new data source in ‘data.all’. This feature enhances collaboration by allowing users to easily publish, discover and share Redshift data within the data.all platform. Users can securely configure Redshift instance, streamlining the process of making Redshift datasets accessible.

Details:

Adding Redshift Instance and Publishing Tables

Users initiate the process by selecting “Create Dataset” and choosing Redshift from the dropdown menu.
The interface guides users through a secure credential input, ensuring a streamlined and secure configuration process.
Once configured, the dataset owners can select specific tables to publish to the ‘data.all’ catalog, ensuring a controlled inclusion of Redshift data.

Tables Available for Discovery

Cataloged Redshift tables automatically become part of the ‘data.all’ catalog, visible to users exploring datasets within the platform.
The catalog provides detailed metadata for each table, facilitating a comprehensive understanding of available data.
Users can navigate the ‘data.all’ UI to effortlessly discover and explore Redshift tables
Dataset owners can edit metadata for each table such as description, tags.

Self-service Share Process for Redshift Data Sharing

Consumers interested in specific Redshift tables initiate the share process by selecting the desired dataset.
Owners of the shared Redshift tables within data.all Datasets receive access requests, with an easy-to-use interface for managing permissions and approvals.
Upon approval, the shared Redshift data becomes dynamically accessible to consumers, maintaining a consistent and user-friendly experience.

Benefits:

Additional Data Source Integration: The added capability of Redshift as a new data source enhances flexibility, enabling users to integrate diverse data sources beyond S3, expanding the platform’s utility.
User-Friendly Configuration: A guided process ensures a connection of Redshift instances with secure credentials.
Efficient Discovery: Automated cataloging promotes effortless exploration of Redshift tables within ‘data.all’ catalog.
Streamlined Sharing Workflow: The self-service share process maintains simplicity and consistency across different types of data, allowing users to request and access Redshift data seamlessly as they do with S3 data.

@dlpzx

dlpzx · 2024-03-13T11:02:05Z

Design

This design is up to date with the latest implementation changes

Assumptions

Redshift clusters/namespaces are created and maintained by DevOps teams outside of data.all
Database admin teams manage users in their clusters/namespaces outside of data.all
Data producers and consumers can access their clusters/namespaces with the access provided by the database admin teams.
Data producers create tables in Redshift outside of data.all
Data.all requires a Redshift user of the type IAM:user or database user with credentials stored in AWS Secrets Manager for the data producers that are going to publish data ( In the diagram this is the basis for Authorization 1). Data.all needs to have permissions to use the IAM role or to access the Secret. This user needs to have permissions to create datashares.
Data.all requires a Redshift user of the type IAM:user for the data.all PivotRole in all accounts with a Redshift cluster. This user needs to have permissions to create datashares. In the diagram this is the basis for Authorization 2 and 3
data.all Share request principal will be REDSHIFT ROLE
Data Consumers register their Redshift roles with Redshift Consumption Roles. Database admins can control the roles created in Redshift which roles are attached to which user/group. To isolate data.all access grants from other access grants, we recommend database admins to create dedicated Redshift roles. For example, for projectXYZ a group of Redshift users needs permissions to data in another cluster. The database admin should create a Redshift role DAProjectXYZ and attach it to the roles/users/groups in RS. Data consumers should register the role in data.all and request access to the data they need.

HLD and User experience

Initial design with Lake Formation ---> NOT USED ANYMORE

During implementation we realized that datashares with LakeFormation do not bring much value when not being used to actually share further in Lake Formation (it just plots metadata in Glue). It might be useful in the future if we integrate with IAM Identity Center as the integration Redshift-LF works way better with IAM IC. But for the moment we won't be using it. If in the future we want to revert the changes the code is in commits: bf476fc, ef9662b and 17970d7.

Initial data sharing design with data.all Redshift consumption roles ---> NOT USED ANYMORE

There are 3 reasons why this design has been further improved:

In this design we assume that the users are taking the necessary outside actions on the pivot role so that it can process data shares in the source and in the target clusters. It connects to the cluster in a different way from how we were doing it for dataset publishing, which adds more code, more IAM policies, more features that we use in Redshift (without needing them).
In addition we are creating a data.all abstraction, redshift consumer roles, which is again another layer of complexity for the users to interact with.
Finally, data.all does not ensure that the user has taken the necessary preliminary actions before opening a share request. There is no visibility on whether the namespace used for the share request can be accessed by data.all, which can lead to errors in the sharing (but the actual error is in the onboarding of the redshift cluster). We should separate onboarding cluster steps from sharing steps as much as possible.

Current design

We add more guardrails on the onboarding of clusters by setting as requirement that there must be a pivot role connection created for each cluster used. This is a pre-req for creating other types of connections and for opening share requests.

Following the numeration above:

Outside of data.all, Database Admin Teams manage Redshift cluster users.
1. For data producers - They create a regular Redshift user and optionally (mandatory for Redshift serverless) store the credentials in Secrets Manager, ⚠️ [NOT IMPLEMENTED YET] or they create a a Redshift user of the type (IAM:user) that allows IAM federation
2. For data consumers - They create Redshift roles and attach them to users
Outside of data.all, Database Admin Teams in the data producer and in the data consumer clusters create a user in Redshift for the data.all IAM pivot role and optionally (mandatory for Redshift serverless) store the credentials in Secrets Manager, ⚠️ [NOT IMPLEMENTED YET] or they create a a Redshift user of the type (IAM:user) that allows IAM federation
Outside of data.all, Data producers work in Redshift and create tables
In data.all UI, Database producers create a data.all pivot role Connection. Without a pivot role connection that is valid, no other connection can be created!
In data.all UI, Data producers create a data.all Connection.
1. When creating a connection, users need to introduce:
  1. The Redshift user (IAM:user) IAM role or SecretArn created by their db admins
  2. Environment where the cluster is
  3. Namespace/cluster id
  4. Database
  5. A data.all Team that owns the connection. Only members of the Team can use it. (similar to consumption IAM roles)
2. Connections are going to be used to AUTHORIZE the import of data and maybe in next steps to open Redshift QueryEditorV2. There are different types of Redshift users:
  1. ⚠️ [NOT IMPLEMENTED YET] Federated users (the IAM role is stored). The role created has permissions to be used as federated user in Redshift by data.all.
  2. [IMPLEMENTED] AWS Secrets Manager (the secretArn is stored). Customers will need to tag the secret in order for data.all to be able to access it.
  3. NEXT STEPS - IAM Identity Center - it cannot be used at the moment for the publication of data.
  4. NEVER - username and password. From data.all we want to avoid securing passwords in transit.
In data.all UI, Data producers import a Redshift dataset in data.all specifying:
1. Select the Environment and the Connection to use for import
2. The Team that owns the Connection also will own the Dataset
3. Redshift schema and selection of tables to be imported from that schema
Under-the-hood, when a dataset is imported, ~~data.all creates a datashare between Redshift and the Glue Catalog using the authorization of the Connection~~ the metadata for the schema and tables imported is stored. Dataset and tables are indexed in the data catalog.
In data.all UI, Data producers can fetch the schema of the imported tables in the dataset/Data tab~~click on “Sync tables” in the imported dataset as we do with S3/Glue datasets. Tables appear in data.all~~. Users can ListDatasets, which lists S3 and Redshift datasets.
Under-the-hood, when the data producer opens the schema of a table, data.all uses redshift data API to read the table details from Redshift. ~~clicks sync-tables, data.all reads from the glue database created as part of the datashare from Redshift to Glue Catalog~~
In data.all UI, data consumers can discover RS tables and datasets in Catalog
In data.all UI, Database consumers create a data.all pivot role Connection for the target cluster. Without a pivot role connection that is valid, the share request cannot be created.
In data.all UI, data consumers can create a share request by selecting the dataset or tables. They submit the request
1. In the share request they select the target environment and target group
2. A dropdown lists the namespaces with pivot role connections in the environment.
3. Data.all checks that the target group has permissions to use Redshift in the environment
4. Optionally, Users manually input the Redshift role that is the recipient of the request. If the Redshift role is specified the share request is granted to the role, if not, to the namespace.
In data.all UI, data producers approve the request
Under-the-hood, data.all creates a datashare in the data producers cluster/namespace
Under-the-hood, data.all associates the datashare to the data consumers cluster and grants permissions to the redshift role (if specified)
Data consumers will access the data through:

BI tools: Quicksight, Tableau, Power BI, Qlik (JDBC/ODBC connections)
SQL clients: DB Beaver, SQL Workbench (JDBC/ODBC connections)
ETL workloads in Redshift
Ad-hoc queries in Redshift Query Editor

User experience

Redshift connection

Create/Delete and list
In the creation we check that the connection is valid by listing databases in redshift and making sure the selected database is part of the cluster/workgroup.
Serverless clusters do not accept a db user for federation (see example API call). At the moment db users are disabled for serverless, in the future we can think of assuming and IAM role and then do federation.

If any parameter in the connection form is invalid it will throw an error. If the Team does not have permissions to create a connection in the environment or does not have tenant permissions for redshift it will also throw an error.

Screen.Recording.2024-07-25.at.08.59.54.mov

Redshift dataset

Import Form

Screen.Recording.2024-07-25.at.09.17.29.mov

List Datasets view
With icons for S3 and Redshift

** List Datasets in Environment**

Screen.Recording.2024-07-25.at.10.43.36.mov

Dataset view

Dataset edit form, Tables tab and schema modal
https://github.com/user-attachments/assets/89c69669-59bf-4ff0-a889-523581d87d25

Table view, columns tab and Table edit form
https://github.com/user-attachments/assets/1a8437d0-4b51-4330-833a-e3c0b495e591

Delete Table, Dataset

Screen.Recording.2024-07-25.at.11.06.02.mov

They get deleted and removed from the catalog

**Catalog indexing **

Screen.Recording.2024-07-25.at.10.06.50.mov

** Feed, Votes**

Screen.Recording.2024-07-25.at.10.15.38.mov

Glossary

Redshift permissions controls
In the admin settings and in the environment team invitation form we can define redshift permissions applied to teams

Screen.Recording.2024-07-25.at.08.57.10.mov

Permissions

IAM permissions

IAM permissions are granted solely to the pivot role. List and describe permissions are granted to all resources if needed and write operations on Redshift workgroups, namespaces and clusters are restricted to those resources that have been onboarded to data.all in the form of Connections. Every time a connection is added to the environment, the pivot role gets updated (the environment stack gets deployed)

data.all application permissions

Here we are referring to permissions guarding API calls. These are not IAM permissions but data.all specific permissions that can be of the type: tenant-level, environment-level, or group-level permissions. For more info, check the Permission model section in the docs.

To avoid complex permissions backfilling migrations or the risk of too open migrations, in this case I am considering dataset sharing and future extensions to decide which permissions to include.

Redshift Connection permissions

All API-facing methods of RedshiftConnectionService are protected by the permission decorators.

Tenant permissions
- MANAGE_REDSHIFT_CONNECTION - 👀 ⚠️ Initially this permission was not defined, assuming that connections were controlled as part of MANAGE_REDSHIFT_DATASETS. However, the actions on redshift connections are subtle to be specially restricted by data.all admin, so it was added back. This permission is applied to create/delete connections. Users without this permission can still be able to import redshift datasets using connections (so they will perform get/list operations). Second warning ⚠️ At the moment this permission won't do anything as a connection has an admin group that will create the connection and then use it, but if in the future we want to share a connection then this will be useful.
Environment permissions - granted when inviting a team to an environment
- CREATE_REDSHIFT_CONNECTION - to limit which groups in an environment are allowed to create connections in the environment. Applied to create_redshift_connection
- LIST_ENVIRONMENT_REDSHIFT_CONNECTIONS - to prevent that users outside of an Environment fetch another environment connections.
Group permissions
- GET_REDSHIFT_CONNECTION - to prevent that unauthorized users (not belonging to the connection owner team) get the details of the connection. Applied to multiple operations that get info from the connection and granted to dataset admin team. 👀 In the future extensible to non-admin groups that could use the connection without being the admins of it.
- DELETE_REDSHIFT_CONNECTION - to prevent that unauthorized users delete a connection. Applied to delete_redshift_connection and granted ONLY to connection admin team.

Connections are not editable at the moment, so there are no permissions to UPDATE_CONNECTIONS.

Redshift Dataset permissions

Tenant permissions
- MANAGE_REDSHIFT_DATASETS to limit at the application level which teams can work with Redshift datasets. Applied to all methods of RedshiftDatasetService. If the tenant says no, then it is a no.
Environment permissions - granted when inviting a team to an environment
- IMPORT_REDSHIFT_DATASET to limit which groups in an environment are allowed to import a redshift dataset in the environment. Applied to import_redshift_dataset
Group permissions
- UPDATE_REDSHIFT_DATASET and DELETE_REDSHIFT_DATASET - to prevent that unauthorized users update/delete a dataset. Applied to update and delete_redshift_dataset and granted ONLY to dataset admin team.
- ADD_TABLES_REDSHIFT_DATASET - to limit the users that can add tables to a dataset. ⚠️ it could be considered as part of update_dataset, but better to be specific as each is a different action in nature.
- GET_REDSHIFT_DATASET - limits get dataset details. Applied to any method that fetches data for the Dataset
- GET_REDSHIFT_DATASET_TABLE - limits get table details. Applied to any method that fetches data for the table. Needed when we share redshift tables
- DELETE_REDSHIFT_DATASET_TABLE- to prevent that unauthorized users delete a table Applied to delete_redshift_table and granted ONLY to dataset admin team (the ones that added the table)
- UPDATE_REDSHIFT_DATASET_TABLE- to prevent that unauthorized users delete a table Applied to update_redshift_table and granted ONLY to dataset admin team (the ones that added the table)

Sharing with Redshift

We will share Redshift tables. We could have decided to implement full dataset sharing, but sharing with more granularity is more aligned with least-privilege principles.

Datashares only work for Encrypted clusters. Therefore we should add guardrails preventing shares for non-encrypted clusters. Or directly disable the onboarding of clusters that are not encrypted. In the connection we should include the encryption type of the cluster.

Alternative 1: datashare per share request

When a share request is approved:
1) Create datashare (in source account)
2) Add schema to the datashare (in source account)
3) Add share requested tables to the datashare (in source account)
4) Grant access to the consumer cluster to the datashare (in source account)
5) Create local database from datashare (in target account) - WITH PERMISSIONS OPTIONAL
6) Create external schema in local database (in target account)
7) Grant usage access to the redshift role to the local database and schema (in target account)
When revoking tables:
1) Remove table from datashare
2) if no more tables in share request -> clean-up: delete external schema, local db, revoke access to datashare (if needed) and delete datashare

Alternative 2: datashare per dataset

When a share request is approved:
1) Create datashare (in source account) if it does not exist already
2) Add schema to the datashare (in source account) if not done already
3) Add tables to the datashare (in source account) if not done already added
4) Grant access to the consumer cluster to the datashare (in source account) if not done already
5) Create local database from datashare (in target account) if not done already - WITH PERMISSIONS is needed
6) Create external schema in local database (in target account) if not done already
7) Grant granular usage access to the redshift role to the local database schema and share requested tables (in target account) ALWAYS

When revoking tables:
1) Revoke permissions (revert step 7)
2) if table not shared in any share request - clean-up table: remove from datashare
3) if no more tables in datashare - clean-up datashare: delete external schema, local db, revoke access to datashare (if needed) and delete datashare

Alternative 3: datashare per dataset-requester namespace

Same steps as alternative 2 but in this case we create a different datashare for each target namespace.

Comparison

Simplicity of implementation: they are pretty similar. Alternative 1 has more steps but each share is isolated. Alternative 2 is very fast for additional shares but has a more complex revoke. Alternative 3 is slightly more difficult than 2.
User experience: this is the main difference ❗ End users will query from the external schema (SELECT * FROM "dev"."serv_db_public"."customer";, having many external schemas involves having complex names with Ids which might be not straightforward to use. Plus, for the database admins it can be also confusing. So alternatives 2 and 3 are definitely more user friendly.
Security/Data Governance: with the WITH PERMISSIONS clause we can restrict access in the consumer side, so there should be no downside of sharing the same datashare across multiple end-consumers (Redshift roles) - I verified that we can grant permissions to a single table in the datashare and that the end-user does not have permissions to other tables in the datashare. They can list and describe them but cannot select them. Between alternative 2 and alternative 3, the later offers more security. At the end the db admins of multiple target namespaces will have access to all the items of the datashare which might include permissions to tables that are not granted explicitly to a particular namespace.

----> decision: Alternative 3 offers the nicest, most secure experience for users

Limitations

All alternatives take into account Redshift service quotas. In principle the max number of dbs in a cluster is 60 (provisioned cluster) and 100 (serverless), but this excludes databases created from datashares, so we are safe. As for datasharing limitations we should add it in the docs: https://docs.aws.amazon.com/redshift/latest/dg/considerations.html.

As for sharing between a Redshift provisioned cluster and a serverless cluster, the documentation states that it is possible: https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-datasharing.html

More constraints: You can create only one consumer database for one datashare on a consumer cluster. You can't create multiple consumer databases referring to the same datashare. --> accounted for in #1467

dlpzx · 2024-03-18T15:35:56Z

Implementation plan

Pre-requisites

To implement the design I will open multiple pull requests (list might vary)

Done Pre-reqs: Refactor current datasets into S3 Datasets and Base datasets (Create generic dataset_base and s3_dataset modules from current datasets #1123)
Done Pre-reqs: Refactor current dataset sharing into S3 sharing and base sharing (Create generic shares_base and s3_datasets_shares modules from current dataset_sharing #1283)

Redshift datasets

Redshift data sharing

Related tasks needed for release

Data sharing guardrails: Add checks on the creation of the share object - PR Redshift data sharing - Add interface for share validations and Redshift guardrails #1484
- users can only open one request per namespace role on the same dataset,
- redshift role must exist in the namespace
List Datasets currently lists S3 datasets, Redshift datasets and S3 shared datasets, but it does not list redshift shared datasets. There are no check on_delete, and resolution of shared roles. Redshift data sharing - Added methods from sharing back to redshift datasets (check_on_delete, list_shared_datasets...) #1511

Documentation (also needed for release)

Documentation PR for redshift-datasets - include section on secrets creation (needs to be tagged with Redshift specific tag and data.all tag) Redshift data sharing - Documentation 1 - Redshift Connections and Datasets #1512
Documentation PR for redshift-datasets-sharing Redshift data sharing - Documentation 2 - Redshift Sharing #1519

Integration testing -----> tracked in #1510

Wait for #1409

redshift- datasets
redshift-datasets-shares

Redshift next steps ---> tracked in #1509

Add Connections of IAM Federation type - next steps!
Use getEnums API call to return clusterTypes with utils implemented in Feat: API call to query Enum values #1435
Extract more common dataset_base code from redshift datasets and s3 datasets
- Common FE elements in import/create S3 dataset and import Redshift dataset
- Common FE elements in edit datasets
- Common resolvers (resolve_dataset_environment, resolve_dataset_owners_group, resolve_dataset_stewards_group)
- Common updateDataset API call
- Common ModifyRedshiftDatasetInput
Following the pattern set by @SofiaSazonova at Feat: API call to query Enum values #1435 I think we should start thinking how to detangle UI from the config.json. Here we could have a query that returns all the enabled modules.Originally posted by @petrkalos in Add Redshift datasets module #1424 (comment)

NOT Redshift tasks out of scope

Move glossary, feed, indexer targets to enums in their respective modules
Rename S3 permission descriptions in team invite permission toogle list to clearly specify they are S3/Glue datasets
create a uni-test directory and migrate the current tests to unit tests - check this commit. I started it but reverted the changes as it was getting too complex to be added in the initial PR
Generic search filter and input in input_types API calls
Common styled DataGrid component with cell borders for dark theme

fourtyplustwo · 2024-03-28T14:32:45Z

@dlpzx I've read through the design and watched your video as well (it was very helpful as it answered some of my questions).

Overall I don't see any big problems but I do have some concerns.

Addition of a new UI "Warehouses" to manage Redshift connections.I find this UI a bit awkward. My first instinct that this should be a TAB under an environment and not a separate UI outside an environment. Especially because you cannot have a connection that is not part of an environment. I think this would also simplify creating connections because then the environment is already pre-defined and you can also make the connection be owned by the same team that is creating the connection.

I would also want to make sure that there's a consistent user experience when registering consumer roles or redshift consumer connections. Even today I find it weird that we register consumer roles in "Teams" tab under environments. I don't think that's intuitive. Perhaps with the addition of redshift connections we can instead add a new tab on the environment "Consumer Connections" or smth similar where you can manage your consumer IAM roles and redshift consumer connections etc..

Also I don't really feel that this new type "Warehouses" is actually going to be reusable for anything else other than Redshift so I think it's misleading.

I would like to hear your arguments why you think it would be much better to put this as a new UI on the left main bar vs making it a new tab on the environment.

For sure make Redshift modular so that it can be fully disabled as for example we don't use redshift at all and don't want our users to be confused.
We need to check security. Absolutely make sure to scan all infrastructure with checkov and that the permissions are as tight as possible.
I'd really like to see part 2 of your video to understand better how Redshift consumer connections should work.

Thank you!

anushka-singh · 2024-03-28T15:55:37Z

I really like how descriptive the design is. Answered most of my questions too!
I have a few pending though:

Will a dataset be able to have s3, glue and redshift data? Will I be able to create such a dataset?
Will the share UI be the same as the one being used today?
Will all the other modules like QS, Sagemaker, Worksheets be available to use for Redshift too?
Why are we calling it "Warehouses"? How is it any different from a data store like Glue or S3?
Can you provide more information on how data consumers will interact with Redshift data using BI tools and SQL clients? Will consumers have to set up anything extra on their end to be able to use these tools?

dlpzx · 2024-04-08T10:04:06Z

Thanks @zsaltys and @anushka-singh for the input, you went straight to the tricky points.

@zsaltys Regarding point 1, initially I placed it inside environments, but then I questioned if we even needed to place a warehouse inside an environment - let's say you are using Snowflake and it is not linked to an AWS account. What we can do is to place it inside environments, because I agree that the user experience is nicer that way. But then if we need to link other Warehouses with non-AWS links, we can work on creating non-AWS-data.all Environments (something that opens the door to multi-cloud....). In short, happy to change it. 2 - absolutely, 3 - let's prioritize for 2.5, 4 - i have not recorded it yet, i have been focusing in Create generic dataset_base and s3_dataset modules from current datasets #1123 the last week. Please have a look
@anushka-singh thanks for the questions! I think you need to have a look at Create generic dataset_base and s3_dataset modules from current datasets #1123 for the questions 1 and 2. The idea is to have a generic Dataset model and specific Dataset classes that inherit this model. Instead of adding functionalities to the existing Dataset module, we have opted to make it extensible. For question 2 - yes, very similar, but we need to check the details
For question 3, we would need to check case-by-case what is the integration: for Quicksight, how does the data sharing work, for SageMaker, if there is any library to connect with a redshift user or with IAM:role federation then they can access the data. Worksheets depends on the Athena connectors, in this last case we would need to see if it is worthy or we can open the RS Query Editor
I called it Warehouses with the idea of making it abstract to other warehousing technologies (also outside AWS)
For 5, most probably. I will add more details

DESIGN UPDATED WITH THE FEEDBACK!

…NAL DELETE DATASETS_BASE (#1242) ### Feature or Bugfix - Refactoring ### Detail After all the previous PRs are merged, there should be no circular dependencies between `datasets` and `datasets_sharing`. We can now proceed to: - move `datasets_base` models, repositories, permissions and enums to `datasets` - adjust the `__init__` files to establish the `datasets_sharing` depends on `datasets` - adjust the Module interfaces to ensure that all necessary dataset models... are imported in the interface for sharing Next steps: - share_notifications paramter to dataset_sharing in config.json ### Relates #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…me datasets as s3_datasets) (#1250) ### Feature or Bugfix - Refactoring ### Detail - Rename `datasets` module to `s3_datasets` module This PR is the first step to extract a generic datasets_base module that implements the undifferentiated concepts of Dataset in data.all. s3_datasets will use this base module to implement the specific implementation for S3 datatasets. ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…te datasets_base and move enums) (#1257) ### Feature or Bugfix ⚠️ This PR should be merged after #1250. - Refactoring ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. This PR: - Creates the skeleton of the `datasets_base` module consisting of 3 packages: `db`, `api`, `services`. And adds the `__init__` file. - Adds the dependency of `s3_datasets` to `datasets_base` in the `__init__` file of the `s3_datasets` module - Moves datasets_enums to datasets_base ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…te DatasetBase db model and S3Dataset model) (#1258) ### Feature or Bugfix ⚠️ This PR should be merged after #1257. - Feature - Refactoring ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. **This PR does**: - Adds a generic `DatasetBase` model in datasets_base.db that is used in s3_datasets.db to build the `S3Dataset` model using joined table inheritance in [sqlalchemy](https://docs.sqlalchemy.org/en/20/orm/inheritance.html) - Rename all usages of Dataset to S3Dataset (in the future some will be returned to DatasetBase, but for the moment we will keep them as S3Dataset) - Add migration script that backfills `datasets` table and renames `s3_datasets` ---> ⚠️ In the process of migrating we are doing some "scary" operations on the dataset table, if for any reason the migration encounters any issue it could result in catastrophic loss of information --> for this reason this [PR](#1267) implements RDS snapshots on migrations. **This PR does not**: - Feed registration stays as: `FeedRegistry.register(FeedDefinition('Dataset', S3Dataset))` using `Dataset` with the `S3Dataset` resource type. It is out of the scope of this PR to migrate the Feed definition. - Exactly the same for the GlossaryRegistry registration. We keep `object_type='Dataset'` to avoid backwards compatibility issues. - It does not change the resourceType for permissions. We keep using a generic `Dataset` as target for S3 permissions. If we are to split permissions into DatasetBase permissions and S3Dataset permissions we would do it on a different PR #### Remarks Inserting new items of S3Dataset does not require any changes. SQL Alchemy joined inheritance automatically inserts data in the parent table and then another one to the child table as explained in this stackoverflow [link](https://stackoverflow.com/questions/39926937/sqlalchemy-how-to-insert-a-joined-table-inherited-class-instance-when-the-pare) (I was not able to find it in the official docs) ### Relates - #1123 - #955 - #1267 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…te DatasetBaseRepository and move DatasetLock) (#1276) ### Feature or Bugfix ⚠️ merge after #1258 - Refactoring ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. In this small PR: - we move the generic DatasetLock model to datasets_base - move the DatasetLock db operations to databasets_base DatasetBaseRepository - move activity to DatasetBaseRepository ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…e DatasetServiceInterface to datasets_base, add property, create first list API for datasets_base) (#1281) ### Feature or Bugfix - Feature - Refactoring ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. In this PR we: - Move DatasetServiceInterface to datasets_base. This interface is used by datasets_sharing to "inject" logic in s3_datasets - add property dataset_type to the DatasetServiceInterface interface to distinguish which type of dataset this interface applies to. - create first list API for datasets_base. 👀 This is the most important part. When having multiple types of datasets users will still list all datasets together in several places in the UI (e.g. in listDatasets in DatasetList view, in listDatasetsEnvironment in Environment view) This API calls are not specific to s3_datasets, but generic to any type of dataset. Thus, they should be part of datasets_base. This PR introduces the datasets_list_service, datasetListRepository and includes only one example of API that moves to dataset_base. In next PRs we will move the rest of APIs ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…ve list queries to dataset_base or rename them) (#1282) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. In this PR we: - Restructure listDatasetsOwnedByEnvGroup as listS3DatasetsOwnedByEnvGroup and move it into Worksheets in FE: the reason why it is moved to Worksheets is that it is the only place where it is used in the FE. One could argue that in the BE listS3DatasetsOwnedByEnvGroup is part of the S3_Dataset module. The way I see it, FE and BE are independent and their modularization strategies fit the type of programming, what makes sense in FE might not make it in BE. In BE queries belong to the module whose services/models they are performing actions on, in this case s3_datasets. In FE queries belong to the module where they are used and if a query is used by more than one module then it can be placed in the generic `services` directory. What is important is that we define the dependencies. In this case it is important to make Worksheets dependent of S3_Datasets (as we do in the index in `frontend/src/modules/Worksheets/index.js` and in `backend/dataall/modules/worksheets/__init__.py` - Move listDatasetsCreatedInEnvironment to datasets_base ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…art 1 (renaming, enums and permissions) (#1284) ### Feature or Bugfix - Feature - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. In this PR: - Rename `dataset_sharing` as `s3_dataset_shares` - Create `shares_base` and introduce dependency (`s3_dataset_shares` depends on `shares_base`) - Move generic enums to shares_base - Move generic permissions to shares_base ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

### Feature or Bugfix - Feature ### Detail Complete design in #955. This particular PR is focused on small FE enhancements to adapt the share views to Redshift shares: Add RedshiftTable as type to plot in shareView -> list Items, edit (add items), verify items ![Screenshot 2024-08-12 at 13 29 18](https://github.com/user-attachments/assets/0c48ca8f-5ce4-41c5-aca9-62928c4345d0) Solve issue with redirect in the ShareView header (it redirected to s3-datasets/dataset/uri) Add principal resolver that resolves as principal the Redshift role (also removed unused fields for principal in backend) ![Screenshot 2024-08-12 at 13 31 07](https://github.com/user-attachments/assets/60be4e6d-fb0c-4a23-9e04-3775f9d0d4f8) Replace IAM role references with a generic role and added icons ![Screenshot 2024-08-12 at 13 31 51](https://github.com/user-attachments/assets/1798a902-3398-4cbc-8aef-96797298c91a) Finally, added shares tab in the Redshift Dataset View: ![image](https://github.com/user-attachments/assets/e321304c-8dfa-460f-bca0-ef24f4fcb594) ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…res (#1467) ### Feature or Bugfix - Feature ### Detail Complete design in #955. This particular PR is the CORE of the redshift dataset sharing implementation. - Implement sharing logic in ECS task for approve, revoke and verify. ❗check the "Sharing with Redshift" section of the design with the key decisions on the sharing workflow - Add the necessary redshift data API calls in the redshift_data client - Move share alarm utils to shares_base so that they can be re-used in redshift sharing. It would be good to rename the file but it can wait. - Includes tests for the processor functions: approve, revoke, verify In contrast to the design in the Glue or S3 sharing mechanisms, in this case I decided to keep it simple and use the AWS client directly from the processor without a manager. ❗ I did not find a way to check permissions granted to redshift roles in Redshift. For this reason in the verification task we are not checking the last 2 steps of the share. In Redshift it is possible to check user permissions to tables (with [has_table_permissions](https://docs.aws.amazon.com/redshift/latest/dg/r_HAS_TABLE_PRIVILEGE.html)) and role permissions to datashares, databases and schemas woth some of the [info tables and views](https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_system-tables.html); but when it comes to tables there is not a table to look up or a system function. For the moment I have not included this step, but I'll be meeting more Redshift experts for guidance. The "good" thing is that it is the last step for a share to succeed, so it will be a matter of users trying and getting an "Access denied for insufficient permissions", which can be troubleshooted There are still a number of issues to be fixed on subsequent PRs: - add guardrails to share creation - polish FE (e.g. principal id, resource type) - avoid IAM checks and dataset and IAM locks for Redshift ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

commit 22a6f6ef Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:28:07 GMT-0400 (Eastern Daylight Time) Add integ tests commit 4fb7d653 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:26:36 GMT-0400 (Eastern Daylight Time) Merge env test changes commit 4cf42e8 Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:19:34 GMT-0400 (Eastern Daylight Time) improve docs commit 65f930a Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:10:56 GMT-0400 (Eastern Daylight Time) fix failures commit 170b7ce Author: Petros Kalos <[email protected]> Date: Wed Jul 03 2024 10:52:20 GMT-0400 (Eastern Daylight Time) add group/consumption_role invite/remove tests commit ba77d69 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 06:51:47 GMT-0400 (Eastern Daylight Time) Rename alias for env_vars kms key in cognito lambdas FE and BE (#1385) ### Feature or Bugfix - Bugfix ### Detail For the case in which we deploy FE and BE in us-east-1 the new lambda env_key alias is the same one for TriggerFunctionCognitoUrlsConfig in FE and for TriggerFunctionCognitoConfig in BE, which results in a failure of the CICD in the FE stack because the alias already exists. This PR changes the name of both aliases to avoid this conflict. It also adds envname to avoid issues with other deployment environments/tooling account in the future ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e5923a9 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 04:27:11 GMT-0400 (Eastern Daylight Time) Fix lambda_env_key out of scope for vpc-facing cognito setup (#1384) ### Feature or Bugfix - Bugfix ### Detail The KMS key for the Lambda environment variables in the Cognito IdP stack was defined inside an if-clause for internet facing frontend. Outside of that if, for vpc-facing architecture the kms key does not exist and the CICD pipeline fails. This PRs move the creation of the KMS key outside of the if. ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 3ccacfc Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 13:56:58 GMT-0400 (Eastern Daylight Time) Add delete docs not found when re indexing in catalog task (#1365) ### Feature or Bugfix  - Feature ### Detail - Add logic to Catalog Indexer Task to Delete Docs No Longer in RDS - TODO: Add Ability to Re-index Catalog Items via Dataall Admin UI ### Relates - #1078 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e2817a1 Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 05:14:07 GMT-0400 (Eastern Daylight Time) Fix/glossary status (#1373) ### Feature or Bugfix  - Bugfix ### Detail - Add back `status` to Glossary GQL Object for GQL Operations (getGlossary, listGlossaries) - Fix `listOrganizationGroupPermissions` enforce non null on FE ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit c3c58bd Author: Petros Kalos <[email protected]> Date: Fri Jun 28 2024 06:55:42 GMT-0400 (Eastern Daylight Time) add environment tests (#1371) ### Feature or Bugfix Feature ### Detail * add list_environment tests * add test for updating an environment (via update_stack) * generalise the polling functions for stacks ### Relates #1220 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e913d48 Author: dlpzx <[email protected]> Date: Fri Jun 28 2024 04:15:49 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in miscellaneous dropdowns (#1367) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. In this case the views required custom dropdowns. ❗ I used `noOptionsText` whenever it was necessary instead of checking groupOptions lenght >0 - [x] DatasetEditForm.js -> ❗ I kept the stewards field as `freesolo` - what that means is that users CAN specify options that are not on the list. I would like the reviewer to confirm this is what we want. At the end stewardship is a delegation of permissions, it makes sense that delegation happens to other teams. Also changed DatasetCreateForm - [X] RequestDashboardAccessModal.js - already implemented, minor changes - [X] EnvironmentTeamInviteForm.js - already implemented, minor changes. -> Kept `freesolo` because invited teams might not be the user teams. Same reason why there is no check for groupOptions == 0, if there are no options there is still the free text option. - [X] EnvironmentRoleAddForm.js - [X] NetworkCreateModal.js ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit ee71d7b Author: Tejas Rajopadhye <[email protected]> Date: Thu Jun 27 2024 14:08:27 GMT-0400 (Eastern Daylight Time) [Gh 1301] Enhancement Feature - Bulk share reapply on dataset (#1363) ### Feature or Bugfix - Feature ### Detail - Adds feature to reapply shares in bulk for a dataset. - Also contains bugfix for AWS worker lambda errors ### Relates - #1301 - #1364 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? N/A - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? N/A - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? N/A - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? N/A - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: trajopadhye <[email protected]> commit 27f1ad7 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 13:18:32 GMT-0400 (Eastern Daylight Time) Convert Dataset Lock Mechanism to Generic Resource Lock (#1338) ### Feature or Bugfix  - Feature - Bugfix - Refactoring ### Detail - Convert Dataset Lock Mechanism to Generic Resource Lock - Extend locking to Share principals (i.e. EnvironmentGroup and Consumption Roles) - Making locking a generic component not tied to datasets ### Relates - #1093 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: dlpzx <[email protected]> commit e3b8658 Author: Petros Kalos <[email protected]> Date: Thu Jun 27 2024 12:50:59 GMT-0400 (Eastern Daylight Time) ignore ruff change in blame (#1372) ### Feature or Bugfix  - Feature - Bugfix - Refactoring ### Detail - <feature1 or bug1> - <feature2 or bug2> ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 2e80de4 Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 10:59:18 GMT-0400 (Eastern Daylight Time) Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) (#1359) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. This is one of the last PRs focused on renaming files and cleaning-up the s3_datasets_shares module. The first step is a consolidation of the file and classes names in the services to clearly refer to s3_shares: - `services.managed_share_policy_service.SharePolicyService` ---> `services.s3_share_managed_policy_service.S3SharePolicyService` - `services.dataset_sharing_alarm_service.DatasetSharingAlarmService` --> `services.s3_share_alarm_service.S3ShareAlarmService` - `services.managed_share_policy_service.SharePolicyService` --> `services.s3_share_managed_policy_service.S3SharePolicyService` 👀 The main refactoring happens in what is used to be `services.dataset_sharing_service`. - The part that implements the `DatasetServiceInterface` has been moved to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService` - The part used in the resolvers and by other methods has been renamed as `services.s3_share_service.py` and the methods for the folder/table permissions are also added to the S3ShareService (from share_item_service) Lastly, there is one method previously in share_item_service that has been moved to the GlueClient directly as `get_glue_database_from_catalog`. ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 1c09015 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 04:16:14 GMT-0400 (Eastern Daylight Time) fix listOrganizationGroupPermissions (#1369) ### Feature or Bugfix  - Bugfix ### Detail - Fix listOrganizationGroupPermissions ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 976ec6b Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 04:13:14 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in create pipelines (#1368) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. This PR implements it for createPipelines ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 6c909a3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 11:18:04 GMT-0400 (Eastern Daylight Time) fix migration to not rely on OrganizationService or RequestContext (#1361) ### Feature or Bugfix  - Bugfix ### Detail - Ensure migration script does not need RequestContext - otherwise fails in migration trigger lambda as context info not set / available ### Relates - #1306 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 90835fb Author: Anushka Singh <[email protected]> Date: Wed Jun 26 2024 11:17:22 GMT-0400 (Eastern Daylight Time) Issue1248: Persistent Email Reminders (#1354) ### Feature or Bugfix - Feature ### Detail - When a share request is initiated and remains pending for an extended period, dataset producers will receive automated email reminders at predefined intervals. These reminders will prompt producers to either approve or extend the share request, thereby preventing delays in accessing datasets. Attaching screenshots for emails: <img width="1336" alt="Screenshot 2024-06-20 at 5 34 31 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/d7be28c3-5c98-4146-92b1-295e136137a3"> <img width="1322" alt="Screenshot 2024-06-20 at 5 34 52 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/047556e8-59ee-4ebf-b8a7-c0a6684e2a63"> - Email will be sent every Monday at 9am UTC. Schedule can be changed in cron expression in container.py ### Relates - #1248 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Anushka Singh <[email protected]> Co-authored-by: trajopadhye <[email protected]> Co-authored-by: Mohit Arora <[email protected]> Co-authored-by: rbernota <[email protected]> Co-authored-by: Rick Bernotas <[email protected]> Co-authored-by: Raj Chopde <[email protected]> Co-authored-by: Noah Paige <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jaidisido <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: mourya-33 <[email protected]> Co-authored-by: nikpodsh <[email protected]> Co-authored-by: MK <[email protected]> Co-authored-by: Manjula <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Daniel Lorch <[email protected]> Co-authored-by: Tejas Rajopadhye <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> commit e477bdf Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 10:39:09 GMT-0400 (Eastern Daylight Time) Enforce non null on GQL query string if non null defined (#1362) ### Feature or Bugfix  - Bugfix ### Detail - Add `String!` to ensure non null input argument on FE if defined as such on backend GQL operation for `listS3DatasetsSharedWithEnvGroup` ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit d6b59b3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 08:48:52 GMT-0400 (Eastern Daylight Time) Fix Init Share Base (#1360) ### Feature or Bugfix  - Bugfix ### Detail - Need to register processors in init for s3 dataset shares API module ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit bd3698c Author: Petros Kalos <[email protected]> Date: Wed Jun 26 2024 05:19:14 GMT-0400 (Eastern Daylight Time) split cognito urls setup and cognito user creation (#1366) ### Feature or Bugfix - Bugfix ### Details For more details about the issue read #1353 In this PR we are solving the problem by splitting the configuration of Cognito in 2. * First part (cognito_users_config.py) is setting up the required groups and users and runs after UserPool deployment * Second part (cognito_urls_config.py) is setting up Cognito's callback/logout urls and runs after the CloudFront deployment We chose to split the functionality because we need to have the users/groups setup for the integration tests which are run after the backend deployment. The other althernative is to keep the config functionality as one but make the integ tests run after CloudFront stage. ### Relates - Solves #1353 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

commit 4425e756 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:57:31 GMT-0400 (Eastern Daylight Time) Fix commit 4cd2bf77 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:56:38 GMT-0400 (Eastern Daylight Time) Fix commit 22a6f6ef Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:28:07 GMT-0400 (Eastern Daylight Time) Add integ tests commit 4fb7d653 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:26:36 GMT-0400 (Eastern Daylight Time) Merge env test changes commit 4cf42e8 Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:19:34 GMT-0400 (Eastern Daylight Time) improve docs commit 65f930a Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:10:56 GMT-0400 (Eastern Daylight Time) fix failures commit 170b7ce Author: Petros Kalos <[email protected]> Date: Wed Jul 03 2024 10:52:20 GMT-0400 (Eastern Daylight Time) add group/consumption_role invite/remove tests commit ba77d69 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 06:51:47 GMT-0400 (Eastern Daylight Time) Rename alias for env_vars kms key in cognito lambdas FE and BE (#1385) ### Feature or Bugfix - Bugfix ### Detail For the case in which we deploy FE and BE in us-east-1 the new lambda env_key alias is the same one for TriggerFunctionCognitoUrlsConfig in FE and for TriggerFunctionCognitoConfig in BE, which results in a failure of the CICD in the FE stack because the alias already exists. This PR changes the name of both aliases to avoid this conflict. It also adds envname to avoid issues with other deployment environments/tooling account in the future ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e5923a9 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 04:27:11 GMT-0400 (Eastern Daylight Time) Fix lambda_env_key out of scope for vpc-facing cognito setup (#1384) ### Feature or Bugfix - Bugfix ### Detail The KMS key for the Lambda environment variables in the Cognito IdP stack was defined inside an if-clause for internet facing frontend. Outside of that if, for vpc-facing architecture the kms key does not exist and the CICD pipeline fails. This PRs move the creation of the KMS key outside of the if. ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 3ccacfc Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 13:56:58 GMT-0400 (Eastern Daylight Time) Add delete docs not found when re indexing in catalog task (#1365) ### Feature or Bugfix  - Feature ### Detail - Add logic to Catalog Indexer Task to Delete Docs No Longer in RDS - TODO: Add Ability to Re-index Catalog Items via Dataall Admin UI ### Relates - #1078 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e2817a1 Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 05:14:07 GMT-0400 (Eastern Daylight Time) Fix/glossary status (#1373) ### Feature or Bugfix  - Bugfix ### Detail - Add back `status` to Glossary GQL Object for GQL Operations (getGlossary, listGlossaries) - Fix `listOrganizationGroupPermissions` enforce non null on FE ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit c3c58bd Author: Petros Kalos <[email protected]> Date: Fri Jun 28 2024 06:55:42 GMT-0400 (Eastern Daylight Time) add environment tests (#1371) ### Feature or Bugfix Feature ### Detail * add list_environment tests * add test for updating an environment (via update_stack) * generalise the polling functions for stacks ### Relates #1220 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e913d48 Author: dlpzx <[email protected]> Date: Fri Jun 28 2024 04:15:49 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in miscellaneous dropdowns (#1367) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. In this case the views required custom dropdowns. ❗ I used `noOptionsText` whenever it was necessary instead of checking groupOptions lenght >0 - [x] DatasetEditForm.js -> ❗ I kept the stewards field as `freesolo` - what that means is that users CAN specify options that are not on the list. I would like the reviewer to confirm this is what we want. At the end stewardship is a delegation of permissions, it makes sense that delegation happens to other teams. Also changed DatasetCreateForm - [X] RequestDashboardAccessModal.js - already implemented, minor changes - [X] EnvironmentTeamInviteForm.js - already implemented, minor changes. -> Kept `freesolo` because invited teams might not be the user teams. Same reason why there is no check for groupOptions == 0, if there are no options there is still the free text option. - [X] EnvironmentRoleAddForm.js - [X] NetworkCreateModal.js ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit ee71d7b Author: Tejas Rajopadhye <[email protected]> Date: Thu Jun 27 2024 14:08:27 GMT-0400 (Eastern Daylight Time) [Gh 1301] Enhancement Feature - Bulk share reapply on dataset (#1363) ### Feature or Bugfix - Feature ### Detail - Adds feature to reapply shares in bulk for a dataset. - Also contains bugfix for AWS worker lambda errors ### Relates - #1301 - #1364 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? N/A - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? N/A - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? N/A - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? N/A - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: trajopadhye <[email protected]> commit 27f1ad7 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 13:18:32 GMT-0400 (Eastern Daylight Time) Convert Dataset Lock Mechanism to Generic Resource Lock (#1338) ### Feature or Bugfix  - Feature - Bugfix - Refactoring ### Detail - Convert Dataset Lock Mechanism to Generic Resource Lock - Extend locking to Share principals (i.e. EnvironmentGroup and Consumption Roles) - Making locking a generic component not tied to datasets ### Relates - #1093 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: dlpzx <[email protected]> commit e3b8658 Author: Petros Kalos <[email protected]> Date: Thu Jun 27 2024 12:50:59 GMT-0400 (Eastern Daylight Time) ignore ruff change in blame (#1372) ### Feature or Bugfix  - Feature - Bugfix - Refactoring ### Detail - <feature1 or bug1> - <feature2 or bug2> ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 2e80de4 Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 10:59:18 GMT-0400 (Eastern Daylight Time) Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) (#1359) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. This is one of the last PRs focused on renaming files and cleaning-up the s3_datasets_shares module. The first step is a consolidation of the file and classes names in the services to clearly refer to s3_shares: - `services.managed_share_policy_service.SharePolicyService` ---> `services.s3_share_managed_policy_service.S3SharePolicyService` - `services.dataset_sharing_alarm_service.DatasetSharingAlarmService` --> `services.s3_share_alarm_service.S3ShareAlarmService` - `services.managed_share_policy_service.SharePolicyService` --> `services.s3_share_managed_policy_service.S3SharePolicyService` 👀 The main refactoring happens in what is used to be `services.dataset_sharing_service`. - The part that implements the `DatasetServiceInterface` has been moved to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService` - The part used in the resolvers and by other methods has been renamed as `services.s3_share_service.py` and the methods for the folder/table permissions are also added to the S3ShareService (from share_item_service) Lastly, there is one method previously in share_item_service that has been moved to the GlueClient directly as `get_glue_database_from_catalog`. ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 1c09015 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 04:16:14 GMT-0400 (Eastern Daylight Time) fix listOrganizationGroupPermissions (#1369) ### Feature or Bugfix  - Bugfix ### Detail - Fix listOrganizationGroupPermissions ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 976ec6b Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 04:13:14 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in create pipelines (#1368) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. This PR implements it for createPipelines ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 6c909a3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 11:18:04 GMT-0400 (Eastern Daylight Time) fix migration to not rely on OrganizationService or RequestContext (#1361) ### Feature or Bugfix  - Bugfix ### Detail - Ensure migration script does not need RequestContext - otherwise fails in migration trigger lambda as context info not set / available ### Relates - #1306 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 90835fb Author: Anushka Singh <[email protected]> Date: Wed Jun 26 2024 11:17:22 GMT-0400 (Eastern Daylight Time) Issue1248: Persistent Email Reminders (#1354) ### Feature or Bugfix - Feature ### Detail - When a share request is initiated and remains pending for an extended period, dataset producers will receive automated email reminders at predefined intervals. These reminders will prompt producers to either approve or extend the share request, thereby preventing delays in accessing datasets. Attaching screenshots for emails: <img width="1336" alt="Screenshot 2024-06-20 at 5 34 31 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/d7be28c3-5c98-4146-92b1-295e136137a3"> <img width="1322" alt="Screenshot 2024-06-20 at 5 34 52 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/047556e8-59ee-4ebf-b8a7-c0a6684e2a63"> - Email will be sent every Monday at 9am UTC. Schedule can be changed in cron expression in container.py ### Relates - #1248 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Anushka Singh <[email protected]> Co-authored-by: trajopadhye <[email protected]> Co-authored-by: Mohit Arora <[email protected]> Co-authored-by: rbernota <[email protected]> Co-authored-by: Rick Bernotas <[email protected]> Co-authored-by: Raj Chopde <[email protected]> Co-authored-by: Noah Paige <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jaidisido <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: mourya-33 <[email protected]> Co-authored-by: nikpodsh <[email protected]> Co-authored-by: MK <[email protected]> Co-authored-by: Manjula <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Daniel Lorch <[email protected]> Co-authored-by: Tejas Rajopadhye <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> commit e477bdf Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 10:39:09 GMT-0400 (Eastern Daylight Time) Enforce non null on GQL query string if non null defined (#1362) ### Feature or Bugfix  - Bugfix ### Detail - Add `String!` to ensure non null input argument on FE if defined as such on backend GQL operation for `listS3DatasetsSharedWithEnvGroup` ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit d6b59b3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 08:48:52 GMT-0400 (Eastern Daylight Time) Fix Init Share Base (#1360) ### Feature or Bugfix  - Bugfix ### Detail - Need to register processors in init for s3 dataset shares API module ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit bd3698c Author: Petros Kalos <[email protected]> Date: Wed Jun 26 2024 05:19:14 GMT-0400 (Eastern Daylight Time) split cognito urls setup and cognito user creation (#1366) ### Feature or Bugfix - Bugfix ### Details For more details about the issue read #1353 In this PR we are solving the problem by splitting the configuration of Cognito in 2. * First part (cognito_users_config.py) is setting up the required groups and users and runs after UserPool deployment * Second part (cognito_urls_config.py) is setting up Cognito's callback/logout urls and runs after the CloudFront deployment We chose to split the functionality because we need to have the users/groups setup for the integration tests which are run after the backend deployment. The other althernative is to keep the config functionality as one but make the integ tests run after CloudFront stage. ### Relates - Solves #1353 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…ift guardrails (#1484) ### Feature or Bugfix - Feature ### Detail Complete design in #955. This particular PR is focused on adding validation checks when a share request is created - Remove IAM role checks from generic sharing_Service to specific share processors - Add interface to execute checks on approve, submit and revoke API calls - Moved S3 checks to new S3Validator - Implemented Redshift checks in RedshiftValidator - Added tests for Redshift validator ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…atasets (check_on_delete, list_shared_datasets...) (#1511) ### Feature or Bugfix - Feature ### Detail Complete design in #955. This particular PR is focused on adding missing functionalities in redshift_datasets that need to be implemented inside redshift_datasets. For example, when we delete a redshift dataset we would want to first check if there are any share requests shared for that dataset. To avoid circular dependencies it is required to use an interface in the same way it was implemented for S3. In this PR: - Add `RedshiftShareDatasetService(DatasetServiceInterface)` class and implement required abstract methods (check_on_delete, resolve_user_shared_datasets..... - Use this class in redshift_Datasets module in resolvers, on dataset deletion... - Some of the code was very similar to the db queries implemented by S3 datasets; for this reason in this PR some of the queries are moved to the generic ShareObjectRepository to be reused by both types of dataset ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

dlpzx · 2024-09-05T12:45:36Z

Closing this issue, remaining tasks will be tracked in the corresponding documentation pull requests and follow-up github issues

…tasets (#1512) ### Feature or Bugfix Documentation ### Detail Added userguide documentation for #955 - Redshift Connections - Redshift Dataset import and table management - Changes in S3 Datasets to clearly differentiate both types ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

### Feature or Bugfix - Documentation ### Detail Added userguide documentation for #955 - list down all shareable items with small definition - Add technical details for each type of shareable item (including Redshift) - Add data consumption section for Redshift ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…nections for import Redshift Dataset (#1565) ### Feature or Bugfix - Feature: enhancement ### Detail This feature is an enhancement suggested by Redshift experts on #955, which is well explained in #1562. This PR: - adds more info and tooltips that explain details about Redshift Connections on the UI - restricts the type of connection that can be used to import a dataset: ONLY DATA_USER CONNECTIONS CAN BE USED TO IMPORT DATASETS. It implements this logic both in the frontend and backend FIRST VERSION: <img width="1126" alt="image" src="https://github.com/user-attachments/assets/14bb5a85-9868-4e8d-b7aa-1c84feb2a681"> UPDATED: ![image](https://github.com/user-attachments/assets/1b199dba-d6ee-471f-9cd7-d74e70b8dd4b) ### Relates #1562 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

### Feature or Bugfix - Bugfix ### Detail We were validating if the Redshift role for a Redshift share request existed in the dataset account, while we should be validating if it exists in the target account (share.environmentUri) ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

### Feature or Bugfix - Bugfix ### Detail The share verify task for Redshift shares was returning a `list index out of range` error when verifying the health of a share given that the datashare was desauthorized in the source. Tested in AWS: ![Screenshot 2024-10-17 at 14 44 31](https://github.com/user-attachments/assets/fa008a2a-4b99-46eb-bb6d-635d518159a3) ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

### Feature or Bugfix - Bugfix ### Detail Redshift sharing is implemented for the READ-only datashares. Write datashares is in preview in multiple regions, but in those regions where it is still not in preview, using the AllowWrites parameter in the API call of authorize_data_share results in an error of the type `An error occurred (InvalidParameterValue) when calling the AuthorizeDataShare operation: DATA_SHARING_WRITES support is not yet available.` This PR removes the usage of that parameter, which was in either case already using the default value (allowWrites=False) ### Relates - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

anmolsgandhi added type: newfeature New feature request priority: high effort: large labels Jan 9, 2024

dlpzx mentioned this issue Mar 25, 2024

Create generic dataset_base and s3_dataset modules from current datasets #1123

Closed

dlpzx mentioned this issue Apr 16, 2024

[AFTER 2.4] Refactor: uncouple datasets and dataset_sharing modules #1179

Closed

15 tasks

dlpzx mentioned this issue May 2, 2024

Refactor: uncouple datasets and dataset_sharing modules - part 2-5 FINAL DELETE DATASETS_BASE #1242

Merged

dlpzx mentioned this issue May 21, 2024

Generic dataset module and specific s3_datasets module - part 6 (Frontend) #1292

Merged

dlpzx mentioned this issue May 22, 2024

Generic shares_base module and specific s3_datasets_shares module - part 2 (db objects to shares_base) #1294

Merged

dlpzx mentioned this issue Aug 12, 2024

Redshift data sharing - Polish frontend views for Redshit shares #1477

Merged

dlpzx added this to Data.all Roadmap Aug 12, 2024

dlpzx moved this to Roadmap in Data.all Roadmap Aug 12, 2024

dlpzx mentioned this issue Aug 14, 2024

Redshift data sharing - Add interface for share validations and Redshift guardrails #1484

Merged

noah-paige modified the milestones: v2.7.0 Sprint 3, v2.7.0 Sprint 4 Aug 27, 2024

dlpzx mentioned this issue Sep 5, 2024

Redshift data sharing - Documentation 2 - Redshift Sharing #1519

Merged

dlpzx closed this as completed Sep 5, 2024

dlpzx mentioned this issue Sep 20, 2024

Add Redshift connection tooltips and info + restrict to DATA_USER connections for import Redshift Dataset #1565

Merged

This was referenced Sep 26, 2024

Simplification of Redshift Connection input parameters - make database name optional #1582

Open

Add ability to define external schema names in Redshift datashare requests #1583

Open

Fix wrong environment in the verification of redshift role #1587

Merged

dlpzx mentioned this issue Oct 17, 2024

Fix error message of Redshift share verifier #1647

Merged

dlpzx mentioned this issue Oct 25, 2024

Fix: Remove optional AllowWrites - not supported in all regions #1664

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redshift Data Sharing #955

Redshift Data Sharing #955

anmolsgandhi commented Jan 9, 2024

dlpzx commented Mar 13, 2024 •

edited

Loading

dlpzx commented Mar 18, 2024 •

edited

Loading

fourtyplustwo commented Mar 28, 2024

anushka-singh commented Mar 28, 2024

dlpzx commented Apr 8, 2024 •

edited

Loading

dlpzx commented Sep 5, 2024

Redshift Data Sharing #955

Redshift Data Sharing #955

Comments

anmolsgandhi commented Jan 9, 2024

Description:

Details:

Benefits:

dlpzx commented Mar 13, 2024 • edited Loading

Design

Assumptions

HLD and User experience

Initial design with Lake Formation ---> NOT USED ANYMORE

Initial data sharing design with data.all Redshift consumption roles ---> NOT USED ANYMORE

Current design

User experience

Redshift connection

Redshift dataset

Permissions

IAM permissions

data.all application permissions

Redshift Connection permissions

Redshift Dataset permissions

Sharing with Redshift

Alternative 1: datashare per share request

Alternative 2: datashare per dataset

Alternative 3: datashare per dataset-requester namespace

Comparison

Limitations

dlpzx commented Mar 18, 2024 • edited Loading

Implementation plan

Pre-requisites

Redshift datasets

Redshift data sharing

Related tasks needed for release

Documentation (also needed for release)

Integration testing -----> tracked in #1510

Redshift next steps ---> tracked in #1509

NOT Redshift tasks out of scope

fourtyplustwo commented Mar 28, 2024

anushka-singh commented Mar 28, 2024

dlpzx commented Apr 8, 2024 • edited Loading

dlpzx commented Sep 5, 2024

dlpzx commented Mar 13, 2024 •

edited

Loading

dlpzx commented Mar 18, 2024 •

edited

Loading

dlpzx commented Apr 8, 2024 •

edited

Loading