Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift Data Sharing #955

Closed
anmolsgandhi opened this issue Jan 9, 2024 · 6 comments
Closed

Redshift Data Sharing #955

anmolsgandhi opened this issue Jan 9, 2024 · 6 comments

Comments

@anmolsgandhi
Copy link
Contributor

Description:

Enable seamless data integration with Redshift as a new data source in ‘data.all’. This feature enhances collaboration by allowing users to easily publish, discover and share Redshift data within the data.all platform. Users can securely configure Redshift instance, streamlining the process of making Redshift datasets accessible.

Details:

Adding Redshift Instance and Publishing Tables

  • Users initiate the process by selecting “Create Dataset” and choosing Redshift from the dropdown menu.
  • The interface guides users through a secure credential input, ensuring a streamlined and secure configuration process.
  • Once configured, the dataset owners can select specific tables to publish to the ‘data.all’ catalog, ensuring a controlled inclusion of Redshift data.

Tables Available for Discovery

  • Cataloged Redshift tables automatically become part of the ‘data.all’ catalog, visible to users exploring datasets within the platform.
  • The catalog provides detailed metadata for each table, facilitating a comprehensive understanding of available data.
  • Users can navigate the ‘data.all’ UI to effortlessly discover and explore Redshift tables
  • Dataset owners can edit metadata for each table such as description, tags.

Self-service Share Process for Redshift Data Sharing

  • Consumers interested in specific Redshift tables initiate the share process by selecting the desired dataset.
  • Owners of the shared Redshift tables within data.all Datasets receive access requests, with an easy-to-use interface for managing permissions and approvals.
  • Upon approval, the shared Redshift data becomes dynamically accessible to consumers, maintaining a consistent and user-friendly experience.

Benefits:

  • Additional Data Source Integration: The added capability of Redshift as a new data source enhances flexibility, enabling users to integrate diverse data sources beyond S3, expanding the platform’s utility.
  • User-Friendly Configuration: A guided process ensures a connection of Redshift instances with secure credentials.
  • Efficient Discovery: Automated cataloging promotes effortless exploration of Redshift tables within ‘data.all’ catalog.
  • Streamlined Sharing Workflow: The self-service share process maintains simplicity and consistency across different types of data, allowing users to request and access Redshift data seamlessly as they do with S3 data.

@dlpzx

@dlpzx
Copy link
Contributor

dlpzx commented Mar 13, 2024

Design

This design is up to date with the latest implementation changes

Assumptions

  • Redshift clusters/namespaces are created and maintained by DevOps teams outside of data.all
  • Database admin teams manage users in their clusters/namespaces outside of data.all
  • Data producers and consumers can access their clusters/namespaces with the access provided by the database admin teams.
  • Data producers create tables in Redshift outside of data.all
  • Data.all requires a Redshift user of the type IAM:user or database user with credentials stored in AWS Secrets Manager for the data producers that are going to publish data ( In the diagram this is the basis for Authorization 1). Data.all needs to have permissions to use the IAM role or to access the Secret. This user needs to have permissions to create datashares.
  • Data.all requires a Redshift user of the type IAM:user for the data.all PivotRole in all accounts with a Redshift cluster. This user needs to have permissions to create datashares. In the diagram this is the basis for Authorization 2 and 3
  • data.all Share request principal will be REDSHIFT ROLE
  • Data Consumers register their Redshift roles with Redshift Consumption Roles. Database admins can control the roles created in Redshift which roles are attached to which user/group. To isolate data.all access grants from other access grants, we recommend database admins to create dedicated Redshift roles. For example, for projectXYZ a group of Redshift users needs permissions to data in another cluster. The database admin should create a Redshift role DAProjectXYZ and attach it to the roles/users/groups in RS. Data consumers should register the role in data.all and request access to the data they need.

HLD and User experience

Initial design with Lake Formation ---> NOT USED ANYMORE

During implementation we realized that datashares with LakeFormation do not bring much value when not being used to actually share further in Lake Formation (it just plots metadata in Glue). It might be useful in the future if we integrate with IAM Identity Center as the integration Redshift-LF works way better with IAM IC. But for the moment we won't be using it. If in the future we want to revert the changes the code is in commits: bf476fc, ef9662b and 17970d7.

image

Initial data sharing design with data.all Redshift consumption roles ---> NOT USED ANYMORE

There are 3 reasons why this design has been further improved:

  • In this design we assume that the users are taking the necessary outside actions on the pivot role so that it can process data shares in the source and in the target clusters. It connects to the cluster in a different way from how we were doing it for dataset publishing, which adds more code, more IAM policies, more features that we use in Redshift (without needing them).

  • In addition we are creating a data.all abstraction, redshift consumer roles, which is again another layer of complexity for the users to interact with.

  • Finally, data.all does not ensure that the user has taken the necessary preliminary actions before opening a share request. There is no visibility on whether the namespace used for the share request can be accessed by data.all, which can lead to errors in the sharing (but the actual error is in the onboarding of the redshift cluster). We should separate onboarding cluster steps from sharing steps as much as possible.

Redshift-data all-without-warehouses-with-consumptionRoles_UPDATED drawio(3)

Current design

We add more guardrails on the onboarding of clusters by setting as requirement that there must be a pivot role connection created for each cluster used. This is a pre-req for creating other types of connections and for opening share requests.

Redshift-data all-without-warehouses-with-consumptionRoles_UPDATED_2 drawio(1)

Following the numeration above:

  1. Outside of data.all, Database Admin Teams manage Redshift cluster users.
    1. For data producers - They create a regular Redshift user and optionally (mandatory for Redshift serverless) store the credentials in Secrets Manager, ⚠️ [NOT IMPLEMENTED YET] or they create a a Redshift user of the type (IAM:user) that allows IAM federation
    2. For data consumers - They create Redshift roles and attach them to users
  2. Outside of data.all, Database Admin Teams in the data producer and in the data consumer clusters create a user in Redshift for the data.all IAM pivot role and optionally (mandatory for Redshift serverless) store the credentials in Secrets Manager, ⚠️ [NOT IMPLEMENTED YET] or they create a a Redshift user of the type (IAM:user) that allows IAM federation
  3. Outside of data.all, Data producers work in Redshift and create tables
  4. In data.all UI, Database producers create a data.all pivot role Connection. Without a pivot role connection that is valid, no other connection can be created!
  5. In data.all UI, Data producers create a data.all Connection.
    1. When creating a connection, users need to introduce:
      1. The Redshift user (IAM:user) IAM role or SecretArn created by their db admins
      2. Environment where the cluster is
      3. Namespace/cluster id
      4. Database
      5. A data.all Team that owns the connection. Only members of the Team can use it. (similar to consumption IAM roles)
    2. Connections are going to be used to AUTHORIZE the import of data and maybe in next steps to open Redshift QueryEditorV2. There are different types of Redshift users:
      1. ⚠️ [NOT IMPLEMENTED YET] Federated users (the IAM role is stored). The role created has permissions to be used as federated user in Redshift by data.all.
      2. [IMPLEMENTED] AWS Secrets Manager (the secretArn is stored). Customers will need to tag the secret in order for data.all to be able to access it.
      3. NEXT STEPS - IAM Identity Center - it cannot be used at the moment for the publication of data.
      4. NEVER - username and password. From data.all we want to avoid securing passwords in transit.
  6. In data.all UI, Data producers import a Redshift dataset in data.all specifying:
    1. Select the Environment and the Connection to use for import
    2. The Team that owns the Connection also will own the Dataset
    3. Redshift schema and selection of tables to be imported from that schema
  7. Under-the-hood, when a dataset is imported, data.all creates a datashare between Redshift and the Glue Catalog using the authorization of the Connection the metadata for the schema and tables imported is stored. Dataset and tables are indexed in the data catalog.
  8. In data.all UI, Data producers can fetch the schema of the imported tables in the dataset/Data tabclick on “Sync tables” in the imported dataset as we do with S3/Glue datasets. Tables appear in data.all. Users can ListDatasets, which lists S3 and Redshift datasets.
  9. Under-the-hood, when the data producer opens the schema of a table, data.all uses redshift data API to read the table details from Redshift. clicks sync-tables, data.all reads from the glue database created as part of the datashare from Redshift to Glue Catalog
  10. In data.all UI, data consumers can discover RS tables and datasets in Catalog
  11. In data.all UI, Database consumers create a data.all pivot role Connection for the target cluster. Without a pivot role connection that is valid, the share request cannot be created.
  12. In data.all UI, data consumers can create a share request by selecting the dataset or tables. They submit the request
    1. In the share request they select the target environment and target group
    2. A dropdown lists the namespaces with pivot role connections in the environment.
    3. Data.all checks that the target group has permissions to use Redshift in the environment
    4. Optionally, Users manually input the Redshift role that is the recipient of the request. If the Redshift role is specified the share request is granted to the role, if not, to the namespace.
  13. In data.all UI, data producers approve the request
  14. Under-the-hood, data.all creates a datashare in the data producers cluster/namespace
  15. Under-the-hood, data.all associates the datashare to the data consumers cluster and grants permissions to the redshift role (if specified)
  16. Data consumers will access the data through:
  • BI tools: Quicksight, Tableau, Power BI, Qlik (JDBC/ODBC connections)
  • SQL clients: DB Beaver, SQL Workbench (JDBC/ODBC connections)
  • ETL workloads in Redshift
  • Ad-hoc queries in Redshift Query Editor

User experience

Redshift connection

Create/Delete and list
In the creation we check that the connection is valid by listing databases in redshift and making sure the selected database is part of the cluster/workgroup.
Serverless clusters do not accept a db user for federation (see example API call). At the moment db users are disabled for serverless, in the future we can think of assuming and IAM role and then do federation.

If any parameter in the connection form is invalid it will throw an error. If the Team does not have permissions to create a connection in the environment or does not have tenant permissions for redshift it will also throw an error.

Screen.Recording.2024-07-25.at.08.59.54.mov

Screenshot 2024-07-15 at 12 37 23

Redshift dataset

Import Form

Screen.Recording.2024-07-25.at.09.17.29.mov

List Datasets view
With icons for S3 and Redshift
image

** List Datasets in Environment**

Screen.Recording.2024-07-25.at.10.43.36.mov

Dataset view
Screenshot 2024-07-17 at 14 33 24

Dataset edit form, Tables tab and schema modal
https://github.com/user-attachments/assets/89c69669-59bf-4ff0-a889-523581d87d25

Table view, columns tab and Table edit form
https://github.com/user-attachments/assets/1a8437d0-4b51-4330-833a-e3c0b495e591

Delete Table, Dataset

Screen.Recording.2024-07-25.at.11.06.02.mov

They get deleted and removed from the catalog

Screenshot 2024-07-25 at 10 21 31

**Catalog indexing **

Screen.Recording.2024-07-25.at.10.06.50.mov

** Feed, Votes**

Screen.Recording.2024-07-25.at.10.15.38.mov

Glossary

Screenshot 2024-07-25 at 10 16 29

Redshift permissions controls
In the admin settings and in the environment team invitation form we can define redshift permissions applied to teams

Screen.Recording.2024-07-25.at.08.57.10.mov

Permissions

IAM permissions

IAM permissions are granted solely to the pivot role. List and describe permissions are granted to all resources if needed and write operations on Redshift workgroups, namespaces and clusters are restricted to those resources that have been onboarded to data.all in the form of Connections. Every time a connection is added to the environment, the pivot role gets updated (the environment stack gets deployed)

data.all application permissions

Here we are referring to permissions guarding API calls. These are not IAM permissions but data.all specific permissions that can be of the type: tenant-level, environment-level, or group-level permissions. For more info, check the Permission model section in the docs.

To avoid complex permissions backfilling migrations or the risk of too open migrations, in this case I am considering dataset sharing and future extensions to decide which permissions to include.

Redshift Connection permissions

All API-facing methods of RedshiftConnectionService are protected by the permission decorators.

  • Tenant permissions
    • MANAGE_REDSHIFT_CONNECTION - 👀 ⚠️ Initially this permission was not defined, assuming that connections were controlled as part of MANAGE_REDSHIFT_DATASETS. However, the actions on redshift connections are subtle to be specially restricted by data.all admin, so it was added back. This permission is applied to create/delete connections. Users without this permission can still be able to import redshift datasets using connections (so they will perform get/list operations). Second warning ⚠️ At the moment this permission won't do anything as a connection has an admin group that will create the connection and then use it, but if in the future we want to share a connection then this will be useful.
  • Environment permissions - granted when inviting a team to an environment
    • CREATE_REDSHIFT_CONNECTION - to limit which groups in an environment are allowed to create connections in the environment. Applied to create_redshift_connection
    • LIST_ENVIRONMENT_REDSHIFT_CONNECTIONS - to prevent that users outside of an Environment fetch another environment connections.
  • Group permissions
    • GET_REDSHIFT_CONNECTION - to prevent that unauthorized users (not belonging to the connection owner team) get the details of the connection. Applied to multiple operations that get info from the connection and granted to dataset admin team. 👀 In the future extensible to non-admin groups that could use the connection without being the admins of it.
    • DELETE_REDSHIFT_CONNECTION - to prevent that unauthorized users delete a connection. Applied to delete_redshift_connection and granted ONLY to connection admin team.

Connections are not editable at the moment, so there are no permissions to UPDATE_CONNECTIONS.

Redshift Dataset permissions

  • Tenant permissions
    • MANAGE_REDSHIFT_DATASETS to limit at the application level which teams can work with Redshift datasets. Applied to all methods of RedshiftDatasetService. If the tenant says no, then it is a no.
  • Environment permissions - granted when inviting a team to an environment
    • IMPORT_REDSHIFT_DATASET to limit which groups in an environment are allowed to import a redshift dataset in the environment. Applied to import_redshift_dataset
  • Group permissions
    • UPDATE_REDSHIFT_DATASET and DELETE_REDSHIFT_DATASET - to prevent that unauthorized users update/delete a dataset. Applied to update and delete_redshift_dataset and granted ONLY to dataset admin team.
    • ADD_TABLES_REDSHIFT_DATASET - to limit the users that can add tables to a dataset. ⚠️ it could be considered as part of update_dataset, but better to be specific as each is a different action in nature.
    • GET_REDSHIFT_DATASET - limits get dataset details. Applied to any method that fetches data for the Dataset
    • GET_REDSHIFT_DATASET_TABLE - limits get table details. Applied to any method that fetches data for the table. Needed when we share redshift tables
    • DELETE_REDSHIFT_DATASET_TABLE- to prevent that unauthorized users delete a table Applied to delete_redshift_table and granted ONLY to dataset admin team (the ones that added the table)
    • UPDATE_REDSHIFT_DATASET_TABLE- to prevent that unauthorized users delete a table Applied to update_redshift_table and granted ONLY to dataset admin team (the ones that added the table)

Sharing with Redshift

We will share Redshift tables. We could have decided to implement full dataset sharing, but sharing with more granularity is more aligned with least-privilege principles.

Datashares only work for Encrypted clusters. Therefore we should add guardrails preventing shares for non-encrypted clusters. Or directly disable the onboarding of clusters that are not encrypted. In the connection we should include the encryption type of the cluster.

Alternative 1: datashare per share request

When a share request is approved:
1) Create datashare (in source account)
2) Add schema to the datashare (in source account)
3) Add share requested tables to the datashare (in source account)
4) Grant access to the consumer cluster to the datashare (in source account)
5) Create local database from datashare (in target account) - WITH PERMISSIONS OPTIONAL
6) Create external schema in local database (in target account)
7) Grant usage access to the redshift role to the local database and schema (in target account)
When revoking tables:
1) Remove table from datashare
2) if no more tables in share request -> clean-up: delete external schema, local db, revoke access to datashare (if needed) and delete datashare

Alternative 2: datashare per dataset

When a share request is approved:
1) Create datashare (in source account) if it does not exist already
2) Add schema to the datashare (in source account) if not done already
3) Add tables to the datashare (in source account) if not done already added
4) Grant access to the consumer cluster to the datashare (in source account) if not done already
5) Create local database from datashare (in target account) if not done already - WITH PERMISSIONS is needed
6) Create external schema in local database (in target account) if not done already
7) Grant granular usage access to the redshift role to the local database schema and share requested tables (in target account) ALWAYS

When revoking tables:
1) Revoke permissions (revert step 7)
2) if table not shared in any share request - clean-up table: remove from datashare
3) if no more tables in datashare - clean-up datashare: delete external schema, local db, revoke access to datashare (if needed) and delete datashare

Alternative 3: datashare per dataset-requester namespace

Same steps as alternative 2 but in this case we create a different datashare for each target namespace.

Comparison

  • Simplicity of implementation: they are pretty similar. Alternative 1 has more steps but each share is isolated. Alternative 2 is very fast for additional shares but has a more complex revoke. Alternative 3 is slightly more difficult than 2.
  • User experience: this is the main difference ❗ End users will query from the external schema (SELECT * FROM "dev"."serv_db_public"."customer";, having many external schemas involves having complex names with Ids which might be not straightforward to use. Plus, for the database admins it can be also confusing. So alternatives 2 and 3 are definitely more user friendly.
  • Security/Data Governance: with the WITH PERMISSIONS clause we can restrict access in the consumer side, so there should be no downside of sharing the same datashare across multiple end-consumers (Redshift roles) - I verified that we can grant permissions to a single table in the datashare and that the end-user does not have permissions to other tables in the datashare. They can list and describe them but cannot select them. Between alternative 2 and alternative 3, the later offers more security. At the end the db admins of multiple target namespaces will have access to all the items of the datashare which might include permissions to tables that are not granted explicitly to a particular namespace.

----> decision: Alternative 3 offers the nicest, most secure experience for users

Limitations

All alternatives take into account Redshift service quotas. In principle the max number of dbs in a cluster is 60 (provisioned cluster) and 100 (serverless), but this excludes databases created from datashares, so we are safe. As for datasharing limitations we should add it in the docs: https://docs.aws.amazon.com/redshift/latest/dg/considerations.html.

As for sharing between a Redshift provisioned cluster and a serverless cluster, the documentation states that it is possible: https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-datasharing.html

More constraints: You can create only one consumer database for one datashare on a consumer cluster. You can't create multiple consumer databases referring to the same datashare. --> accounted for in #1467

@dlpzx
Copy link
Contributor

dlpzx commented Mar 18, 2024

Implementation plan


Pre-requisites

To implement the design I will open multiple pull requests (list might vary)


Redshift datasets

  • Done New Redshift Dataset module using Base datasets + publish to catalog logic. Introduce Redshift Connections (Add Redshift datasets module #1424)
    • Redshift Connections - + checks
    • Redshift Dataset import
    • Redshift Tables view,
    • Add and remove tables
    • Delete and edit Dataset
    • Add catalog indexer for Dataset and tables
    • Add logic for glossary, feed
    • Polish IAM permissions
    • Polish data.all permissions
    • Polish frontend views
    • Migrations and backfilling
    • Add unit testing for connections - check comments below
    • Add unit testing for datasets - 94% coverage (leaving out glossary and votes which should be tested in their own modules)

Redshift data sharing


Related tasks needed for release


Documentation (also needed for release)


Integration testing -----> tracked in #1510

Wait for #1409

  • redshift- datasets
  • redshift-datasets-shares

Redshift next steps ---> tracked in #1509

  • Add Connections of IAM Federation type - next steps!
  • Use getEnums API call to return clusterTypes with utils implemented in Feat: API call to query Enum values #1435
  • Extract more common dataset_base code from redshift datasets and s3 datasets
    • Common FE elements in import/create S3 dataset and import Redshift dataset
    • Common FE elements in edit datasets
    • Common resolvers (resolve_dataset_environment, resolve_dataset_owners_group, resolve_dataset_stewards_group)
    • Common updateDataset API call
    • Common ModifyRedshiftDatasetInput
  • Following the pattern set by @SofiaSazonova at Feat: API call to query Enum values #1435 I think we should start thinking how to detangle UI from the config.json. Here we could have a query that returns all the enabled modules.Originally posted by @petrkalos in Add Redshift datasets module #1424 (comment)

NOT Redshift tasks out of scope

  • Move glossary, feed, indexer targets to enums in their respective modules
  • Rename S3 permission descriptions in team invite permission toogle list to clearly specify they are S3/Glue datasets
  • create a uni-test directory and migrate the current tests to unit tests - check this commit. I started it but reverted the changes as it was getting too complex to be added in the initial PR
  • Generic search filter and input in input_types API calls
  • Common styled DataGrid component with cell borders for dark theme

@fourtyplustwo
Copy link
Contributor

@dlpzx I've read through the design and watched your video as well (it was very helpful as it answered some of my questions).

Overall I don't see any big problems but I do have some concerns.

  1. Addition of a new UI "Warehouses" to manage Redshift connections.I find this UI a bit awkward. My first instinct that this should be a TAB under an environment and not a separate UI outside an environment. Especially because you cannot have a connection that is not part of an environment. I think this would also simplify creating connections because then the environment is already pre-defined and you can also make the connection be owned by the same team that is creating the connection.

I would also want to make sure that there's a consistent user experience when registering consumer roles or redshift consumer connections. Even today I find it weird that we register consumer roles in "Teams" tab under environments. I don't think that's intuitive. Perhaps with the addition of redshift connections we can instead add a new tab on the environment "Consumer Connections" or smth similar where you can manage your consumer IAM roles and redshift consumer connections etc..

Also I don't really feel that this new type "Warehouses" is actually going to be reusable for anything else other than Redshift so I think it's misleading.

I would like to hear your arguments why you think it would be much better to put this as a new UI on the left main bar vs making it a new tab on the environment.

  1. For sure make Redshift modular so that it can be fully disabled as for example we don't use redshift at all and don't want our users to be confused.

  2. We need to check security. Absolutely make sure to scan all infrastructure with checkov and that the permissions are as tight as possible.

  3. I'd really like to see part 2 of your video to understand better how Redshift consumer connections should work.

Thank you!

@anushka-singh
Copy link
Contributor

I really like how descriptive the design is. Answered most of my questions too!
I have a few pending though:

  1. Will a dataset be able to have s3, glue and redshift data? Will I be able to create such a dataset?
  2. Will the share UI be the same as the one being used today?
  3. Will all the other modules like QS, Sagemaker, Worksheets be available to use for Redshift too?
  4. Why are we calling it "Warehouses"? How is it any different from a data store like Glue or S3?
  5. Can you provide more information on how data consumers will interact with Redshift data using BI tools and SQL clients? Will consumers have to set up anything extra on their end to be able to use these tools?

@dlpzx
Copy link
Contributor

dlpzx commented Apr 8, 2024

Thanks @zsaltys and @anushka-singh for the input, you went straight to the tricky points.

  • @zsaltys Regarding point 1, initially I placed it inside environments, but then I questioned if we even needed to place a warehouse inside an environment - let's say you are using Snowflake and it is not linked to an AWS account. What we can do is to place it inside environments, because I agree that the user experience is nicer that way. But then if we need to link other Warehouses with non-AWS links, we can work on creating non-AWS-data.all Environments (something that opens the door to multi-cloud....). In short, happy to change it. 2 - absolutely, 3 - let's prioritize for 2.5, 4 - i have not recorded it yet, i have been focusing in Create generic dataset_base and s3_dataset modules from current datasets #1123 the last week. Please have a look
  • @anushka-singh thanks for the questions! I think you need to have a look at Create generic dataset_base and s3_dataset modules from current datasets #1123 for the questions 1 and 2. The idea is to have a generic Dataset model and specific Dataset classes that inherit this model. Instead of adding functionalities to the existing Dataset module, we have opted to make it extensible. For question 2 - yes, very similar, but we need to check the details
  • For question 3, we would need to check case-by-case what is the integration: for Quicksight, how does the data sharing work, for SageMaker, if there is any library to connect with a redshift user or with IAM:role federation then they can access the data. Worksheets depends on the Athena connectors, in this last case we would need to see if it is worthy or we can open the RS Query Editor
  • I called it Warehouses with the idea of making it abstract to other warehousing technologies (also outside AWS)
  • For 5, most probably. I will add more details

DESIGN UPDATED WITH THE FEEDBACK!

dlpzx added a commit that referenced this issue May 3, 2024
…NAL DELETE DATASETS_BASE (#1242)

### Feature or Bugfix
- Refactoring

### Detail
After all the previous PRs are merged, there should be no circular
dependencies between `datasets` and `datasets_sharing`. We can now
proceed to:
- move `datasets_base` models, repositories, permissions and enums to
`datasets`
- adjust the `__init__` files to establish the `datasets_sharing`
depends on `datasets`
- adjust the Module interfaces to ensure that all necessary dataset
models... are imported in the interface for sharing


Next steps:
- share_notifications paramter to dataset_sharing in config.json

### Relates
#955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue May 7, 2024
…me datasets as s3_datasets) (#1250)

### Feature or Bugfix
- Refactoring

### Detail
- Rename `datasets` module to `s3_datasets` module

This PR is the first step to extract a generic datasets_base module that
implements the undifferentiated concepts of Dataset in data.all.
s3_datasets will use this base module to implement the specific
implementation for S3 datatasets.

### Relates
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue May 15, 2024
…te datasets_base and move enums) (#1257)

### Feature or Bugfix
⚠️ This PR should be merged after #1250. 
- Refactoring

### Detail
As explained in the design for #1123 we are trying to implement a
generic `datasets_base` module that can be used by any type of datasets
in a generic way.

This PR:
- Creates the skeleton of the `datasets_base` module consisting of 3
packages: `db`, `api`, `services`. And adds the `__init__` file.
- Adds the dependency of `s3_datasets` to `datasets_base` in the
`__init__` file of the `s3_datasets` module
- Moves datasets_enums to datasets_base

### Relates
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue May 17, 2024
…te DatasetBase db model and S3Dataset model) (#1258)

### Feature or Bugfix
⚠️ This PR should be merged after #1257.
- Feature
- Refactoring

### Detail
As explained in the design for #1123 we are trying to implement a
generic `datasets_base` module that can be used by any type of datasets
in a generic way.

**This PR does**:
- Adds a generic `DatasetBase` model in datasets_base.db that is used in
s3_datasets.db to build the `S3Dataset` model using joined table
inheritance in
[sqlalchemy](https://docs.sqlalchemy.org/en/20/orm/inheritance.html)
- Rename all usages of Dataset to S3Dataset (in the future some will be
returned to DatasetBase, but for the moment we will keep them as
S3Dataset)
- Add migration script that backfills `datasets` table and renames
`s3_datasets` ---> ⚠️ In the process of migrating we are doing some
"scary" operations on the dataset table, if for any reason the migration
encounters any issue it could result in catastrophic loss of information
--> for this reason this
[PR](#1267) implements RDS
snapshots on migrations.

**This PR does not**:
- Feed registration stays as:
`FeedRegistry.register(FeedDefinition('Dataset', S3Dataset))` using
`Dataset` with the `S3Dataset` resource type. It is out of the scope of
this PR to migrate the Feed definition.
- Exactly the same for the GlossaryRegistry registration. We keep
`object_type='Dataset'` to avoid backwards compatibility issues.
- It does not change the resourceType for permissions. We keep using a
generic `Dataset` as target for S3 permissions. If we are to split
permissions into DatasetBase permissions and S3Dataset permissions we
would do it on a different PR

#### Remarks
Inserting new items of S3Dataset does not require any changes. SQL
Alchemy joined inheritance automatically inserts data in the parent
table and then another one to the child table as explained in this
stackoverflow
[link](https://stackoverflow.com/questions/39926937/sqlalchemy-how-to-insert-a-joined-table-inherited-class-instance-when-the-pare)
(I was not able to find it in the official docs)


### Relates
- #1123 
- #955 
- #1267

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue May 21, 2024
…te DatasetBaseRepository and move DatasetLock) (#1276)

### Feature or Bugfix
⚠️ merge after #1258 
- Refactoring

### Detail
As explained in the design for #1123 we are trying to implement a
generic `datasets_base` module that can be used by any type of datasets
in a generic way.

In this small PR:
- we move the generic DatasetLock model to datasets_base
- move the DatasetLock db operations to databasets_base
DatasetBaseRepository
- move activity to DatasetBaseRepository

### Relates
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue May 21, 2024
…e DatasetServiceInterface to datasets_base, add property, create first list API for datasets_base) (#1281)

### Feature or Bugfix
- Feature
- Refactoring

### Detail
As explained in the design for #1123 we are trying to implement a
generic `datasets_base` module that can be used by any type of datasets
in a generic way.

In this PR we:
- Move DatasetServiceInterface to datasets_base. This interface is used
by datasets_sharing to "inject" logic in s3_datasets
- add property dataset_type to the DatasetServiceInterface interface to
distinguish which type of dataset this interface applies to.
- create first list API for datasets_base. 👀 This is the most important
part. When having multiple types of datasets users will still list all
datasets together in several places in the UI (e.g. in listDatasets in
DatasetList view, in listDatasetsEnvironment in Environment view) This
API calls are not specific to s3_datasets, but generic to any type of
dataset. Thus, they should be part of datasets_base. This PR introduces
the datasets_list_service, datasetListRepository and includes only one
example of API that moves to dataset_base. In next PRs we will move the
rest of APIs

### Relates
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue May 21, 2024
…ve list queries to dataset_base or rename them) (#1282)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 we are trying to implement a
generic `datasets_base` module that can be used by any type of datasets
in a generic way.

In this PR we:
- Restructure listDatasetsOwnedByEnvGroup as
listS3DatasetsOwnedByEnvGroup and move it into Worksheets in FE: the
reason why it is moved to Worksheets is that it is the only place where
it is used in the FE. One could argue that in the BE
listS3DatasetsOwnedByEnvGroup is part of the S3_Dataset module. The way
I see it, FE and BE are independent and their modularization strategies
fit the type of programming, what makes sense in FE might not make it in
BE. In BE queries belong to the module whose services/models they are
performing actions on, in this case s3_datasets. In FE queries belong to
the module where they are used and if a query is used by more than one
module then it can be placed in the generic `services` directory. What
is important is that we define the dependencies. In this case it is
important to make Worksheets dependent of S3_Datasets (as we do in the
index in `frontend/src/modules/Worksheets/index.js` and in
`backend/dataall/modules/worksheets/__init__.py`
- Move listDatasetsCreatedInEnvironment to datasets_base

### Relates
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue May 22, 2024
…art 1 (renaming, enums and permissions) (#1284)

### Feature or Bugfix
- Feature
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

In this PR:
- Rename `dataset_sharing` as `s3_dataset_shares`
- Create `shares_base` and introduce dependency (`s3_dataset_shares`
depends on `shares_base`)
- Move generic enums to shares_base
- Move generic permissions to shares_base


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@dlpzx dlpzx moved this to Roadmap in Data.all Roadmap Aug 12, 2024
dlpzx added a commit that referenced this issue Aug 13, 2024
### Feature or Bugfix
- Feature

### Detail
Complete design in #955.
This particular PR is focused on small FE enhancements to adapt the
share views to Redshift shares:

Add RedshiftTable as type to plot in shareView -> list Items, edit (add
items), verify items
![Screenshot 2024-08-12 at 13 29
18](https://github.com/user-attachments/assets/0c48ca8f-5ce4-41c5-aca9-62928c4345d0)

Solve issue with redirect in the ShareView header (it redirected to
s3-datasets/dataset/uri)

Add principal resolver that resolves as principal the Redshift role
(also removed unused fields for principal in backend)
![Screenshot 2024-08-12 at 13 31
07](https://github.com/user-attachments/assets/60be4e6d-fb0c-4a23-9e04-3775f9d0d4f8)

Replace IAM role references with a generic role and added icons
![Screenshot 2024-08-12 at 13 31
51](https://github.com/user-attachments/assets/1798a902-3398-4cbc-8aef-96797298c91a)

Finally, added shares tab in the Redshift Dataset View:

![image](https://github.com/user-attachments/assets/e321304c-8dfa-460f-bca0-ef24f4fcb594)

### Relates
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Aug 14, 2024
…res (#1467)

### Feature or Bugfix
- Feature

### Detail
Complete design in #955.
This particular PR is the CORE of the redshift dataset sharing
implementation.

- Implement sharing logic in ECS task for approve, revoke and verify.
❗check the "Sharing with Redshift" section of the design with the key
decisions on the sharing workflow
- Add the necessary redshift data API calls in the redshift_data client
- Move share alarm utils to shares_base so that they can be re-used in
redshift sharing. It would be good to rename the file but it can wait.
- Includes tests for the processor functions: approve, revoke, verify

In contrast to the design in the Glue or S3 sharing mechanisms, in this
case I decided to keep it simple and use the AWS client directly from
the processor without a manager.

❗ I did not find a way to check permissions granted to redshift roles in
Redshift. For this reason in the verification task we are not checking
the last 2 steps of the share. In Redshift it is possible to check user
permissions to tables (with
[has_table_permissions](https://docs.aws.amazon.com/redshift/latest/dg/r_HAS_TABLE_PRIVILEGE.html))
and role permissions to datashares, databases and schemas woth some of
the [info tables and
views](https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_system-tables.html);
but when it comes to tables there is not a table to look up or a system
function. For the moment I have not included this step, but I'll be
meeting more Redshift experts for guidance. The "good" thing is that it
is the last step for a share to succeed, so it will be a matter of users
trying and getting an "Access denied for insufficient permissions",
which can be troubleshooted

There are still a number of issues to be fixed on subsequent PRs:
- add guardrails to share creation
- polish FE (e.g. principal id, resource type)
- avoid IAM checks and dataset and IAM locks for Redshift

### Relates
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
noah-paige added a commit that referenced this issue Aug 30, 2024
commit 22a6f6ef 
Author: Noah Paige <[email protected]> 
Date: Mon Jul 08 2024 11:28:07 GMT-0400 (Eastern Daylight Time) 

    Add integ tests


commit 4fb7d653 
Author: Noah Paige <[email protected]> 
Date: Mon Jul 08 2024 11:26:36 GMT-0400 (Eastern Daylight Time) 

    Merge env test changes


commit 4cf42e8 
Author: Petros Kalos <[email protected]> 
Date: Fri Jul 05 2024 08:19:34 GMT-0400 (Eastern Daylight Time) 

    improve docs


commit 65f930a 
Author: Petros Kalos <[email protected]> 
Date: Fri Jul 05 2024 08:10:56 GMT-0400 (Eastern Daylight Time) 

    fix failures


commit 170b7ce 
Author: Petros Kalos <[email protected]> 
Date: Wed Jul 03 2024 10:52:20 GMT-0400 (Eastern Daylight Time) 

    add group/consumption_role invite/remove tests


commit ba77d69 
Author: dlpzx <[email protected]> 
Date: Wed Jul 03 2024 06:51:47 GMT-0400 (Eastern Daylight Time) 

    Rename alias for env_vars kms key in cognito lambdas FE and BE (#1385)

### Feature or Bugfix
- Bugfix

### Detail
For the case in which we deploy FE and BE in us-east-1 the new lambda
env_key alias is the same one for TriggerFunctionCognitoUrlsConfig in FE
and for TriggerFunctionCognitoConfig in BE, which results in a failure
of the CICD in the FE stack because the alias already exists.

This PR changes the name of both aliases to avoid this conflict. It also
adds envname to avoid issues with other deployment environments/tooling
account in the future

### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e5923a9 
Author: dlpzx <[email protected]> 
Date: Wed Jul 03 2024 04:27:11 GMT-0400 (Eastern Daylight Time) 

    Fix lambda_env_key out of scope for vpc-facing cognito setup (#1384)

### Feature or Bugfix
- Bugfix

### Detail
The KMS key for the Lambda environment variables in the Cognito IdP
stack was defined inside an if-clause for internet facing frontend.
Outside of that if, for vpc-facing architecture the kms key does not
exist and the CICD pipeline fails. This PRs move the creation of the KMS
key outside of the if.

### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 3ccacfc 
Author: Noah Paige <[email protected]> 
Date: Mon Jul 01 2024 13:56:58 GMT-0400 (Eastern Daylight Time) 

    Add delete docs not found when re indexing in catalog task (#1365)

### Feature or Bugfix
<!-- please choose -->
- Feature

### Detail
- Add logic to Catalog Indexer Task to Delete Docs No Longer in RDS
- TODO: Add Ability to Re-index Catalog Items via Dataall Admin UI

### Relates
- #1078

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e2817a1 
Author: Noah Paige <[email protected]> 
Date: Mon Jul 01 2024 05:14:07 GMT-0400 (Eastern Daylight Time) 

    Fix/glossary status (#1373)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Add back `status` to Glossary GQL Object for GQL Operations
(getGlossary, listGlossaries)
- Fix  `listOrganizationGroupPermissions` enforce non null on FE


### Relates


### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit c3c58bd 
Author: Petros Kalos <[email protected]> 
Date: Fri Jun 28 2024 06:55:42 GMT-0400 (Eastern Daylight Time) 

    add environment tests (#1371)

### Feature or Bugfix
Feature

### Detail
* add list_environment tests
* add test for updating an environment (via update_stack)
* generalise the polling functions for stacks

### Relates
#1220 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e913d48 
Author: dlpzx <[email protected]> 
Date: Fri Jun 28 2024 04:15:49 GMT-0400 (Eastern Daylight Time) 

    Add search (Autocomplete) in miscellaneous dropdowns (#1367)

### Feature or Bugfix
- Feature

### Detail
Autocomplete for environments and teams in the following frontend views
as requested in #1012. In this case the views required custom dropdowns.

❗ I used `noOptionsText` whenever it was necessary instead of checking
groupOptions lenght >0
- [x] DatasetEditForm.js -> ❗ I kept the stewards field as `freesolo` -
what that means is that users CAN specify options that are not on the
list. I would like the reviewer to confirm this is what we want. At the
end stewardship is a delegation of permissions, it makes sense that
delegation happens to other teams. Also changed DatasetCreateForm
- [X] RequestDashboardAccessModal.js - already implemented, minor
changes
- [X] EnvironmentTeamInviteForm.js - already implemented, minor changes.
-> Kept `freesolo` because invited teams might not be the user teams.
Same reason why there is no check for groupOptions == 0, if there are no
options there is still the free text option.
- [X] EnvironmentRoleAddForm.js
- [X] NetworkCreateModal.js 

### Relates
- #1012 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit ee71d7b 
Author: Tejas Rajopadhye <[email protected]> 
Date: Thu Jun 27 2024 14:08:27 GMT-0400 (Eastern Daylight Time) 

    [Gh 1301] Enhancement Feature - Bulk share reapply on dataset  (#1363)

### Feature or Bugfix
- Feature


### Detail

- Adds feature to reapply shares in bulk for a dataset. 
- Also contains bugfix for AWS worker lambda errors 

### Relates
- #1301
- #1364

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)? N/A
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization? N/A
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features? N/A
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users? N/A
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: trajopadhye <[email protected]>

commit 27f1ad7 
Author: Noah Paige <[email protected]> 
Date: Thu Jun 27 2024 13:18:32 GMT-0400 (Eastern Daylight Time) 

    Convert Dataset Lock Mechanism to Generic Resource Lock (#1338)

### Feature or Bugfix
<!-- please choose -->
- Feature
- Bugfix
- Refactoring

### Detail
- Convert Dataset Lock Mechanism to Generic Resource Lock
- Extend locking to Share principals (i.e. EnvironmentGroup and
Consumption Roles)

- Making locking a generic component not tied to datasets


### Relates
- #1093 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: dlpzx <[email protected]>

commit e3b8658 
Author: Petros Kalos <[email protected]> 
Date: Thu Jun 27 2024 12:50:59 GMT-0400 (Eastern Daylight Time) 

    ignore ruff change in blame (#1372)

### Feature or Bugfix
<!-- please choose -->
- Feature
- Bugfix
- Refactoring

### Detail
- <feature1 or bug1>
- <feature2 or bug2>

### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 2e80de4 
Author: dlpzx <[email protected]> 
Date: Thu Jun 27 2024 10:59:18 GMT-0400 (Eastern Daylight Time) 

    Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) (#1359)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

This is one of the last PRs focused on renaming files and cleaning-up
the s3_datasets_shares module. The first step is a consolidation of the
file and classes names in the services to clearly refer to s3_shares:
- `services.managed_share_policy_service.SharePolicyService` --->
`services.s3_share_managed_policy_service.S3SharePolicyService`
- `services.dataset_sharing_alarm_service.DatasetSharingAlarmService`
--> `services.s3_share_alarm_service.S3ShareAlarmService`
- `services.managed_share_policy_service.SharePolicyService` -->
`services.s3_share_managed_policy_service.S3SharePolicyService`

👀 The main refactoring happens in what is used to be
`services.dataset_sharing_service`.
- The part that implements the `DatasetServiceInterface` has been moved
to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService`
- The part used in the resolvers and by other methods has been renamed
as `services.s3_share_service.py` and the methods for the folder/table
permissions are also added to the S3ShareService (from
share_item_service)

Lastly, there is one method previously in share_item_service that has
been moved to the GlueClient directly as
`get_glue_database_from_catalog`.


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 1c09015 
Author: Noah Paige <[email protected]> 
Date: Thu Jun 27 2024 04:16:14 GMT-0400 (Eastern Daylight Time) 

    fix listOrganizationGroupPermissions (#1369)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Fix listOrganizationGroupPermissions


### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 976ec6b 
Author: dlpzx <[email protected]> 
Date: Thu Jun 27 2024 04:13:14 GMT-0400 (Eastern Daylight Time) 

    Add search (Autocomplete) in create pipelines (#1368)

### Feature or Bugfix
- Feature

### Detail
Autocomplete for environments and teams in the following frontend views
as requested in #1012.
This PR implements it for createPipelines

### Relates
- #1012 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 6c909a3 
Author: Noah Paige <[email protected]> 
Date: Wed Jun 26 2024 11:18:04 GMT-0400 (Eastern Daylight Time) 

    fix migration to not rely on OrganizationService or RequestContext (#1361)

### Feature or Bugfix
<!-- please choose -->
- Bugfix

### Detail
- Ensure migration script does not need RequestContext - otherwise fails
in migration trigger lambda as context info not set / available


### Relates
- #1306

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 90835fb 
Author: Anushka Singh <[email protected]> 
Date: Wed Jun 26 2024 11:17:22 GMT-0400 (Eastern Daylight Time) 

    Issue1248: Persistent Email Reminders (#1354)

### Feature or Bugfix
- Feature


### Detail
- When a share request is initiated and remains pending for an extended
period, dataset producers will receive automated email reminders at
predefined intervals. These reminders will prompt producers to either
approve or extend the share request, thereby preventing delays in
accessing datasets.

Attaching screenshots for emails:

<img width="1336" alt="Screenshot 2024-06-20 at 5 34 31 PM"
src="https://github.com/data-dot-all/dataall/assets/26413731/d7be28c3-5c98-4146-92b1-295e136137a3">

<img width="1322" alt="Screenshot 2024-06-20 at 5 34 52 PM"
src="https://github.com/data-dot-all/dataall/assets/26413731/047556e8-59ee-4ebf-b8a7-c0a6684e2a63">


- Email will be sent every Monday at 9am UTC. Schedule can be changed in
cron expression in container.py

### Relates
- #1248

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Anushka Singh <[email protected]>
Co-authored-by: trajopadhye <[email protected]>
Co-authored-by: Mohit Arora <[email protected]>
Co-authored-by: rbernota <[email protected]>
Co-authored-by: Rick Bernotas <[email protected]>
Co-authored-by: Raj Chopde <[email protected]>
Co-authored-by: Noah Paige <[email protected]>
Co-authored-by: dlpzx <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: jaidisido <[email protected]>
Co-authored-by: dlpzx <[email protected]>
Co-authored-by: mourya-33 <[email protected]>
Co-authored-by: nikpodsh <[email protected]>
Co-authored-by: MK <[email protected]>
Co-authored-by: Manjula <[email protected]>
Co-authored-by: Zilvinas Saltys <[email protected]>
Co-authored-by: Zilvinas Saltys <[email protected]>
Co-authored-by: Daniel Lorch <[email protected]>
Co-authored-by: Tejas Rajopadhye <[email protected]>
Co-authored-by: Zilvinas Saltys <[email protected]>
Co-authored-by: Sofia Sazonova <[email protected]>
Co-authored-by: Sofia Sazonova <[email protected]>

commit e477bdf 
Author: Noah Paige <[email protected]> 
Date: Wed Jun 26 2024 10:39:09 GMT-0400 (Eastern Daylight Time) 

    Enforce non null on GQL query string if non null defined (#1362)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Add `String!` to ensure non null input argument on FE if defined as
such on backend GQL operation for `listS3DatasetsSharedWithEnvGroup`


### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit d6b59b3 
Author: Noah Paige <[email protected]> 
Date: Wed Jun 26 2024 08:48:52 GMT-0400 (Eastern Daylight Time) 

    Fix Init Share Base (#1360)

### Feature or Bugfix
<!-- please choose -->
- Bugfix

### Detail
- Need to register processors in init for s3 dataset shares API module


### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit bd3698c 
Author: Petros Kalos <[email protected]> 
Date: Wed Jun 26 2024 05:19:14 GMT-0400 (Eastern Daylight Time) 

    split cognito urls setup and cognito user creation (#1366)

### Feature or Bugfix
- Bugfix
### Details
For more details about the issue read #1353 
In this PR we are solving the problem by splitting the configuration of
Cognito in 2.
* First part (cognito_users_config.py) is setting up the required groups
and users and runs after UserPool deployment
* Second part (cognito_urls_config.py) is setting up Cognito's
callback/logout urls and runs after the CloudFront deployment

We chose to split the functionality because we need to have the
users/groups setup for the integration tests which are run after the
backend deployment.

The other althernative is to keep the config functionality as one but
make the integ tests run after CloudFront stage.

### Relates
- Solves #1353 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
noah-paige added a commit that referenced this issue Aug 30, 2024
commit 4425e756 
Author: Noah Paige <[email protected]> 
Date: Mon Jul 08 2024 11:57:31 GMT-0400 (Eastern Daylight Time) 

    Fix


commit 4cd2bf77 
Author: Noah Paige <[email protected]> 
Date: Mon Jul 08 2024 11:56:38 GMT-0400 (Eastern Daylight Time) 

    Fix


commit 22a6f6ef 
Author: Noah Paige <[email protected]> 
Date: Mon Jul 08 2024 11:28:07 GMT-0400 (Eastern Daylight Time) 

    Add integ tests


commit 4fb7d653 
Author: Noah Paige <[email protected]> 
Date: Mon Jul 08 2024 11:26:36 GMT-0400 (Eastern Daylight Time) 

    Merge env test changes


commit 4cf42e8 
Author: Petros Kalos <[email protected]> 
Date: Fri Jul 05 2024 08:19:34 GMT-0400 (Eastern Daylight Time) 

    improve docs


commit 65f930a 
Author: Petros Kalos <[email protected]> 
Date: Fri Jul 05 2024 08:10:56 GMT-0400 (Eastern Daylight Time) 

    fix failures


commit 170b7ce 
Author: Petros Kalos <[email protected]> 
Date: Wed Jul 03 2024 10:52:20 GMT-0400 (Eastern Daylight Time) 

    add group/consumption_role invite/remove tests


commit ba77d69 
Author: dlpzx <[email protected]> 
Date: Wed Jul 03 2024 06:51:47 GMT-0400 (Eastern Daylight Time) 

    Rename alias for env_vars kms key in cognito lambdas FE and BE (#1385)

### Feature or Bugfix
- Bugfix

### Detail
For the case in which we deploy FE and BE in us-east-1 the new lambda
env_key alias is the same one for TriggerFunctionCognitoUrlsConfig in FE
and for TriggerFunctionCognitoConfig in BE, which results in a failure
of the CICD in the FE stack because the alias already exists.

This PR changes the name of both aliases to avoid this conflict. It also
adds envname to avoid issues with other deployment environments/tooling
account in the future

### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e5923a9 
Author: dlpzx <[email protected]> 
Date: Wed Jul 03 2024 04:27:11 GMT-0400 (Eastern Daylight Time) 

    Fix lambda_env_key out of scope for vpc-facing cognito setup (#1384)

### Feature or Bugfix
- Bugfix

### Detail
The KMS key for the Lambda environment variables in the Cognito IdP
stack was defined inside an if-clause for internet facing frontend.
Outside of that if, for vpc-facing architecture the kms key does not
exist and the CICD pipeline fails. This PRs move the creation of the KMS
key outside of the if.

### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 3ccacfc 
Author: Noah Paige <[email protected]> 
Date: Mon Jul 01 2024 13:56:58 GMT-0400 (Eastern Daylight Time) 

    Add delete docs not found when re indexing in catalog task (#1365)

### Feature or Bugfix
<!-- please choose -->
- Feature

### Detail
- Add logic to Catalog Indexer Task to Delete Docs No Longer in RDS
- TODO: Add Ability to Re-index Catalog Items via Dataall Admin UI

### Relates
- #1078

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e2817a1 
Author: Noah Paige <[email protected]> 
Date: Mon Jul 01 2024 05:14:07 GMT-0400 (Eastern Daylight Time) 

    Fix/glossary status (#1373)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Add back `status` to Glossary GQL Object for GQL Operations
(getGlossary, listGlossaries)
- Fix  `listOrganizationGroupPermissions` enforce non null on FE


### Relates


### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit c3c58bd 
Author: Petros Kalos <[email protected]> 
Date: Fri Jun 28 2024 06:55:42 GMT-0400 (Eastern Daylight Time) 

    add environment tests (#1371)

### Feature or Bugfix
Feature

### Detail
* add list_environment tests
* add test for updating an environment (via update_stack)
* generalise the polling functions for stacks

### Relates
#1220 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit e913d48 
Author: dlpzx <[email protected]> 
Date: Fri Jun 28 2024 04:15:49 GMT-0400 (Eastern Daylight Time) 

    Add search (Autocomplete) in miscellaneous dropdowns (#1367)

### Feature or Bugfix
- Feature

### Detail
Autocomplete for environments and teams in the following frontend views
as requested in #1012. In this case the views required custom dropdowns.

❗ I used `noOptionsText` whenever it was necessary instead of checking
groupOptions lenght >0
- [x] DatasetEditForm.js -> ❗ I kept the stewards field as `freesolo` -
what that means is that users CAN specify options that are not on the
list. I would like the reviewer to confirm this is what we want. At the
end stewardship is a delegation of permissions, it makes sense that
delegation happens to other teams. Also changed DatasetCreateForm
- [X] RequestDashboardAccessModal.js - already implemented, minor
changes
- [X] EnvironmentTeamInviteForm.js - already implemented, minor changes.
-> Kept `freesolo` because invited teams might not be the user teams.
Same reason why there is no check for groupOptions == 0, if there are no
options there is still the free text option.
- [X] EnvironmentRoleAddForm.js
- [X] NetworkCreateModal.js 

### Relates
- #1012 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit ee71d7b 
Author: Tejas Rajopadhye <[email protected]> 
Date: Thu Jun 27 2024 14:08:27 GMT-0400 (Eastern Daylight Time) 

    [Gh 1301] Enhancement Feature - Bulk share reapply on dataset  (#1363)

### Feature or Bugfix
- Feature


### Detail

- Adds feature to reapply shares in bulk for a dataset. 
- Also contains bugfix for AWS worker lambda errors 

### Relates
- #1301
- #1364

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)? N/A
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization? N/A
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features? N/A
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users? N/A
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: trajopadhye <[email protected]>

commit 27f1ad7 
Author: Noah Paige <[email protected]> 
Date: Thu Jun 27 2024 13:18:32 GMT-0400 (Eastern Daylight Time) 

    Convert Dataset Lock Mechanism to Generic Resource Lock (#1338)

### Feature or Bugfix
<!-- please choose -->
- Feature
- Bugfix
- Refactoring

### Detail
- Convert Dataset Lock Mechanism to Generic Resource Lock
- Extend locking to Share principals (i.e. EnvironmentGroup and
Consumption Roles)

- Making locking a generic component not tied to datasets


### Relates
- #1093 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: dlpzx <[email protected]>

commit e3b8658 
Author: Petros Kalos <[email protected]> 
Date: Thu Jun 27 2024 12:50:59 GMT-0400 (Eastern Daylight Time) 

    ignore ruff change in blame (#1372)

### Feature or Bugfix
<!-- please choose -->
- Feature
- Bugfix
- Refactoring

### Detail
- <feature1 or bug1>
- <feature2 or bug2>

### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 2e80de4 
Author: dlpzx <[email protected]> 
Date: Thu Jun 27 2024 10:59:18 GMT-0400 (Eastern Daylight Time) 

    Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) (#1359)

### Feature or Bugfix
- Refactoring

### Detail
As explained in the design for #1123 and #1283 we are trying to
implement generic `datasets_base` and `shares_base` modules that can be
used by any type of datasets and by any type of shareable object in a
generic way.

This is one of the last PRs focused on renaming files and cleaning-up
the s3_datasets_shares module. The first step is a consolidation of the
file and classes names in the services to clearly refer to s3_shares:
- `services.managed_share_policy_service.SharePolicyService` --->
`services.s3_share_managed_policy_service.S3SharePolicyService`
- `services.dataset_sharing_alarm_service.DatasetSharingAlarmService`
--> `services.s3_share_alarm_service.S3ShareAlarmService`
- `services.managed_share_policy_service.SharePolicyService` -->
`services.s3_share_managed_policy_service.S3SharePolicyService`

👀 The main refactoring happens in what is used to be
`services.dataset_sharing_service`.
- The part that implements the `DatasetServiceInterface` has been moved
to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService`
- The part used in the resolvers and by other methods has been renamed
as `services.s3_share_service.py` and the methods for the folder/table
permissions are also added to the S3ShareService (from
share_item_service)

Lastly, there is one method previously in share_item_service that has
been moved to the GlueClient directly as
`get_glue_database_from_catalog`.


### Relates
- #1283 
- #1123 
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 1c09015 
Author: Noah Paige <[email protected]> 
Date: Thu Jun 27 2024 04:16:14 GMT-0400 (Eastern Daylight Time) 

    fix listOrganizationGroupPermissions (#1369)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Fix listOrganizationGroupPermissions


### Relates
- <URL or Ticket>

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 976ec6b 
Author: dlpzx <[email protected]> 
Date: Thu Jun 27 2024 04:13:14 GMT-0400 (Eastern Daylight Time) 

    Add search (Autocomplete) in create pipelines (#1368)

### Feature or Bugfix
- Feature

### Detail
Autocomplete for environments and teams in the following frontend views
as requested in #1012.
This PR implements it for createPipelines

### Relates
- #1012 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 6c909a3 
Author: Noah Paige <[email protected]> 
Date: Wed Jun 26 2024 11:18:04 GMT-0400 (Eastern Daylight Time) 

    fix migration to not rely on OrganizationService or RequestContext (#1361)

### Feature or Bugfix
<!-- please choose -->
- Bugfix

### Detail
- Ensure migration script does not need RequestContext - otherwise fails
in migration trigger lambda as context info not set / available


### Relates
- #1306

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit 90835fb 
Author: Anushka Singh <[email protected]> 
Date: Wed Jun 26 2024 11:17:22 GMT-0400 (Eastern Daylight Time) 

    Issue1248: Persistent Email Reminders (#1354)

### Feature or Bugfix
- Feature


### Detail
- When a share request is initiated and remains pending for an extended
period, dataset producers will receive automated email reminders at
predefined intervals. These reminders will prompt producers to either
approve or extend the share request, thereby preventing delays in
accessing datasets.

Attaching screenshots for emails:

<img width="1336" alt="Screenshot 2024-06-20 at 5 34 31 PM"
src="https://github.com/data-dot-all/dataall/assets/26413731/d7be28c3-5c98-4146-92b1-295e136137a3">

<img width="1322" alt="Screenshot 2024-06-20 at 5 34 52 PM"
src="https://github.com/data-dot-all/dataall/assets/26413731/047556e8-59ee-4ebf-b8a7-c0a6684e2a63">


- Email will be sent every Monday at 9am UTC. Schedule can be changed in
cron expression in container.py

### Relates
- #1248

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Anushka Singh <[email protected]>
Co-authored-by: trajopadhye <[email protected]>
Co-authored-by: Mohit Arora <[email protected]>
Co-authored-by: rbernota <[email protected]>
Co-authored-by: Rick Bernotas <[email protected]>
Co-authored-by: Raj Chopde <[email protected]>
Co-authored-by: Noah Paige <[email protected]>
Co-authored-by: dlpzx <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: jaidisido <[email protected]>
Co-authored-by: dlpzx <[email protected]>
Co-authored-by: mourya-33 <[email protected]>
Co-authored-by: nikpodsh <[email protected]>
Co-authored-by: MK <[email protected]>
Co-authored-by: Manjula <[email protected]>
Co-authored-by: Zilvinas Saltys <[email protected]>
Co-authored-by: Zilvinas Saltys <[email protected]>
Co-authored-by: Daniel Lorch <[email protected]>
Co-authored-by: Tejas Rajopadhye <[email protected]>
Co-authored-by: Zilvinas Saltys <[email protected]>
Co-authored-by: Sofia Sazonova <[email protected]>
Co-authored-by: Sofia Sazonova <[email protected]>

commit e477bdf 
Author: Noah Paige <[email protected]> 
Date: Wed Jun 26 2024 10:39:09 GMT-0400 (Eastern Daylight Time) 

    Enforce non null on GQL query string if non null defined (#1362)

### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Add `String!` to ensure non null input argument on FE if defined as
such on backend GQL operation for `listS3DatasetsSharedWithEnvGroup`


### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit d6b59b3 
Author: Noah Paige <[email protected]> 
Date: Wed Jun 26 2024 08:48:52 GMT-0400 (Eastern Daylight Time) 

    Fix Init Share Base (#1360)

### Feature or Bugfix
<!-- please choose -->
- Bugfix

### Detail
- Need to register processors in init for s3 dataset shares API module


### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

commit bd3698c 
Author: Petros Kalos <[email protected]> 
Date: Wed Jun 26 2024 05:19:14 GMT-0400 (Eastern Daylight Time) 

    split cognito urls setup and cognito user creation (#1366)

### Feature or Bugfix
- Bugfix
### Details
For more details about the issue read #1353 
In this PR we are solving the problem by splitting the configuration of
Cognito in 2.
* First part (cognito_users_config.py) is setting up the required groups
and users and runs after UserPool deployment
* Second part (cognito_urls_config.py) is setting up Cognito's
callback/logout urls and runs after the CloudFront deployment

We chose to split the functionality because we need to have the
users/groups setup for the integration tests which are run after the
backend deployment.

The other althernative is to keep the config functionality as one but
make the integ tests run after CloudFront stage.

### Relates
- Solves #1353 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Sep 3, 2024
…ift guardrails (#1484)

### Feature or Bugfix
- Feature

### Detail
Complete design in #955.
This particular PR is focused on adding validation checks when a share
request is created
- Remove IAM role checks from generic sharing_Service to specific share
processors
- Add interface to execute checks on approve, submit and revoke API
calls
- Moved S3 checks to new S3Validator
- Implemented Redshift checks in RedshiftValidator
- Added tests for Redshift validator

### Relates
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Sep 4, 2024
…atasets (check_on_delete, list_shared_datasets...) (#1511)

### Feature or Bugfix
- Feature

### Detail
Complete design in #955.
This particular PR is focused on adding missing functionalities in
redshift_datasets that need to be implemented inside redshift_datasets.

For example, when we delete a redshift dataset we would want to first
check if there are any share requests shared for that dataset. To avoid
circular dependencies it is required to use an interface in the same way
it was implemented for S3.

In this PR:
- Add `RedshiftShareDatasetService(DatasetServiceInterface)` class and
implement required abstract methods (check_on_delete,
resolve_user_shared_datasets.....
- Use this class in redshift_Datasets module in resolvers, on dataset
deletion...
- Some of the code was very similar to the db queries implemented by S3
datasets; for this reason in this PR some of the queries are moved to
the generic ShareObjectRepository to be reused by both types of dataset

### Relates
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@dlpzx
Copy link
Contributor

dlpzx commented Sep 5, 2024

Closing this issue, remaining tasks will be tracked in the corresponding documentation pull requests and follow-up github issues

@dlpzx dlpzx closed this as completed Sep 5, 2024
dlpzx added a commit that referenced this issue Sep 10, 2024
…tasets (#1512)

### Feature or Bugfix
Documentation

### Detail
Added userguide documentation for #955 
- Redshift Connections
- Redshift Dataset import and table management
- Changes in S3 Datasets to clearly differentiate both types

### Relates
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Sep 10, 2024
### Feature or Bugfix
- Documentation

### Detail
Added userguide documentation for
#955
- list down all shareable items with small definition
- Add technical details for each type of shareable item (including
Redshift)
- Add data consumption section for Redshift

### Relates
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Sep 23, 2024
…nections for import Redshift Dataset (#1565)

### Feature or Bugfix
- Feature: enhancement

### Detail
This feature is an enhancement suggested by Redshift experts on #955,
which is well explained in #1562.
This PR:
- adds more info and tooltips that explain details about Redshift
Connections on the UI
- restricts the type of connection that can be used to import a dataset:
ONLY DATA_USER CONNECTIONS CAN BE USED TO IMPORT DATASETS. It implements
this logic both in the frontend and backend

FIRST VERSION:
<img width="1126" alt="image"
src="https://github.com/user-attachments/assets/14bb5a85-9868-4e8d-b7aa-1c84feb2a681">

UPDATED:

![image](https://github.com/user-attachments/assets/1b199dba-d6ee-471f-9cd7-d74e70b8dd4b)


### Relates
#1562 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Sep 27, 2024
### Feature or Bugfix
- Bugfix

### Detail
We were validating if the Redshift role for a Redshift share request
existed in the dataset account, while we should be validating if it
exists in the target account (share.environmentUri)

### Relates
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Oct 18, 2024
### Feature or Bugfix
- Bugfix

### Detail
The share verify task for Redshift shares was returning a `list index
out of range` error when verifying the health of a share given that the
datashare was desauthorized in the source.

Tested in AWS:

![Screenshot 2024-10-17 at 14 44
31](https://github.com/user-attachments/assets/fa008a2a-4b99-46eb-bb6d-635d518159a3)

### Relates
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
dlpzx added a commit that referenced this issue Oct 25, 2024
### Feature or Bugfix
- Bugfix

### Detail
Redshift sharing is implemented for the READ-only datashares. Write
datashares is in preview in multiple regions, but in those regions where
it is still not in preview, using the AllowWrites parameter in the API
call of authorize_data_share results in an error of the type `An error
occurred (InvalidParameterValue) when calling the AuthorizeDataShare
operation: DATA_SHARING_WRITES support is not yet available.`

This PR removes the usage of that parameter, which was in either case
already using the default value (allowWrites=False)

### Relates
- #955 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Roadmap
Status: Done
Development

No branches or pull requests

6 participants