Create generic shares_base and s3_datasets_shares modules from current dataset_sharing #1283

dlpzx · 2024-05-17T14:00:02Z

Is your feature request related to a problem? Please describe.
It is difficult to add new sharing item types with the current implementation of the datasets_sharing module.

Describe the solution you'd like
If we want to add more types of sharing(e.g. Redshift dataset sharing, notebook sharing) a first step will be the creation of a generic class that defines a data.all Shares abstraction that can be used by each of the particular sharing implementations.

Describe alternatives you've considered
This are rough design considerations that might change during implementation:

⚠️ WIP
This item is a continuation of #1123 that implements the top part of the diagram, the separation between datasets_base and s3_datasets. Now we need to do the same for shares.

Backend

1. `shares_base` module

With the generic code to create share objects and share items.

api - all APIs on ShareObject and the approval/reject/revoke workflow. List generic ShareItems
aws - nothing, it should be technology agnostic
db - generic ShareObject and ShareItem models. Operations on these models
- in ShareObject: rename datasetUri as targetUri
- in ShareItem: remove Glue and S3 details from the shareItem
- repositories: all operations that do not involve details on S3/Glue
cdk - nothing, it does not involve any particular AWS IAM permissions or infra
handlers - ECS handler that handles shares tasks and executes sharing manager methods accordingly
services
- sharing_service - interfaces to implement ShareProcessors
- shares_enums - they are generic - ShareItemTypes will be used to know the items that can be shared
- shares_permissions - they are generic
- share_object_service
- share_item_service
tasks
- generic shareTask that loads all shareProcessors
- generic shareVerifier that loads all shareProcessors
- generic shareReapply that loads all shareProcessors

2. `s3_datasets_shares` module

With the specific code to share S3 dataset items.

api - TBD: if there is any API that is specific to s3 datasets it will remain here
aws: the same ones as before, lakeformation, s3, glue, kms...
db
- Specific repositories to get share items with details of Tables, Folders, Buckets
cdk
- pivot role and env role permissions
handlers: nothing
services: the specific services used to share S3 resources in the ECS task
- processors
- managers
- other utilities
tasks: no sharing tasks! the actual sharing ECS task is generic, the specific logic is imported through services

Frontend

Most views related to shares are very generic. If anything, share item tables and share details (consumption details) might be s3-specific. Since the shares view is being modified in other workstreams this is paused at the moment.

Config.json

tbd

The text was updated successfully, but these errors were encountered:

dlpzx · 2024-05-17T14:08:47Z

…art 1 (renaming, enums and permissions) (#1284) ### Feature or Bugfix - Feature - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. In this PR: - Rename `dataset_sharing` as `s3_dataset_shares` - Create `shares_base` and introduce dependency (`s3_dataset_shares` depends on `shares_base`) - Move generic enums to shares_base - Move generic permissions to shares_base ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…tend) (#1292) ### Feature or Bugfix - Feature ### Detail As explained in the design for #1123 we are trying to implement a generic `datasets_base` module that can be used by any type of datasets in a generic way. In this PR we: - Create DatasetsBase module in frontend. Depends on S3_Datasets module - Move DatasetsList view and DatasetListItem component to DatasetsBase - Add CreateDataset modal that allows multiple types of datasets creation - Fix routes and redirects to point at /datasets/ or any other /X-dataset/ - Move dataset_base services. In backend/datasets_base/api we define the following queries that are good candidates to become part of the DatasetsBase module - listDatasets - only used in DatasetsBase/DatasetList view - it should be in DatasetsBase/services - listOwnedDatasets - only used in Shares/SharesBoxList view - it should be in Shares/services - listDatasetsCreatedInEnvironment - only used in Environments/EnvDataset tab - it should be in Environment/services If we want to keep everything clean we could rename all "datasets" as "s3_datasets" or equivalent in the S3_Datasets module. Because it is a cosmetic change that would pollute the PR a lot I have decided not to include it. ⚠️ UPDATE: Next steps The Data Shared With You table in Environments>Datasets tab needs a remake. It contains references to each type of item and it is very coupled with s3-dataset-shares. For the moment I just made it work for the changes of s3-datasets, but when completing the work in #1283 we should fix this. Maybe in favor of DataGrid ### Relates - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…art 2 (db objects to shares_base) (#1294) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. In this PR: - Move shares state machines into its own file and move them to the `shares_base` module. - Create new ShareObjectRepository and copy some generic methods to it - Move db share objects to shares_base: - ShareObject has a field called datasetUri. For the scope of S3 and Redshift datasets we can leave it as dataseturi, but if we want to implement other kinds of sharing we should rethink it: we need to store the "approvers" of the share in some way. For the moment I am not going to go down that path until shares and s3 are uncoupled, then we can see how we would implement a complete generic sharing. - ShareObjectItem includes 3 fields related to the glue tables... we need to get rid of them in the backend code and look up the info with the itemType and itemUri. Left for part3 to keep the PR clean. ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@petrkalos

…art 3 (share processor and manager interfaces) (#1298) ### Feature or Bugfix - Feature - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. In this PR: - Move ECS handler for sharing tasks to shares_base - Move sharing tasks to shares_base - Move DataSharingService to shares_base and rename it as SharingService. Make SharingService generic and remove any reference to specific items. In this process: - attach and delete table and folder permissions are moved into the specific share processor. delete permissions are processed item by item and moved into the shareItemService - Some methods are copied from ShareObjectRepository in s3_datasets_shares to shares_base. They have not been removed from s3_datasets_shares, instead there is a TODO marking those methods that have been copied. The migration and clean-up of shareObjectRepository will be done in a following PR - Need to introduce `DatasetBaseRepository.get_dataset_by_uri(session, share.datasetUri)` to avoid future circular dependencies: shares_base depends only on datasets_base. - Clean-up and consolidate methods: remove updates of the share items outside of state machine transactions. Only re-used methods and dataset lock handling in share_manager --> TODO: dataset_lock manager should be its own service outside of sharing, but this is out of the scope of this PR. - Add updates of share_item statuses in except clauses for more robust sharing - Introduce ShareProcessor and ShareManager interfaces and use them in the SharingService: instead of the processor inheriting the manager class, the processor uses the ShareProcessor interface and constructs a manager when needed. - Introduce new load ImportMode `SHARES_TASK` and register ShareProcessors in s3_datasets_share (`backend/dataall/modules/s3_datasets_shares/__init__.py`) ![image](https://github.com/data-dot-all/dataall/assets/71252798/af4eafc3-c990-4532-ba25-62ef950791aa) See full detail of SharingService design in #1283 ### Next steps/Open questions For failures I think we should rollback whatever actions where performed. For example, if we are sharing a table and it failed in one step, it should revert all the steps executed before. @petrkalos @SofiaSazonova @noah-paige what do you think? ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…art 4 (remove s3 info from shareItem db models) (#1311) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. In this PR: - Remove the fields `GlueDatabaseName`, `GlueTableName` and `S3AccessPointName` from the ShareObjectItem db model. The goal is to have a generic ShareObjectItem and access the specific item information through its itemUri and itemType. - Migration script to drop those columns. - Disclaimer, there is still work to do on the gql types; but that is out of the scope for this PR ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…art 5 (move exceptions and notifications to shares_base) (#1312) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. In this PR: - Move share_exceptions to shares_base - Move share_notification_service to shares_base ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…art 6 (Split APIs and graphql types) (#1320) ### Feature or Bugfix - Feature - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. - First, this PR splits the query used in Worksheet and Environments to list glue databases and list datasets. They are pretty different queries, the one used in Worksheets is only relevant for S3 datasets, while the one in Environment is focused on the share items in general: - Introduce new API `listS3DatasetsSharedWithEnvGroup` to list shared glue databases in Worksheets view. It is part of the s3_datasets_shares module. This new API replaces the usage of `searchEnvironmentDataItems` in Worksheets frontend. - Remove Glue-parameters from `searchEnvironmentDataItems` API, this API belongs to shares_base. It is only used in the Environment view > Datasets tab, so I moved the API in frontend to modules/Environment. - remove unused parameters (`tables`, `locations`) from statistics in `api/types/ShareObjectStatistics`. Now the statistics are only generic. - Introduce new API `getS3ConsumptionData` in s3_datasets_shares. This new API call gets the details of gluedatabase/table, s3accesspoint that were previously part of ShareObject. This way the graphql ShareObject does not contain specific S3 info. - The rest of the APIs have been split in `shares_base` and `s3_datasets_shares`. In general, all the share lifecycle (create, add items, approve...) is part of shares_base. listDatasetShareObjects, verifyDatasetShareObjects used in the S3-Dataset UI are part of s3_datasets_shares - TODO: Review tests and create new tests for get_consumption_data. Currently tests for shares and datasets are placed in the same folder. I will open a separate PR to order the tests a bit before this ### Relates - #1283 - #1123 - #955 - #1277 ---> This PR needs to be merged and then I will introduce some changes in the ShareView. ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…art 7 (share_object_service) (#1340) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. The goal of this PR is to move the `share_object_service` to `shares_base` and refactor any dependency to S3 in the service. - Move file and fix imports of ShareObjectService - Use DatasetsBase and DatasetsBaseRepository instead of the S3 equivalents - ⚠️ Avoid Dashboard check logic in `ShareObjectService.submit_share_object` see below - ⚠️ Avoid SharePolicyService logic in `ShareObjectService.create_share_object` see below - Create ShareLogsService for logs - Remove unused methods - I also copied share_item_service to shares_base (it will be used in next PR) #### Avoid Dashboard check logic in `ShareObjectService.submit_share_object` Currently, whenever a share request is submitted, we check if the REQUESTER environment has dashboards enabled and if there are shared tables we verify that the Quicksight subscription is active. Alternative: perform this check in the share processor of tables. It solves the issue, but it gives a poorer user experience as it is difficult to figure out for the requester why the share failed. This can be solved holistically as requested in #1168. Decision: move the logic to the processor and make the table share fail. #### Avoid SharePolicyService logic in `ShareObjectService.create_share_object` When a share request is first created, we perform a series of operations to ensure that an IAM policy for the share requester principal IAM role is created. Alternative 1: move this logic inside the share processor. Not sure if it is possible. It would be the ideal solution, but the SharePolicyService throws errors in the create share object API if the policy is not attached. Alternative 2: implement interface to define share_policies (similar to the dataset-actions that use share logic in `backend/dataall/modules/s3_datasets_shares/services/dataset_sharing_service.py`). Decision: we want to preserve the user experience of having the IAM policy created before the share request is processed. Plus, it is not an uncommon pattern that could get extended by other dataset types, for example redshift sharing might need additional share policies. For this reason this PR implements alternative 2 ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…art 8 (sharei_item_service) (#1350) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. The goal of this PR is to split the `share_item_service`logic into the a generic service in `shares_base` and an specific service in s3_datasets_shares. - `ShareItemService` in shares_base only has shareItem logic without references to S3 or Glue. - `S3ShareItemService` in s3_datasets_shares has logic for share items that are tables and folders. The files' names are a bit messy but i don't want to pollute this PR with more changes. I'll do a review of the file names in part 10. ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…art 9 (share db repositories) (#1351) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. This PR includes: - Split the ShareobjectRepository from s3_datasets_shares into: - `ShareobjectRepository` (shares_base) - generic db operations on share objects - no references to S3, Glue - `ShareStatusRepository` (shares_base) - db operations related to the sharing state machine states - a way to split the db operations into smaller chunks - `S3ShareobjectRepository` (s3_datasets_share) - db operations on s3 share objects - used only in s3_datasets_shares. They might contain references to DatasetTables, S3Datasets... They are used in the clean-up activities and to count resources in environment. - Adapt `S3ShareobjectRepository` to S3 objects. For some queries it was needed to add filters on the type of share items retrieved, so that if in the future anyone adds a new share type the code still makes sense. To add some more meaning, some functions are renamed to clearly point out that they are s3 functions or what they do. - Make `ShareobjectRepository` completely generic. The following queries needed extra work: - ShareObjectRepository.get_share_item - renamed as `get_share_item_details` - `list_shareable_items` - split in 2 parts `list_shareable_items_of_type` + `paginated_list_shareable_items`: the first function is invoked recursively over the list of share processors, instead of querying the DatasetTable, DatasetStorageLocation and DatasetBucket we query the shareable_type. The challenge is to get all fields from the db Resource object that all of them are built upon. In particular the field `itemName` does not match the BucketName (in bucket) or the S3Prefix (in folders). For this reason I added a migration script to backfill the DatasetBucket.name as DatasetBucket.S3BucketName. and the DatasetStorageLocation.name with DatasetStorageLocation.S3Prefix. `paginated_list_shareable_items` joins the list of subqueries, filters and paginates. - In verify_dataset_share_objects instead of using list_shareable_items, I replaced it by `ShareObjectRepository.get_all_share_items_in_share` which does not need tables, storage, avoiding the whole S3 logic and avoiding unnecessary queries - Remove S3 references from shares_base.api.resolvers. Use DatasetBase and DatasetBaseRepository instead. Remove unused `ShareableObject`. - I had some problems with circular dependencies so I created the `ShareProcessorManager` in shares_base for the registration of processors. The SharingService uses the manager to get all processors. Missing items for Part10: - Lake Formation cross region table references in shares_base/services/share_item_service.py:add_shared_item - remove table references in shares_base/services/share_item_service.py:remove_shared_item - remove s3_prefix references shares_base/services/share_notification_service:notify_new_data_available_from_owners - RENAMING! Right now the names are a bit misleading ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…art 10 (other s3 references in shares_base) (#1357) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. This PR: - Remove the delete_resource_policy conditional for Tables in `backend/dataall/modules/shares_base/services/share_item_service.py` --> Permissions to the Table in data.all are granted once the share has succeeded, the conditional that checks for share_failed tables should not exist. - Remove unnecessary check in share_item_service: in add_share_item we check if it is a table whether it is a cross-region share. This check is completely unnecessary because when we create a share request object we are already checking if it is cross-region - Use `get_share_item_details` in add_share_item - we want to check if the table, folder, bucket exist so we need to query those tables. - Move s3_prefix notifications to subscription task - Fix error in query in `backend/dataall/modules/shares_base/db/share_state_machines_repositories.py` ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…art 11 (renaming and cleaning up s3_shares) (#1359) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. This is one of the last PRs focused on renaming files and cleaning-up the s3_datasets_shares module. The first step is a consolidation of the file and classes names in the services to clearly refer to s3_shares: - `services.managed_share_policy_service.SharePolicyService` ---> `services.s3_share_managed_policy_service.S3SharePolicyService` - `services.dataset_sharing_alarm_service.DatasetSharingAlarmService` --> `services.s3_share_alarm_service.S3ShareAlarmService` - `services.managed_share_policy_service.SharePolicyService` --> `services.s3_share_managed_policy_service.S3SharePolicyService` 👀 The main refactoring happens in what is used to be `services.dataset_sharing_service`. - The part that implements the `DatasetServiceInterface` has been moved to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService` - The part used in the resolvers and by other methods has been renamed as `services.s3_share_service.py` and the methods for the folder/table permissions are also added to the S3ShareService (from share_item_service) Lastly, there is one method previously in share_item_service that has been moved to the GlueClient directly as `get_glue_database_from_catalog`. ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

dlpzx · 2024-06-28T10:11:39Z

The core functionality has been completed. There will be follow-up enhancements related to shares tackled in separate github issues

commit 22a6f6ef Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:28:07 GMT-0400 (Eastern Daylight Time) Add integ tests commit 4fb7d653 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:26:36 GMT-0400 (Eastern Daylight Time) Merge env test changes commit 4cf42e8 Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:19:34 GMT-0400 (Eastern Daylight Time) improve docs commit 65f930a Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:10:56 GMT-0400 (Eastern Daylight Time) fix failures commit 170b7ce Author: Petros Kalos <[email protected]> Date: Wed Jul 03 2024 10:52:20 GMT-0400 (Eastern Daylight Time) add group/consumption_role invite/remove tests commit ba77d69 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 06:51:47 GMT-0400 (Eastern Daylight Time) Rename alias for env_vars kms key in cognito lambdas FE and BE (#1385) ### Feature or Bugfix - Bugfix ### Detail For the case in which we deploy FE and BE in us-east-1 the new lambda env_key alias is the same one for TriggerFunctionCognitoUrlsConfig in FE and for TriggerFunctionCognitoConfig in BE, which results in a failure of the CICD in the FE stack because the alias already exists. This PR changes the name of both aliases to avoid this conflict. It also adds envname to avoid issues with other deployment environments/tooling account in the future ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e5923a9 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 04:27:11 GMT-0400 (Eastern Daylight Time) Fix lambda_env_key out of scope for vpc-facing cognito setup (#1384) ### Feature or Bugfix - Bugfix ### Detail The KMS key for the Lambda environment variables in the Cognito IdP stack was defined inside an if-clause for internet facing frontend. Outside of that if, for vpc-facing architecture the kms key does not exist and the CICD pipeline fails. This PRs move the creation of the KMS key outside of the if. ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 3ccacfc Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 13:56:58 GMT-0400 (Eastern Daylight Time) Add delete docs not found when re indexing in catalog task (#1365) ### Feature or Bugfix  - Feature ### Detail - Add logic to Catalog Indexer Task to Delete Docs No Longer in RDS - TODO: Add Ability to Re-index Catalog Items via Dataall Admin UI ### Relates - #1078 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e2817a1 Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 05:14:07 GMT-0400 (Eastern Daylight Time) Fix/glossary status (#1373) ### Feature or Bugfix  - Bugfix ### Detail - Add back `status` to Glossary GQL Object for GQL Operations (getGlossary, listGlossaries) - Fix `listOrganizationGroupPermissions` enforce non null on FE ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit c3c58bd Author: Petros Kalos <[email protected]> Date: Fri Jun 28 2024 06:55:42 GMT-0400 (Eastern Daylight Time) add environment tests (#1371) ### Feature or Bugfix Feature ### Detail * add list_environment tests * add test for updating an environment (via update_stack) * generalise the polling functions for stacks ### Relates #1220 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e913d48 Author: dlpzx <[email protected]> Date: Fri Jun 28 2024 04:15:49 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in miscellaneous dropdowns (#1367) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. In this case the views required custom dropdowns. ❗ I used `noOptionsText` whenever it was necessary instead of checking groupOptions lenght >0 - [x] DatasetEditForm.js -> ❗ I kept the stewards field as `freesolo` - what that means is that users CAN specify options that are not on the list. I would like the reviewer to confirm this is what we want. At the end stewardship is a delegation of permissions, it makes sense that delegation happens to other teams. Also changed DatasetCreateForm - [X] RequestDashboardAccessModal.js - already implemented, minor changes - [X] EnvironmentTeamInviteForm.js - already implemented, minor changes. -> Kept `freesolo` because invited teams might not be the user teams. Same reason why there is no check for groupOptions == 0, if there are no options there is still the free text option. - [X] EnvironmentRoleAddForm.js - [X] NetworkCreateModal.js ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit ee71d7b Author: Tejas Rajopadhye <[email protected]> Date: Thu Jun 27 2024 14:08:27 GMT-0400 (Eastern Daylight Time) [Gh 1301] Enhancement Feature - Bulk share reapply on dataset (#1363) ### Feature or Bugfix - Feature ### Detail - Adds feature to reapply shares in bulk for a dataset. - Also contains bugfix for AWS worker lambda errors ### Relates - #1301 - #1364 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? N/A - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? N/A - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? N/A - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? N/A - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: trajopadhye <[email protected]> commit 27f1ad7 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 13:18:32 GMT-0400 (Eastern Daylight Time) Convert Dataset Lock Mechanism to Generic Resource Lock (#1338) ### Feature or Bugfix  - Feature - Bugfix - Refactoring ### Detail - Convert Dataset Lock Mechanism to Generic Resource Lock - Extend locking to Share principals (i.e. EnvironmentGroup and Consumption Roles) - Making locking a generic component not tied to datasets ### Relates - #1093 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: dlpzx <[email protected]> commit e3b8658 Author: Petros Kalos <[email protected]> Date: Thu Jun 27 2024 12:50:59 GMT-0400 (Eastern Daylight Time) ignore ruff change in blame (#1372) ### Feature or Bugfix  - Feature - Bugfix - Refactoring ### Detail - <feature1 or bug1> - <feature2 or bug2> ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 2e80de4 Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 10:59:18 GMT-0400 (Eastern Daylight Time) Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) (#1359) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. This is one of the last PRs focused on renaming files and cleaning-up the s3_datasets_shares module. The first step is a consolidation of the file and classes names in the services to clearly refer to s3_shares: - `services.managed_share_policy_service.SharePolicyService` ---> `services.s3_share_managed_policy_service.S3SharePolicyService` - `services.dataset_sharing_alarm_service.DatasetSharingAlarmService` --> `services.s3_share_alarm_service.S3ShareAlarmService` - `services.managed_share_policy_service.SharePolicyService` --> `services.s3_share_managed_policy_service.S3SharePolicyService` 👀 The main refactoring happens in what is used to be `services.dataset_sharing_service`. - The part that implements the `DatasetServiceInterface` has been moved to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService` - The part used in the resolvers and by other methods has been renamed as `services.s3_share_service.py` and the methods for the folder/table permissions are also added to the S3ShareService (from share_item_service) Lastly, there is one method previously in share_item_service that has been moved to the GlueClient directly as `get_glue_database_from_catalog`. ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 1c09015 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 04:16:14 GMT-0400 (Eastern Daylight Time) fix listOrganizationGroupPermissions (#1369) ### Feature or Bugfix  - Bugfix ### Detail - Fix listOrganizationGroupPermissions ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 976ec6b Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 04:13:14 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in create pipelines (#1368) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. This PR implements it for createPipelines ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 6c909a3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 11:18:04 GMT-0400 (Eastern Daylight Time) fix migration to not rely on OrganizationService or RequestContext (#1361) ### Feature or Bugfix  - Bugfix ### Detail - Ensure migration script does not need RequestContext - otherwise fails in migration trigger lambda as context info not set / available ### Relates - #1306 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 90835fb Author: Anushka Singh <[email protected]> Date: Wed Jun 26 2024 11:17:22 GMT-0400 (Eastern Daylight Time) Issue1248: Persistent Email Reminders (#1354) ### Feature or Bugfix - Feature ### Detail - When a share request is initiated and remains pending for an extended period, dataset producers will receive automated email reminders at predefined intervals. These reminders will prompt producers to either approve or extend the share request, thereby preventing delays in accessing datasets. Attaching screenshots for emails: <img width="1336" alt="Screenshot 2024-06-20 at 5 34 31 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/d7be28c3-5c98-4146-92b1-295e136137a3"> <img width="1322" alt="Screenshot 2024-06-20 at 5 34 52 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/047556e8-59ee-4ebf-b8a7-c0a6684e2a63"> - Email will be sent every Monday at 9am UTC. Schedule can be changed in cron expression in container.py ### Relates - #1248 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Anushka Singh <[email protected]> Co-authored-by: trajopadhye <[email protected]> Co-authored-by: Mohit Arora <[email protected]> Co-authored-by: rbernota <[email protected]> Co-authored-by: Rick Bernotas <[email protected]> Co-authored-by: Raj Chopde <[email protected]> Co-authored-by: Noah Paige <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jaidisido <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: mourya-33 <[email protected]> Co-authored-by: nikpodsh <[email protected]> Co-authored-by: MK <[email protected]> Co-authored-by: Manjula <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Daniel Lorch <[email protected]> Co-authored-by: Tejas Rajopadhye <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> commit e477bdf Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 10:39:09 GMT-0400 (Eastern Daylight Time) Enforce non null on GQL query string if non null defined (#1362) ### Feature or Bugfix  - Bugfix ### Detail - Add `String!` to ensure non null input argument on FE if defined as such on backend GQL operation for `listS3DatasetsSharedWithEnvGroup` ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit d6b59b3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 08:48:52 GMT-0400 (Eastern Daylight Time) Fix Init Share Base (#1360) ### Feature or Bugfix  - Bugfix ### Detail - Need to register processors in init for s3 dataset shares API module ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit bd3698c Author: Petros Kalos <[email protected]> Date: Wed Jun 26 2024 05:19:14 GMT-0400 (Eastern Daylight Time) split cognito urls setup and cognito user creation (#1366) ### Feature or Bugfix - Bugfix ### Details For more details about the issue read #1353 In this PR we are solving the problem by splitting the configuration of Cognito in 2. * First part (cognito_users_config.py) is setting up the required groups and users and runs after UserPool deployment * Second part (cognito_urls_config.py) is setting up Cognito's callback/logout urls and runs after the CloudFront deployment We chose to split the functionality because we need to have the users/groups setup for the integration tests which are run after the backend deployment. The other althernative is to keep the config functionality as one but make the integ tests run after CloudFront stage. ### Relates - Solves #1353 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

commit 4425e756 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:57:31 GMT-0400 (Eastern Daylight Time) Fix commit 4cd2bf77 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:56:38 GMT-0400 (Eastern Daylight Time) Fix commit 22a6f6ef Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:28:07 GMT-0400 (Eastern Daylight Time) Add integ tests commit 4fb7d653 Author: Noah Paige <[email protected]> Date: Mon Jul 08 2024 11:26:36 GMT-0400 (Eastern Daylight Time) Merge env test changes commit 4cf42e8 Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:19:34 GMT-0400 (Eastern Daylight Time) improve docs commit 65f930a Author: Petros Kalos <[email protected]> Date: Fri Jul 05 2024 08:10:56 GMT-0400 (Eastern Daylight Time) fix failures commit 170b7ce Author: Petros Kalos <[email protected]> Date: Wed Jul 03 2024 10:52:20 GMT-0400 (Eastern Daylight Time) add group/consumption_role invite/remove tests commit ba77d69 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 06:51:47 GMT-0400 (Eastern Daylight Time) Rename alias for env_vars kms key in cognito lambdas FE and BE (#1385) ### Feature or Bugfix - Bugfix ### Detail For the case in which we deploy FE and BE in us-east-1 the new lambda env_key alias is the same one for TriggerFunctionCognitoUrlsConfig in FE and for TriggerFunctionCognitoConfig in BE, which results in a failure of the CICD in the FE stack because the alias already exists. This PR changes the name of both aliases to avoid this conflict. It also adds envname to avoid issues with other deployment environments/tooling account in the future ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e5923a9 Author: dlpzx <[email protected]> Date: Wed Jul 03 2024 04:27:11 GMT-0400 (Eastern Daylight Time) Fix lambda_env_key out of scope for vpc-facing cognito setup (#1384) ### Feature or Bugfix - Bugfix ### Detail The KMS key for the Lambda environment variables in the Cognito IdP stack was defined inside an if-clause for internet facing frontend. Outside of that if, for vpc-facing architecture the kms key does not exist and the CICD pipeline fails. This PRs move the creation of the KMS key outside of the if. ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 3ccacfc Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 13:56:58 GMT-0400 (Eastern Daylight Time) Add delete docs not found when re indexing in catalog task (#1365) ### Feature or Bugfix  - Feature ### Detail - Add logic to Catalog Indexer Task to Delete Docs No Longer in RDS - TODO: Add Ability to Re-index Catalog Items via Dataall Admin UI ### Relates - #1078 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e2817a1 Author: Noah Paige <[email protected]> Date: Mon Jul 01 2024 05:14:07 GMT-0400 (Eastern Daylight Time) Fix/glossary status (#1373) ### Feature or Bugfix  - Bugfix ### Detail - Add back `status` to Glossary GQL Object for GQL Operations (getGlossary, listGlossaries) - Fix `listOrganizationGroupPermissions` enforce non null on FE ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit c3c58bd Author: Petros Kalos <[email protected]> Date: Fri Jun 28 2024 06:55:42 GMT-0400 (Eastern Daylight Time) add environment tests (#1371) ### Feature or Bugfix Feature ### Detail * add list_environment tests * add test for updating an environment (via update_stack) * generalise the polling functions for stacks ### Relates #1220 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit e913d48 Author: dlpzx <[email protected]> Date: Fri Jun 28 2024 04:15:49 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in miscellaneous dropdowns (#1367) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. In this case the views required custom dropdowns. ❗ I used `noOptionsText` whenever it was necessary instead of checking groupOptions lenght >0 - [x] DatasetEditForm.js -> ❗ I kept the stewards field as `freesolo` - what that means is that users CAN specify options that are not on the list. I would like the reviewer to confirm this is what we want. At the end stewardship is a delegation of permissions, it makes sense that delegation happens to other teams. Also changed DatasetCreateForm - [X] RequestDashboardAccessModal.js - already implemented, minor changes - [X] EnvironmentTeamInviteForm.js - already implemented, minor changes. -> Kept `freesolo` because invited teams might not be the user teams. Same reason why there is no check for groupOptions == 0, if there are no options there is still the free text option. - [X] EnvironmentRoleAddForm.js - [X] NetworkCreateModal.js ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit ee71d7b Author: Tejas Rajopadhye <[email protected]> Date: Thu Jun 27 2024 14:08:27 GMT-0400 (Eastern Daylight Time) [Gh 1301] Enhancement Feature - Bulk share reapply on dataset (#1363) ### Feature or Bugfix - Feature ### Detail - Adds feature to reapply shares in bulk for a dataset. - Also contains bugfix for AWS worker lambda errors ### Relates - #1301 - #1364 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? N/A - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? N/A - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? N/A - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? N/A - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: trajopadhye <[email protected]> commit 27f1ad7 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 13:18:32 GMT-0400 (Eastern Daylight Time) Convert Dataset Lock Mechanism to Generic Resource Lock (#1338) ### Feature or Bugfix  - Feature - Bugfix - Refactoring ### Detail - Convert Dataset Lock Mechanism to Generic Resource Lock - Extend locking to Share principals (i.e. EnvironmentGroup and Consumption Roles) - Making locking a generic component not tied to datasets ### Relates - #1093 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: dlpzx <[email protected]> commit e3b8658 Author: Petros Kalos <[email protected]> Date: Thu Jun 27 2024 12:50:59 GMT-0400 (Eastern Daylight Time) ignore ruff change in blame (#1372) ### Feature or Bugfix  - Feature - Bugfix - Refactoring ### Detail - <feature1 or bug1> - <feature2 or bug2> ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 2e80de4 Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 10:59:18 GMT-0400 (Eastern Daylight Time) Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) (#1359) ### Feature or Bugfix - Refactoring ### Detail As explained in the design for #1123 and #1283 we are trying to implement generic `datasets_base` and `shares_base` modules that can be used by any type of datasets and by any type of shareable object in a generic way. This is one of the last PRs focused on renaming files and cleaning-up the s3_datasets_shares module. The first step is a consolidation of the file and classes names in the services to clearly refer to s3_shares: - `services.managed_share_policy_service.SharePolicyService` ---> `services.s3_share_managed_policy_service.S3SharePolicyService` - `services.dataset_sharing_alarm_service.DatasetSharingAlarmService` --> `services.s3_share_alarm_service.S3ShareAlarmService` - `services.managed_share_policy_service.SharePolicyService` --> `services.s3_share_managed_policy_service.S3SharePolicyService` 👀 The main refactoring happens in what is used to be `services.dataset_sharing_service`. - The part that implements the `DatasetServiceInterface` has been moved to `services/s3_share_dataset_service.py` as the `S3ShareDatasetService` - The part used in the resolvers and by other methods has been renamed as `services.s3_share_service.py` and the methods for the folder/table permissions are also added to the S3ShareService (from share_item_service) Lastly, there is one method previously in share_item_service that has been moved to the GlueClient directly as `get_glue_database_from_catalog`. ### Relates - #1283 - #1123 - #955 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 1c09015 Author: Noah Paige <[email protected]> Date: Thu Jun 27 2024 04:16:14 GMT-0400 (Eastern Daylight Time) fix listOrganizationGroupPermissions (#1369) ### Feature or Bugfix  - Bugfix ### Detail - Fix listOrganizationGroupPermissions ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 976ec6b Author: dlpzx <[email protected]> Date: Thu Jun 27 2024 04:13:14 GMT-0400 (Eastern Daylight Time) Add search (Autocomplete) in create pipelines (#1368) ### Feature or Bugfix - Feature ### Detail Autocomplete for environments and teams in the following frontend views as requested in #1012. This PR implements it for createPipelines ### Relates - #1012 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 6c909a3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 11:18:04 GMT-0400 (Eastern Daylight Time) fix migration to not rely on OrganizationService or RequestContext (#1361) ### Feature or Bugfix  - Bugfix ### Detail - Ensure migration script does not need RequestContext - otherwise fails in migration trigger lambda as context info not set / available ### Relates - #1306 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit 90835fb Author: Anushka Singh <[email protected]> Date: Wed Jun 26 2024 11:17:22 GMT-0400 (Eastern Daylight Time) Issue1248: Persistent Email Reminders (#1354) ### Feature or Bugfix - Feature ### Detail - When a share request is initiated and remains pending for an extended period, dataset producers will receive automated email reminders at predefined intervals. These reminders will prompt producers to either approve or extend the share request, thereby preventing delays in accessing datasets. Attaching screenshots for emails: <img width="1336" alt="Screenshot 2024-06-20 at 5 34 31 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/d7be28c3-5c98-4146-92b1-295e136137a3"> <img width="1322" alt="Screenshot 2024-06-20 at 5 34 52 PM" src="https://github.com/data-dot-all/dataall/assets/26413731/047556e8-59ee-4ebf-b8a7-c0a6684e2a63"> - Email will be sent every Monday at 9am UTC. Schedule can be changed in cron expression in container.py ### Relates - #1248 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Anushka Singh <[email protected]> Co-authored-by: trajopadhye <[email protected]> Co-authored-by: Mohit Arora <[email protected]> Co-authored-by: rbernota <[email protected]> Co-authored-by: Rick Bernotas <[email protected]> Co-authored-by: Raj Chopde <[email protected]> Co-authored-by: Noah Paige <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jaidisido <[email protected]> Co-authored-by: dlpzx <[email protected]> Co-authored-by: mourya-33 <[email protected]> Co-authored-by: nikpodsh <[email protected]> Co-authored-by: MK <[email protected]> Co-authored-by: Manjula <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Daniel Lorch <[email protected]> Co-authored-by: Tejas Rajopadhye <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> Co-authored-by: Sofia Sazonova <[email protected]> commit e477bdf Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 10:39:09 GMT-0400 (Eastern Daylight Time) Enforce non null on GQL query string if non null defined (#1362) ### Feature or Bugfix  - Bugfix ### Detail - Add `String!` to ensure non null input argument on FE if defined as such on backend GQL operation for `listS3DatasetsSharedWithEnvGroup` ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit d6b59b3 Author: Noah Paige <[email protected]> Date: Wed Jun 26 2024 08:48:52 GMT-0400 (Eastern Daylight Time) Fix Init Share Base (#1360) ### Feature or Bugfix  - Bugfix ### Detail - Need to register processors in init for s3 dataset shares API module ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. commit bd3698c Author: Petros Kalos <[email protected]> Date: Wed Jun 26 2024 05:19:14 GMT-0400 (Eastern Daylight Time) split cognito urls setup and cognito user creation (#1366) ### Feature or Bugfix - Bugfix ### Details For more details about the issue read #1353 In this PR we are solving the problem by splitting the configuration of Cognito in 2. * First part (cognito_users_config.py) is setting up the required groups and users and runs after UserPool deployment * Second part (cognito_urls_config.py) is setting up Cognito's callback/logout urls and runs after the CloudFront deployment We chose to split the functionality because we need to have the users/groups setup for the integration tests which are run after the backend deployment. The other althernative is to keep the config functionality as one but make the integ tests run after CloudFront stage. ### Relates - Solves #1353 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

dlpzx added status: in-progress This issue has been picked and is being implemented type: newfeature New feature request priority: high effort: large labels May 17, 2024

This was referenced May 17, 2024

Create generic dataset_base and s3_dataset modules from current datasets #1123

Closed

Generic shares_base module and specific s3_datasets_shares module - part 1 (renaming, enums and permissions) #1284

Merged

dlpzx mentioned this issue May 22, 2024

Generic dataset module and specific s3_datasets module - part 6 (Frontend) #1292

Merged

This was referenced May 22, 2024

Generic shares_base module and specific s3_datasets_shares module - part 2 (db objects to shares_base) #1294

Merged

ShareView remake #1277

Merged

This was referenced May 23, 2024

Generic shares_base module and specific s3_datasets_shares module - part 3 (share processor and manager interfaces) #1298

Merged

More robust handling of exceptions inside share manager ECS task #1266

Open

dlpzx self-assigned this May 31, 2024

dlpzx mentioned this issue Jun 6, 2024

Redshift Data Sharing #955

Closed

dlpzx mentioned this issue Jun 7, 2024

Generic shares_base module and specific s3_datasets_shares module - part 6 (Split APIs and graphql types) #1320

Merged

dlpzx mentioned this issue Jun 18, 2024

Generic shares_base module and specific s3_datasets_shares module - part 7 (share_object_service) #1340

Merged

dlpzx mentioned this issue Jun 19, 2024

Generic shares_base module and specific s3_datasets_shares module - part 8 (share_item_service) #1350

Merged

dlpzx mentioned this issue Jun 20, 2024

Generic shares_base module and specific s3_datasets_shares module - part 9 (share db repositories) #1351

Merged

dlpzx mentioned this issue Jun 21, 2024

Generic shares_base module and specific s3_datasets_shares module - part 10 (other s3 references in shares_base) #1357

Merged

dlpzx mentioned this issue Jun 25, 2024

Generic shares_base module and specific s3_datasets_shares module - part 11 (renaming and cleaning up s3_shares) #1359

Merged

dlpzx closed this as completed Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create generic shares_base and s3_datasets_shares modules from current dataset_sharing #1283

Create generic shares_base and s3_datasets_shares modules from current dataset_sharing #1283

dlpzx commented May 17, 2024 •

edited

Loading

dlpzx commented May 17, 2024 •

edited

Loading

dlpzx commented Jun 28, 2024

Create generic shares_base and s3_datasets_shares modules from current dataset_sharing #1283

Create generic shares_base and s3_datasets_shares modules from current dataset_sharing #1283

Comments

dlpzx commented May 17, 2024 • edited Loading

Backend

1. shares_base module

2. s3_datasets_shares module

Frontend

Config.json

dlpzx commented May 17, 2024 • edited Loading

Step-by-step implementation plan

dlpzx commented Jun 28, 2024

dlpzx commented May 17, 2024 •

edited

Loading

1. `shares_base` module

2. `s3_datasets_shares` module

dlpzx commented May 17, 2024 •

edited

Loading