Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic shares_base module and specific s3_datasets_shares module - part 3 (share processor and manager interfaces) #1298

Merged
merged 15 commits into from
Jun 4, 2024

Conversation

dlpzx
Copy link
Contributor

@dlpzx dlpzx commented May 23, 2024

Feature or Bugfix

  • Feature
  • Refactoring

Detail

As explained in the design for #1123 and #1283 we are trying to implement generic datasets_base and shares_base modules that can be used by any type of datasets and by any type of shareable object in a generic way.

In this PR:

  • Move ECS handler for sharing tasks to shares_base
  • Move sharing tasks to shares_base
  • Move DataSharingService to shares_base and rename it as SharingService. Make SharingService generic and remove any reference to specific items. In this process:
    • attach and delete table and folder permissions are moved into the specific share processor. delete permissions are processed item by item and moved into the shareItemService
    • Some methods are copied from ShareObjectRepository in s3_datasets_shares to shares_base. They have not been removed from s3_datasets_shares, instead there is a TODO marking those methods that have been copied. The migration and clean-up of shareObjectRepository will be done in a following PR
    • Need to introduce DatasetBaseRepository.get_dataset_by_uri(session, share.datasetUri) to avoid future circular dependencies: shares_base depends only on datasets_base.
    • Clean-up and consolidate methods: remove updates of the share items outside of state machine transactions. Only re-used methods and dataset lock handling in share_manager --> TODO: dataset_lock manager should be its own service outside of sharing, but this is out of the scope of this PR.
    • Add updates of share_item statuses in except clauses for more robust sharing
  • Introduce ShareProcessor and ShareManager interfaces and use them in the SharingService: instead of the processor inheriting the manager class, the processor uses the ShareProcessor interface and constructs a manager when needed.
  • Introduce new load ImportMode SHARES_TASK and register ShareProcessors in s3_datasets_share (backend/dataall/modules/s3_datasets_shares/__init__.py)

image

See full detail of SharingService design in #1283

Next steps/Open questions

For failures I think we should rollback whatever actions where performed. For example, if we are sharing a table and it failed in one step, it should revert all the steps executed before. @petrkalos @SofiaSazonova @noah-paige what do you think?

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dlpzx dlpzx force-pushed the feat/generic-dataset-sharing-3 branch from 2c78c13 to d92479d Compare May 28, 2024 08:56
@dlpzx dlpzx force-pushed the feat/generic-dataset-sharing-3 branch from d92479d to 5b7c1d0 Compare May 28, 2024 11:55
# Conflicts:
#	backend/dataall/modules/s3_datasets_shares/services/share_object_service.py
@dlpzx
Copy link
Contributor Author

dlpzx commented May 28, 2024

Testing locally:

Approve shares

First we will test that sharing works. After approve the share task is correctly handled. And items are processed as usual. We will test success cases. For folders and tables, we also need to verify UI permissions (GET_TABLE, GET_FOLDER) are attached

  • S3 bucket share
  • S3 access points share (1 folder) - everything completed, DA permissions attached
  • S3 access points share (multiple folders)
  • Glue table share (1 table)
  • Glue table share (multiple tables)
  • S3 bucket share + S3 access points share + Glue table all together

Then unsuccessful cases:

  • Approve S3 bucket share + S3 access points share + Glue table all together - we delete the principal IAM role very fast after approving the share and then when we share the items end up in Share_Failed

Mix of success/failures:

  • For a malformed KMS policy folders and bucket sharing failed and tables succeeded. Items do not end up in limbo states and do not make other types of sharing fail.

Then cases with the dataset_lock:

  • lock the dataset in the database and process a new share request. Let it wait until it times out and verify that all items end in Share_Failed

Revoke shares

After revoke the share task is correctly handled. And items are processed as usual. We will test success cases. For folders and tables, we also need to verify UI permissions (GET_TABLE, GET_FOLDER) are removed

  • S3 bucket share
  • S3 access points share (1 folder) - permissions deleted
  • S3 access points share (multiple folders)
  • Glue table share (1 table)
  • Glue table share (multiple tables)
  • S3 bucket share + S3 access points share + Glue table all together

Then unsuccessful cases:

  • Revoke S3 bucket share + S3 access points share + Glue table all together - we delete the principal IAM role very fast after revoking the share and then when we share the items end up in Revoke_Failed --->There is an error but is unrelated to this PR: it is impossible to delete the role fast enough, so before deletion it already goes into the processors/manager that are not ensuring that the role is needed for revoke for access points share

Then cases with the dataset_lock:

  • lock the dataset in the database and process a new share request. Let it wait until it timesout and verify that all items end in Revoke_Failed

Verify shares

  • Verify after sharing - all good, datetime updated
  • Delete principal IAM role and verify - all items updated as unhealthy
  • Introduce error in table share and verify all items - table as unhealthy, other items healthy
  • On top of the above Introduce error in bucket share and verify - table and bucket as unhealthy
  • Introduce error in folder share and verify folder only - folder as unhealthy

Re-apply shares

  • from the previous hit reapply - all 3 types of items are re-shared

Testing in AWS

Needed as we are changing the ECS tasks and the handlers. We just need tot est that the handler triggers the ECS tasks and that the processors are loaded:

  • CICD completes and ECS tasks still exist
  • Request share (verify, approve or revoke) ----> all good! Checked the logs, all the different shareItems are verified

Found issues:

  • the clean-up of malformed KMS policies has some issues. I deleted an IAM role and re-created it the policy then cannot go out of the malformed state
  • principal IAM role checks inside the revoke managers/processors (explained above)

@@ -537,73 +538,6 @@ def _validate_group_membership(session, share_object_group, environment_uri):
message=f'Team: {share_object_group} is not a member of the environment {environment_uri}',
)

@staticmethod
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to the share_item_service because now it applies to each item, not to the whole share

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried having github rename and edit the file backend/dataall/modules/s3_datasets_shares/services/share_processors/lakeformation_process_share.py but it is ignoring it. The same for the other processors. The inside logic has barely being touched, just the construction

source_env_group,
env_group,
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_share_data_items_by_type is new, it replaces get_share_data_items called in the data_sharing_service

(shared_tables, shared_folders, shared_buckets) = ShareObjectRepository.get_share_data_items(
                    session, share_uri, ShareItemStatus.Share_Approved.value
                )

def list_all_active_share_objects(session) -> [ShareObject]:
return session.query(ShareObject).filter(ShareObject.deleted.is_(None)).all()

@staticmethod
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has suffered some modifications. We can now send a message

@@ -241,13 +238,6 @@ def __init__(self, state):
]
},
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transition is unnecessary if we make a failure more flexible

@@ -4,7 +4,6 @@
class ShareableType(GraphQLEnumMapper):
Table = 'DatasetTable'
StorageLocation = 'DatasetStorageLocation'
View = 'View'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused. I know it could have waited for another PR, but it was very very small

@@ -1498,21 +1498,6 @@ def test_verify_items_share_request(db, client, user2, group2, share3_processed,
assert status == ShareItemHealthStatus.PendingVerify.value


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because this function does not exist anymore, but we will introduce more tests once the refactoring is done

@dlpzx dlpzx requested a review from petrkalos May 29, 2024 16:11
@dlpzx dlpzx marked this pull request as ready for review May 29, 2024 16:11
@dlpzx dlpzx force-pushed the feat/generic-dataset-sharing-3 branch from d2ffa77 to a7940f1 Compare June 4, 2024 07:35
@dlpzx dlpzx merged commit e20e750 into main Jun 4, 2024
10 checks passed
@dlpzx dlpzx deleted the feat/generic-dataset-sharing-3 branch June 6, 2024 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants