Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic shares_base module and specific s3_datasets_shares module - part 2 (db objects to shares_base) #1294

Merged
merged 2 commits into from
May 23, 2024

Conversation

dlpzx
Copy link
Contributor

@dlpzx dlpzx commented May 22, 2024

Feature or Bugfix

  • Refactoring

Detail

As explained in the design for #1123 and #1283 we are trying to implement generic datasets_base and shares_base modules that can be used by any type of datasets and by any type of shareable object in a generic way.

In this PR:

  • Move shares state machines into its own file and move them to the shares_base module.
  • Create new ShareObjectRepository and copy some generic methods to it
  • Move db share objects to shares_base:
    • ShareObject has a field called datasetUri. For the scope of S3 and Redshift datasets we can leave it as dataseturi, but if we want to implement other kinds of sharing we should rethink it: we need to store the "approvers" of the share in some way. For the moment I am not going to go down that path until shares and s3 are uncoupled, then we can see how we would implement a complete generic sharing.
    • ShareObjectItem includes 3 fields related to the glue tables... we need to get rid of them in the backend code and look up the info with the itemType and itemUri. Left for part3 to keep the PR clean.

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dlpzx dlpzx force-pushed the feat/generic-dataset-sharing-2-simplified branch from ec2404e to d78d95f Compare May 22, 2024 08:09
@dlpzx dlpzx changed the title Move shares state machines to shares_base Generic shares_base module and specific s3_datasets_shares module - part 2 May 22, 2024
@dlpzx dlpzx changed the title Generic shares_base module and specific s3_datasets_shares module - part 2 Generic shares_base module and specific s3_datasets_shares module - part 2 (db objects to shares_base) May 22, 2024
@dlpzx dlpzx requested review from SofiaSazonova and petrkalos May 22, 2024 08:24
@dlpzx dlpzx marked this pull request as ready for review May 22, 2024 08:24
Copy link
Contributor

@petrkalos petrkalos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some stylistic nits, feel free to disregard

def get_transition_target(self, prev_state):
if self.validate_transition(prev_state):
for target_state, list_prev_states in self._transitions.items():
if prev_state in list_prev_states:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not in and drop the else branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copy/pasted the code. I was very tempted to do changes on the implementation of the state machines, but I think we should do a review of them in a separate PR

def __init__(self, name, transitions):
self._name = name
self._transitions = transitions
self._all_source_states = [*set([item for sublist in transitions.values() for item in sublist])]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

  • itertools.chain.from_iterable(transitions.values()) should flatten your values
  • why convert the set to list? I guess you want to dedup but why convert back to list?
  • by converting to set you assume that the values are hashable, I can see that are enum values so they must be but give it a thought

self._name = name
self._transitions = transitions
self._all_source_states = [*set([item for sublist in transitions.values() for item in sublist])]
self._all_target_states = [item for item in transitions.keys()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why is this needed? what's wrong with self._all_target_states = transitions.keys()?

@dlpzx dlpzx merged commit a0fc4b0 into main May 23, 2024
9 checks passed
@dlpzx dlpzx deleted the feat/generic-dataset-sharing-2-simplified branch June 6, 2024 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants