Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Worksheet database names in UI for new simplified db names #1063

Merged
merged 7 commits into from
Feb 27, 2024

Conversation

dlpzx
Copy link
Contributor

@dlpzx dlpzx commented Feb 14, 2024

Feature or Bugfix

  • Bugfix

Detail

After implementing #1016 the names displayed for the databases in Worksheets won't contain the unique identifier.
In addition this PR solves #805 by removing duplicates also in FE.

Here is an screenshot of a local test:

Screenshot 2024-02-14 at 17 34 16

Update:
Because there will be a mix of old shares with Glue database names ending inshared_URI and shares with database names suffixed by shared only, this PR introduces a new field in the GraphQL type returned by the searchDataItems query. This field resolves the name of the shared Glue database. @noah-paige @TejasRGitHub in this commit

At first I tried implementing a separate resolver for the Worksheets, but I think we can fix step by step and first focus on the database name and then on the group of the Worksheet vs the group chosen inside the Worksheet. In any case I left the commit to have some reference.

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dlpzx dlpzx requested a review from noah-paige February 20, 2024 08:09
@noah-paige
Copy link
Contributor

The PR looks good just 2 things to call out:

  1. For old shares using shareUri in suffix we will now be showing the wrong db name --> if an easy way possible should we try to fix this, maybe by doing similar to what we do with ShareView and using the resolved consumption data to get shared DB name ${share.consumptionData.sharedGlueDatabase}?

  2. I think we should drill down Team to be the same as the team that created the Worksheet and not have it as an optional parameter - this can be a separate PR

@TejasRGitHub
Copy link
Contributor

Hi @dlpzx , @noah-paige ,

I agree with @noah-paige on both the points.

If possible it would be great to display the database name ( old or new ) which is actually present in the account where it is shared.

For the second point, I agree that we should not have a drop down to select the team and only show the databases and tables which are accessible to the team which created the worksheet and also because the querying is happening with the worksheet admin-group's role and not the role of the team selected in dropdown ( please correct me if I am wrong )

@noah-paige
Copy link
Contributor

Tested this PR locally:

  • No Duplicate DB Names
  • Shared DB Name Populated Correctly for Old Share DB Naming
  • Shared DB Name Populated Correctly for New Share DB Naming

Copy link
Contributor

@noah-paige noah-paige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

return None
old_shared_db_name = (source.GlueDatabaseName + '_shared_' + source.shareUri)[:254]
with context.engine.scoped_session() as session:
share = ShareObjectService.get_share_object_in_environment(uri=source.environmentUri, shareUri=source.shareUri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we get the share item from this new method instead of getting it from get_share_object() ? Did we just want to confirm if the user has the permissions to access that env ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

believe it is exactly for that reason because of decorator @has_resource_permission(GET_ENVIRONMENT) but will leave it open for @dlpzx to confirm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, I wanted to keep the permissions checking of the environment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @noah-paige , @dlpzx for the clarification

environmentUri: d.environmentUri
}));
// Remove duplicates based on GlueDatabaseName
sharedWithDatabases = sharedWithDatabases.filter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we potentially remove this duplication in the query itself ? dataall.modules.dataset_sharing.db.share_object_repositories.ShareObjectRepository.paginate_shared_datasets in here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to remove duplication of DB names on the backend because this query searchEnvironmentDataItems is also used at frontend/src/modules/Environments/components/EnvironmentSharedDatasets.js where we show each share item that is shared with the given environment and for what respective team in that environment (in this case we would want to show all shares duplicate dbs or not)

Another note (out of scope for this PR): We currently do not handle shared items of type S3Bucket for the table mentioned above ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay I get it. Thanks @noah-paige .

I didn't quite get your what you meant by the note and which table you are mentioning

Copy link
Contributor Author

@dlpzx dlpzx Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 reasons not to remove the duplicates, one is that the query is used in other places as Noah pointed out (in the Environment view > Datasets tab > Shared items).

The second reason was that it is difficult to predict which shares are using the old method and the new method, and that is retrieved in the resolver that is a field of the type. Once we deprecate the old type of share names I think we can redesign the remove-duplicates and the separation between plot the shared items in environment and list the shared databases in Worksheets.

I tried implementing a new resolver (check the commits) to separate the searchSharedDataItems used in environments to what we want to achieve in Worksheets, but I found it a bit complex to fulfill the purpose of this PR = show correct glue database names

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. Thanks @dlpzx

@dlpzx dlpzx merged commit 179fbbb into main Feb 27, 2024
8 checks passed
@dlpzx dlpzx deleted the fix/worksheet-database-names-after-simplification branch March 26, 2024 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants