Skip to content

Commit

Permalink
Fix created dataset naming convention (#1002)
Browse files Browse the repository at this point in the history
### Feature or Bugfix
<!-- please choose -->
- Bugfix


### Detail
- Before we were creating naming convention for s3 buckets, kms keys,
etc. for newly create datasets without using the `targetUri` because we
were referencing the Dataset object before it was added to the RDS Table
and thus before the datasetUri is created


### Relates
- N/A

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
  • Loading branch information
noah-paige authored Jan 26, 2024
1 parent ffb5949 commit 3e33479
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ def create_dataset(uri, admin_group, data: dict):
session=session,
env=environment,
dataset=dataset,
data=data
)

DatasetBucketRepository.create_dataset_bucket(session, dataset, data)
Expand Down
14 changes: 8 additions & 6 deletions backend/dataall/modules/datasets_base/db/dataset_repositories.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def build_dataset(cls, username: str, env: Environment, data: dict) -> Dataset:
AwsAccountId=env.AwsAccountId,
SamlAdminGroupName=data['SamlAdminGroupName'],
region=env.region,
S3BucketName='undefined',
S3BucketName=data.get('bucketName', 'undefined'),
GlueDatabaseName='undefined',
IAMDatasetAdminRoleArn='undefined',
IAMDatasetAdminUserArn='undefined',
Expand All @@ -52,7 +52,7 @@ def build_dataset(cls, username: str, env: Environment, data: dict) -> Dataset:
else data['SamlAdminGroupName'],
autoApprovalEnabled=data.get('autoApprovalEnabled', False),
)
cls._set_dataset_aws_resources(dataset, data, env)

cls._set_import_data(dataset, data)
return dataset

Expand All @@ -75,15 +75,16 @@ def count_resources(session, environment, group_uri) -> int:
.count()
)

@staticmethod
def create_dataset(session, env: Environment, dataset: Dataset):
@classmethod
def create_dataset(cls, session, env: Environment, dataset: Dataset, data: dict):
organization = OrganizationRepository.get_organization_by_uri(
session, env.organizationUri
)

session.add(dataset)
session.commit()

cls._set_dataset_aws_resources(dataset, data, env)

activity = Activity(
action='dataset:create',
label='dataset:create',
Expand Down Expand Up @@ -114,7 +115,8 @@ def _set_dataset_aws_resources(dataset: Dataset, data, environment):
).build_compliant_name()
dataset.GlueDatabaseName = data.get('glueDatabaseName') or glue_db_name

dataset.KmsAlias = bucket_name
if not dataset.imported:
dataset.KmsAlias = bucket_name

iam_role_name = NamingConventionService(
target_uri=dataset.datasetUri,
Expand Down

0 comments on commit 3e33479

Please sign in to comment.