Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/1569 dataset properties def #1597

Merged
merged 50 commits into from
Dec 11, 2020
Merged

Conversation

dk1844
Copy link
Contributor

@dk1844 dk1844 commented Nov 9, 2020

Pilot implementation of #1569

  • UI implementation is not part of this PR; properties & property definitions (PDs) are now Rest API-managed only.

  • Rest API exists for PD management and for Dataset properties management.

  • properties whose PDs have putIntoInfoFile=true are added into _INFO file's metadata with the key being std_ds_ prefix in the Standardization phase, Conformance phase is not affected.

  • Pilot limitations: only String and EnumString type properties are supported now.

Dataset API

View dataset with properties:

GET /api/dataset/{datasetName} or GET /api/dataset/{datasetName}/{datasetVersion}-> e.g.

{
    "name": "MyDataset",
    "version": 6,
    ...
    "properties": {
        "requiredKey1": "valueOfRequiredKey1"
    },
    ...
}

View properties of a dataset:

GET /api/dataset/{datasetName}/properties -> e.g.

{
    "requiredKey1": "valueOfRequiredKey1"
}

Replace properties of a dataset:

PUT /api/dataset/{datasetName}/properties with payload

{
    "newKey1": "valueOfNewKey1"
}

-> e.g.

{
    "name": "MyDataset",
    "version": 7,
    ...
    "properties": {
         "newKey1": "valueOfNewKey1"
    },
    "propertiesValidation": null,
    ...
}

(201 Created with header Location:/api/dataset/MyDataset/7)

Dataset properties validation

GET /api/dataset/{datasetName}/{datasetVersion}/properties/valid-> e.g.

{
    "errors": {}
}

or

{
    "errors": {
        "keyWithoutDefinition": [
            "There is no property definition for key 'keyWithoutDefinition'."
        ],
        "enumAbc1": [
             "Value 'valueNotFromEnum' is not one of the allowed values (optionA, optionB, optionC)."
        ]
    }
}

Dataset with validated properties (added in Review)

GET /api/dataset/{datasetName}/{datasetVersion}?validateProperties=true-> e.g.

{
    "name": "MyDataset",
    "version": 6,
    ...
    "properties": {
        "requiredKey1": "valueOfRequiredKey1",
        "keyWithoutDefinition": "someValue",
        "enumAbc1": "valueNotFromEnum"
    },
    "propertiesValidation" : {
        "errors": {
            "keyWithoutDefinition": [
                "There is no property definition for key 'keyWithoutDefinition'."
            ],
            "enumAbc1": [
                 "Value 'valueNotFromEnum' is not one of the allowed values (optionA, optionB, optionC)."
            ]
        }
    }
    ...
}

(if validateProperties is false/missing/invalid, properties are not validated)

Dataset Properties Definition API

View all property definitions at-once (latest version)

GET /api/properties/datasets-> e.g.

[
    {
        "name": "enumAbc1",
        "version": 1,
        "description": null,
        "propertyType": {
            "_t": "StringEnumPropertyType",
            "allowedValues": [
                "optionA",
                "optionB",
                "optionC"
            ],
            "suggestedValue": "optionB"
        },
        "putIntoInfoFile": false,
        "essentiality": {
            "_t": "Optional"
        },
        "disabled": false,
        "dateCreated": "2020-11-09T08:33:33.129Z",
        "userCreated": "user",
        "lastUpdated": "2020-11-09T08:33:33.129Z",
        "userUpdated": "user",
        "dateDisabled": null,
        "userDisabled": null,
        "parent": null,
        "isRequired": false,
        "isOptional": true,
        "createdMessage": {
            "menasRef": {
                "collection": null,
                "name": "enumAbc1",
                "version": 1
            },
            "updatedBy": "user",
            "updated": "2020-11-09T08:33:33.129Z",
            "changes": [
                {
                    "field": "",
                    "oldValue": null,
                    "newValue": null,
                    "message": "PropertyDefinition enumAbc1 created."
                }
            ]
        }
    },
    {
        "name": "requiredKey1",
        "version": 1,
        "description": null,
        "propertyType": {
            "_t": "StringPropertyType",
            "suggestedValue": ""
        },
        "putIntoInfoFile": true,
        "essentiality": {
            "_t": "Mandatory"
        },
        ...
        }
    }
]

View specific property

GET /api/properties/datasets/{pdName} (latest-version) or GET /api/properties/datasets/{pdName}/{pdVerison} -> e.g.

{
    "name": "requiredKey1",
    "version": 1,
    "description": null,
    "propertyType": {
        "_t": "StringPropertyType",
        "suggestedValue": ""
    },
    ...
}

Create a new Property Definition

POST /api/properties/datasets with payload

{
    "name": "myPdKey2",
    "description": "myDesc",
    "propertyType": {
        "_t": "StringPropertyType",
        "suggestedValue": "sortOfDefault"
    }
}

-> e.g.

    "name": "myPdKey2",
    "version": 1,
    "description": "myDesc",
    "propertyType": {
        "_t": "StringPropertyType",
        "suggestedValue": "sortOfDefault"
    },
    "putIntoInfoFile": false,
    "essentiality": {
        "_t": "Optional"
    },
   ...
}

(201 Created with header Location:/api/properties/datasets/myPdKey2/1)

Test-runs:

Spark-job run is guarded with having valid properties:

(tested with std job run):

Required property missing:

Exception in thread "main" java.lang.IllegalStateException: Dataset validation failed, errors found in fields:
 - 'requiredKey1': List(Dataset property 'requiredKey1' is mandatory, but does not exist!)

Property value conformance fails:

Exception in thread "main" java.lang.IllegalStateException: Dataset validation failed, errors found in fields:
 - 'enumAbc1': List(Value bogusEnumValue of key 'enumAbc1' does not conform to the property type of StringEnumPropertyType(Set(optionA, optionB, optionC),optionB).)

Undefined property is present:

 Exception in thread "main" java.lang.IllegalStateException: Dataset validation failed, errors found in fields:
 - 'unwantedExtraProperty': List(There is no property definition for key 'unwantedExtraProperty'.)

A property key/value is written into the _INFO file:

  • This very behavior is covered with an integration test, too.
    A property definition with putIntoInfoFile = true exists:
    GET http://localhost:8080/menas_exploded/api/properties/datasets/requiredKey1 ->
{
    "name": "requiredKey1",
    "version": 1,
    "description": null,
    "propertyType": {
        "_t": "StringPropertyType",
        "suggestedValue": ""
    },
    "putIntoInfoFile": true,
    "essentiality": {
        "_t": "Mandatory"
    },
    "disabled": false,
    "dateCreated": "2020-11-06T15:25:30.153Z",
    "userCreated": "user",
    "lastUpdated": "2020-11-06T15:25:30.153Z",
    "userUpdated": "user",
    "dateDisabled": null,
    "userDisabled": null,
    "parent": null,
    "isRequired": true,
    "isOptional": false,
    "createdMessage": {
        "menasRef": {
            "collection": null,
            "name": "requiredKey1",
            "version": 1
        },
        "updatedBy": "user",
        "updated": "2020-11-06T15:25:30.153Z",
        "changes": [
            {
                "field": "",
                "oldValue": null,
                "newValue": null,
                "message": "PropertyDefinition requiredKey1 created."
            }
        ]
    }
}

The dataset has such a field defined:
GET http://localhost:8080/menas_exploded/api/dataset/Cobol1/properties ->

{
    "requiredKey1": "valueOfRequiredKey1"
}

A key being written into the _INFO file:

$ cat _INFO | jq "."
{
  "metadata": {
   ...
    "version": 1,
    "informationDate": "25-02-2019",
    "additionalInfo": {
      ...
      "std_records_failed": "0",
      "std_ds_requiredKey": "valueOfRequiredKey1",
      ...

Known issues

  • listing dataset properties via /api/dataset/{name}/{version}/properties will include properties that do not have an existing property definition in the system (because PD are not retrieved at this time ("orphaned properties"). (Should they be?)). When filtering is used, e.g. /api/datase/{name}/properties?putIntoInfoFile=false, such orphaned properties are dropped. However, even if filtering is used with an invalid filter (/api/datase/{name}/properties?putIntoInfoFile=bogus), this fallbacks to only PD-backed properties being returned. Thus, the orphaned properties are dropped, too.

@dk1844 dk1844 added the work in progress Work on this item is not yet finished (mainly intended for PRs) label Nov 9, 2020
@dk1844 dk1844 force-pushed the feature/1569-dataset-properties-def branch from 199e4ea to 581ad2b Compare November 9, 2020 15:45
propertyDef export impl
propertyDef json export unit test
other export tests = assert expected v actual switched for better error messages
…erties/datasets (this aligns things with the base controller providing other endpoints like /list, etc.)
…ods where possible - just aliased to gain as much consistency as possible
…efinition seems to work.

Example:
```
{
    "errors": {
        "enumAbc1": [
            "Value bogusEnumValue of key 'enumAbc1' does not conform to the property type of StringEnumPropertyType(Set(optionA, optionB, optionC))"
        ],
        "testKey2": [
            "There is no property definition for key 'testKey2'."
        ],
        "requiredKey1": [
            "Dataset property requiredKey1 is mandatory, but does not exist!"
        ]
    }
}
```
…when mandatory+disabled = should not be present)
PropDef - suggestedValue is now part of propertyType, typeSpecificSettings removed.
…/properties/datasets) added to the controller, integTest covered
…efinition: latest) + integTest update

Dao: getDatasetPropertiesValidation
Validation class shared to Dao
…once at /api/dataset/{datasetName}/{datasetVersion}?validateProperties=true|false
…atasetVersion}?validateProperties=true|false, MenasDAO updated to use it + std spark-job, too.

TODO spark job test
…e common job exec fails on invalid dataset properties from DAO
… is now configurable (config key is "control.info.dataset.properties.prefix", default value: "")
Zejnilovic
Zejnilovic previously approved these changes Dec 10, 2020
@@ -48,17 +48,19 @@ package object propertyType {

isValueConforming(suggestedValue) match {
case Success(_) =>
case Failure(e) => new IllegalArgumentException(s"The suggested value $suggestedValue cannot be used.", e)
case Failure(e) => throw PropertyTypeValidationException(s"The suggested value $suggestedValue cannot be used: ${e.getMessage}", e)
Copy link
Contributor

@Zejnilovic Zejnilovic Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach returns to the user. Maybe we could catch it sooner? Return Validation somewhere? We could also keep it after the REST API Audit as this is not the only one.

Typedefinitionerror: [
  simpletype,
  classza.co.absa.enceladus.model.properties.propertyType.package$EnumPropertyType
];nestedexceptioniscom.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannotconstructinstanceof`za.co.absa.enceladus.model.properties.propertyType.package$EnumPropertyType`,
problem: ThesuggestedvalueoptionDcannotbeused: Value'optionD'isnotoneoftheallowedvalues(optionA,
optionB,
optionC).at[
  Source: (PushbackInputStream);line: 7,
  column: 2
](throughreferencechain: za.co.absa.enceladus.model.properties.PropertyDefinition[
  "propertyType"
])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have adjusted the ExceptionHandler to include the message from our PropertyTypeValidationException if found as a L2 underlying cause, otherwise, the "generic" L0 message is used. The relevant integTest has been changed to reflect this. Hope you like it 🥇

@Zejnilovic Zejnilovic added the PR:tested Only for PR - PR was tested by a tester (person) label Dec 10, 2020
…d by PropertyTypeValidationException, react with most specific errorMessage to the REST API client
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 15 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@dk1844 dk1844 requested a review from Zejnilovic December 10, 2020 19:12
@dk1844 dk1844 linked an issue Dec 11, 2020 that may be closed by this pull request
@dk1844 dk1844 merged commit de057dc into develop Dec 11, 2020
@dk1844 dk1844 deleted the feature/1569-dataset-properties-def branch December 11, 2020 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR:tested Only for PR - PR was tested by a tester (person)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data Model for Dataset properties
4 participants