Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingestion): properly detect optional fields in avro schemas #2343

Merged
merged 3 commits into from
Apr 8, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def avro_schema_to_mce_fields(avro_schema_string: str) -> List[SchemaField]:
type=_get_column_type(parsed_field.type),
description=parsed_field.props.get("doc", None),
recursive=False,
nullable=(parsed_field.type == "null"),
nullable=parsed_field.has_default,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields can also be null if they are of type [ "null", "something else" ] as discussed here. This change will miss those cases, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For full correctness, the nullable logic might need to be a bit more complex: union types should inherit their default values and nullability from the first type in the union.

This might be ok to do in a follow-up PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, my avro knowledge is far from good. I updated the PR to a solution that more closely resembles the existing behaviour. Please let me know what you think.

)

fields.append(field)
Expand Down
29 changes: 29 additions & 0 deletions metadata-ingestion/tests/unit/test_schema_util.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import unittest

from datahub.ingestion.extractor.schema_util import avro_schema_to_mce_fields


class SchemaUtilTest(unittest.TestCase):
shirshanka marked this conversation as resolved.
Show resolved Hide resolved
def test_avro_schema_to_mce_fields(self):
avro_schema_string = """
{
"type": "record",
"name": "SomeEventName",
"namespace": "some.event.namespace",
"fields": [
{
"name": "my_field",
"type": {
"type": "string"
},
"default": "some.default.value",
"doc": "some doc"
}
]
}
"""

fields = avro_schema_to_mce_fields(avro_schema_string)

self.assertEquals(1, len(fields))
self.assertTrue(fields[0].nullable)