You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
DataHub Java SDK lacks schema ingestion capabilities for Avro and JSON Schema formats, while the Python SDK has robust support for both. Although Java has Protobuf support in a standalone module, we need to provide equivalent capabilities for Avro and JSON Schema to ensure consistency between both SDKs.
To Reproduce
Create a complex nested Avro or JSON Schema with:
Nested record types
Arrays of complex types
Maps with complex value types
Union types (for Avro)
Attempt to generate DataHub schema using Java SDK
Observe that no built-in conversion utilities exist, unlike Python SDK's support for Avro (avro_schema_to_mce()) and JSON Schema
Expected behavior
The Java SDK should provide equivalent schema ingestion capabilities as the Python SDK:
Add Avro and JSON Schema conversion utilities to match Python SDK capabilities
Automatic handling of nested types and complex schema structures
Keep parity with Python SDK's schema handling features while maintaining the existing Protobuf support in the standalone module
Helper methods to extract schema metadata (e.g., field descriptions, annotations, meta_mapping)
Additional context
Python SDK currently handles both Avro and JSON Schema through datahub.ingestion.extractor.schema_util and datahub.ingestion.extractor.json_schema_util
Java SDK has Protobuf support but in a standalone module
Neither SDK currently supports Thrift schema ingestion
Manual schema mapping for Avro and JSON Schema in Java is error-prone and time-consuming
This feature would provide a consistent experience across both SDKs
Consider implementing similar schema inference capabilities as found in the Python SDK's existing schema utilities
Many organizations use a mix of Python and Java, so having consistent support across both SDKs is crucial
The text was updated successfully, but these errors were encountered:
Describe the bug
DataHub Java SDK lacks schema ingestion capabilities for Avro and JSON Schema formats, while the Python SDK has robust support for both. Although Java has Protobuf support in a standalone module, we need to provide equivalent capabilities for Avro and JSON Schema to ensure consistency between both SDKs.
To Reproduce
avro_schema_to_mce()
) and JSON SchemaExpected behavior
The Java SDK should provide equivalent schema ingestion capabilities as the Python SDK:
Additional context
datahub.ingestion.extractor.schema_util
anddatahub.ingestion.extractor.json_schema_util
The text was updated successfully, but these errors were encountered: