Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add Support for querying Avro/Protobuf Data in athena-msk Connector #2075

Merged
merged 1 commit into from
Aug 7, 2024

Conversation

Jithendar12
Copy link
Contributor

Support for querying Avro and Protobuf data from Amazon MSK.

Prerequisite: Define the Avro/Protobuf Schema in AWS Glue Schema Registry, select Data Format as Avro/Protocol Buffers while defining schema.
Note: Provide Glue Schema Name as MSK Topic Name with same case.
Example avro/protobuf schema in glue :

Avro:

{
  "type": "record",
  "namespace": "ABC_Organization",
  "name": "Employee",
  "fields": [
    {
      "name": "Name",
      "type": "string"
    },
    {
      "name": "Age",
      "type": "int"
    },
    {
      "name": "IsActive",
      "type": "boolean"
    },
    {
      "name": "Balance",
      "type": "double"
    },
    {
      "name": "Rating",
      "type": "float"
    }
  ]
}

Protobuf:

message protobuftest {
  optional string name = 1;
  optional int64 calories = 2;
  optional string colour = 3;
  optional bool isOrganic = 4;
  optional double price = 5;
  optional float weight = 6;
  optional int32 quantity = 7;
}

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@AbdulR3hman AbdulR3hman force-pushed the feature/msk-avro-protobuf branch from 03dffba to e7ed6ad Compare July 26, 2024 15:06
String dataFormat = registryReader.getGlueSchemaType(glueRegistryName, glueSchemaName);
if (dataFormat.equalsIgnoreCase(AVRO_DATA_FORMAT)) {
AvroTopicSchema avroTopicSchema = registryReader.getGlueSchema(glueRegistryName, glueSchemaName, AvroTopicSchema.class);
// Creating ArrowType for each fields in the topic schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; why copy the comments exactly as is?

There is a difference between the first if condition and the else at the end. maybe it would serve the reader if the comments are clearer;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AbdulR3hman , removed.

@Jithendar12 Jithendar12 requested a review from AbdulR3hman August 5, 2024 04:56
@AbdulR3hman
Copy link
Contributor

@Jithendar12 it seems we can conflicts; if you can rebase and push again it would great; thank you

@Jithendar12 Jithendar12 force-pushed the feature/msk-avro-protobuf branch 3 times, most recently from 87f8d77 to 1d65f61 Compare August 7, 2024 05:58
@Jithendar12
Copy link
Contributor Author

Hi @AbdulR3hman , rebased and pushed. Thank you

@AbdulR3hman AbdulR3hman force-pushed the feature/msk-avro-protobuf branch from 1d65f61 to c90c900 Compare August 7, 2024 18:29
@AbdulR3hman AbdulR3hman merged commit 5752309 into awslabs:master Aug 7, 2024
6 checks passed
github-actions bot pushed a commit that referenced this pull request Aug 16, 2024
  - Added Data Source Hints (#2148)
  - [Fix] Add views to the Table list in athena-sqlserver connector (#2170)
  - Fix for cloudwatch-metrics pagination bug (#2169)
  - Added QPT Support for TPCDS Connector (#2168)
  - build(deps): bump org.elasticsearch.client:elasticsearch-rest-client from 8.14.3 to 8.15.0 (#2159)
  - build(deps): bump org.elasticsearch.client:elasticsearch-rest-client
  - build(deps): bump software.amazon.awssdk:bom from 2.26.29 to 2.27.2 (#2164)
  - build(deps): bump software.amazon.awssdk:bom from 2.26.29 to 2.27.2
  - build(deps): bump software.amazon.msk:aws-msk-iam-auth from 2.1.1 to 2.2.0 (#2167)
  - build(deps): bump software.amazon.msk:aws-msk-iam-auth
  - build(deps): bump io.confluent:kafka-protobuf-provider from 7.6.0 to 7.7.0 (#2157)
  - build(deps): bump io.confluent:kafka-protobuf-provider
  - build(deps): bump org.apache.directory.api:api-ldap-model from 2.1.6 to 2.1.7 (#2163)
  - build(deps): bump org.apache.directory.api:api-ldap-model
  - build(deps): bump com.google.protobuf:protobuf-java from 3.19.6 to 3.25.3 (#2155)
  - build(deps): bump com.google.protobuf:protobuf-java
  - build(deps): bump org.apache.commons:commons-lang3 from 3.15.0 to 3.16.0 (#2161)
  - build(deps): bump org.apache.commons:commons-lang3 from 3.15.0 to 3.16.0
  - build(deps): bump slf4j-log4j.version from 2.0.13 to 2.0.16 (#2154)
  - build(deps): bump slf4j-log4j.version from 2.0.13 to 2.0.16
  - build(deps): bump org.apache.avro:avro from 1.11.3 to 1.12.0 (#2153)
  - build(deps): bump org.apache.avro:avro from 1.11.3 to 1.12.0
  - Remove duplicate dependencies from athena-msk pom.xml (#2146)
  - build(deps): bump com.google.protobuf:protobuf-java from 3.19.4 to 3.19.6 in /athena-msk (#2145)
  - build(deps): bump com.google.protobuf:protobuf-java in /athena-msk
  - [FEATURE] Add Support for querying Avro/Protobuf Data in athena-msk Connector (#2075)
  - DocDB select query with _id issue fix (#2105)
  - build(deps-dev): bump org.hamcrest:hamcrest from 2.2 to 3.0 (#2137)
  - build(deps-dev): bump org.hamcrest:hamcrest from 2.2 to 3.0
  - build(deps): bump com.mysql:mysql-connector-j from 8.4.0 to 9.0.0 (#2074)
  - build(deps): bump com.mysql:mysql-connector-j from 8.4.0 to 9.0.0
  - build(deps): bump org.eclipse.rdf4j:rdf4j-repository-sparql from 5.0.1 to 5.0.2 (#2136)
  - build(deps): bump org.eclipse.rdf4j:rdf4j-repository-sparql
  - Feature/sqlserver issue 1540 (#2126)
  - Neptune dependency upgrade (#2143)
  - Remove s3:listAllMyBuckets permission (#2131)
  - build(deps): bump com.amazon.redshift:redshift-jdbc42 from 2.1.0.29 to 2.1.0.30 (#2141)
  - build(deps): bump com.amazon.redshift:redshift-jdbc42
  - build(deps): bump com.oracle.database.jdbc:ojdbc8 from 23.4.0.24.05 to 23.5.0.24.07 (#2140)
  - build(deps): bump com.oracle.database.jdbc:ojdbc8
  - build(deps): bump io.lettuce:lettuce-core from 6.3.2.RELEASE to 6.4.0.RELEASE (#2138)
  - build(deps): bump io.lettuce:lettuce-core
  - build(deps): bump software.amazon.awssdk:bom from 2.26.25 to 2.26.29 (#2139)
  - build(deps): bump software.amazon.awssdk:bom from 2.26.25 to 2.26.29
  - build(deps): bump software.amazon.jsii:jsii-runtime from 1.101.0 to 1.102.0 (#2142)
  - build(deps): bump software.amazon.jsii:jsii-runtime
  - build(deps): bump com.google.cloud:google-cloud-storage from 2.40.1 to 2.41.0 (#2135)
  - build(deps): bump com.google.cloud:google-cloud-storage
  - build(deps): bump com.microsoft.sqlserver:mssql-jdbc from 12.7.1.jre11-preview to 12.8.0.jre11 (#2133)
  - build(deps): bump com.microsoft.sqlserver:mssql-jdbc
  - build(deps): bump aws-sdk.version from 1.12.765 to 1.12.767 (#2134)
  - build(deps): bump aws-sdk.version from 1.12.765 to 1.12.767
  - BigQuery empty table and projectId casing issues fix (#2068)
  - Handling array values in Neptune node (#2095)
  - [Fix] Remove Trailing Whitespaces from Char Column in Postgresql/Redshift Connector (#2117)
  - build(deps): bump software.amazon.awssdk:bom from 2.26.21 to 2.26.25 (#2124)
  - build(deps): bump software.amazon.awssdk:bom from 2.26.21 to 2.26.25
  - build(deps): bump com.microsoft.azure:msal4j from 1.16.1 to 1.16.2 (#2125)
  - build(deps): bump com.microsoft.azure:msal4j from 1.16.1 to 1.16.2
  - build(deps): bump io.confluent:kafka-avro-serializer from 7.6.2 to 7.7.0 (#2122)
  - build(deps): bump io.confluent:kafka-avro-serializer from 7.6.2 to 7.7.0
  - build(deps): bump io.confluent:kafka-protobuf-serializer from 7.6.2 to 7.7.0 (#2123)
  - build(deps): bump io.confluent:kafka-protobuf-serializer
  - build(deps): bump net.snowflake:snowflake-jdbc from 3.17.0 to 3.18.0 (#2121)
  - build(deps): bump net.snowflake:snowflake-jdbc from 3.17.0 to 3.18.0
  - build(deps): bump org.apache.kafka:kafka-clients from 3.7.1 to 3.8.0 (#2120)
  - build(deps): bump org.apache.kafka:kafka-clients from 3.7.1 to 3.8.0
  - build(deps): bump aws-sdk.version from 1.12.762 to 1.12.765 (#2119)
  - build(deps): bump aws-sdk.version from 1.12.762 to 1.12.765
  - build(deps): bump io.confluent:kafka-avro-serializer from 7.6.1 to 7.6.2 (#2116)
  - build(deps): bump io.confluent:kafka-avro-serializer from 7.6.1 to 7.6.2
  - build(deps): bump org.apache.maven.plugins:maven-javadoc-plugin from 3.7.0 to 3.8.0 (#2112)
  - build(deps): bump org.apache.maven.plugins:maven-javadoc-plugin
  - build(deps): bump com.clickhouse:clickhouse-jdbc from 0.6.2 to 0.6.3 (#2111)
  - build(deps): bump com.clickhouse:clickhouse-jdbc from 0.6.2 to 0.6.3
  - build(deps): bump org.apache.commons:commons-lang3 from 3.14.0 to 3.15.0 (#2113)
  - build(deps): bump org.apache.commons:commons-lang3 from 3.14.0 to 3.15.0
  - build(deps): bump software.amazon.msk:aws-msk-iam-auth from 2.1.1 to 2.2.0 (#2114)
  - build(deps): bump software.amazon.msk:aws-msk-iam-auth
  - build(deps): bump software.amazon.awssdk:bom from 2.26.20 to 2.26.21 (#2115)
  - build(deps): bump software.amazon.awssdk:bom from 2.26.20 to 2.26.21
  - build(deps): bump io.confluent:kafka-protobuf-serializer from 7.6.1 to 7.6.2 (#2110)
  - build(deps): bump io.confluent:kafka-protobuf-serializer
  - build(deps): bump aws-sdk.version from 1.12.761 to 1.12.762 (#2109)
  - build(deps): bump aws-sdk.version from 1.12.761 to 1.12.762
  - build(deps): bump org.assertj:assertj-core from 3.26.0 to 3.26.3 (#2099)
  - build(deps): bump org.assertj:assertj-core from 3.26.0 to 3.26.3
  - build(deps): bump software.amazon.awssdk:bom from 2.26.16 to 2.26.20 (#2103)
  - build(deps): bump software.amazon.awssdk:bom from 2.26.16 to 2.26.20
  - build(deps): bump net.snowflake:snowflake-jdbc from 3.16.1 to 3.17.0 (#2104)
  - build(deps): bump net.snowflake:snowflake-jdbc from 3.16.1 to 3.17.0
  - build(deps): bump surefire.failsafe.version from 3.3.0 to 3.3.1 (#2102)
  - build(deps): bump surefire.failsafe.version from 3.3.0 to 3.3.1
  - build(deps): bump com.microsoft.azure:msal4j from 1.16.0 to 1.16.1 (#2101)
  - build(deps): bump com.microsoft.azure:msal4j from 1.16.0 to 1.16.1
  - build(deps): bump com.microsoft.sqlserver:mssql-jdbc from 12.7.0.jre11-preview to 12.7.1.jre11-preview (#2097)
  - build(deps): bump com.microsoft.sqlserver:mssql-jdbc
  - build(deps): bump org.elasticsearch.client:elasticsearch-rest-client from 8.14.2 to 8.14.3 (#2098)
  - build(deps): bump org.elasticsearch.client:elasticsearch-rest-client
  - build(deps): bump aws-sdk.version from 1.12.753 to 1.12.761 (#2096)
  - build(deps): bump aws-sdk.version from 1.12.753 to 1.12.761
  - build(deps): bump org.apache.kafka:kafka-clients from 3.7.0 to 3.7.1 (#2066)
  - build(deps): bump org.apache.kafka:kafka-clients from 3.7.0 to 3.7.1
  - build(deps): bump software.amazon.awssdk:bom from 2.26.9 to 2.26.16 (#2073)
  - build(deps): bump software.amazon.awssdk:bom from 2.26.9 to 2.26.16
  - build(deps): bump org.junit:junit-bom from 5.10.2 to 5.10.3 (#2065)
  - build(deps): bump org.junit:junit-bom from 5.10.2 to 5.10.3
  - build(deps): bump org.elasticsearch.client:elasticsearch-rest-client from 8.14.1 to 8.14.2 (#2072)
  - build(deps): bump org.elasticsearch.client:elasticsearch-rest-client
  - build(deps): bump com.microsoft.azure:msal4j from 1.15.1 to 1.16.0 (#2063)
  - build(deps): bump com.microsoft.azure:msal4j from 1.15.1 to 1.16.0
  - build(deps): bump com.google.cloud:google-cloud-storage from 2.40.0 to 2.40.1 (#2061)
  - build(deps): bump com.google.cloud:google-cloud-storage
  - build(deps): bump net.jqwik:jqwik from 1.8.5 to 1.9.0 (#2059)
  - build(deps): bump net.jqwik:jqwik from 1.8.5 to 1.9.0
  - build(deps): bump com.clickhouse:clickhouse-jdbc from 0.6.1 to 0.6.2 (#2064)
  - build(deps): bump com.clickhouse:clickhouse-jdbc from 0.6.1 to 0.6.2
  - build(deps): bump com.sap.cloud.db.jdbc:ngdbc from 2.21.10 to 2.21.11 (#2062)
  - build(deps): bump com.sap.cloud.db.jdbc:ngdbc from 2.21.10 to 2.21.11
  - build(deps): bump fasterxml.jackson.version from 2.17.1 to 2.17.2 (#2070)
  - build(deps): bump fasterxml.jackson.version from 2.17.1 to 2.17.2
  - build(deps): bump aws-sdk.version from 1.12.750 to 1.12.753 (#2058)
  - build(deps): bump aws-sdk.version from 1.12.750 to 1.12.753
  - build(deps): bump software.amazon.jsii:jsii-runtime from 1.100.0 to 1.101.0 (#2049)
  - build(deps): bump software.amazon.jsii:jsii-runtime
  - build(deps): bump io.confluent:kafka-protobuf-serializer from 7.6.0 to 7.6.1 (#2050)
  - build(deps): bump io.confluent:kafka-protobuf-serializer
  - build(deps): bump com.clickhouse:clickhouse-jdbc from 0.6.0 to 0.6.1 (#2048)
  - build(deps): bump com.clickhouse:clickhouse-jdbc from 0.6.0 to 0.6.1
  - build(deps): bump software.amazon.awssdk:bom from 2.26.7 to 2.26.9 (#2047)
  - build(deps): bump software.amazon.awssdk:bom from 2.26.7 to 2.26.9
  - build(deps): bump aws-sdk.version from 1.12.748 to 1.12.750 (#2046)
  - build(deps): bump aws-sdk.version from 1.12.748 to 1.12.750
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants