Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Avro to JSON data format conversion #865

Merged
merged 1 commit into from
Oct 31, 2019

Conversation

jarrodconnolly
Copy link
Contributor

Operation to convert Avro (with schema) to JSON format.

Avro files are essentially records and as such most tools convert Avro to a non-compliant JSON like record format. The objects are output in JSON format themselves but not in an array, they are usually just newline separated.

(eg) using the official Apache Avro cmd line tools.

java -jar avro-tools-1.9.1.jar tojson human-10.avro

{"name":"Human Person One","age":25}
{"name":"Human Person Two","age":30}
{"name":"Human Person Three","age":35}

I have added an option Force Valid JSON to this Operation that will put records into a valid JSON array to be further processed by other JSON tools if needed.

Closes #289

@n1474335 n1474335 merged commit 2d12a16 into gchq:master Oct 31, 2019
@n1474335
Copy link
Member

Thanks very much for this. I've made a couple of small changes:

  • I've created a new module called 'Serialise' and put the Avro and BSON operations into it so that we don't have too many tiny modules floating around.
  • I'm not a fan of dynamically changing the output and present types of an operation as there are a number of places where we make assumptions about an operation based on OperationConfig.json which is generated based on the values set in the operation constructor. If these values can't be relied upon, it could cause issues that would be hard to detect in automated testing. I've set the output type for this op to string as this casts easily to JSON anyway. Any subsequent operations that require their input to be JSON should work perfectly well regardless of whether this operation returns genuine JSON or stringified JSON.

@jarrodconnolly
Copy link
Contributor Author

Thanks for the changes! Much appreciated.

I could tell that switching the output type was not really the cleanest way to handle things but I was not sure if outputting string would play well with the other JSON operations. Glad to hear that it will work.

Love the project, happy to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert Avro files to regular JSON
2 participants