Implementation

This implementation uses <github.com/apache/arrow-go> to write a parquet file.

The implementation runs through the JSON line by line twice. The first is used to infer the parquet schema - the number of columns in the data, their names and types.

Supported JSON types and their deduced parquet type:

JSON Type	Parquet Type
boolean	boolean
integer	int64
floating point number	float64
base64 encoded string	byte array
string	byte array (with string logical type)
RFC3339 date string	byte array (with custom RFC3339 type)
array of booleans	list of repeated booleans
array of integers	list of repeated int64s
array of floating point numbers	list of float64s
array of base64 encoded strings	list of byte arrays
array of strings	list of byte arrays (with string logical type)
array of RFC3339 strings	list of byte arrays (with custom RFC3339 type)

Nested objects are not supported.

Build and run

go build .
./json2parquet

Build and run docker image:

docker build --target service -t json2parquet .

docker run -d --name json2parquet json2parquet ${ARGS}

Run the docker on test-data

Extract data/test-data.zip into the data folder
The data folder should contain files place-city.ndjson, place-hamlet.ndjson, place-town.ndjson and place-village.ndjson

run

 docker run --rm -v "$(pwd)/data":/data --name json2parquet json2parquet -o /data/place-hamlet.parquet /data/place-hamlet.ndjson

Run linters

golangci-lint run --timeout 5m

Run tests

go test -race -v -cover -coverpkg=./... -covermode=atomic ./...

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
data		data
json		json
log		log
parquet		parquet
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation

Build and run

Run linters

Run tests

About

Releases

Packages

Languages

Danielius1922/json2parquet

Folders and files

Latest commit

History

Repository files navigation

Implementation

Build and run

Run linters

Run tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages