Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query output duplicates 'entities' header for each batch #36

Open
ththvseo opened this issue Feb 15, 2018 · 1 comment
Open

query output duplicates 'entities' header for each batch #36

ththvseo opened this issue Feb 15, 2018 · 1 comment

Comments

@ththvseo
Copy link

ththvseo commented Feb 15, 2018

one would expect that '''query''' produces an output file that can be read back by '''upsert'''.
this does not work, because query will duplicate the '''entities:'' key for each batch, essentially writing a corrupt yaml file.
(for yaml, but i guess other formats are similarly affected, it's probably a similiar issue for json, but i have not tried; for csv it likely does not cause issues because there is no header?)

even worse, upsert will accept such a yaml file as input, but apparently only use the last batch (because the parser internally overwrites repeating keys?)
but that is probably not a bug in dsio itself, because the yaml parser is from a library and not part of dsio.)

@nshmura
Copy link
Owner

nshmura commented Feb 17, 2018

Thanks!
This is a bug, I will fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants