Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV sniffer needs more data #119

Open
dbro opened this issue Oct 6, 2023 · 2 comments
Open

CSV sniffer needs more data #119

dbro opened this issue Oct 6, 2023 · 2 comments

Comments

@dbro
Copy link

dbro commented Oct 6, 2023

Hello- I have a TSV file with line character counts as follows (the first line is the header)

155
130
656
416
707
950
526
753
186
731
...

csv-reconcile init gives me the following error:

$ poetry run csv-reconcile init test7.tsv col1_name col2_name

...
File "/home/me/src/csv-reconcile/csv_reconcile/initdb.py", line 88, in init_db
searchidx = header.index(searchcol)
^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 'col2_name' is not in list

The error is fixed if I change the amount of data being fed to the sniffer on this line
dialect = csv.Sniffer().sniff(csvfile.read(10240))

where I changed the previous value of 1024 to be 10240.

@gitonthescene
Copy link
Owner

Thanks. I’ll have a look at making this configurable. Would it be possible to upload that file so I can test it on your data? Definitely not necessary but it would be nice to be able to confirm.

@dbro
Copy link
Author

dbro commented Oct 7, 2023

Here is a file that fails with sniffer parameter 1024, and succeeds with 10240
test-not-working3.tsv.gz

$ poetry run csv-reconcile init --scorer=dice test-not-working3.tsv public_identifier match_string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants