read_table() can wrongly read numbers (silently) #518

aphalo · 2016-09-12T08:41:04Z

read_table() miss reads numbers in the first column when there is white space at the start of enough lines at the top of the file.

Of the attached files the shorter readr-read-table.txt is correctly read, but the second and longer file readr-read-table-2.txt is incorrectly read, with 1000 interpreted as 0, 1001 as 1, etc.

 readr::read_table("readr-read-table.txt", col_types = "dd")
 readr::read_table("readr-read-table-2.txt", col_types = "dd")

readr-read-table.txt
readr-read-table-2.txt

I noticed a problem with some of my files some months ago, but I did not find the cause of the problem until yesterday. utils::read.table() has no problems with either of these files.

readr installed from this repository minutes ago. 1.0.0.9000
R 3.3.1 Windows 10 x64.

The text was updated successfully, but these errors were encountered:

yeedle · 2016-10-28T16:15:36Z

Here's another similar complaint citing the same issue. I'm not sure it's a bug though. By design, the tokenizer reads only the first 100 lines to determine the structure of the file. Probably because of efficiency, though it wouldn't hurt to mention it in the documentation.

aphalo · 2016-10-28T18:29:47Z

An option to remove white space at the start of lines before parsing could solve the problem, I think. In many data sets where the first column is an ordered index quantity like wavelength it is common that the numbers are right justified. This may happen even in files where the columns are not all aligned, which prevents the use of a fixed format.

yeedle · 2016-10-28T20:09:28Z

The same issue occurs with both right-aligned and left-aligned columns. See the SO question I linked to above. There, the issue occurred to a left-justified column.

hadley · 2016-12-22T19:28:51Z

Yes, read_table() is potentially unreliable because it is magic. If you favour correctness over ease-of-use, you should use read_fwf().

That said, read_table() should print out the spec that it uses (a la col_types) so you can tweak afterwards.

Fixes #518

hadley added feature a feature request or enhancement read 📖 labels Dec 22, 2016

hadley mentioned this issue Dec 22, 2016

Unexpected behavior in parsing TXT file #516

Closed

yeedle mentioned this issue Dec 27, 2016

add ability to tweak number of lines read for read_table #567

Merged

hadley closed this as completed in 8d52d61 Jan 17, 2017

jimhester pushed a commit that referenced this issue Jan 23, 2017

Add ability to tweak number of lines read for read_table (#567)

ceba68e

Fixes #518

jimhester pushed a commit that referenced this issue Jan 23, 2017

Add ability to tweak number of lines read for read_table (#567)

68a7575

Fixes #518

lock bot locked and limited conversation to collaborators Sep 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_table() can wrongly read numbers (silently) #518

read_table() can wrongly read numbers (silently) #518

aphalo commented Sep 12, 2016 •

edited

Loading

yeedle commented Oct 28, 2016 •

edited

Loading

aphalo commented Oct 28, 2016

yeedle commented Oct 28, 2016

hadley commented Dec 22, 2016

read_table() can wrongly read numbers (silently) #518

read_table() can wrongly read numbers (silently) #518

Comments

aphalo commented Sep 12, 2016 • edited Loading

yeedle commented Oct 28, 2016 • edited Loading

aphalo commented Oct 28, 2016

yeedle commented Oct 28, 2016

hadley commented Dec 22, 2016

aphalo commented Sep 12, 2016 •

edited

Loading

yeedle commented Oct 28, 2016 •

edited

Loading