-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_table() can wrongly read numbers (silently) #518
Comments
Here's another similar complaint citing the same issue. I'm not sure it's a bug though. By design, the tokenizer reads only the first 100 lines to determine the structure of the file. Probably because of efficiency, though it wouldn't hurt to mention it in the documentation. |
An option to remove white space at the start of lines before parsing could solve the problem, I think. In many data sets where the first column is an ordered index quantity like wavelength it is common that the numbers are right justified. This may happen even in files where the columns are not all aligned, which prevents the use of a fixed format. |
The same issue occurs with both right-aligned and left-aligned columns. See the SO question I linked to above. There, the issue occurred to a left-justified column. |
Yes, That said, |
read_table()
miss reads numbers in the first column when there is white space at the start of enough lines at the top of the file.Of the attached files the shorter
readr-read-table.txt
is correctly read, but the second and longer filereadr-read-table-2.txt
is incorrectly read, with 1000 interpreted as 0, 1001 as 1, etc.readr-read-table.txt
readr-read-table-2.txt
I noticed a problem with some of my files some months ago, but I did not find the cause of the problem until yesterday.
utils::read.table()
has no problems with either of these files.readr installed from this repository minutes ago. 1.0.0.9000
R 3.3.1 Windows 10 x64.
The text was updated successfully, but these errors were encountered: