-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html_table: check column names? #116
Comments
Reproducible example: library(dplyr)
library(rvest)
HTML <- read_html('<table><tr><th>Var1</th><th> </th></tr><tr><td>1</td><td>2</td></table>')
HTML %>%
html_node('table') %>%
html_table() %>%
tbl_df()
## Error: All columns must be named
HTML %>%
html_node('table') %>%
html_table(check.names=TRUE) %>%
tbl_df()
## Source: local data frame [1 x 2]
## Var1 X
## (int) (int)
## 1 1 2
|
I think it's a bad idea to munge column names because they can contain important information. |
I agree that the default behavior should be to not munge things. However, so many HTML tables -- with or without good column names -- do not try to adhere to good R naming conventions :-) I certainly think the default should be to not change things, but I also think that when known-challenging tables are scraped, it would be better to offer an explicit "out" vice the unfortunate error with invalid column names. (This specific example may be masked with your new column-spanning feature, though I'm not certain that that should always be the right option.) Is there a better and/or more elegant solution to HTML tables that fail like this one does? Doesn't keeping the default |
Your explicit out is to |
Ahh, I see, perhaps something in |
Or maybe dplyr should have an explicit |
I don't think the can should be kicked down the road to
|
Hadley,
Thanks for the tools!
In
html_table()
, I find with some sites that using the defined header names can produce bad column names (including empty strings). What are your thoughts about including an option to run the column names throughmake.names
?Currently having to do:
Perhaps:
Something like:
I found no previous discussion of
make.names
, my apologies if this is a repeat.Currently using R 3.2.2, rvest_0.3.0, win10_64.
The text was updated successfully, but these errors were encountered: