Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html_table() error #204

Closed
jubjubbc opened this issue Dec 2, 2017 · 4 comments · Fixed by #293
Closed

html_table() error #204

jubjubbc opened this issue Dec 2, 2017 · 4 comments · Fixed by #293
Labels
bug an unexpected problem or unintended behavior table 🏓

Comments

@jubjubbc
Copy link

jubjubbc commented Dec 2, 2017

I'm somewhat new to scraping with R, but I'm getting an error message that I can't make sense of. A commenter on stackoverflow said that html_table() isn't handling the colspan logic correctly and suggested I file a bug report.

My code:

 url <- "https://en.wikipedia.org/wiki/California_State_Legislature,_2017%E2%80%9318_session"

leg <- read_html(url)

testdata <- leg %>% 
  html_nodes('table') %>% 
  .[6] %>% 
  html_table()

To which I get the response:

Error in out[j + k, ] : subscript out of bounds

@lifesabirch
Copy link

lifesabirch commented Nov 29, 2018

This error is the result of assuming that the html is formed properly and that a row span does not go beyond the end of the table.

read_html("<table><tr><td rowspan='7'>foo</td></tr></table>") %>% html_table()

If you add a check to see if you're at the end of the table, it works

for (k in seq_len(rowspan - 1)) {
  #ADD the following line it works just fine 
  if (j+k > nrow(out)) break;
  l <- utils::head(out[j+k, ], i-1)
  r <- utils::tail(out[j+k, ], maxp-i+1)
  out[j + k, ] <- utils::head(c(l, out[j, i], r), maxp)
}

@hadley hadley added bug an unexpected problem or unintended behavior table 🏓 labels Mar 17, 2019
@sreineri
Copy link

sreineri commented May 5, 2020

Hi, I have the same problem, but I couldnt fix it with the help above.

library('rvest')

url <- 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-ptBR.asp'

#Le o codigo HTML da url indicada
site <- read_html(url)
lista_tabela <- site %>%
html_nodes("table") %>%
html_table(fill=TRUE)

Error in out[j + k, ] : subscript out of bounds

Can you please help me?
Tks

@lifesabirch
Copy link

create your own version of html_table.xml_node and put in: this line: for (k in seq_len(rowspan - 1)) {
#ADD the following line it works just fine
if (j+k > nrow(out)) break;
l <- utils::head(out[j+k, ], i-1)
r <- utils::tail(out[j+k, ], maxp-i+1)
out[j + k, ] <- utils::head(c(l, out[j, i], r), maxp)
}

@sreineri
Copy link

sreineri commented May 6, 2020

I am sorry, I need more details. What is this rowspan? Can you please send me the whole code that you have tested and worked?

Thank you very much

hadley added a commit that referenced this issue Dec 19, 2020
And make it return a tibble.

Fixes #63. Fixes #204. Fixes #215. Fixes #199.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior table 🏓
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants