Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pivoting a very wide lazy table throws a c-stack error. #1217

Closed
abalter opened this issue Mar 19, 2023 · 8 comments
Closed

Pivoting a very wide lazy table throws a c-stack error. #1217

abalter opened this issue Mar 19, 2023 · 8 comments

Comments

@abalter
Copy link

abalter commented Mar 19, 2023

I created a wide table: >100 columns. As a tibble I can pivot it to three columns. As a memdb, SQLite in-memory table, or arrow-->duckdb table I get a c-stack error:

Error: C stack usage 7972212 is too close to the limit

When I reduce the number of columns to , say 80, I don't get the error.

I created a reprex, but for some reason it won't display that error. However, I am including the code below:

library(tidyverse)
library(dbplyr)
library(RSQLite)
library(arrow)

Nids = 10
Nyears = 10
Ndates = 12*Nyears
start_year = 2010

"*******  Create Very Wide Table  *********"
tb_wide =
  crossing(
    id = str_glue("id_{1:Nids}"),
    year = start_year:(start_year+Nyears-1),
    month = month.abb %>% tolower()
  ) %>%
  tibble() %>%
  mutate(value = rnorm(Nids*Ndates)) %>%
  unite(month, year, col="date", sep="_") %>%
  pivot_wider(names_from=date, values_from = value)

tb_wide %>% nrow()
tb_wide %>% ncol()
colnames(tb_wide)

"******  Create lazy arrow table  *****"
readr::write_csv(tb_wide, "tb_wide.csv")
tb_arrow = read_csv_arrow("tb_wide.csv", as_data_frame = F)
tb_arrow %>% nrow()
tb_arrow %>% ncol()

"******  Create memdb table  ******"
tb_mdb = memdb_frame(tb_wide)
tb_arrow %>% nrow()
tb_arrow %>% ncol()

"******  Create in-memory sqlite table  ******"
con = dbConnect(RSQLite::SQLite(),":memory")
copy_to(con, tb_wide, "tb_wide")
dbListTables(con)
tb_db = tbl(con, "tb_wide")
tb_arrow %>% nrow()
tb_arrow %>% ncol()


"******  Try Pivoting  ******"
tb_long =
  tb_wide %>%
  pivot_longer(cols=-id, names_to="date", values_to = "value")

dim(tb_long)
colnames(tb_long)

tb_arrow %>%
  to_duckdb() %>%
  pivot_longer(cols=-id, names_to="date", values_to = "value")

tb_mdb %>%
  pivot_longer(cols=-id, names_to="date", values_to = "value")

tb_db %>%
  pivot_longer(cols=-id, names_to="date", values_to = "value")

print(nonexistent_variable)

sessionInfo()
@mgirlich
Copy link
Collaborator

I can't reproduce this but I think this has to do with the many union_all() it does in the background. The data structure has to be changed for this in order to make this work correctly.

@abalter
Copy link
Author

abalter commented Mar 24, 2023

Can't reproduce meaning you don't get an error? Can you demonstrate?

@mgirlich
Copy link
Collaborator

Yes, I don't get an error, not even with Nyears = 24. You can try using the dev version of dblyr. Otherwise, this can also be related to the system you're using.

@rkb965
Copy link

rkb965 commented Apr 4, 2023

I also get this C stack error. My personal use case was with duckdb and pivot_longer, 500 columns. The query ran smoothly as a tibble but throws a C stack error with duckdb (works well with 10 columns, haven't yet explored the limit).

I also get the C stack error with the above repex. Any idea what system details might be relevant to this?

Thank you!

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /spack/2206/apps/linux-centos7-x86_64_v3/gcc-11.3.0/openblas-0.3.20-n62j5my/lib/libopenblasp-r0.3.20.so

(all package versions are new as of last week. can attach full sessionInfo() if helpful)

@mgirlich
Copy link
Collaborator

You could try with the dev version of dbplyr. Otherwise, this has to wait until a series of union() resp. union_all() is handled differently. It is on the list of things I want to tackle but I don't know yet when I have time for this.

@mgirlich
Copy link
Collaborator

The dev version now handles a sequence of union() better. Can you install it via devtools::install_github("tidyverse/dbplyr") and give feedback whether this solves your issues?

@abalter
Copy link
Author

abalter commented Apr 28, 2023

I can confirm that this dev version was able to handle the table in the reprex.

Great work!

> installed.packages()['dbplyr', c('LibPath', 'Version', 'Built')]
                                                   LibPath
"/home/users/balter/micromamba/envs/bigwide/lib/R/library"
                                                   Version
                                              "2.3.2.9000"
                                                     Built
                                                   "4.2.3"

If you would like me to stress-test a bit I would be happy to do that.

@mgirlich
Copy link
Collaborator

mgirlich commented May 2, 2023

Thanks for the feedback.
Closed by #1270.

@mgirlich mgirlich closed this as completed May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants