Pivoting a very wide lazy table throws a c-stack error. #1217

abalter · 2023-03-19T19:07:48Z

I created a wide table: >100 columns. As a tibble I can pivot it to three columns. As a memdb, SQLite in-memory table, or arrow-->duckdb table I get a c-stack error:

Error: C stack usage 7972212 is too close to the limit

When I reduce the number of columns to , say 80, I don't get the error.

I created a reprex, but for some reason it won't display that error. However, I am including the code below:

library(tidyverse)
library(dbplyr)
library(RSQLite)
library(arrow)

Nids = 10
Nyears = 10
Ndates = 12*Nyears
start_year = 2010

"*******  Create Very Wide Table  *********"
tb_wide =
  crossing(
    id = str_glue("id_{1:Nids}"),
    year = start_year:(start_year+Nyears-1),
    month = month.abb %>% tolower()
  ) %>%
  tibble() %>%
  mutate(value = rnorm(Nids*Ndates)) %>%
  unite(month, year, col="date", sep="_") %>%
  pivot_wider(names_from=date, values_from = value)

tb_wide %>% nrow()
tb_wide %>% ncol()
colnames(tb_wide)

"******  Create lazy arrow table  *****"
readr::write_csv(tb_wide, "tb_wide.csv")
tb_arrow = read_csv_arrow("tb_wide.csv", as_data_frame = F)
tb_arrow %>% nrow()
tb_arrow %>% ncol()

"******  Create memdb table  ******"
tb_mdb = memdb_frame(tb_wide)
tb_arrow %>% nrow()
tb_arrow %>% ncol()

"******  Create in-memory sqlite table  ******"
con = dbConnect(RSQLite::SQLite(),":memory")
copy_to(con, tb_wide, "tb_wide")
dbListTables(con)
tb_db = tbl(con, "tb_wide")
tb_arrow %>% nrow()
tb_arrow %>% ncol()


"******  Try Pivoting  ******"
tb_long =
  tb_wide %>%
  pivot_longer(cols=-id, names_to="date", values_to = "value")

dim(tb_long)
colnames(tb_long)

tb_arrow %>%
  to_duckdb() %>%
  pivot_longer(cols=-id, names_to="date", values_to = "value")

tb_mdb %>%
  pivot_longer(cols=-id, names_to="date", values_to = "value")

tb_db %>%
  pivot_longer(cols=-id, names_to="date", values_to = "value")

print(nonexistent_variable)

sessionInfo()

mgirlich · 2023-03-24T10:44:43Z

I can't reproduce this but I think this has to do with the many union_all() it does in the background. The data structure has to be changed for this in order to make this work correctly.

abalter · 2023-03-24T15:24:02Z

Can't reproduce meaning you don't get an error? Can you demonstrate?

mgirlich · 2023-03-27T06:10:54Z

Yes, I don't get an error, not even with Nyears = 24. You can try using the dev version of dblyr. Otherwise, this can also be related to the system you're using.

rkb965 · 2023-04-04T23:27:14Z

I also get this C stack error. My personal use case was with duckdb and pivot_longer, 500 columns. The query ran smoothly as a tibble but throws a C stack error with duckdb (works well with 10 columns, haven't yet explored the limit).

I also get the C stack error with the above repex. Any idea what system details might be relevant to this?

Thank you!

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /spack/2206/apps/linux-centos7-x86_64_v3/gcc-11.3.0/openblas-0.3.20-n62j5my/lib/libopenblasp-r0.3.20.so

(all package versions are new as of last week. can attach full sessionInfo() if helpful)

mgirlich · 2023-04-11T07:09:00Z

You could try with the dev version of dbplyr. Otherwise, this has to wait until a series of union() resp. union_all() is handled differently. It is on the list of things I want to tackle but I don't know yet when I have time for this.

mgirlich · 2023-04-28T12:25:01Z

The dev version now handles a sequence of union() better. Can you install it via devtools::install_github("tidyverse/dbplyr") and give feedback whether this solves your issues?

abalter · 2023-04-28T16:27:18Z

I can confirm that this dev version was able to handle the table in the reprex.

Great work!

> installed.packages()['dbplyr', c('LibPath', 'Version', 'Built')]
                                                   LibPath
"/home/users/balter/micromamba/envs/bigwide/lib/R/library"
                                                   Version
                                              "2.3.2.9000"
                                                     Built
                                                   "4.2.3"

If you would like me to stress-test a bit I would be happy to do that.

mgirlich · 2023-05-02T06:46:12Z

Thanks for the feedback.
Closed by #1270.

abalter mentioned this issue Mar 20, 2023

[Bug] Dbplyr throws c stack error pivoting wide table. #1201

Closed

mgirlich closed this as completed May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivoting a very wide lazy table throws a c-stack error. #1217

Pivoting a very wide lazy table throws a c-stack error. #1217

abalter commented Mar 19, 2023 •

edited

Loading

mgirlich commented Mar 24, 2023

abalter commented Mar 24, 2023

mgirlich commented Mar 27, 2023

rkb965 commented Apr 4, 2023 •

edited

Loading

mgirlich commented Apr 11, 2023

mgirlich commented Apr 28, 2023

abalter commented Apr 28, 2023 •

edited

Loading

mgirlich commented May 2, 2023

Pivoting a very wide lazy table throws a c-stack error. #1217

Pivoting a very wide lazy table throws a c-stack error. #1217

Comments

abalter commented Mar 19, 2023 • edited Loading

mgirlich commented Mar 24, 2023

abalter commented Mar 24, 2023

mgirlich commented Mar 27, 2023

rkb965 commented Apr 4, 2023 • edited Loading

mgirlich commented Apr 11, 2023

mgirlich commented Apr 28, 2023

abalter commented Apr 28, 2023 • edited Loading

mgirlich commented May 2, 2023

abalter commented Mar 19, 2023 •

edited

Loading

rkb965 commented Apr 4, 2023 •

edited

Loading

abalter commented Apr 28, 2023 •

edited

Loading