You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
since yesterday I run into immediate crashes of RStudio when running the following code with more than 1 factor variables and multiple level, or multiple factor variables of 1 level. This might be associated with new versions of tidymodels, tune, parsnip, workflows, because I updated them yesterday, but I couldn't restore the previous version setting, that, if I remember well didn't cause RStudio to crash! (not 100% sure).
This is the code, try with factorized variables and without, that should reproduce the error [crossed fingers].
I am not deep into BRT like LightGBM, do variables need to be factorized to be "recognized" as such by the algorithm(s) or does this not matter at all because of the algorithm's nature like lightGBM?
Kindly, felix
library(doParallel) #include multithreadding and parallizing processess where possible
library(foreach) #Provides foreach looping constructUseCores<- detectCores() -1# #Register CoreClustercl<- makeCluster(UseCores)
registerDoParallel(cl)
library(lightgbm)
pacman::p_load(
janitor, #data cleaningrecipes,
rsample,
parsnip,
workflows,
tune,
dials,
yardstick,
treesnip,
tidyverse,
tidymodels,
plyr,
dplyr,
tidyr,
readr,
stringr,
gtools,
ggplot2,
reshape2,
purrr,
data.table
)
#this worksmmmf_mixedVAR_simple_qf= structure(list(QF2020scenario= c(41.6104850769043, 68.6856307983398,
57.3654022216797, 28.7580642700195, 45.1602096557617, 47.2106628417969,
68.6856307983398, 71.7784652709961, 24.4756469726562, 40.3135414123535,
23.0632152557373, 24.1012935638428, 68.1585693359375, 63.6638069152832,
41.6104850769043, 35.8073768615723, 68.6856307983398, 54.6796913146973,
48.6273994445801, 41.6104850769043, 41.6104850769043, 89.0888595581055,
50.85595703125, 41.8495635986328, 75.2088851928711, 37.8572235107422,
76.5240020751953, 27.1052436828613, 34.5876998901367, 28.7580642700195,
42.9002914428711, 27.1052436828613, 29.2019214630127, 34.5320816040039,
40.723876953125, 57.3654022216797, 54.1622505187988, 41.5773963928223,
39.8819427490234, 32.0537185668945, 40.9108467102051, 41.5037727355957,
28.6135368347168, 47.2106628417969, 68.6856307983398, 36.1073341369629,
40.440845489502, 48.0962562561035, 74.1079177856445, 23.0632152557373,
58.8290863037109, 50.85595703125, 24.9130783081055, 87.9564056396484,
65.1510391235352, 28.7580642700195, 28.6135368347168, 48.6273994445801,
56.6535720825195, 31.4044914245605, 89.0888595581055, 42.9002914428711,
40.723876953125, 54.8065490722656, 48.5243873596191, 41.2597579956055,
22.1660785675049, 39.8819427490234, 27.2910995483398, 56.6535720825195,
40.723876953125, 41.6104850769043, 58.8290863037109, 37.8572235107422,
34.5320816040039, 79.2940444946289, 22.6065940856934, 57.3654022216797,
77.4911727905273, 26.6769046783447, 74.1079177856445, 45.1602096557617,
79.2940444946289, 36.1073341369629, 28.7580642700195, 68.1585693359375,
46.0501861572266, 27.2910995483398, 48.1491203308105, 71.7784652709961,
68.0657424926758, 54.1622505187988, 44.9057807922363, 26.5627517700195,
93.6683654785156, 23.0632152557373, 38.9476852416992, 48.6273994445801,
28.7580642700195, 40.440845489502), MMMFsize= structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label= c("624.9", "1249.8", "1874.6", "1874.7",
"2499.5", "2499.6", "3124.4", "3124.5", "3749.3", "3749.4", "4999.1",
"4999.2", "6248.9"), class="factor"), DEM= c(1378.26611328125,
974.038024902344, 1185.93322753906, 671.142700195312, 1395.73779296875,
1257.41296386719, 863.424499511719, 1140.35339355469, 1240.84362792969,
846.944030761719, 864.497497558594, 1451.91491699219, 913.910522460938,
1063.34252929688, 1784.55932617188, 1390.81921386719, 1312.2763671875,
1656.35778808594, 1273.79150390625, 1221.74926757812, 1278.97338867188,
1039.95471191406, 1548.49108886719, 1124.38513183594, 1080.16455078125,
895.399963378906, 1248.04418945312, 1637.8017578125, 1257.97521972656,
697.554016113281, 1007.21697998047, 1405.5810546875, 1640.0732421875,
680.0546875, 1188.71472167969, 1348.49426269531, 1692.85485839844,
809.52734375, 1069.23974609375, 832.667541503906, 1056.05895996094,
983.139587402344, 720.094604492188, 1692.974609375, 1284.35400390625,
1618.52624511719, 925.88134765625, 1287.73107910156, 917.59375,
740.259094238281, 1020.67596435547, 1245.60473632812, 1178.15710449219,
1194.00207519531, 1027.35168457031, 694.300720214844, 717.023986816406,
1905.86767578125, 1113.28002929688, 669.358154296875, 1170.1513671875,
1143.10107421875, 1163.02563476562, 1591.00939941406, 1369.50964355469,
924.715576171875, 1223.99816894531, 986.636840820312, 1388.10412597656,
1064.22509765625, 1019.59216308594, 876.469787597656, 1090.61096191406,
1315.27526855469, 1266.10583496094, 1157.1943359375, 1229.54321289062,
1376.40209960938, 1154.36730957031, 1365.8759765625, 840.7255859375,
947.099365234375, 1402.03540039062, 1509.984375, 711.320678710938,
864.457153320312, 1516.41235351562, 1297.19689941406, 1970.48754882812,
992.990478515625, 1284.78002929688, 1398.86865234375, 1365.95983886719,
1524.95874023438, 1135.64111328125, 1386.25695800781, 1407.48962402344,
1354.19921875, 702.702270507812, 1183.13366699219)), row.names= c(NA,
-100L), class= c("data.table", "data.frame"), .internal.selfref=<pointer:0x0000022c227e1ef0>)
#this crashesmmmf_mixedVAR_simple_qf= structure(list(QF2020scenario= c(41.6104850769043, 68.6856307983398,
57.3654022216797, 28.7580642700195, 45.1602096557617, 47.2106628417969,
68.6856307983398, 71.7784652709961, 24.4756469726562, 40.3135414123535,
23.0632152557373, 24.1012935638428, 68.1585693359375, 63.6638069152832,
41.6104850769043, 35.8073768615723, 68.6856307983398, 54.6796913146973,
48.6273994445801, 41.6104850769043, 41.6104850769043, 89.0888595581055,
50.85595703125, 41.8495635986328, 75.2088851928711, 37.8572235107422,
76.5240020751953, 27.1052436828613, 34.5876998901367, 28.7580642700195,
42.9002914428711, 27.1052436828613, 29.2019214630127, 34.5320816040039,
40.723876953125, 57.3654022216797, 54.1622505187988, 41.5773963928223,
39.8819427490234, 32.0537185668945, 40.9108467102051, 41.5037727355957,
28.6135368347168, 47.2106628417969, 68.6856307983398, 36.1073341369629,
40.440845489502, 48.0962562561035, 74.1079177856445, 23.0632152557373,
58.8290863037109, 50.85595703125, 24.9130783081055, 87.9564056396484,
65.1510391235352, 28.7580642700195, 28.6135368347168, 48.6273994445801,
56.6535720825195, 31.4044914245605, 89.0888595581055, 42.9002914428711,
40.723876953125, 54.8065490722656, 48.5243873596191, 41.2597579956055,
22.1660785675049, 39.8819427490234, 27.2910995483398, 56.6535720825195,
40.723876953125, 41.6104850769043, 58.8290863037109, 37.8572235107422,
34.5320816040039, 79.2940444946289, 22.6065940856934, 57.3654022216797,
77.4911727905273, 26.6769046783447, 74.1079177856445, 45.1602096557617,
79.2940444946289, 36.1073341369629, 28.7580642700195, 68.1585693359375,
46.0501861572266, 27.2910995483398, 48.1491203308105, 71.7784652709961,
68.0657424926758, 54.1622505187988, 44.9057807922363, 26.5627517700195,
93.6683654785156, 23.0632152557373, 38.9476852416992, 48.6273994445801,
28.7580642700195, 40.440845489502), QF2020baseline= c(131.406265258789,
181.709228515625, 169.186233520508, 419.544281005859, 140.61865234375,
146.94059753418, 181.709228515625, 188.410766601562, 94.6697769165039,
342.659301757812, 91.3541030883789, 93.9609985351562, 68.1585693359375,
173.479064941406, 278.51025390625, 257.252197265625, 181.709228515625,
165.460083007812, 149.839279174805, 41.6104850769043, 41.6104850769043,
89.0888595581055, 83.1039886474609, 280.182220458984, 75.2088851928711,
37.8572235107422, 200.087020874023, 244.59294128418, 116.486618041992,
252.553466796875, 134.824676513672, 27.1052436828613, 256.261657714844,
467.705383300781, 132.259353637695, 169.186233520508, 337.075317382812,
351.096618652344, 126.392868041992, 108.81022644043, 130.170455932617,
132.241760253906, 253.748397827148, 307.326568603516, 181.709228515625,
129.052230834961, 128.383041381836, 148.36784362793, 193.594207763672,
252.822570800781, 58.8290863037109, 50.85595703125, 24.9130783081055,
472.049530029297, 65.1510391235352, 252.553466796875, 28.6135368347168,
79.4798583984375, 511.481689453125, 270.176391601562, 89.0888595581055,
134.824676513672, 132.259353637695, 166.476776123047, 149.569839477539,
41.2597579956055, 87.522834777832, 126.392868041992, 104.307945251465,
56.6535720825195, 132.259353637695, 41.6104850769043, 163.606430053711,
124.32772064209, 34.5320816040039, 205.243103027344, 90.1506805419922,
169.186233520508, 434.497650146484, 102.199165344238, 193.594207763672,
140.61865234375, 438.991790771484, 129.052230834961, 252.553466796875,
183.919052124023, 144.040237426758, 27.2910995483398, 310.386657714844,
188.410766601562, 68.0657424926758, 164.270843505859, 139.579650878906,
102.060707092285, 235.561004638672, 23.0632152557373, 126.467643737793,
149.839279174805, 252.553466796875, 128.383041381836), HSG= structure(c(1L,
3L, 3L, 1L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 3L,
1L, 1L, 1L, 1L, 3L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L,
1L, 1L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 3L, 3L,
1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L,
3L, 1L, 3L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 3L, 1L, 3L, 3L, 1L, 3L,
1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 1L, 1L,
1L, 1L, 3L), .Label= c("3", "3.5", "4"), class="factor")), row.names= c(NA,
-100L), class= c("data.table", "data.frame"), .internal.selfref=<pointer:0x0000022c227e1ef0>)
# set the random seed so we can reproduce any simulated results.
set.seed(1234)
# load the housing data and clean namesmmmf_mixedVAR_simple_janitor_clean=mmmf_mixedVAR_simple_qf %>% janitor::clean_names()
# split into training and testing datasets. Stratify by Sale price mmmf_mixedVAR_simple_janitor_clean_split<-rsample::initial_split(
mmmf_mixedVAR_simple_janitor_clean,
prop=0.8,
strata=qf2020scenario
)
# Pre processing preprocessing_recipe<-recipes::recipe(qf2020scenario~., data= training(mmmf_mixedVAR_simple_janitor_clean_split)) %>%
#convert categorical variables to factorsrecipes::step_string2factor(all_nominal()) %>%
# combine low frequency factor levelsrecipes::step_other(all_nominal(), threshold=0.01) %>%
# remove no variance predictors which provide no predictive information recipes::step_nzv(all_nominal()) %>%
prep()
# Cross validate mmmf_mixedVAR_simple_janitor_clean_split_preproc_cv_folds<-recipes::bake(
preprocessing_recipe,
new_data= training(mmmf_mixedVAR_simple_janitor_clean_split)
) %>% rsample::vfold_cv(v=5)
# lightgbm model specificationlightgbm_model<-parsnip::boost_tree(
min_n= tune(), #min_data_in_leaftree_depth= tune(), #max_depthtrees= tune(), #num_iterationslearn_rate= tune(), #learning_rateloss_reduction= tune(), #min_gain_to_splitmtry= tune()#, #feature_fraction# sample_size = tune() #bagging_fraction
) %>% set_engine("lightgbm") %>% set_mode("regression"
) %>% set_args(
num_threads=3,
num_leaves=131072, ## bagging_fraction = 0.1, # lets test 0.1 to 0.9 in steps of 0.1# early_stopping_round = 5,boosting="goss",
# bagging_freq = 5,tree_learner="data",
extra_trees=T,
monotone_constraints_method="advanced",
feature_pre_filter=F,
pre_partition=T
)
# ///grid specification by dials package to fill in the model above# grid specificationlightgbm_params<-dials::parameters(
min_n(),
tree_depth(),
trees(),
learn_rate(),
loss_reduction(),
mtry()#,# sample_size = sample_prop(range = c(0.1,0.9),trans=NULL)
) %>% update(mtry= finalize(mtry(), mmmf_mixedVAR_simple_janitor_clean %>% select(-qf2020scenario))) ##mtry and sample_size need to be provided with a range of how much to sample in (sample_size) and from how many predictor to select (mtry)#mtry will use in this annotation any of predictors without the deselected(-)# ///and the grid to look in # Experimental designs for computer experiments are used# to construct parameter grids that try to cover the parameter space such that# any portion of the space has an observed combination that is not too far from# it.lgbm_grid<-dials::grid_max_entropy(
lightgbm_params,
size=7
)
# To tune our model, we perform grid search over our xgboost_grid’s grid space# to identify the hyperparameter values that have the lowest prediction error.# Workflow setup# /// (contains the work)lgbm_wf<-workflows::workflow() %>%
add_model(lightgbm_model
) %>%
add_formula(qf2020scenario~.)
# /// so far little to no computation has been performed except for# /// preprocessing calculations# hyperparameter tuning# //// this is where the machine starts to smoke!
set_dependency("boost_tree", eng="lightgbm", "lightgbm")
set_dependency("boost_tree", eng="lightgbm", "treesnip")
lgbm_tuned<-tune::tune_grid(
object=lgbm_wf,
resamples=mmmf_mixedVAR_simple_janitor_clean_split_preproc_cv_folds,
grid=lgbm_grid,
metrics=yardstick::metric_set(rmse, rsq, mae),
control=tune::control_grid(verbose=F)
)
Hi,
since yesterday I run into immediate crashes of RStudio when running the following code with more than 1 factor variables and multiple level, or multiple factor variables of 1 level. This might be associated with new versions of tidymodels, tune, parsnip, workflows, because I updated them yesterday, but I couldn't restore the previous version setting, that, if I remember well didn't cause RStudio to crash! (not 100% sure).
This is the code, try with factorized variables and without, that should reproduce the error [crossed fingers].
I am not deep into BRT like LightGBM, do variables need to be factorized to be "recognized" as such by the algorithm(s) or does this not matter at all because of the algorithm's nature like lightGBM?
Kindly, felix
Created on 2022-03-21 by the reprex package (v2.0.1)
Session info
and this works, too (tidymodels/tune#476; example from tidymodels/tune#460):
Created on 2022-03-21 by the reprex package (v2.0.1)
Session info
The text was updated successfully, but these errors were encountered: