You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using train_test_split with shuffle=False and a Dask dataframe, I notice 2 issues - 1) The index is actually shuffled and 2) the train/test size seems incorrect. The behavior doesn't match sklearn or when you pass a raw DataFrame.
When using
train_test_split
withshuffle=False
and a Dask dataframe, I notice 2 issues - 1) The index is actually shuffled and 2) the train/test size seems incorrect. The behavior doesn't match sklearn or when you pass a raw DataFrame.Minimal Complete Verifiable Example:
Setup
With
sklearn.model_selection
, order is maintained (i.e. no shuffle)With
dask_ml.model_selection
using Pandas Dataframe, order is maintained (i.e. no shuffle)With
dask_ml.model_selection
using Dask Dataframe, , order is NOT maintained and train/test size is incorrect.Environment:
The text was updated successfully, but these errors were encountered: