You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As noted in #5929, for many algorithms, whether the input data is C or Fortran contiguous determines whether an expensive memory copy needs to be made. While this seems innocuous, it can have significant UX implications because it's not well understood by most users and, when it rears its head, it's not obvious based on errors.
For example, since KMeans expects C-contiguous data, any user that passes F-contiguous data to fit will require enough memory to spike to 2x in the original input data size. If they don't have enough space, they'll hit an OOM error and have no obvious idea why.
At that point, they'll either try Dask/Spark and multiple GPUs (which is more expensive and can be harder) or give up entirely (more common).
When we need to make a copy to change the contiguousness, we should throw a warning and explain the impact to users. That way, they may be able to change their workflow upstream to avoid an unexpected OOM (that they'd associate with cuML).
The text was updated successfully, but these errors were encountered:
As noted in #5929, for many algorithms, whether the input data is C or Fortran contiguous determines whether an expensive memory copy needs to be made. While this seems innocuous, it can have significant UX implications because it's not well understood by most users and, when it rears its head, it's not obvious based on errors.
For example, since KMeans expects C-contiguous data, any user that passes F-contiguous data to
fit
will require enough memory to spike to 2x in the original input data size. If they don't have enough space, they'll hit an OOM error and have no obvious idea why.At that point, they'll either try Dask/Spark and multiple GPUs (which is more expensive and can be harder) or give up entirely (more common).
When we need to make a copy to change the contiguousness, we should throw a warning and explain the impact to users. That way, they may be able to change their workflow upstream to avoid an unexpected OOM (that they'd associate with cuML).
The text was updated successfully, but these errors were encountered: