[FEA] cuML estimators should warn users when a copy is being made due to C vs. Fortran contiguousness (and explain the impact) #5930

beckernick · 2024-06-13T15:13:05Z

As noted in #5929, for many algorithms, whether the input data is C or Fortran contiguous determines whether an expensive memory copy needs to be made. While this seems innocuous, it can have significant UX implications because it's not well understood by most users and, when it rears its head, it's not obvious based on errors.

For example, since KMeans expects C-contiguous data, any user that passes F-contiguous data to fit will require enough memory to spike to 2x in the original input data size. If they don't have enough space, they'll hit an OOM error and have no obvious idea why.

At that point, they'll either try Dask/Spark and multiple GPUs (which is more expensive and can be harder) or give up entirely (more common).

When we need to make a copy to change the contiguousness, we should throw a warning and explain the impact to users. That way, they may be able to change their workflow upstream to avoid an unexpected OOM (that they'd associate with cuML).

The text was updated successfully, but these errors were encountered:

beckernick added feature request New feature or request Cython / Python Cython or Python issue labels Jun 13, 2024

viclafargue mentioned this issue Jul 24, 2024

Better communicate expectations of data order/contiguousness #5975

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] cuML estimators should warn users when a copy is being made due to C vs. Fortran contiguousness (and explain the impact) #5930

[FEA] cuML estimators should warn users when a copy is being made due to C vs. Fortran contiguousness (and explain the impact) #5930

beckernick commented Jun 13, 2024

[FEA] cuML estimators should warn users when a copy is being made due to C vs. Fortran contiguousness (and explain the impact) #5930

[FEA] cuML estimators should warn users when a copy is being made due to C vs. Fortran contiguousness (and explain the impact) #5930

Comments

beckernick commented Jun 13, 2024