Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] cuML estimators should warn users when a copy is being made due to C vs. Fortran contiguousness (and explain the impact) #5930

Open
beckernick opened this issue Jun 13, 2024 · 0 comments
Labels
Cython / Python Cython or Python issue feature request New feature or request

Comments

@beckernick
Copy link
Member

As noted in #5929, for many algorithms, whether the input data is C or Fortran contiguous determines whether an expensive memory copy needs to be made. While this seems innocuous, it can have significant UX implications because it's not well understood by most users and, when it rears its head, it's not obvious based on errors.

For example, since KMeans expects C-contiguous data, any user that passes F-contiguous data to fit will require enough memory to spike to 2x in the original input data size. If they don't have enough space, they'll hit an OOM error and have no obvious idea why.

At that point, they'll either try Dask/Spark and multiple GPUs (which is more expensive and can be harder) or give up entirely (more common).

When we need to make a copy to change the contiguousness, we should throw a warning and explain the impact to users. That way, they may be able to change their workflow upstream to avoid an unexpected OOM (that they'd associate with cuML).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant