-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] ValueError: Attribute __setstate__ not found in <class 'cuml.dask.cluster.kmeans.KMeans'> #3140
Comments
Note that the non-dask version has no problems with being pickle dumped and loaded. |
I'm aware that some CUML transformers cannot be pickled until fit, but this cannot be pickled even after fit. Any thoughts from CUML team? This blocks our use of such CUML transformers. |
@pseudotensor I'm looking into this, sorry for the delay. |
@pseudotensor, in general, the You can extract the underlying single-GPU model with |
Best would be if this was handled by CUML class itself rather than user going through those steps. E.g. requiring a separate package just to handle load/save is a bit much. And requiring a separate package just to do predict/transform is also alot. And it's not clear, even given what you said, the impact. I I already extract "client" out of the encoder, and one can inject that back in for transform. But this too is not necessary. You shouldn't need client as argument and get just get client from the environment like xgboost and other packages. xgboost handles all these things fine for the dask classes. So it's unclear why CUML cannot do same thing. Maybe @teju85 has some insights. |
Also, FYI, pickling of dask CUML algos is even already mentioned here https://docs.rapids.ai/api/cuml/stable/api.html#manifold |
Any progress on this? I don't think there is any technical hurdle and any normal scikit model/transformer can do this, and I think dask algos can too with minimal effort. |
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d. |
It sounds like this is something that isn't likely to change in the near future, outside the workflow described by cjnolet. Closing for now; if there is a specific workflow that requires the ability to directly serialize Dask cuml models we can revisit. |
Describe the bug
Pickle dump of cuml instance doesn't complain, but pickle.load does.
Just fit some kmeans state:
This fails as above during loads.
Without ability to persist the state of the encoder, this makes it impossible to use for any inference case in any separate session. It's unusual to have to always fit and predict/transform in exact same fork. No production case would be able to use without some persistence to disk. E.g. all of scikit-learn and xgboost allow pickle, and in cases when not present packages like tensorflow have their own load/save mechansms.
rapids 0.14, conda prebuilt packages, Ubuntu 18.04, RTX 2080, 440.33.01, cuda 10.2.
Perhaps this is solved already in 0.15 or 0.16?
The text was updated successfully, but these errors were encountered: