[BUG] ValueError: Attribute setstate not found in <class 'cuml.dask.cluster.kmeans.KMeans'> #3140

pseudotensor · 2020-11-13T23:17:18Z

Describe the bug

Pickle dump of cuml instance doesn't complain, but pickle.load does.

ValueError: Attribute __setstate__ not found in <class 'cuml.dask.cluster.kmeans.KMeans'>

Just fit some kmeans state:

from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
with LocalCUDACluster as cluster():
        with Client(cluster) as client:
                from cuml.dask.cluster import KMeans
                encoder = KMeans(client=client)
                X_pd = encoder.fit_transform(X, delayed=False).compute().to_pandas()
                encoder.client = None  # nuke unpicklable part
                import pickle
                pickle.loads(pickle.dumps(encoder))
                client.cancel(X)
                del X
                return X_pd

This fails as above during loads.

Without ability to persist the state of the encoder, this makes it impossible to use for any inference case in any separate session. It's unusual to have to always fit and predict/transform in exact same fork. No production case would be able to use without some persistence to disk. E.g. all of scikit-learn and xgboost allow pickle, and in cases when not present packages like tensorflow have their own load/save mechansms.

rapids 0.14, conda prebuilt packages, Ubuntu 18.04, RTX 2080, 440.33.01, cuda 10.2.

Perhaps this is solved already in 0.15 or 0.16?

The text was updated successfully, but these errors were encountered:

pseudotensor · 2020-11-13T23:36:06Z

Note that the non-dask version has no problems with being pickle dumped and loaded.

pseudotensor · 2020-11-18T06:48:18Z

I'm aware that some CUML transformers cannot be pickled until fit, but this cannot be pickled even after fit.

Any thoughts from CUML team? This blocks our use of such CUML transformers.

Nanthini10 · 2020-11-18T17:01:12Z

@pseudotensor I'm looking into this, sorry for the delay.

cjnolet · 2020-11-18T20:28:26Z

@pseudotensor, in general, the cuml.dask wrappers are not meant to be pickled because they often contain connection information for the local Dask cluster (and potentially other environment-specific information).

You can extract the underlying single-GPU model with serializable_model = encoder.get_combined_model(). The single GPU inference also works on this serializable model. You can also use ParallelPostFit from Dask-ML to load the serializable model back into a Dask cluster for distributed inference.

pseudotensor · 2020-11-18T20:56:14Z

Best would be if this was handled by CUML class itself rather than user going through those steps. E.g. requiring a separate package just to handle load/save is a bit much. And requiring a separate package just to do predict/transform is also alot. And it's not clear, even given what you said, the impact. I
cannot be sure I would do the right things to ensure it behave same as any reference scikit-learn like model or behave as if I never left the session (i.e. never needed to pickle). It should be seamless.

I already extract "client" out of the encoder, and one can inject that back in for transform. But this too is not necessary. You shouldn't need client as argument and get just get client from the environment like xgboost and other packages.

xgboost handles all these things fine for the dask classes. So it's unclear why CUML cannot do same thing. Maybe @teju85 has some insights.

pseudotensor · 2020-11-20T10:10:49Z

Also, FYI, pickling of dask CUML algos is even already mentioned here https://docs.rapids.ai/api/cuml/stable/api.html#manifold Known issue: If a UMAP model has not yet been fit, it cannot be pickled.

pseudotensor · 2020-12-22T23:36:28Z

Any progress on this? I don't think there is any technical hurdle and any normal scikit model/transformer can do this, and I think dask algos can too with minimal effort.

github-actions · 2021-02-16T22:19:20Z

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

drobison00 · 2021-02-18T17:54:22Z

It sounds like this is something that isn't likely to change in the near future, outside the workflow described by cjnolet. Closing for now; if there is a specific workflow that requires the ability to directly serialize Dask cuml models we can revisit.

pseudotensor added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 13, 2020

pseudotensor mentioned this issue Dec 22, 2020

[FEATURE] convert dask -> non-dask model dmlc/xgboost#6547

Closed

github-actions bot added the inactive-30d label Feb 16, 2021

drobison00 closed this as completed Feb 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] ValueError: Attribute setstate not found in <class 'cuml.dask.cluster.kmeans.KMeans'> #3140

[BUG] ValueError: Attribute setstate not found in <class 'cuml.dask.cluster.kmeans.KMeans'> #3140

pseudotensor commented Nov 13, 2020 •

edited

Loading

pseudotensor commented Nov 13, 2020

pseudotensor commented Nov 18, 2020 •

edited

Loading

Nanthini10 commented Nov 18, 2020

cjnolet commented Nov 18, 2020

pseudotensor commented Nov 18, 2020

pseudotensor commented Nov 20, 2020

pseudotensor commented Dec 22, 2020

github-actions bot commented Feb 16, 2021

drobison00 commented Feb 18, 2021

[BUG] ValueError: Attribute __setstate__ not found in <class 'cuml.dask.cluster.kmeans.KMeans'> #3140

[BUG] ValueError: Attribute __setstate__ not found in <class 'cuml.dask.cluster.kmeans.KMeans'> #3140

Comments

pseudotensor commented Nov 13, 2020 • edited Loading

pseudotensor commented Nov 13, 2020

pseudotensor commented Nov 18, 2020 • edited Loading

Nanthini10 commented Nov 18, 2020

cjnolet commented Nov 18, 2020

pseudotensor commented Nov 18, 2020

pseudotensor commented Nov 20, 2020

pseudotensor commented Dec 22, 2020

github-actions bot commented Feb 16, 2021

drobison00 commented Feb 18, 2021

[BUG] ValueError: Attribute setstate not found in <class 'cuml.dask.cluster.kmeans.KMeans'> #3140

[BUG] ValueError: Attribute setstate not found in <class 'cuml.dask.cluster.kmeans.KMeans'> #3140

pseudotensor commented Nov 13, 2020 •

edited

Loading

pseudotensor commented Nov 18, 2020 •

edited

Loading