Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ValueError: Attribute __setstate__ not found in <class 'cuml.dask.cluster.kmeans.KMeans'> #3140

Closed
pseudotensor opened this issue Nov 13, 2020 · 9 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working inactive-30d

Comments

@pseudotensor
Copy link

pseudotensor commented Nov 13, 2020

Describe the bug

Pickle dump of cuml instance doesn't complain, but pickle.load does.

ValueError: Attribute __setstate__ not found in <class 'cuml.dask.cluster.kmeans.KMeans'>

Just fit some kmeans state:

from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
with LocalCUDACluster as cluster():
        with Client(cluster) as client:
                from cuml.dask.cluster import KMeans
                encoder = KMeans(client=client)
                X_pd = encoder.fit_transform(X, delayed=False).compute().to_pandas()
                encoder.client = None  # nuke unpicklable part
                import pickle
                pickle.loads(pickle.dumps(encoder))
                client.cancel(X)
                del X
                return X_pd

This fails as above during loads.

Without ability to persist the state of the encoder, this makes it impossible to use for any inference case in any separate session. It's unusual to have to always fit and predict/transform in exact same fork. No production case would be able to use without some persistence to disk. E.g. all of scikit-learn and xgboost allow pickle, and in cases when not present packages like tensorflow have their own load/save mechansms.

rapids 0.14, conda prebuilt packages, Ubuntu 18.04, RTX 2080, 440.33.01, cuda 10.2.

Perhaps this is solved already in 0.15 or 0.16?

@pseudotensor pseudotensor added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 13, 2020
@pseudotensor
Copy link
Author

Note that the non-dask version has no problems with being pickle dumped and loaded.

@pseudotensor
Copy link
Author

pseudotensor commented Nov 18, 2020

I'm aware that some CUML transformers cannot be pickled until fit, but this cannot be pickled even after fit.

Any thoughts from CUML team? This blocks our use of such CUML transformers.

@Nanthini10
Copy link
Contributor

@pseudotensor I'm looking into this, sorry for the delay.

@cjnolet
Copy link
Member

cjnolet commented Nov 18, 2020

@pseudotensor, in general, the cuml.dask wrappers are not meant to be pickled because they often contain connection information for the local Dask cluster (and potentially other environment-specific information).

You can extract the underlying single-GPU model with serializable_model = encoder.get_combined_model(). The single GPU inference also works on this serializable model. You can also use ParallelPostFit from Dask-ML to load the serializable model back into a Dask cluster for distributed inference.

@pseudotensor
Copy link
Author

Best would be if this was handled by CUML class itself rather than user going through those steps. E.g. requiring a separate package just to handle load/save is a bit much. And requiring a separate package just to do predict/transform is also alot. And it's not clear, even given what you said, the impact. I
cannot be sure I would do the right things to ensure it behave same as any reference scikit-learn like model or behave as if I never left the session (i.e. never needed to pickle). It should be seamless.

I already extract "client" out of the encoder, and one can inject that back in for transform. But this too is not necessary. You shouldn't need client as argument and get just get client from the environment like xgboost and other packages.

xgboost handles all these things fine for the dask classes. So it's unclear why CUML cannot do same thing. Maybe @teju85 has some insights.

@pseudotensor
Copy link
Author

Also, FYI, pickling of dask CUML algos is even already mentioned here https://docs.rapids.ai/api/cuml/stable/api.html#manifold Known issue: If a UMAP model has not yet been fit, it cannot be pickled.

@pseudotensor
Copy link
Author

Any progress on this? I don't think there is any technical hurdle and any normal scikit model/transformer can do this, and I think dask algos can too with minimal effort.

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@drobison00
Copy link
Contributor

It sounds like this is something that isn't likely to change in the near future, outside the workflow described by cjnolet. Closing for now; if there is a specific workflow that requires the ability to directly serialize Dask cuml models we can revisit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working inactive-30d
Projects
None yet
Development

No branches or pull requests

4 participants