Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize model verification to unsupervised learning methods, custom output fields #97

Open
KapoorHitesh opened this issue May 28, 2018 · 4 comments

Comments

@KapoorHitesh
Copy link

KapoorHitesh commented May 28, 2018

For kmeans algorithm, not getting ModelVerification section/tag in generated pmml file.

iris_pipeline = PMMLPipeline([
	("pca", PCA(n_components = 3)),
	("clusterer", KMeans(n_clusters = 3))
])

x_columns = ["Sepal.Length", "Sepal.Width", "Petal.Length","Petal.Width"]
iris_pipeline.fit(iris_df[x_columns])
from sklearn2pmml import sklearn2pmml
iris_pipeline.verify(iris_df[x_columns])
sklearn2pmml(iris_pipeline, "kmeanstry.pmml", with_repr = True)
@vruusmann
Copy link
Member

Model verification is most useful with supervised learning methods such as regression and classification, where there is a "clear" target field to be checked.

Clustering is an unsupervised learning method. As the target field is missing, then something else needs to be checked, such as the distance of the verification data record to a (sub)set of clusters (in PMML speak, "cluster affinities").

@KapoorHitesh
Copy link
Author

Thank you for the quick response. Do you have any example PMML file which uses this?
Are you planning to implement this feature in sklearn2pmml?

@vruusmann
Copy link
Member

Do you have any example PMML file which uses this?

Integration tests for the KMeans estimator type use cluster affinities for "external" validation:
https://github.com/jpmml/jpmml-sklearn/blob/master/src/test/resources/main.py#L112-L118

So, instead of saving cluster affinities to a CSV file, they should be bundled into the PKL file so that the JPMML-SkLearn library could see them.

Are you planning to implement this feature in sklearn2pmml?

Not a priority for me.

But lets keep this issue open, as a means to track ideas/progress on this topic (and related topics).

@vruusmann vruusmann changed the title pipeline.verify() not giving model verification tag/section in pmml for kmeans Generalize model verification to unsupervised learning methods, custom output fields May 28, 2018
@pradeeppadmarajaiah
Copy link

Hi,
Can you please post an example which explains on how to build PMML pipeline for K-means clustering using sklearn2pmml?

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants