Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalizer #92

Closed
sylvainferrec opened this issue Dec 13, 2018 · 1 comment
Closed

Normalizer #92

sylvainferrec opened this issue Dec 13, 2018 · 1 comment

Comments

@sylvainferrec
Copy link

**Hi,

I am trying to run this code to vectorize some text, then classify it with a logistic regression :**

from sklearn.linear_model import LogisticRegression
pipeline = PMMLPipeline([
("tfidf", TfidfVectorizer(
norm = None,
ngram_range=(1,2),
# min_df=5,
max_df=0.5,
analyzer = "word",
max_features=1000,
token_pattern = None,
tokenizer = Splitter()))
])

Unfortunaltely, the normalization is not available in sklearn2pmml and the results are not good enough without it.
So I am thinking about export a PMML for the TF-IDF part, normalize the results and then export another PMML for the classification part. The Normalization part would be written in Java and implemented between our 2 PMMLs.
But I cannot use a PMMLPipeline with a TfidfVectorizer transformer only. With this code :

pipeline = PMMLPipeline([
("tfidf", TfidfVectorizer(
norm = None,
ngram_range=(1,2),
# min_df=5,
max_df=0.5,
analyzer = "word",
max_features=1000,
token_pattern = None,
tokenizer = Splitter()))
])

model = pipeline.fit(x_train)

sklearn2pmml(model,"model_text7.pmml", with_repr = True, debug = True)

I got this error message :

Exception in thread "main" java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.preprocessing.data.Normalizer)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:616)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:68)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:202)
at org.jpmml.sklearn.Main.run(Main.java:145)
at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
... 6 more

Is there a way to export a TfidfVectorizer transformer in a PMML ? Or implement a l2 normalization in PMML pipeline ?

@vruusmann
Copy link
Member

Closing as duplicate of #64

Exception in thread "main" java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.preprocessing.data.Normalizer)

A pipeline has to end with an estimator, not a transformer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants