Normalizer #92

sylvainferrec · 2018-12-13T14:38:00Z

**Hi,

I am trying to run this code to vectorize some text, then classify it with a logistic regression :**

from sklearn.linear_model import LogisticRegression
pipeline = PMMLPipeline([
("tfidf", TfidfVectorizer(
norm = None,
ngram_range=(1,2),
# min_df=5,
max_df=0.5,
analyzer = "word",
max_features=1000,
token_pattern = None,
tokenizer = Splitter()))
])

Unfortunaltely, the normalization is not available in sklearn2pmml and the results are not good enough without it.
So I am thinking about export a PMML for the TF-IDF part, normalize the results and then export another PMML for the classification part. The Normalization part would be written in Java and implemented between our 2 PMMLs.
But I cannot use a PMMLPipeline with a TfidfVectorizer transformer only. With this code :

pipeline = PMMLPipeline([
("tfidf", TfidfVectorizer(
norm = None,
ngram_range=(1,2),
# min_df=5,
max_df=0.5,
analyzer = "word",
max_features=1000,
token_pattern = None,
tokenizer = Splitter()))
])

model = pipeline.fit(x_train)

sklearn2pmml(model,"model_text7.pmml", with_repr = True, debug = True)

I got this error message :

Exception in thread "main" java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.preprocessing.data.Normalizer)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:616)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:68)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:202)
at org.jpmml.sklearn.Main.run(Main.java:145)
at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
at java.lang.Class.cast(Class.java:3369)
at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
... 6 more

Is there a way to export a TfidfVectorizer transformer in a PMML ? Or implement a l2 normalization in PMML pipeline ?

vruusmann · 2018-12-13T15:55:11Z

Closing as duplicate of #64

Exception in thread "main" java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.preprocessing.data.Normalizer)

A pipeline has to end with an estimator, not a transformer.

vruusmann closed this as completed Dec 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalizer #92

Normalizer #92

sylvainferrec commented Dec 13, 2018

vruusmann commented Dec 13, 2018

Normalizer #92

Normalizer #92

Comments

sylvainferrec commented Dec 13, 2018

vruusmann commented Dec 13, 2018