Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Python class sklearn.preprocessing.data.Normalizer) is not a supported Transformer #128

Closed
siriJR opened this issue Feb 24, 2020 · 1 comment

Comments

@siriJR
Copy link

siriJR commented Feb 24, 2020

Hello,My problems are as follows:

Failed to convert
java.lang.IllegalArgumentException: The value object (Python class sklearn.preprocessing.data.Normalizer) is not a supported Transformer
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
	at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:612)
	at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:72)
	at sklearn.Initializer.encodeFeatures(Initializer.java:41)
	at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
	at sklearn.Composite.encodeFeatures(Composite.java:129)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:208)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
	at java.lang.Class.cast(Class.java:3369)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
	... 9 more

Exception in thread "main" java.lang.IllegalArgumentException: The value object (Python class sklearn.preprocessing.data.Normalizer) is not a supported Transformer
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:45)
	at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:612)
	at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:72)
	at sklearn.Initializer.encodeFeatures(Initializer.java:41)
	at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
	at sklearn.Composite.encodeFeatures(Composite.java:129)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:208)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
	at java.lang.Class.cast(Class.java:3369)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
	... 9 more

RuntimeError                              Traceback (most recent call last)
<ipython-input-18-de7aeb2a07c7> in <module>
----> 1 sklearn2pmml(pipeline, "./GBDT+LR3.pmml")

/mnt/lujiren/.pylib/3/sklearn2pmml/__init__.py in sklearn2pmml(pipeline, pmml, user_classpath, with_repr, with_jar, debug, java_encoding)
    263                                 print("Standard error is empty")
    264                 if retcode:
--> 265                         raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
    266         finally:
    267                 if debug:

RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

and my codes :

#%% 
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import LabelBinarizer,StandardScaler,Normalizer,OneHotEncoder
from sklearn2pmml.decoration import CategoricalDomain, ContinuousDomain
from sklearn2pmml.ensemble import GBDTLRClassifier
from sklearn2pmml.pipeline import PMMLPipeline

def make_fit_gbdtlr(gbdt, lr, cat_columns, cont_columns, label_column, df_data):
    mapper = DataFrameMapper(
        [([cat_column], [CategoricalDomain(missing_value_replacement="DEFAULT", invalid_value_treatment="as_missing",
                                           missing_value_treatment="as_median"), OneHotEncoder()]) for cat_column in cat_columns] +

        [([cont_column], [ContinuousDomain(), Normalizer()]) for cont_column in cont_columns]
    )
    classifier = GBDTLRClassifier(gbdt, lr)
    pipeline = PMMLPipeline([ ("mapper", mapper),("classifier", classifier)])
    pipeline.fit(df_data[cat_columns + cont_columns], df_data[label_column])
    return pipeline

#%%

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn2pmml import sklearn2pmml
gbdt = GradientBoostingClassifier(n_estimators=300, max_depth=5)
lr = LogisticRegression(max_iter=100)
pipeline = make_fit_gbdtlr(gbdt, lr, col_cate, col_num, col_label, df_train_1)

When I switch:

sklearn2pmml(pipeline, "./GBDT+LR3.pmml")

and my package version:

import sklearn, sklearn.externals.joblib, sklearn_pandas, sklearn2pmml
print(sklearn.__version__)
print(sklearn.externals.joblib.__version__)
print(sklearn_pandas.__version__)
print(sklearn2pmml.__version__)

--

0.20.0
0.12.5
1.8.0
0.52.1
@vruusmann
Copy link
Member

Exact duplicate of #64

Workaround - there is no need to normalize continuous values when using decision tree-based learning methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants