Handling columns with null values #44

malathit · 2018-06-26T08:36:04Z

Exception in thread "main" java.lang.IllegalArgumentException: Field a1 has valid values [b, a]
	at org.jpmml.converter.PMMLEncoder.toCategorical(PMMLEncoder.java:189)
	at org.jpmml.sparkml.feature.VectorIndexerModelConverter.encodeFeatures(VectorIndexerModelConverter.java:98)
	at org.jpmml.sparkml.FeatureConverter.registerFeatures(FeatureConverter.java:48)
	at org.jpmml.sparkml.ConverterUtil.toPMML(ConverterUtil.java:96)
	at org.jpmml.sparkml.ConverterUtil.toPMML(ConverterUtil.java:68)

I get the above exception when the column has null values. Any ideas on how to resolve this? Please comment if further details are needed.

The text was updated successfully, but these errors were encountered:

vruusmann · 2018-06-26T08:47:22Z

I get the above exception when the column has null values. Any ideas on how to resolve this?

Apply org.apache.spark.ml.feature.Imputer to this column first?

What is your Apache Spark version? How does Apache Spark handle columns with missing values - AFAIK it should also crash sooner or later.

malathit · 2018-06-26T09:10:48Z

Hi,

Thanks for the quick reply. AFAIK the org.apache.spark.ml.feature.Imputer class can be used only on float or double data types. The column that gives me error is String type.

I am using Apache spark 2.2.0.

malathit · 2018-06-26T09:11:26Z

How does Apache Spark handle columns with missing values - AFAIK it should also crash sooner or later.

In apache spark null values are handled with StringIndexer setInvalid method with value set to "keep". Let me share the simplied code where I can reproduce the issue and share it.

malathit · 2018-06-27T09:39:16Z

@vruusmann This is the code and it gives the issue

vruusmann · 2018-06-27T09:45:53Z

@malathit90 Sorry, I don't have time to debug images.

malathit · 2018-06-27T10:03:00Z

Here is the snippet giving the error @vruusmann

val a1Idx = new StringIndexer().setInputCol("a1").setOutputCol("a1Indexed").setHandleInvalid("keep")

val featureAssembler = new VectorAssembler().setInputCols(Array("a1Indexed", "a2")).setOutputCol("features");

val labelIndexer = new StringIndexer().setInputCol("a16").setOutputCol("labelIndexed").fit(zeroFilledData);

val featureIndexer = new VectorIndexer().setInputCol("features").setOutputCol("featuresIndexed").setMaxCategories(15);

val classifier = new RandomForestClassifier().setLabelCol("labelIndexed").setFeaturesCol("featuresIndexed").setImpurity("gini").setPredictionCol("predictionIndexed");

val labelConverter = new IndexToString().setInputCol("predictionIndexed").setOutputCol("prediction").setLabels(labelIndexer.labels);

val pipeline = new Pipeline().setStages(Array(a1Idx, labelIndexer, featureAssembler, featureIndexer, classifier, labelConverter));

val model = pipeline.fit(zeroFilledData)

MetroJAXBUtil.marshalPMML(ConverterUtil.toPMML(df.schema, model), new FileOutputStream("/tmp/out.pmml"))```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling columns with null values #44

Handling columns with null values #44

malathit commented Jun 26, 2018 •

edited by vruusmann

Loading

vruusmann commented Jun 26, 2018

malathit commented Jun 26, 2018

malathit commented Jun 26, 2018 •

edited

Loading

malathit commented Jun 27, 2018

vruusmann commented Jun 27, 2018

malathit commented Jun 27, 2018 •

edited

Loading

Handling columns with null values #44

Handling columns with null values #44

Comments

malathit commented Jun 26, 2018 • edited by vruusmann Loading

vruusmann commented Jun 26, 2018

malathit commented Jun 26, 2018

malathit commented Jun 26, 2018 • edited Loading

malathit commented Jun 27, 2018

vruusmann commented Jun 27, 2018

malathit commented Jun 27, 2018 • edited Loading

malathit commented Jun 26, 2018 •

edited by vruusmann

Loading

malathit commented Jun 26, 2018 •

edited

Loading

malathit commented Jun 27, 2018 •

edited

Loading