-
Notifications
You must be signed in to change notification settings - Fork 16
Support for missing
attribute
#19
Comments
What is your definition of a "missing value"? A Java The JPMML-XGBoost library has been very thoroughly tested with continuous/categorical/missing/invalid data form 6+ years, without a single major issue. So, again, I must assume that the problem resides somewhere in your application code. Please prepare & share a minimal reproducible example - a CSV data file plus an Apache Spark script (Scala or PySpark), which I can run and explore locally. |
This project contains an integration test that uses sparse categorical data: This test is 100% reproducible. |
i've tried SparseToDenseTransformer before, and see it fixes the inconsistent problem caused by sparse vector problem. |
i set missing value to 0, in xgboost4j-spark 1.2.0, if i set missing to other values, then it'll give xgboost training failed error. |
The How can the PMML engine make correct predictions if it is missing this critical piece of information? Take the PMML document, and insert the following <DataField name="hour" optype="categorical" dataType="integer">
<!-- THIS -->
<Value property="missing" value="0"/>
</DataField> |
It would be nice to automate the generation of extra Here are some related feature requests: jpmml/jpmml-sparkml#14 and jpmml/jpmml-sparkml#25 Newer XGBoost versions also store this information in model dumps. Here's a related Scikit-Learn issue: jpmml/jpmml-sklearn#166 |
missing
attribute
Hi vruusmann, Unfortunately, after i add the extra DataField/Value@property="missing" fields, the inconsistent problem still exists, i'm frustrated. |
Hi vruusmann,
Sorry to disturb again, i've been headache for the inconsistent problem about several months. after i checked the doc of xgboost4j, i see after version 0.9, they've made some fixes about the missing value problem. so i upgraded xgboost4j-spark to 1.2.0 with spark 3. but now i still get the inconsistent problem.
you can see i only have one categorical feature hour which doesn't contain missing values, but if i remove categorical feature and use only numeric features, then the predict is consistent.
do you have any clues?
The text was updated successfully, but these errors were encountered: