-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsupported vector type on datasource that provides it #21
Comments
Duplicate of #18 and #2 (and probably some others)
The Perhaps it will be possible to create a subclass of
You can waste computation time, or you can waste your own time. If you think that your time is more abundant than computer time, then you can try creating a synthetic dataframe schema definition, as explained here: #18 (comment) |
Thanks, I'll see what I can do with the "synthetic definition" as using a |
You don't need to embed and execute the The idea is to create a pair of "synthetic" Anyway, if 10% time penalty is such a huge deal for your use case, then you should be probably avoiding the PMML approach. |
Hello,
We are using Spark with a custom datasource that directly gives a
label, vector(features)
dataframe which saves using aVectorAssembler
in the pipeline.While this works just fine to train ML models, we can't export them to PMML using
jpmml-sparkml
because we receive this errorjava.lang.IllegalArgumentException: Expected string, integral, double or boolean type, got vector type
Looking around on various sites, I see that it comes from the fact that
jpmml-sparkml
does not know how to handle our dataframe. What metadata are we missing so that our models can be exported to PMML?As a workaround, we can have "split" data and use a
VectorAssembler
but it uses some computation time that we feel is a bit wasted.The text was updated successfully, but these errors were encountered: