-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double datatype is used within switch-case statement in generated java source #21
Comments
Can you attach this problematic PMML file here? The referenced blog post deals with many CHAID model configurations, so it would be extra work to generate and test them all. Very interesting that you're going through my work so systematically.
Yes, the Right now I would say that it's more like a PMML converter problem - it should be automatically "downcasting" categorical
If it's a one-to-one mapping, then it might be a better idea to generate a plain dictionary lookup in that place. Something like The Java compiler will also be translating (smaller-) |
Please find attached the generated PMML and the java file. Also, a Decision Tree-based PMML (of which, I believe, shares the same structure, but transpiled to very different java source). Hence, I was wondering why it was "treated" differently by the translator. So, it does make sense that it might be problem of PMML converter instead. The problematic PMML was generated using following code:
I like your project very much and it should gain more attention, but to contribute on coding is way out of my league, because of the complexity, coding style and my lack of coding skill. So, testing it is the very least that I can do to contribute. |
Got it - this issue manifests itself, because the CHAID model is trained on categorical double fields. Casting to int is not an option here, because the original input values are fractional (eg. The quick workaround would be to cast categorical doubles to categorical strings - note the use of the def make_passthrough_mapper(cols):
return DataFrameMapper(
[([col], CategoricalDomain(dtype = str)) for col in cols]
) The actual fix must therefore also go in the direction of a dictionary lookup. |
@denmase I've seen you inquiring several GitHub scorecard projects about PMML support. Do you have any specific workflows in mind? If you do, then feel free to shoot me an e-mail, and maybe we can join our forces. |
Great that you found the problem straight away. I don't have any clear and specific workflows in mind yet (at least for now). However, my broad wish is to have easy and seamless pipeline in model development, implementation, and monitoring, for both traditional model (e.g., scorecard, segmentation trees, BR, etc.) and ML model. I found your project about a year ago and then found out later that |
The Java code that is generated by JPMML-Transpiler is still hard-coupled to the underlying JPMML-Evaluator library platform version. For example, the latest JPMML-Transpiler 1.3.1 assumes that it can freely use JPMML-Evaluator 1.6.4 API classes and methods. The upgradeability of Java libraries is production systems is a serious matter - falls to the category of "don't fix it unless it's (completely-) broken". Anyway, your comment just sparked a few ideas in my head. It should be possible to create an extra layer into the JPMML-Evaluator library hierarchy, which would allow referencing a specific JPMML-Evaluator JAR file as "use this JAR version for running this model". Very easy to pull it off using the builder pattern! Something like this: org.jpmml.evaluator.Evaluator evaluator = new LoadingEvaluatorBuilder()
# THIS!
.setRuntimeJar("/path/to/jpmml-evaluator-${version}")
.load("MyModel.pmml.xml")
.build(); The |
Wow.. that's cool. Please make it configurable in a way so third party software, which uses jpmml-evaluator, can expose a way to enable end user to configure which JAR file to be used, or something like that.
I know, and this is actually what happen in my case, it isn't broken, but it isn't working either, because it uses older version of jpmml-evaluator (supports up to PMML v4.3). Downgrading to v4.3 does work for some cases, but in other cases it just doesn't work, while asking for upgrade from vendor isn't an option either. So I am locked and stuck. It's a real pain in the rear for model implementer (like me). Modeller only says "Hey, I've built the model, I don't care about the implementation, that's YOUR problem.". Btw, I'll probably drop you an email or two, to brainstorm about my wish I spoke about in more detail, if it's okay to you. |
Hi @vruusmann,
While testing for your release after #20, I tried to transpile a folder which contains many pmml files, and accidentally, one of them was from Extending Scikit-Learn with CHAID model type, and it was the only file which wasn't compiled properly with following error:
Lines 188 & 293 are both a switch-case statement
I don't know whether it is changed in Java 9 or later, but at least AFAIK in Java 8, we cannot use
double
inswitch-case
statement. Consider usingif
instead?Thank you.
The text was updated successfully, but these errors were encountered: