-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect feature promotion from (high cardinality-) categorical to pseudo-numeric #4
Comments
This is a failing "sanity check" - it appears that one of the GBM trees contains a split instruction, where a "continuous-type" split is attempted on a "categorical-type" feature. For example, a split instruction I have no way of verifying if this sanity check is doing a correct job or not, because I don't know what's the definition of the "MAKE" feature (is it text string, numeric string or number?), and what kind of instructions about this feature have you given to H2O.ai. AFAIK, there is a way to explicitly state which columns are continuous and which others categorical. How are you interacting with H2O.ai in the first place? Directly, using its Scikit-Learn or R wrappers, or in some other way? My integration tests are developed using the Scikit-Learn wrapper, and I didn't encounter any feature typing issues there. |
Hi Villu, Thanks for the speedy response and detailed feedback. After doing some investigating, it appears as though the problem is being caused by the cardinality of the 'MAKE' field being too high (e.g. MAKE has 451 distinct levels). Having a look at the POJO file that is produced by H2O, it seems like H2O converts high cardinality fields to numeric, something which your tool doesn't appear to allow for. I have since reduced the cardinality of this field and the conversion was successful. In response to your question, I'm interacting with H2O via R. Thanks for your time. Regards, |
Interesting fact. Will have to generate a synthetic dataset that would trigger this automatic "categorical-to-pseudo numeric" conversion.
There's an native H2O integration available in the SkLearn2PMML package. There will be one day an integration available for the R2PMML package as well. |
Thanks for the feedback! I come from an R background, which is the reason why I've been using R to interact with H2O. In the future, I'll explore python. It would really be great if support is added to the R2PMML package for H2O models. |
Hi Villu,
Thanks for your assistance with the previous issue that I raised, it's greatly appreciated!
I have however stumbled across a new issue and was wondering whether you could perhaps take a look at it? I'm getting the following error when trying to convert a Tweedie GBM to PMML:
It seems like one of the inputs to the model (MAKE) is causing an issue, however, the same input was used for the Poisson model that I referred to you previously and there were no issues with it one you made allowance for Poisson models in your code.
Your assistance with this would be greatly appreciated.
Thanks for your time.
Regards,
Paulo
The text was updated successfully, but these errors were encountered: