Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparing the data #7

Open
bgreenwell opened this issue Jul 29, 2020 · 3 comments
Open

Preparing the data #7

bgreenwell opened this issue Jul 29, 2020 · 3 comments

Comments

@bgreenwell
Copy link

Hi @eddelbuettel, first off, thanks for porting CORELS to R. I'm writing a book about trees and came across this while writing about rule-based models and it seems really promising. That being said, would you be open to a few PRs? Starting with this one on formatting the data for the users. I have some starter code I was using for a couple of examples that wouldn't be difficult to generalize, but wanted to check if you had a specific design in mind first? E.g., a formula method

corels(Edibility ~ ., data = mushroom, ...)   OR   corels(X, y, ...)

where X and y are both constructed and written out to a temporary (or user-specified) location, or maybe even just a simple prepareData() function? Soon after, I can submit at least one or two good examples to include that seems perfect for this type of algorithm. Happy to hear your thoughts. Obviously numeric inputs would have to be re-encoded by the user before hand.

@eddelbuettel
Copy link
Collaborator

eddelbuettel commented Jul 29, 2020

Hi and thanks for raising an issue. Yes this is on our TODO list. Preferably with proper data.frame conversion etc but we have not gotten there yet. This is really a group effort with the corels org even though this was so far just me committing. So @nlarusstone and @fingoldin may pipe in as well.

Not sure if you have seen tidycorels by @billster45 which already adds a more R-alike interface (though by going through, IIRC, external files more akin to the corels binary).

PS And see #6 for a little bit of prior discussion on tidycorels.

@bgreenwell
Copy link
Author

bgreenwell commented Jul 29, 2020

Thanks for pointing me towards @billster45's tidycorels package, which looks like it uses a similar approach to what I was simply doing with just writeLines(), but I'd assume there's a more organic way to handle this other than coercing data frames and writing files out to a (possibly temporary) directory for consumption by corels. I'll keep an eye on the repo and post anything that might be useful.

For reference, here's the example I'm playing with: https://gist.github.com/bgreenwell/482cffb8b7a5c60103fe5526b236e0ad

@billster45
Copy link

@bgreenwell yes, writing out the dataframe to text files and all the other parts of tidycorels (e.g. capturing the console output and converting to data.table if-else code) are hacky things I was doing to compare corels to popular ML methods. Then occured I could try package building with them.

Good to try your corels code on the mushroom data and see everything in compact base R. With the polite nudge from @eddelbuettel I was amazed how much cleaner R code can be using only base and one package (data.table). The points made here make more sense to me now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants