Can this be integrated with MLJ? #91

PyDataBlog · 2021-05-25T06:19:46Z

It would be really neat if this idea was integrated into MLJ ecosystem in some form so Julia can have something like what Python has with scikit-learn. Custom transformers supported by the ML ecosystem. Is this possible?

xiaodaigh · 2021-07-17T10:29:09Z

hope not. MLJ is pretty heavy. need a lighter alternative

PyDataBlog · 2021-07-17T10:42:19Z

hope not. MLJ is pretty heavy. need a lighter alternative

Which part do you want stripped?

xiaodaigh · 2021-07-17T10:54:22Z

with MLJ, I need to make a machine before I can do OneHotEncoding. So It's too much boiler-platy stuff for my liking

PyDataBlog · 2021-07-17T11:00:22Z

with MLJ, I need to make a machine before I can do OneHotEncoding. So It's too much boiler-platy stuff for my liking

ML is rarely done in some silo but mostly some end to end pipeline. Who would want to use OHE on its own from an ML library? So it makes sense to tie everything together, I think.

xiaodaigh · 2021-07-17T11:05:49Z

I see. you are meant to pipeline with a machine

PyDataBlog · 2021-07-17T11:29:57Z

I see. you are meant to pipeline with a machine

Exactly! So that design logic flows through most ML libraries now. This issue would complete MLJ as a fully fledged end to end ML library. Imagine shipping one binary pipeline object like everyone does with Scikit-Learn to a production system instead of multiple objects from feature engineering steps.

juliohm · 2021-10-25T10:51:48Z

I think the issue of having pipelines here is separate from the issue of integrating with MLJ.jl. I personally find it more attractive to implement pipelines here as a standalone concept. MLJ.jl could then see if there is value in refactoring or supporting the pipelines from here.

juliohm · 2021-10-25T10:53:00Z

Also from the a community standpoint, it is much nicer to focus efforts on transforms in a separate hub that is detached from the huge MLJ.jl ecosystem that is already hard to follow even for experienced Julia programmers. If someone wants to add a new transform here, it is easy. Now try to do that in MLJ.jl and the person will have to first find out which package is the appropriate package, which API should be implemented, etc.

glennmoy · 2021-10-25T12:09:51Z

The points raised by @juliohm are spot on and largely why this package exists.

MLJ is a sprawling ecosystem which our internal codebase is not set up to adopt. Refactoring our code for MLJ and getting our researchers trained to use it would take substantially more effort with far less certainty.
We wanted to make something lightweight that users could extend and adopt to suit their own use-cases (see Support for text representation transforms? #102). This was also motivated by the need for our own internal feature engineering packages to easily extend the API without much effort.
We want to interface this with our other packages like FeatureDescriptors.jl and AxisSets.jl which are starting to comprise Invenia's "feature-engineering ecosystem" that is an alternative (at least for us) to the hegemony of MLJ.

That being said, because of (2) MLJ should be able extend this API or integrate it into its packages.

rofinn · 2021-12-23T20:49:00Z

FWIW, it looks like we could support that entire ecosystem by:

Depending only on MLJModelInterface.jl which is a pretty minimal package
Putting the MLJModelInterface.Unsupervised wrappers in a separate submodule

I don't think we'd need to change anything else about how our package works.

PyDataBlog · 2021-12-23T20:51:54Z

FWIW, it looks like we could support that entire ecosystem by:

Depending only on MLJModelInterface.jl which is a pretty minimal package

Putting the MLJModelInterface.Unsupervised wrappers in a separate submodule

I don't think we'd need to change anything else about how our package works.

That would be awesome and significantly improve the Julia ML ecosystem!

juliohm · 2021-12-23T23:00:23Z

Just pointing out in case someone missed it... We addressed a couple of design issues in this package on a fresh new package called TableTransforms.jl, which supports composible, revertible pipelines: https://github.com/JuliaML/TableTransforms.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can this be integrated with MLJ? #91

Can this be integrated with MLJ? #91

PyDataBlog commented May 25, 2021

xiaodaigh commented Jul 17, 2021

PyDataBlog commented Jul 17, 2021

xiaodaigh commented Jul 17, 2021

PyDataBlog commented Jul 17, 2021

xiaodaigh commented Jul 17, 2021

PyDataBlog commented Jul 17, 2021

juliohm commented Oct 25, 2021

juliohm commented Oct 25, 2021 •

edited

Loading

glennmoy commented Oct 25, 2021

rofinn commented Dec 23, 2021 •

edited

Loading

PyDataBlog commented Dec 23, 2021

juliohm commented Dec 23, 2021

Can this be integrated with MLJ? #91

Can this be integrated with MLJ? #91

Comments

PyDataBlog commented May 25, 2021

xiaodaigh commented Jul 17, 2021

PyDataBlog commented Jul 17, 2021

xiaodaigh commented Jul 17, 2021

PyDataBlog commented Jul 17, 2021

xiaodaigh commented Jul 17, 2021

PyDataBlog commented Jul 17, 2021

juliohm commented Oct 25, 2021

juliohm commented Oct 25, 2021 • edited Loading

glennmoy commented Oct 25, 2021

rofinn commented Dec 23, 2021 • edited Loading

PyDataBlog commented Dec 23, 2021

juliohm commented Dec 23, 2021

juliohm commented Oct 25, 2021 •

edited

Loading

rofinn commented Dec 23, 2021 •

edited

Loading