Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for text representation transforms? #102

Open
pietrolesci opened this issue Aug 19, 2021 · 3 comments
Open

Support for text representation transforms? #102

pietrolesci opened this issue Aug 19, 2021 · 3 comments

Comments

@pietrolesci
Copy link

Hi there,

First of all, thanks for creating this amazing library - investing time and money to support it, and making it open source. I discovered it at JuliaCon 2021 and have been immediately fascinated by it.

Question: Is there interest in supporting text representation (e.g., using ScikitLearn terminology: CountVectorizer, TfidfVectorizer, NgramVectorizer, etc) features in FeatureTransforms.jl?

Some of them are covered in TextAnalysis but are quite different, from the user perspective, from what anyone would expect looking at ScikitLearn. In particular, they are not "pipe-able" nor do they immediately return what is interesting for the user (i.e., do not implement the common "fit_transform" paradigm). In other words, it's quite non-trivial getting from text to input representations that can be fed to a machine learning model (e.g., MLJ, Flux, etc).

FeatureTransforms.jl seems a good place to support these transforms and, if interesting, I'd be happy to work on adding support for them.

@glennmoy
Copy link
Member

Hi, thanks for getting in touch, glad you like the package :) Sorry it hasn't been my focus since JuliaCon so I haven't checked up on it since.

Question: Is there interest in supporting text representation (e.g., using ScikitLearn terminology: CountVectorizer, TfidfVectorizer, NgramVectorizer, etc) features in FeatureTransforms.jl?

TBH not particularly since we don't have any use cases for "textual features" so it wasn't in the original roadmap. But that being said I'd happily support adding this functionality provided it didn't introduce any heavy / non-standard dependencies. (I'd rather not have users who don't work with NLP to have to load in dependencies they don't need).

However if this was unavoidable, another option is to create a package that extends this API but adds in the functionality you need. For instance, we have a private package for precisely this purpose.

@pietrolesci
Copy link
Author

Hi @glennmoy,

Thanks for your answer. I went through the linked issue and I really see the point of having a lightweight alternative to MLJ.

Given the use-cases you are interested in at Invenia, it seems there is no interest in supporting NLP-related transformers. On the other hand, having an independent package maintained by one person (me, in this case) seems to add noise to the already crowded feature engineering packages sub-community within the larger Julia ecosystem.

For the time being, I will refrain to contribute NLP transformers. Happy to connect in the future if there will be interest from Invenia's side.

Thank you very much for your attention. I wish you a very good day.

Best,
Pietro

@glennmoy
Copy link
Member

No problem, thanks for getting in touch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants