Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneHotEncoding should retain metadata #57

Open
rofinn opened this issue Mar 19, 2021 · 2 comments
Open

OneHotEncoding should retain metadata #57

rofinn opened this issue Mar 19, 2021 · 2 comments

Comments

@rofinn
Copy link
Member

rofinn commented Mar 19, 2021

The current OneHotEncoding transform seems to always just return a NxP binary matrix despite knowing what the categories are and potentially being passed a type that would allow you to retain that info.

For example, if I have OneHotEncoding(Hour(0):Hour(1):Hour(23)) and then I pass a dataframe with an HoD column, I could easily see that transform returning a new dataframe with a column name for each hour. Similarly, if we took a function argument, we could do something like:

FeatureTransforms.ohe(Hour, data; dims=:time)

This would return a new KeyedArray or AxisArray that retains the category information for the p dimension, and also uses the dims argument to mean use dimension keys for KeyedArrays or AxisArrays.

@glennmoy
Copy link
Member

Can you explain what you had in mind for the function argument? I'm not sure I see how it translates to keeping the category information? Is it that the function determines what the category labels are?

@rofinn
Copy link
Member Author

rofinn commented Mar 22, 2021

Is it that the function determines what the category labels are?

Yes, so if I have an axis with datetimes I might use hour to lazily generate the categories. For AxisArrays and KeyedArrays we could then return n (:time/DateTime) x p (:category/Hour).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants