You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi team, Thanks for the useful library! I wonder if you'd be open to this idea:
I would like to be able to:
Set up categorizing features (let's say, for illustration, CATEGORY=[footwear, t-shirts, socks], SIZE=[S, M, L, US-Mens-8, US-Womens-6) and define Factors on them
Generate time-series with more restricted feature combinations than the outer product (again for illustration, "t-shirt sizes for t-shirts, shoe sizes for footwear")
Today, it seems like Generator.generate() hard-codes the assumption that time-series should be generated for the product of all provided feature values.
It'd be helpful if, instead, we could have the option of customizing this join to limit down generated combinations?
Some options I can think of:
Leave the library as-is: Users generate full outer product and limit down what they want in post-processing
This seems possible already, but very RAM-intensive if your desired combinations are sparse?
Accept an optional dataframe of factor combinations as parameter to the generate() method
Gives full flexibility over which combinations are kept / ignored, without assuming any particular rigid hierarchies between features
...But might need to do a bit of validation to protect against user errors? May not be super easy to use without some documented examples / functions to generate the dataframe
Some more complex API for feature configuration that accommodates specifying valid/invalid feature combinations
Might be nicer for usability, but difficult to make general: E.g. a straightforward hierarchy could be represented as a nested dict, but in practice many applications have multiple intersecting views of product category information e.g. brand, type, target segment, etc.
The text was updated successfully, but these errors were encountered:
@athewsey thanks for your great feature request, and some implementation suggestions! Personally, I like option 3, but as you said, it is not an easy one to make general.
However, I am quite busy recently, and will not have time to work on it in next few months. Feel free to improve the package if you have ideas and time.
Hi team, Thanks for the useful library! I wonder if you'd be open to this idea:
I would like to be able to:
CATEGORY=[footwear, t-shirts, socks]
,SIZE=[S, M, L, US-Mens-8, US-Womens-6
) and define Factors on themToday, it seems like Generator.generate() hard-codes the assumption that time-series should be generated for the product of all provided feature values.
It'd be helpful if, instead, we could have the option of customizing this join to limit down generated combinations?
Some options I can think of:
generate()
methodThe text was updated successfully, but these errors were encountered: