Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats functions for other numeric types than float #410

Closed
sebhofer opened this issue Aug 9, 2018 · 6 comments
Closed

Stats functions for other numeric types than float #410

sebhofer opened this issue Aug 9, 2018 · 6 comments

Comments

@sebhofer
Copy link

sebhofer commented Aug 9, 2018

While thinking about implementation of a describe function (cf. #398) today, I realized that all Stats functions are defined for float only. I don't see any reason why those (in principle) shouldn't exist for other numeric types. I'm guessing that one could just cast all other numeric types to floats and then use the existing functionality. As the result of those functions are floats in most cases, this would not [edited] restrict us in any way.

I'd be willing to give it a shot, but I need some pointers how to achieve this in an efficient and manageable way. Any suggestions (@tpetricek @zyzhu )?

@tpetricek
Copy link
Member

There is no good reason for why the stats functions are defined only for float. It's mainly because that is the "default numerical series" in Deedle in some other places. For example df?SomeColumn also defaults to float.

We could add support for other types. I suspect many of the functions might not make sense for integers, but it would be good to support those that do. We could add float32 but that's likely not very useful for anyone. Having decimal would probably make sense though. (Alternatively, we could see if we can actually make those functions generic using static constraints, that might be a good approach too.)

@sebhofer
Copy link
Author

sebhofer commented Aug 10, 2018

I actually think that most of the functions would make sense for integers. It’s just that the result is not an integer. Just think about mean or variance. A sequence of integers certainly does have a mean, and there should be a function to calculate it.

But I really don’t know what would be a good strategy to handle these functions for different types (yet).

Edit: Or did you mean users should just directly create frames from floats instead of ints? I guess that would make sense too. However, if it’s reasonably simple to support ints, I think it would make sense. And decimals would certainly be great too!

@zyzhu
Copy link
Contributor

zyzhu commented Aug 13, 2018

@tpetricek I've tried overload member functions but has some problems.

For instance, I overloaded mean function

  static member mean (series:Series<'K, float>) =
    let sums = initSumsSparse 1 (valuesAllOpt series)
    sums.sum / sums.nobs

  static member mean (series:Series<'K, int>) =
    let sums = initSumsSparse 1 (valuesAllOpt (series |> Series.mapValues float))
    sums.sum / sums.nobs

But I also have to overload levelMean functions

  static member levelMean (level:'K -> 'L) (series:Series<'K, float>) = 
    Series.applyLevel level (Stats.mean:Series<'K,float>->float) series

  static member levelMean (level:'K -> 'L) (series:Series<'K, int>) = 
    Series.applyLevel level (Stats.mean:Series<'K,int>->float) series

But levelMean gives me error message as 'Method with curried arguments cannot be overloaded. Consider using a method taking tupled arguments'

Any idea of how to get around this problem?

@sebhofer
Copy link
Author

By now I'm thinking: Wouldn't it probably make more sense to make the functions generic? This would require rewriting some of the helper functions, but I think it could be done and it looks like the cleaner (and possibly even more efficient) approach to me.

@tpetricek
Copy link
Member

@zyzhu Yeah, I think this is the problem - if we want to keep the nice pipe-compatible syntax like ts |> Series.levelMean fst then there is no way to have the function/member overloaded - the overload resolution only works based on the first argument and so I think defining the above overload for levelMean is not possible. What the error message is saying is that we could define it as:

static member levelMean (level:'K -> 'L, series:Series<'K, float>) = 

... but this would no longer work nicely with pipes.

I think there might be a way to make this generic with some inline and static type constraints magic. That sounds good to me, but it will probably make the code a bit more complicated.

@zyzhu
Copy link
Contributor

zyzhu commented Aug 14, 2018

close issue as it's addressed in #418

@zyzhu zyzhu closed this as completed Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants