Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fill with value=f::Function isn't comfortable with entirely missing data #51

Closed
nickrobinson251 opened this issue Aug 29, 2019 · 2 comments
Milestone

Comments

@nickrobinson251
Copy link
Contributor

nickrobinson251 commented Aug 29, 2019

Here are some common functions we may want to use with fill but which (arguably) have awkward behaviour if all data is missing -- stemming from fill using drop and subsequently ending up calling the function on a empty array:

julia> using Impute, Statistics

julia> x = [missing 2.0; missing 4.0]
2×2 Array{Union{Missing, Float64},2}:
 missing  2.0
 missing  4.0

julia> Impute.fill(x, value=mean, dims=1)
2×2 Array{Union{Missing, Float64},2}:
 NaN  2.0
 NaN  4.0
julia> x = [missing 2; missing 4];

julia> Impute.fill(x, value=mean, dims=1)
ERROR: InexactError: Int64(NaN)
julia> x = [missing 2.0; missing 4.0];

julia> Impute.fill(x, value=median, dims=1)
ERROR: ArgumentError: median of an empty array is undefined, Union{Missing, Float64}[]
julia> x = [missing 2.0; missing 4.0];

julia> Impute.fill(x, value=middle, dims=1)
ERROR: ArgumentError: collection must be non-empty

(Impute v0.3, tested on Julia v1.2)

Apologies for not having a solution here... but some question we might want to think about

  • Should these all be errors?
  • If so, should they error in a "nicer" way?
  • If not, should they simply fail to impute, i.e. leave the missings, a la locf if the first element is missing?
@rofinn
Copy link
Member

rofinn commented Sep 2, 2019

I'd be okay with retaining the missing in this case. One thing I've been considering is that if we split out an Iterators interface then we'd likely want to use an OnlineStat instead. Unfortunately, this would just use that default value (e.g., value(Mean()) -> 0.0) which also might not be what we want? I guess in that case we could check nobs(value) before returning.

This was referenced Sep 25, 2020
@rofinn rofinn added this to the 0.6 milestone Oct 10, 2020
@rofinn
Copy link
Member

rofinn commented Oct 16, 2020

Alright, #69 fixed this by having impute(data::AbstractArray{Missing}, imp) just return in the input data cause most imputation methods are going to need at least some non-missing values to work properly.

@rofinn rofinn closed this as completed Oct 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants