Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior when using Impute.locf within groupby-combine procedure #140

Open
BeitianMa opened this issue Nov 1, 2023 · 2 comments

Comments

@BeitianMa
Copy link

As the following code shows, i want to forward fill missing values use Impute.locf function, but just within the same :id

using DataFramesMeta, Impute

df = DataFrame(id = repeat(1:3, 2), value = [1,missing,3,4,missing,missing])

combine(groupby(df, :id), :value => (x -> Impute.locf(x)) => :value)

Unexpectedly, it raises

ERROR: AssertionError: !(all(ismissing, data))

this is clearly beacause there are all missing value under the same :id=2, but the following code

df = DataFrame(id = repeat(1:3, 2), value = [missing,missing,missing,missing,missing,missing])

transform(df, :value = (x -> Impute.locf(x)) => :value)

completed with no error. It just leaves all values missing, which is the desired result

Row  │ id     value   
     │ Int64  Missing 
─────┼────────────────
   1 │     1  missing 
   2 │     2  missing 
   3 │     3  missing 
   4 │     1  missing 
   5 │     2  missing 
   6 │     3  missing

My questions are:

  1. Is it a bug or a feature (for some concerns I don't know)?
  2. How do I get the (grouped) results? Of course, the simpler the code, the better.

Thanks in advance!

@nilshg
Copy link

nilshg commented Nov 1, 2023

As I explained on Discourse, this has nothing to do with groupby:

julia> locf([missing])
1-element Vector{Missing}:
 missing

julia> locf(Union{Float64, Missing}[missing])
ERROR: AssertionError: !(all(ismissing, data))

@rofinn
Copy link
Member

rofinn commented Nov 3, 2023

Hmm, I believe this was introduced to avoid having LOCF silently fail to impute any values. Perhaps we should support a flag or something... If you're positive that you don't want the error to be raised then the easiest solution would probably be something like this.

combine(groupby(df, :id), :value => (x -> Impute.locf(identity.(x))) => :value)

If everything is missing then this will reallocate your array to be Vector{Missing}. Depending on your data using something like ResultTypes.jl with a condition on the error case would allocate less memory, but be slightly more verbose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants