handle interp for integers by casting and rounding (#71) #142

kpa28-git · 2024-03-05T05:27:17Z

This is a simple solution for #71.

src/imputors/interp.jl

1. We add a `r::Union{RoundingMode, Nothing}=nothing` to `Interpolate`, avoiding any breaking changes. 2. Added unit tests for unsigned integers and cases with and without the RoundingMode. 3. Increment calculation: - Still using `(next - prev) / n` in the default case - If a rounding mode is specified then `div` is used with the corresponding mode (avoids `InexactError`) - If we're given unsigned ints then we convert them to integers for the increment calculation (avoids integer overflow)

rofinn · 2024-04-09T00:34:19Z

I hope you don't mind. I opted to add the commit to your PR to retain the commit history. The force push was to trigger the CI as it was disabled. Adding a rounding mode and dispatching on the increment calculation mostly resolves #71 without introducing a breaking change. Feel free to check the new tests in case I missed something.

kpa28-git · 2024-04-09T15:54:07Z

Nope, I don't mind. Thanks! Looks good, I'll have a chance to test it later. I don't think I'll have anything to add.

kpa28-git · 2024-04-11T19:09:12Z

I found what looks to be a problem with interpolate going past a lower boundary, here are some cases:

ints = [8, fill(missing, 10)..., 2]
uints = [0x08, fill(missing, 10)..., 0x02]
Impute.impute(ints, Impute.Interpolate(r=RoundNearest);)   # [8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, 2]
Impute.impute(uints, Impute.Interpolate(r=RoundNearest);) # InexactError: trunc(UInt8, -1)

Making the gap between the top and bottom value smaller (in this example, it was 5 or lower) makes it not go past the bottom value, but gives RoundUp instead of RoundNearest:

ints_sm = [7, fill(missing, 10)..., 2]
uints_sm = [0x07, fill(missing, 10)..., 0x02]
Impute.impute(ints_sm, Impute.Interpolate(r=RoundNearest);)     # [7, 7, 7, ..., 7, 2]
Impute.impute(uints_sm, Impute.Interpolate(r=RoundNearest);)   # [0x07, 0x07, 0x07, ..., 0x07, 0x02]

rofinn · 2024-04-12T19:49:30Z

Right, forgot rounding would violate the increment assumption from before. I've added a fix:

julia> ints = [8, fill(missing, 10)..., 2];

julia> uints = [0x08, fill(missing, 10)..., 0x02];

julia> Impute.impute(ints, Impute.Interpolate(r=RoundNearest);)
12-element Vector{Union{Missing, Int64}}:
 8
 7
 6
 5
 4
 3
 2
 2
 2
 2
 2
 2

julia> Impute.impute(uints, Impute.Interpolate(r=RoundNearest);)
12-element Vector{Union{Missing, UInt8}}:
 0x08
 0x07
 0x06
 0x05
 0x04
 0x03
 0x02
 0x02
 0x02
 0x02
 0x02
 0x02

rofinn · 2024-04-16T16:24:59Z

Alright if there aren’t any comments I’m gonna go ahead and merge/tag this.

kpa28-git · 2024-04-16T22:48:24Z

I don't mind if you merge it.

This is for future reference (maybe another issue), but my only problem is that the code doesn't handle partial increments as expected. For example:

Impute.impute([08, fill(missing, 12)..., 02], Impute.Interpolate(r=RoundNearest))

[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 2]

round.(Int, Impute.impute([08., fill(missing, 12)..., 02.], Impute.Interpolate()), RoundNearest)

[8, 8, 7, 7, 6, 6, 5, 5, 4, 4, 3, 3, 2, 2] # what I expect

The user can always cast their data to floats, but then it would be the same as before.

For discrete types that might require a variable increment between values in gaps (e.g. integers), maybe we can dispatch _impute! or have a branch of the main function to do imputation for gaps each gap at a time. That way, each imputed value can be independent from the previous value.

This would allocate intermediate vectors but it would be better than having to cast the entire input to floats.

Any ideas or thoughts?

kpa28-git · 2024-04-17T00:11:05Z

Using a generator and setting the gap slices doesn't allocate and gives results that are more even over larger gaps
Example (I don't know if there is a built-in way to set slices over generators so I made a genset! function):

_calculate_increment(a, b, n, ::Nothing) = (b - a) / n

function interpolate(a::T, b::T, n::Integer, r::RoundingMode) where {T<:Integer}
    inc = _calculate_increment(float(a), float(b), n, nothing)
    (round(T, a + inc*i, r) for i=1:n)
end

function genset!(v::AbstractVector, st, g)
    for (i, val) in enumerate(g)
       v[i+st] = val
    end
    v
end

v = [8, fill(missing, 12)..., 2]
v2 = [0x08, fill(missing, 12)..., 0x02]
@benchmark genset!(v, 1, interpolate(8, 2, 12, RoundNearest)) # no alloc
@benchmark genset!(v2, 1, interpolate(0x08, 0x02, 12, RoundNearest)) # no alloc

rofinn · 2024-04-17T03:30:26Z

Sure. If we don't care about the increment type, then I think just rounding/clamping on the insert could make sense.

…cations + some tests

kpa28-git · 2024-04-17T03:38:50Z

Well I have something here that might work. It does the interpolation in floating point space, but uses a generator for to avoid allocation.
I haven't benchmarked it on large inputs yet, but on small veectors it has about the same alloc as the integer increment version and maybe a bit faster.
It passes all the tests, added some more in this commit too.

kpa28-git · 2024-04-17T03:49:45Z

The second commit is just to ignore RoundingMode for floats, probably don't want to have the same RoundingMode apply to different vectors in a mixed dataset. Anyway, we usually don't want to round floats.

src/imputors/interp.jl

…ger values to integers on insertion.

rofinn · 2024-04-17T05:33:08Z

Sure, makes sense to me 👍🏻 I just added a commit to minimize the changes. Rather than creating a generator with duplicated logic to insert values, we can just dispatch on two special cases:

Calculating an increment value with an Unsigned
Rounding when inserting a non-integer value into an integer array if a RoundingMode is specified.

kpa28-git · 2024-04-17T07:25:43Z

I pushed what I had now (based on your feedback + some benchmarking) in case I can't get back to you quick enough. I don't know if you're interested, but I figure sharing what I have is better.

1. We just need a kernel function barrier to avoid type inner loop type instability with integers 2. The generator caused overhead for the default floats case 3. Simplified some of the indexing logic since the kernel function can handle most of it. 4. Add some comments about why all these extra internal functions exist.

Bump patch release

handle interp for integers by casting and rounding (invenia#71)

ee9856d

rofinn reviewed Mar 6, 2024

View reviewed changes

src/imputors/interp.jl Outdated Show resolved Hide resolved

rofinn force-pushed the master branch from 7e25591 to eb15e82 Compare April 9, 2024 00:31

rofinn added 2 commits April 12, 2024 18:28

Add interp test for exceeding endpoint values.

7493e61

Fix tests by clamping the inserted value.

737bda6

interpolate integers in floating point using generators to avoid allo…

2f7fc0a

…cations + some tests

ignore RoundingMode for non-integers like floats

ddc8a9c

rofinn reviewed Apr 17, 2024

View reviewed changes

src/imputors/interp.jl Outdated Show resolved Hide resolved

src/imputors/interp.jl Outdated Show resolved Hide resolved

src/imputors/interp.jl Outdated Show resolved Hide resolved

src/imputors/interp.jl Outdated Show resolved Hide resolved

Only dispatch on Unsigned increment calculation and rounding non-inte…

26ef7e0

…ger values to integers on insertion.

refactor of interpolate integers in floating point using generators...

a2f8d07

rofinn merged commit 11658f6 into invenia:master Apr 19, 2024
5 checks passed

rofinn referenced this pull request Apr 19, 2024

Merge pull request #143 from invenia/rofinn-patch-1

2707b86

Bump patch release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle interp for integers by casting and rounding (#71) #142

handle interp for integers by casting and rounding (#71) #142

kpa28-git commented Mar 5, 2024

rofinn commented Apr 9, 2024

kpa28-git commented Apr 9, 2024

kpa28-git commented Apr 11, 2024

rofinn commented Apr 12, 2024 •

edited

Loading

rofinn commented Apr 16, 2024

kpa28-git commented Apr 16, 2024

kpa28-git commented Apr 17, 2024

rofinn commented Apr 17, 2024

kpa28-git commented Apr 17, 2024

kpa28-git commented Apr 17, 2024

rofinn commented Apr 17, 2024

kpa28-git commented Apr 17, 2024

handle interp for integers by casting and rounding (#71) #142

handle interp for integers by casting and rounding (#71) #142

Conversation

kpa28-git commented Mar 5, 2024

rofinn commented Apr 9, 2024

kpa28-git commented Apr 9, 2024

kpa28-git commented Apr 11, 2024

rofinn commented Apr 12, 2024 • edited Loading

rofinn commented Apr 16, 2024

kpa28-git commented Apr 16, 2024

kpa28-git commented Apr 17, 2024

rofinn commented Apr 17, 2024

kpa28-git commented Apr 17, 2024

kpa28-git commented Apr 17, 2024

rofinn commented Apr 17, 2024

kpa28-git commented Apr 17, 2024

rofinn commented Apr 12, 2024 •

edited

Loading