Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HoD transform #11

Closed
wants to merge 5 commits into from
Closed

Add HoD transform #11

wants to merge 5 commits into from

Conversation

nicoleepp
Copy link
Contributor

@nicoleepp nicoleepp commented Feb 3, 2021

Part of #6

TODO:

  • Support inds for AbstractArrays
  • Support Tables
  • Support cols for Tables
  • Re-add all the tests that I commeted out, clean up, remove packages that I added for local testing

src/temporal.jl Outdated Show resolved Hide resolved
src/temporal.jl Outdated Show resolved Hide resolved
src/transformers.jl Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Feb 5, 2021

Codecov Report

Merging #11 (67f1f63) into main (57cccbe) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main       #11   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         6    +1     
  Lines           45        49    +4     
=========================================
+ Hits            45        49    +4     
Impacted Files Coverage Δ
src/Transforms.jl 100.00% <ø> (ø)
src/transformers.jl 100.00% <ø> (ø)
src/temporal.jl 100.00% <100.00%> (ø)
src/power.jl 100.00% <0.00%> (ø)
src/utils.jl 100.00% <0.00%> (ø)
src/linear_combination.jl 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57cccbe...5fcc2d0. Read the comment docs.

@nicoleepp nicoleepp changed the title RFC: Add HoD transform Add HoD transform Feb 8, 2021
src/temporal.jl Outdated Show resolved Hide resolved
test/temporal.jl Outdated Show resolved Hide resolved
test/temporal.jl Show resolved Hide resolved
src/temporal.jl Outdated Show resolved Hide resolved
src/temporal.jl Outdated Show resolved Hide resolved
src/temporal.jl Outdated

_apply(x, ::HoD) = hour.(x)

function apply(A::AbstractArray, t::HoD; dims=:, inds=:, kwargs...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need dims at all for this transform? I think it doesn't matter because it's one-to-one.

Copy link
Contributor

@bencottier bencottier Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, looks like it affects returning Matrix vs. vector of vectors. In that case I'm not sure - I don't know if there's a use case for having the result sliced up in different ways. Maybe for the sake of consistency with transforms where dims does matter. Currently Periodic doesn't usedims.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there's a use case for having the result sliced up in different ways

I don't either, it's a consequence of eachslice not that we particularly want it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will note dims doesn't make a difference unless inds is passed in too for this transform with the updated implementation following this round of code review.

Also, I'm trying to make this apply method be the canonical apply method that gets used for non-mutating transforms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm trying to make this apply method be the canonical apply method that gets used for non-mutating transforms.

What was the reason this last commit was needed? It seemed out of scope for me but maybe I missed something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason this last commit was needed? It seemed out of scope for me but maybe I missed something?

Fair to say it's out of scope. Maybe it should be split up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to be more clear on the general case, how we want apply to work with dims and inds instead of getting lost in the details of hod. Once the general case is clear the individual transform cases become way more trivial

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15

Copy link
Contributor

@bencottier bencottier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple questions and suggestions but this looks good to me.

We've agreed in the past to return AbstractArray for non-mutating apply, for consistency between Transforms. I'm still uncertain whether that's best, but I think we should leave that to further discussion outside this PR.

return _apply(A, t; kwargs...)
else
if A isa KeyedArray
# KeyedArrays don't support indexing into them via an array of indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's inconvenient. There is the syntax of A[dimname1=inds, dimname2=inds, ...] but I'm not sure if that can be written for all dimnames(A) generically. There's also syntax of A(dimname1=[axiskey11, axiskey12], dimname2=[axiskey21, axiskey22], ...) which we could support.

I don't mind this as-is for now.

end
end

return mapslices(x -> _apply(x[inds], t; kwargs...), A, dims=dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why this works for KeyedArray given

KeyedArrays don't support indexing into them via an array of indices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapslices x is not a KeyedArray, it's a view of the array

transformed = Transforms.apply(A, p; dims=d)
@test transformed isa AxisArray
@test transformed == expected
@test Transforms.apply(A, p; dims=d) == expected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out why this returns a plain array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we haven't asked apply to return a KeyedArray - just an array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for AxisArray. If I follow apply(A::AbstractArray, t::Transform; dims=:, inds=:, kwargs...) then I think it does _apply (without wrapping in an array), which I thought would preserve the type. But maybe this relates to your suggestion? i.e. it's currently calling apply! when it should be _apply! https://github.com/invenia/Transforms.jl/pull/11/files#r573720397

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for AxisArray

ah my mistake. Then this is AxisArrays being annoying and not conserving type when operating on them

julia> A = AxisArray([1 2 3; 4 5 6], foo=["a", "b"], bar=["x", "y", "z"]);

julia> A .+ 1
2×3 Array{Int64,2}:
 2  3  4
 5  6  7

function apply!(
A::AbstractArray{T}, t::Transform; dims=:, kwargs...
) where T <: Real
function apply!(A::AbstractArray, t::Transform; dims=:, kwargs...)
dims == Colon() && return _apply!(A, t; kwargs...)

for x in eachslice(A; dims=dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapslices here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works just as welll since it's mutating the array, I briefly tried using mapslices and things were breaking so I'd opt to leave as is. It does return the result we expect

return [_apply(getproperty(columntable, cname), t; kwargs...) for cname in cnames]
end

_apply(x, t::Transform; kwargs...) = apply!(_try_copy(x), t; kwargs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_apply(x, t::Transform; kwargs...) = apply!(_try_copy(x), t; kwargs...)
_apply(x, t::Transform; kwargs...) = _apply!(_try_copy(x), t; kwargs...)

transformed = Transforms.apply(A, p; dims=d)
@test transformed isa AxisArray
@test transformed == expected
@test Transforms.apply(A, p; dims=d) == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we haven't asked apply to return a KeyedArray - just an array.

x = ZonedDateTime(2020, 3, 7, 9, 0, tz"America/New_York"):Hour(1):ZonedDateTime(2020, 3, 8, 9, 0, tz"America/New_York")

# expected result skips the DST transition hour of 2
expected = [9:23..., 0, 1, 3:9...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected = [9:23..., 0, 1, 3:9...]
expected_dst= [9:23..., 0, 1, 3:9...]

only because it could get interpreted that we've reassigned the expected used in a bunch of other tests.

end

@testset "dims = 2" begin
d = 2
expected = [[2], [10], [11]]
@test Transforms.apply(A, hod; dims=d, inds=[2]) == expected
expected = [9 10; 10 11]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very different to the answer above - in spite of the different inds. What caused this?

Copy link
Contributor

@bencottier bencottier Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by "answer above", but I'll explain how I checked this.

With dims=:, inds=: the expected result would be

1  9   10
2  10  11

dims=2 slices into rows [1, 9, 10] and [2, 10, 11]. inds = [2, 3] picks the 2nd and 3rd element of each of those rows which is 9, 10 and 10, 11 respectively. (It's actually leaving the other elements untransformed, but you know what I mean).

I thought this same logic checked out for the other test cases but let me know if you think otherwise, or it's not how you expect it to work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by "answer above",

Sorry, I meant in relation to the expected answer it's replacing: expected = [[2], [10], [11]]

Before this commit, for d=2, inds=[2] the result was [[2], [10], [11]] a Vector{Vector}

Now, for d=2, inds=[2, 3] the result is [9 10; 10 11] an Array{Int, 2}.

It's correct, but I'm wondering why it was so different before. I'm guessing it's the result of now using mapslices?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was confused because I was looking at the full diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dims were in the opposite order before thanks to eachslice versus mapslices differences

src/transformers.jl Show resolved Hide resolved
src/transformers.jl Show resolved Hide resolved
@nicoleepp nicoleepp mentioned this pull request Feb 11, 2021
@nicoleepp
Copy link
Contributor Author

Closing in favour of #16

@nicoleepp nicoleepp closed this Feb 11, 2021
@nicoleepp nicoleepp deleted the ne/temporal branch February 11, 2021 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants