Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use traits to generalise apply methods #75

Closed
glennmoy opened this issue Apr 7, 2021 · 3 comments
Closed

Use traits to generalise apply methods #75

glennmoy opened this issue Apr 7, 2021 · 3 comments

Comments

@glennmoy
Copy link
Member

glennmoy commented Apr 7, 2021

Summary

We should define a Cardinality trait for Transforms and appropriate intermediate _apply method for each. We should also create dummy Transforms to verify that types supporting the transform interface are compatible with any kind of Transform.

Challenges

There are a few challenges in developing this package in it's current state:

  • LinearCombination is a special case in that it has its own apply method, rather than just a more simple _apply because of how it needs to modify the input data.
  • This makes it more difficult to add another transform like LinearCombination in the future: it will also likely need its own apply method.
  • We have been explicitly testing each Transform on all supported data types and thus over-testing the package, but we have had to do this to make sure we are catching all edge-cases.
  • If we were to support a new data type, we would have to write new tests against all Transforms for the same reason, which is extremely time-consuming and error-prone. WIP: support FeatureTransforms.jl AxisSets.jl#44
  • All of this is especially needed for LinearCombination because of its unique behaviour.

We need a better way of structuring the package so that

  • New transforms are easy to write and test:
    • No more special apply methods.
    • Only the _apply method and the behaviour of the transform should be tested, which should be independent of the data type it's being used on.
  • New data types are easy to support and test:
    • Only the generic, high-level apply/apply!/apply_append methods are required.
    • We don't want to test each transform individually: we should test some canonical transforms that represent any transform that can be passed.

Idea: A Cardinality Trait

The reason LinearCombination needs its own apply method is that it is a reduction operation, i.e. many-to-one, whereas (most of) the rest of the transforms are one-to-one.
The only exception is OneHotEncoding, which is one-to-many and comes with other challenges (like how to treat matrix inputs).

If we inspect the code, the difference is most easily recognised in the tables methods:

# Generic Transform: apply transform to each component in turn and collect the result
reduce(hcat, [_apply(getproperty(coltable, col), t; kwargs...) for col in cols])

# LinearCombination: collect all components and apply the transform
hcat(_sum_terms([getproperty(coltable, col) for col in cols], LC.coefficients))

It may be useful to define some interstitial level of abstraction between the apply method (for types) and _apply method (for Transforms) that treats the data according to the cardinality of the Transform.

The rough structure of that design would look like:

abstract type Cardinality end

struct OneToOne <: Cardinality end
struct ManyToOne <: Cardinality end
struct OneToMany <: Cardinality end
struct ManyToMany <: Cardinality end

cardinality(::Power) = OneToOne()
cardinality(::LinearCombination) = ManyToOne()
cardinality(::OneHotEncoding) = OneToMany()

_apply(::Union{OneToOne, OneToMany}, A, transform; kwargs...) = # apply to each component in turn
_apply(::Union{ManyToOne, ManyToMany}, A, transform; kwargs...) = # apply to all components 

# Example apply method calling these
function apply(A::AbstractArray, t::Transform; dims=:, inds=:, kwargs...)
    if dims === Colon()
        if inds === Colon()
            return _apply(cardinality(t), A, t; kwargs...)
        else
            return @views _apply(cardinality(t), A[:][inds], t; kwargs...)
        end
    end

    return _apply(cardinality(t), selectdim(A, dims, inds), t; kwargs...)
end

Further advantages of this approach:

  • apply! could be explicitly restricted to OneToOne transforms
  • We could define DummyOneToOneTransform, DummyManyToOneTransform, etc., against which to test new types rather than test all Transforms.

Notes

  • A POC would probably be needed to get the correct implementation. Right now it's quite academic.
  • Chance we could just be re-arranging what work is needed for new types? will we also need to extend these intermediate _apply method for each new type?
  • Any other benefits / concerns?
@nicoleepp
Copy link
Contributor

I think this is good to be thinking about but would like to see prototype of applying this to maybe Linearcombination or OneHotEncoding to prove testing is simplified without losing tests for edge cases

@bencottier
Copy link
Contributor

I think this is good to be thinking about but would like to see prototype of applying this to maybe Linearcombination or OneHotEncoding to prove testing is simplified without losing tests for edge cases

I agree. I'm on board with the points of "We need a better way of structuring the package". The proposed solution looks alright. My biggest concern is

Chance we could just be re-arranging what work is needed for new types? will we also need to extend these intermediate _apply method for each new type?

and a POC would give a better sense of that.

I think we could still solve the over-testing problem without generalising things this cleanly, but it would be nice.

@glennmoy
Copy link
Member Author

glennmoy commented Apr 9, 2021

Here is the POC for the above: #77

I'll note that after working on it, and iteratively refactoring, I realised there is no need for another level of abstraction for _apply (which might have been too confusing anyway). Rather, it implements _preformat and _postformat functions for structuring the input/result of a given Transform according to its cardinality.

Practically speaking, this really only affects LinearCombination, but you can see that there are some benefits beyond that:

  • It drastically simplifies that transform
  • It makes it easier to add new transforms like this in future.
  • Traits make it much easier to test a new type we want to support (see the example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants