Use traits to generalise apply methods #75

glennmoy · 2021-04-07T14:23:03Z

Summary

We should define a Cardinality trait for Transforms and appropriate intermediate _apply method for each. We should also create dummy Transforms to verify that types supporting the transform interface are compatible with any kind of Transform.

Challenges

There are a few challenges in developing this package in it's current state:

LinearCombination is a special case in that it has its own apply method, rather than just a more simple _apply because of how it needs to modify the input data.
This makes it more difficult to add another transform like LinearCombination in the future: it will also likely need its own apply method.
We have been explicitly testing each Transform on all supported data types and thus over-testing the package, but we have had to do this to make sure we are catching all edge-cases.
If we were to support a new data type, we would have to write new tests against all Transforms for the same reason, which is extremely time-consuming and error-prone. WIP: support FeatureTransforms.jl AxisSets.jl#44
All of this is especially needed for LinearCombination because of its unique behaviour.

We need a better way of structuring the package so that

New transforms are easy to write and test:
- No more special apply methods.
- Only the _apply method and the behaviour of the transform should be tested, which should be independent of the data type it's being used on.
New data types are easy to support and test:
- Only the generic, high-level apply/apply!/apply_append methods are required.
- We don't want to test each transform individually: we should test some canonical transforms that represent any transform that can be passed.

Idea: A Cardinality Trait

The reason LinearCombination needs its own apply method is that it is a reduction operation, i.e. many-to-one, whereas (most of) the rest of the transforms are one-to-one.
The only exception is OneHotEncoding, which is one-to-many and comes with other challenges (like how to treat matrix inputs).

If we inspect the code, the difference is most easily recognised in the tables methods:

# Generic Transform: apply transform to each component in turn and collect the result
reduce(hcat, [_apply(getproperty(coltable, col), t; kwargs...) for col in cols])

# LinearCombination: collect all components and apply the transform
hcat(_sum_terms([getproperty(coltable, col) for col in cols], LC.coefficients))

It may be useful to define some interstitial level of abstraction between the apply method (for types) and _apply method (for Transforms) that treats the data according to the cardinality of the Transform.

The rough structure of that design would look like:

abstract type Cardinality end

struct OneToOne <: Cardinality end
struct ManyToOne <: Cardinality end
struct OneToMany <: Cardinality end
struct ManyToMany <: Cardinality end

cardinality(::Power) = OneToOne()
cardinality(::LinearCombination) = ManyToOne()
cardinality(::OneHotEncoding) = OneToMany()

_apply(::Union{OneToOne, OneToMany}, A, transform; kwargs...) = # apply to each component in turn
_apply(::Union{ManyToOne, ManyToMany}, A, transform; kwargs...) = # apply to all components 

# Example apply method calling these
function apply(A::AbstractArray, t::Transform; dims=:, inds=:, kwargs...)
    if dims === Colon()
        if inds === Colon()
            return _apply(cardinality(t), A, t; kwargs...)
        else
            return @views _apply(cardinality(t), A[:][inds], t; kwargs...)
        end
    end

    return _apply(cardinality(t), selectdim(A, dims, inds), t; kwargs...)
end

Further advantages of this approach:

apply! could be explicitly restricted to OneToOne transforms
We could define DummyOneToOneTransform, DummyManyToOneTransform, etc., against which to test new types rather than test all Transforms.

Notes

A POC would probably be needed to get the correct implementation. Right now it's quite academic.
Chance we could just be re-arranging what work is needed for new types? will we also need to extend these intermediate _apply method for each new type?
Any other benefits / concerns?

The text was updated successfully, but these errors were encountered:

nicoleepp · 2021-04-07T18:20:12Z

I think this is good to be thinking about but would like to see prototype of applying this to maybe Linearcombination or OneHotEncoding to prove testing is simplified without losing tests for edge cases

bencottier · 2021-04-09T09:52:18Z

I think this is good to be thinking about but would like to see prototype of applying this to maybe Linearcombination or OneHotEncoding to prove testing is simplified without losing tests for edge cases

I agree. I'm on board with the points of "We need a better way of structuring the package". The proposed solution looks alright. My biggest concern is

Chance we could just be re-arranging what work is needed for new types? will we also need to extend these intermediate _apply method for each new type?

and a POC would give a better sense of that.

I think we could still solve the over-testing problem without generalising things this cleanly, but it would be nice.

glennmoy · 2021-04-09T18:42:16Z

Here is the POC for the above: #77

I'll note that after working on it, and iteratively refactoring, I realised there is no need for another level of abstraction for _apply (which might have been too confusing anyway). Rather, it implements _preformat and _postformat functions for structuring the input/result of a given Transform according to its cardinality.

Practically speaking, this really only affects LinearCombination, but you can see that there are some benefits beyond that:

It drastically simplifies that transform
It makes it easier to add new transforms like this in future.
Traits make it much easier to test a new type we want to support (see the example).

glennmoy mentioned this issue Apr 7, 2021

WIP: support FeatureTransforms.jl invenia/AxisSets.jl#44

Closed

glennmoy mentioned this issue Apr 9, 2021

POC: Add traits to support generalising the apply methods #77

Closed

This was referenced Apr 16, 2021

Define cardinality traits #79

Merged

Refactor apply methods to use Traits #80

Merged

Add test fakes and test utils #81

Merged

glennmoy closed this as completed Apr 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use traits to generalise apply methods #75

Use traits to generalise apply methods #75

glennmoy commented Apr 7, 2021 •

edited

Loading

nicoleepp commented Apr 7, 2021

bencottier commented Apr 9, 2021

glennmoy commented Apr 9, 2021

Use traits to generalise apply methods #75

Use traits to generalise apply methods #75

Comments

glennmoy commented Apr 7, 2021 • edited Loading

Summary

Challenges

Idea: A Cardinality Trait

Notes

nicoleepp commented Apr 7, 2021

bencottier commented Apr 9, 2021

glennmoy commented Apr 9, 2021

glennmoy commented Apr 7, 2021 •

edited

Loading