Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
Glenn Moynihan committed Mar 25, 2021
1 parent a34ec3e commit 9e3fef0
Showing 1 changed file with 44 additions and 23 deletions.
67 changes: 44 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,50 +20,71 @@ Load in the dependencies and construct some toy data.
```julia
julia> using DataFrames, FeatureTransforms

julia> df = DataFrame(:a=>[1, 2, 3, 4, 5], :b=>[5, 4, 3, 2, 1], :c=>[0, 1, 0, 1, 0])
julia> df = DataFrame(:a=>[1, 2, 3, 4, 5], :b=>[5, 4, 3, 2, 1], :c=>[2, 1, 3, 1, 3])
5×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
11 5 0
11 5 2
22 4 1
33 3 0
33 3 3
44 2 1
55 1 0
55 1 3
```

We construct the transformations that we want to `apply` to the data, which can be non-mutating (`apply`) or mutating (`apply!`) if supported.
Note that non-mutating transformations do not necessarily return the same type, even when applied to all the elements.
Next, we construct the `Transform` that we want to `apply` to the data, which can either be non-mutating (`apply`) or mutating (`apply!`).
All `Transforms` support the non-mutating `apply` method any `Transform` that changes the type or dimension of the input does not support mutation.

In either case, the return will be the same type as the input.
So if you provide an `Array` you get back an `Array`, and if you provide a `Table` you will get back a `Table`.
Here we are working with a `DataFrame`, so the return will always be a `DataFrame`:
```julia
julia> p = Power(3);

julia> FeatureTransforms.apply(df, p; cols=[:a])
1-element Array{Array{Int64,1},1}:
[1, 8, 27, 64, 125]
julia> FeatureTransforms.apply(df, p; cols=[:a], header=[:a3])
5×1 DataFrame
Row │ a3
│ Int64
─────┼───────
11
28
327
464
5125

julia> FeatureTransforms.apply!(df, p; cols=[:a])
5×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
11 5 0
11 5 2
28 4 1
327 3 0
327 3 3
464 2 1
5125 1 0
5125 1 3
```

Also note that some transformations, such as those applying a reduction operation, do not support mutation.
But users may append the output to their data if they so wish.

`Transform`s that don't support mutation must be called using `apply` and appended.
To help with this, you can call the `Transform` type directly:
```julia
julia> ohe = OneHotEncoding(1:3);

julia> lc = LinearCombination([1, -10]);

julia> FeatureTransforms.apply(df, lc; cols=[:b, :c])
5-element Array{Int64,1}:
5
-6
3
-8
1
julia> ohe_df = ohe(df; cols=[:c], header=[:cat1, :cat2, :cat3])

julia> lc_df = lc(df; cols=[:a, :b], header=[:ab]);

julia> df = hcat(df, lc_df, ohe_df)
5×7 DataFrame
Row │ a b c ab cat1 cat2 cat3
│ Int64 Int64 Int64 Int64 Bool Bool Bool
─────┼─────────────────────────────────────────────────
11 5 2 -49 false true false
28 4 1 -32 true false false
327 3 3 -3 false false true
464 2 1 44 true false false
5125 1 3 115 false false true

```

0 comments on commit 9e3fef0

Please sign in to comment.