Skip to content

Commit

Permalink
Added references
Browse files Browse the repository at this point in the history
  • Loading branch information
TheRoniOne committed Feb 5, 2022
1 parent 392db11 commit ac145d5
Show file tree
Hide file tree
Showing 6 changed files with 34 additions and 34 deletions.
6 changes: 3 additions & 3 deletions docs/src/man/better_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ julia> ["1", 2.0]
```

To solve this problem we have the `reinfer_schema`, `reinfer_schema!` and `reinfer_schema_ROT` functions that will try
To solve this problem we have the [`reinfer_schema`](@ref), [`reinfer_schema!`](@ref) and [`reinfer_schema_ROT`](@ref) functions that will try
to make the column of type `Union` with, by default, up to 3 types stored in `Union` while also
internally using `Base.promote_typejoin` on numeric types to reduce the final amount of numeric types.

Expand Down Expand Up @@ -71,8 +71,8 @@ julia> reinfer_schema(ct; max_types=2)

## Index prefered

For the cases when you might want to add a row index to your table, we have the `add_index`, `add_index!`
and `add_index_ROT` functions that will add a row index as the first column of your table.
For the cases when you might want to add a row index to your table, we have the [`add_index`](@ref), [`add_index!`](@ref)
and [`add_index_ROT`](@ref) functions that will add a row index as the first column of your table.

```jldoctest reinfer
julia> ct = CleanTable([:A, :B], [[:a, :b, :c], ["x", "y", "z"]])
Expand Down
12 changes: 6 additions & 6 deletions docs/src/man/dirt_removal.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ such as `Julia`'s `missing`, `Python`'s `None`, `R`'s `NA` and a diversity of co
like `""`, `' '`, etc.

As an easy way to handle this common problems we got the `compact` functions, being them
`compact_table`, `compact_columns` and `compact_rows` with their mutating in-place and ROT variants
i.e. `compact_table!`, `compact_table_ROT` et al.
[`compact_table`](@ref), [`compact_columns`](@ref) and [`compact_rows`](@ref) with their mutating in-place and ROT variants
i.e. [`compact_table!`](@ref), [`compact_table_ROT`](@ref) et al.

They all recieve a table as first argument and an optional keyword argument `empty_values`
where you can pass a vector of what you consider being empty values present in your table.
Expand Down Expand Up @@ -64,8 +64,8 @@ julia> compact_table(ct; empty_values=[""])
```

You might also feel that columns filled with just a constant value are not adding any value
to your table and may prefer to remove them, for those cases we got the `delete_const_columns`,
`delete_const_columns!` and `delete_const_columns_ROT` functions.
to your table and may prefer to remove them, for those cases we got the [`delete_const_columns`](@ref),
[`delete_const_columns!`](@ref) and [`delete_const_columns_ROT`](@ref) functions.

```jldoctest removal
julia> ct = CleanTable([:A, :B, :C], [[4, 5, 6], [1, 1, 1], String["7", "8", "9"]])
Expand Down Expand Up @@ -94,8 +94,8 @@ julia> delete_const_columns(ct)

## One missing, remove em all

A more radical aproach can be taken when desired by using `drop_missing`, `drop_missing!` or
`drop_missing_ROT` to remove all rows where at least one `missing` or `missing_values` has been found.
A more radical aproach can be taken when desired by using [`drop_missing`](@ref), [`drop_missing!`](@ref) or
[`drop_missing_ROT`](@ref) to remove all rows where at least one `missing` or `missing_values` has been found.

```jldoctest removal
julia> ct = CleanTable([:A, :B], [[1, missing, 3], ["x", "y", "z"]])
Expand Down
6 changes: 3 additions & 3 deletions docs/src/man/first_steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ julia> df |> CleanTable |> reinfer_schema! |> DataFrame
```

By default the `CleanTable` constructor when called with a table as only argument will copy the columns
By default the [`CleanTable`](@ref) constructor when called with a table as only argument will copy the columns
instead of using directly the source columns. This behavior can be overwritten by explicitly passing
the `copycols=false` keyword argument.

Expand Down Expand Up @@ -175,7 +175,7 @@ julia> df

## Accessing columns

If you want to access an specific column, `CleanTable` supports access by column index and
If you want to access an specific column, [`CleanTable`](@ref) supports access by column index and
column name.

```jldoctest access_cols; setup = :(using Cleaner)
Expand Down Expand Up @@ -207,7 +207,7 @@ julia> ct[1]
```

As the result of accessing a column in a `CleanTable` is the column itself, if you want to reasign
As the result of accessing a column in a [`CleanTable`](@ref) is the column itself, if you want to reasign
values in a column you can just modify the accessed result.

E.g:
Expand Down
8 changes: 4 additions & 4 deletions docs/src/man/name_changing.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Having repated column names, names with spaces in them, names where spaces are p
end, names with inconsistent formating, etc can certainly become a trouble when trying to reference a certain
column during your workflow.

To tackle this problems directly, we have the functions `polish_names`, `polish_names!` and `polish_names_ROT` used as follows:
To tackle this problems directly, we have the functions [`polish_names`](@ref), [`polish_names!`](@ref) and [`polish_names_ROT`](@ref) used as follows:

```jldoctest name_polish
julia> using Cleaner
Expand Down Expand Up @@ -45,7 +45,7 @@ julia> polish_names(ct; style=:camelCase)
Currently the only available styles are `:snake_case` and `:camelCase`.
The default style is `:snake_case`.

Internally `polish_names`, `polish_names!` and `polish_names_ROT` all call the `generate_polished_names` function, so if you just need
Internally [`polish_names`](@ref), [`polish_names!`](@ref) and [`polish_names_ROT`](@ref) all call the [`generate_polished_names`](@ref) function, so if you just need
to generate better names for your table, you could call it as follows and manually rename your table.

```jldoctest name_polish
Expand Down Expand Up @@ -79,9 +79,9 @@ julia> rename(ct, [:A, :B])
## Making a row be the column names

When working with messy data you might end up having the row names being the second or third row of the table you have
loaded. For this cases you can use the `row_as_names`, `row_as_names!` and `row_as_names_ROT` functions.
loaded. For this cases you can use the [`row_as_names`](@ref), [`row_as_names!`](@ref) and [`row_as_names_ROT`](@ref) functions.

By default, `row_as_names`, `row_as_names!` and `row_as_names_ROT` will remove all rows above the index passed, but
By default, [`row_as_names`](@ref), [`row_as_names!`](@ref) and [`row_as_names_ROT`](@ref) will remove all rows above the index passed, but
this behavior can be overwritten by passing the optional keyword argument `remove=false`.

```jldoctest promoting_rows; setup = :(using Cleaner)
Expand Down
6 changes: 3 additions & 3 deletions docs/src/man/table_exploring.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Tables can usualy have values in a column or columns that are supposed to be uni
Primary keys from a table in a database are the most common example of this cases.

For when you want to find out what values (or combinations) are being duplicated on your table we have
the `get_all_repeated` function.
the [`get_all_repeated`](@ref) function.

```jldoctest explore
julia> using DataFrames: DataFrame
Expand Down Expand Up @@ -49,7 +49,7 @@ julia> get_all_repeated(df, [:A, :B])
When you are working with categorical data, you might want to know what percentage of the total each
category is representing.

For those cases we got the `category_distribution` function.
For those cases we got the [`category_distribution`](@ref) function.

```jldoctest explore
julia> category_distribution(df, [:A])
Expand Down Expand Up @@ -84,7 +84,7 @@ julia> category_distribution(df, [:A, :B])
When working with multiple tables you might try to do joins and have them fail
because there were different column names or schemas between them.

To help you identify these problems we got the `compare_table_columns` function.
To help you identify these problems we got the [`compare_table_columns`](@ref) function.

```jldoctest explore
julia> df = DataFrame(:A => ["y", "x", "y"], :B => ["x", "x", "x"])
Expand Down
30 changes: 15 additions & 15 deletions docs/src/man/workflow_tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,13 @@ if you need to keep copies of the data in order to do different transformations
functions would be a better fit, whereas if you just want to do a series of linear transformations on your data and
continue processing it after finishing the cleaning, using mutating functions would a better option.

You could also mix and match mutating and non-mutating `Cleaner` functions to better fit your needs, as all
non-mutating `Cleaner` functions work on any [Tables.jl](https://github.com/JuliaData/Tables.jl) implementation and return a `CleanTable`, while
all mutating `Cleaner` functions work on a `CleanTable` and return a `CleanTable` which also is a Tables.jl
You can also mix and match mutating and non-mutating `Cleaner` functions to better fit your needs, as all
non-mutating `Cleaner` functions work on any [Tables.jl](https://github.com/JuliaData/Tables.jl) implementation and return a [`CleanTable`](@ref), while
all mutating `Cleaner` functions work on a [`CleanTable`](@ref) and return a [`CleanTable`](@ref) which also is a Tables.jl
implementation.

There is also the option to build a `CleanTable` from any Tables.jl implementation to start a your workflow by mutating
even the data stored in the original table, as the `CleanTable` constructor has a keyword argument `copycols` that can be
There is also the option to build a [`CleanTable`](@ref) from any Tables.jl implementation to start a your workflow by mutating
even the data stored in the original table, as the [`CleanTable`](@ref) constructor has a keyword argument `copycols` that can be
set to false to use the original columns directly at your own risk.

```jldoctest start
Expand Down Expand Up @@ -132,7 +132,7 @@ julia> df
```

The complete oposite approach would be to use a function from the ROT (returning original type) variants (e.g. `polish_names_ROT`)
The complete oposite approach would be to use a function from the ROT (returning original type) variants (e.g. [`polish_names_ROT`](@ref))
that take as input any table, does it's transformation on a copy of it and then returns a new table of the same type of
the source table.

Expand All @@ -150,9 +150,9 @@ julia> df |> polish_names_ROT

## Looking for performance

When trying to avoid most of the extra allocations while working with `Cleaner`, you should start by creating a `CleanTable`
specifying `copycols=false` to use the original columns directly on the new `CleanTable` instead of having a non-mutating `Cleaner`
function making copies of them to use on the `CleanTable` it builds first.
When trying to avoid most of the extra allocations while working with `Cleaner`, you should start by creating a [`CleanTable`](@ref)
specifying `copycols=false` to use the original columns directly on the new [`CleanTable`](@ref) instead of having a non-mutating `Cleaner`
function making copies of them to use on the [`CleanTable`](@ref) it builds first.

```jldoctest performance; setup = :(using Cleaner)
julia> nt = (A = [missing, missing, missing], B = [4, 'x', 6])
Expand All @@ -171,7 +171,7 @@ julia> ct = CleanTable(nt; copycols=false)
```

Now that you have a `CleanTable` you should continue by using `Cleaner` mutating functions, as they will modify the same `CleanTable`
Now that you have a [`CleanTable`](@ref) you should continue by using `Cleaner` mutating functions, as they will modify the same [`CleanTable`](@ref)
passed as input in place avoiding having to allocate new `CleanTable`s while also avoiding copying the underlying columns data.

```jldoctest performance
Expand Down Expand Up @@ -210,7 +210,7 @@ julia> nt
```

!!! warning
Note that when using the original columns to build a `CleanTable` and using mutating functions in it, the changes also happen on
Note that when using the original columns to build a [`CleanTable`](@ref) and using mutating functions in it, the changes also happen on
the source potentially corrupting it.

If you do need to use the original source after applying mutating `Cleaner` functions, you can always just use a non-mutating
Expand All @@ -221,7 +221,7 @@ julia> nt

If you just want to apply a `Cleaner` function or two on your original table, probably you also want to have the result be of
the original table type. For this cases we have the convinient ROT function variants, that will keep the original columns intact
by applying the transformation on a new `CleanTable` with copied columns and return a new table based on the result but having it be
by applying the transformation on a new [`CleanTable`](@ref) with copied columns and return a new table based on the result but having it be
of the original source type.

```jldoctest convenience; setup = :(using Cleaner; using DataFrames: DataFrame)
Expand Down Expand Up @@ -253,16 +253,16 @@ julia> df3 = row_as_names_ROT(df2, 2)
```

Its not recommended to use more than 2 ROT functions on a workflow, as they are the least performant and most allocating function variants.
For each time a ROT function is called, it first is creating a `CleanTable` with copied columns to work with, then applying the
For each time a ROT function is called, it first is creating a [`CleanTable`](@ref) with copied columns to work with, then applying the
desired transformation and then creating a new table of the original source type which commonly copies columns too.

This ends up allocating a new `CleanTable`, copying columns, allocating another table of the original source type and copying columns for it
This ends up allocating a new [`CleanTable`](@ref), copying columns, allocating another table of the original source type and copying columns for it
to use too for every time a ROT function is used, which when working with bigger tables can become slow and trigger a lot more times the
garbage collector as compared by using an alternative workflow.

## Final touches

After using all the `CleanTable` functions you needed, you probably want to have the result be another table type to continue your workflow.
After using all the [`CleanTable`](@ref) functions you needed, you probably want to have the result be another table type to continue your workflow.
For this cases, you can try calling the constructor of your desired table type to try and build a new table based on the output or, if you
are not sure if your desired table type has a constructor that works with other table implementations, you can use the `materializer` function
from [Tables.jl](https://github.com/JuliaData/Tables.jl) we conveniently export for you.
Expand Down

0 comments on commit ac145d5

Please sign in to comment.