Skip to content

Commit

Permalink
Fix spelling issues, update janitor.md
Browse files Browse the repository at this point in the history
  • Loading branch information
billdenney committed Dec 18, 2024
1 parent 42199d8 commit f9ec0eb
Show file tree
Hide file tree
Showing 6 changed files with 82 additions and 93 deletions.
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Suggests:
rmarkdown,
RSQLite,
sf,
spelling,
testthat (>= 3.0.0),
tibble,
tidygraph
Expand All @@ -54,3 +55,4 @@ Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Language: en-US
8 changes: 4 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ These are all minor breaking changes resulting from enhancements and are not exp

## New features

* The `adorn_totals()` function now accepts the special argument `fill = NA`, which will insert a class-appropriate `NA` value into each column that isn't being totaled. This preserves the class of each column; previously they were all convered to character. (thanks **@hamstr147** for implementing in #404 and **@ymer** for reporting in #298).
* The `adorn_totals()` function now accepts the special argument `fill = NA`, which will insert a class-appropriate `NA` value into each column that isn't being totaled. This preserves the class of each column; previously they were all converted to character. (thanks **@hamstr147** for implementing in #404 and **@ymer** for reporting in #298).

* `adorn_totals()` now takes the value of `"both"` for the `where` argument. That is, `adorn_totals("both")` is a shorter version of `adorn_totals(c("col", "row"))`. (#362, thanks to **@svgsstats** for implementing and **@sfd99** for suggesting).

Expand All @@ -130,7 +130,7 @@ These are all minor breaking changes resulting from enhancements and are not exp

* A call to make a 3-way `tabyl()` now succeeds when the first variable is of class `ordered` (#386)

* If a totals row and/or column is present on a tabyl as a result of `adorn_totals()`, the functions `chisq.test()` and `fisher.test()` drop the totals and print a warning before proceding with the calculations (#385).
* If a totals row and/or column is present on a tabyl as a result of `adorn_totals()`, the functions `chisq.test()` and `fisher.test()` drop the totals and print a warning before proceeding with the calculations (#385).

# janitor 2.0.1 (2020-04-12)

Expand Down Expand Up @@ -276,7 +276,7 @@ This builds on the original functionality of janitor, with similar-but-improved

### A fully-overhauled `tabyl`

`tabyl()` is now a single function that can count combinations of one, two, or three variables, ala base R's `table()`. The resulting `tabyl` data.frames can be manipulated and formatted using a family of `adorn_` functions. See the [tabyls vignette](https://sfirke.github.io/janitor/articles/tabyls.html) for more.
`tabyl()` is now a single function that can count combinations of one, two, or three variables, a la base R's `table()`. The resulting `tabyl` data.frames can be manipulated and formatted using a family of `adorn_` functions. See the [tabyls vignette](https://sfirke.github.io/janitor/articles/tabyls.html) for more.

The now-redundant legacy functions `crosstab()` and `adorn_crosstab()` have been deprecated, but remain in the package for now. Existing code that relies on the version of `tabyl` present in janitor versions <= 0.3.1 will break if the `sort` argument was used, as that argument no longer exists in `tabyl` (use `dplyr::arrange()` instead).

Expand All @@ -292,7 +292,7 @@ No further changes are planned to `clean_names()` and its results should be stab

## Major features

- `clean_names()` transliterates accented letters, e.g., `çãüœ` becomes `cauoe` [(#120)](https://github.com/sfirke/janitor/issues/120). Thanks to **@fernandovmacedo**.
- `clean_names()` transliterates accented letters, e.g., `C'C#C<E` becomes `cauoe` [(#120)](https://github.com/sfirke/janitor/issues/120). Thanks to **@fernandovmacedo**.

- `clean_names()` offers multiple options for variable name styling. In addition to `snake_case` output you can select `smallCamelCase`, `BigCamelCase`, `ALL_CAPS` and others. [(#131)](https://github.com/sfirke/janitor/issues/131).
- Thanks to **@tazinho**, who wrote the [snakecase](https://github.com/Tazinho/snakecase/) package that janitor depends on to do this, as well as the patch to incorporate it into `clean_names()`. And thanks to **@maelle** for proposing this feature.
Expand Down
2 changes: 1 addition & 1 deletion R/round_half_up.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Round a numeric vector; halves will be rounded up, ala Microsoft Excel.
#' Round a numeric vector; halves will be rounded up, a la Microsoft Excel.
#'
#' @description
#' In base R `round()`, halves are rounded to even, e.g., 12.5 and
Expand Down
2 changes: 1 addition & 1 deletion vignettes/janitor.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ Smaller functions for use in particular situations. More human-readable than th

### Manipulate vectors of names with `make_clean_names()`

Like base R's `make.names()`, but with the stylings and case choice of the long-time janitor function `clean_names()`. While `clean_names()` is still offered for use in data.frame pipeline with `%>%`, `make_clean_names()` allows for more general usage, e.g., on a vector.
Like base R's `make.names()`, but with the styling and case choice of the long-time janitor function `clean_names()`. While `clean_names()` is still offered for use in data.frame pipeline with `%>%`, `make_clean_names()` allows for more general usage, e.g., on a vector.

It can also be used as an argument to `.name_repair` in the newest version of `tibble::as_tibble`:
```{r}
Expand Down
155 changes: 71 additions & 84 deletions vignettes/janitor.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,45 @@
Overview of janitor functions
================
2023-02-02

- <a href="#major-functions" id="toc-major-functions">Major functions</a>
- <a href="#cleaning" id="toc-cleaning">Cleaning</a>
- <a href="#clean-dataframe-names-with-clean_names"
id="toc-clean-dataframe-names-with-clean_names">Clean data.frame names
with <code>clean_names()</code></a>
- <a href="#do-those-dataframes-actually-contain-the-same-columns"
id="toc-do-those-dataframes-actually-contain-the-same-columns">Do those
data.frames actually contain the same columns?</a>
- <a href="#exploring" id="toc-exploring">Exploring</a>
- <a href="#tabyl---a-better-version-of-table"
id="toc-tabyl---a-better-version-of-table"><code>tabyl()</code> - a
better version of <code>table()</code></a>
- <a
href="#explore-records-with-duplicated-values-for-specific-combinations-of-variables-with-get_dupes"
id="toc-explore-records-with-duplicated-values-for-specific-combinations-of-variables-with-get_dupes">Explore
records with duplicated values for specific combinations of variables
with <code>get_dupes()</code></a>
- <a href="#explore-relationships-between-columns-with-get_one_to_one"
id="toc-explore-relationships-between-columns-with-get_one_to_one">Explore
relationships between columns with <code>get_one_to_one()</code></a>
- <a href="#minor-functions" id="toc-minor-functions">Minor functions</a>
- <a href="#cleaning-1" id="toc-cleaning-1">Cleaning</a>
- <a href="#manipulate-vectors-of-names-with-make_clean_names"
id="toc-manipulate-vectors-of-names-with-make_clean_names">Manipulate
vectors of names with <code>make_clean_names()</code></a>
- <a href="#validate-that-a-column-has-a-single_value-per-group"
id="toc-validate-that-a-column-has-a-single_value-per-group">Validate
that a column has a <code>single_value()</code> per group</a>
- <a href="#remove_empty-rows-and-columns"
id="toc-remove_empty-rows-and-columns"><code>remove_empty()</code> rows
and columns</a>
- <a href="#remove_constant-columns"
id="toc-remove_constant-columns"><code>remove_constant()</code>
columns</a>
- <a href="#directionally-consistent-rounding-behavior-with-round_half_up"
id="toc-directionally-consistent-rounding-behavior-with-round_half_up">Directionally-consistent
rounding behavior with <code>round_half_up()</code></a>
- <a
href="#round-decimals-to-precise-fractions-of-a-given-denominator-with-round_to_fraction"
id="toc-round-decimals-to-precise-fractions-of-a-given-denominator-with-round_to_fraction">Round
decimals to precise fractions of a given denominator with
<code>round_to_fraction()</code></a>
- <a href="#fix-dates-stored-as-serial-numbers-with-excel_numeric_to_date"
id="toc-fix-dates-stored-as-serial-numbers-with-excel_numeric_to_date">Fix
dates stored as serial numbers with
<code>excel_numeric_to_date()</code></a>
- <a href="#convert-a-mix-of-date-and-datetime-formats-to-date"
id="toc-convert-a-mix-of-date-and-datetime-formats-to-date">Convert a
mix of date and datetime formats to date</a>
- <a href="#elevate-column-names-stored-in-a-dataframe-row"
id="toc-elevate-column-names-stored-in-a-dataframe-row">Elevate column
names stored in a data.frame row</a>
- <a href="#find-the-header-row-buried-within-a-messy-dataframe"
id="toc-find-the-header-row-buried-within-a-messy-dataframe">Find the
header row buried within a messy data.frame</a>
- <a href="#exploring-1" id="toc-exploring-1">Exploring</a>
- <a
href="#count-factor-levels-in-groups-of-high-medium-and-low-with-top_levels"
id="toc-count-factor-levels-in-groups-of-high-medium-and-low-with-top_levels">Count
factor levels in groups of high, medium, and low with
<code>top_levels()</code></a>
2024-12-18

- [Major functions](#major-functions)
- [Cleaning](#cleaning)
- [Clean data.frame names with
`clean_names()`](#clean-dataframe-names-with-clean_names)
- [Do those data.frames actually contain the same
columns?](#do-those-dataframes-actually-contain-the-same-columns)
- [Exploring](#exploring)
- [`tabyl()` - a better version of
`table()`](#tabyl---a-better-version-of-table)
- [Explore records with duplicated values for specific combinations
of variables with
`get_dupes()`](#explore-records-with-duplicated-values-for-specific-combinations-of-variables-with-get_dupes)
- [Explore relationships between columns with
`get_one_to_one()`](#explore-relationships-between-columns-with-get_one_to_one)
- [Minor functions](#minor-functions)
- [Cleaning](#cleaning-1)
- [Manipulate vectors of names with
`make_clean_names()`](#manipulate-vectors-of-names-with-make_clean_names)
- [Validate that a column has a `single_value()` per
group](#validate-that-a-column-has-a-single_value-per-group)
- [`remove_empty()` rows and
columns](#remove_empty-rows-and-columns)
- [`remove_constant()` columns](#remove_constant-columns)
- [Directionally-consistent rounding behavior with
`round_half_up()`](#directionally-consistent-rounding-behavior-with-round_half_up)
- [Round decimals to precise fractions of a given denominator with
`round_to_fraction()`](#round-decimals-to-precise-fractions-of-a-given-denominator-with-round_to_fraction)
- [Fix dates stored as serial numbers with
`excel_numeric_to_date()`](#fix-dates-stored-as-serial-numbers-with-excel_numeric_to_date)
- [Convert a mix of date and datetime formats to
date](#convert-a-mix-of-date-and-datetime-formats-to-date)
- [Elevate column names stored in a data.frame
row](#elevate-column-names-stored-in-a-dataframe-row)
- [Find the header row buried within a messy
data.frame](#find-the-header-row-buried-within-a-messy-dataframe)
- [Exploring](#exploring-1)
- [Count factor levels in groups of high, medium, and low with
`top_levels()`](#count-factor-levels-in-groups-of-high-medium-and-low-with-top_levels)

The janitor functions expedite the initial data exploration and cleaning
that comes with any new data set. This catalog describes the usage for
Expand All @@ -78,7 +55,7 @@ Functions for everyday use.

Call this function every time you read data.

It works in a `%>%` pipeline, and handles problematic variable names,
It works in a `%>%` pipeline and handles problematic variable names,
especially those that are so well-preserved by `readxl::read_excel()`
and `readr::read_csv()`.

Expand All @@ -94,8 +71,10 @@ and `readr::read_csv()`.
``` r
# Create a data.frame with dirty names
test_df <- as.data.frame(matrix(ncol = 6))
names(test_df) <- c("firstName", "ábc@!*", "% successful (2009)",
"REPEAT VALUE", "REPEAT VALUE", "")
names(test_df) <- c(
"firstName", "ábc@!*", "% successful (2009)",
"REPEAT VALUE", "REPEAT VALUE", ""
)
```

Clean the variable names, returning a data.frame:
Expand All @@ -111,8 +90,8 @@ Compare to what base R produces:

``` r
make.names(names(test_df))
#> [1] "firstName" "ábc..." "X..successful..2009." "REPEAT.VALUE" "REPEAT.VALUE"
#> [6] "X"
#> [1] "firstName" "ábc..." "X..successful..2009."
#> [4] "REPEAT.VALUE" "REPEAT.VALUE" "X"
```

This function is powered by the underlying exported function
Expand Down Expand Up @@ -229,10 +208,11 @@ sets of one-to-one clusters:

``` r
library(dplyr)
starwars[1:4,] %>%
starwars[1:4, ] %>%
get_one_to_one()
#> [[1]]
#> [1] "name" "height" "mass" "skin_color" "birth_year" "films"
#> [1] "name" "height" "mass" "skin_color" "birth_year"
#> [6] "films"
#>
#> [[2]]
#> [1] "hair_color" "starships"
Expand All @@ -250,7 +230,7 @@ than the equivalent code they replace.

### Manipulate vectors of names with `make_clean_names()`

Like base R’s `make.names()`, but with the stylings and case choice of
Like base R’s `make.names()`, but with the styling and case choice of
the long-time janitor function `clean_names()`. While `clean_names()` is
still offered for use in data.frame pipeline with `%>%`,
`make_clean_names()` allows for more general usage, e.g., on a vector.
Expand All @@ -273,7 +253,7 @@ tibble::as_tibble(iris, .name_repair = janitor::make_clean_names)
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # … with 140 more rows
#> # 140 more rows
```

### Validate that a column has a `single_value()` per group
Expand All @@ -290,7 +270,8 @@ where it should not:
``` r
not_one_to_one <- data.frame(
X = rep(1:3, each = 2),
Y = c(rep(1:2, each = 2), 1:2))
Y = c(rep(1:2, each = 2), 1:2)
)

not_one_to_one
#> X Y
Expand All @@ -303,12 +284,13 @@ not_one_to_one

# throws informative error:
try(not_one_to_one %>%
dplyr::group_by(X) %>%
dplyr::mutate(
Z = single_value(Y, info = paste("Calculating Z for group X =", X)))
)
dplyr::group_by(X) %>%
dplyr::mutate(
Z = single_value(Y, info = paste("Calculating Z for group X =", X))
))
#> Error in dplyr::mutate(., Z = single_value(Y, info = paste("Calculating Z for group X =", :
#> ℹ In argument: `Z = single_value(Y, info = paste("Calculating Z for group X =", X))`.
#> ℹ In argument: `Z = single_value(Y, info = paste("Calculating Z for
#> group X =", X))`.
#> ℹ In group 3: `X = 3`.
#> Caused by error in `single_value()`:
#> ! More than one (2) value found (1, 2): Calculating Z for group X = 3: Calculating Z for group X = 3
Expand All @@ -320,9 +302,11 @@ Does what it says. For cases like cleaning Excel files that contain
empty rows and columns after being read into R.

``` r
q <- data.frame(v1 = c(1, NA, 3),
v2 = c(NA, NA, NA),
v3 = c("a", NA, "b"))
q <- data.frame(
v1 = c(1, NA, 3),
v2 = c(NA, NA, NA),
v3 = c("a", NA, "b")
)
q %>%
remove_empty(c("rows", "cols"))
#> v1 v3
Expand Down Expand Up @@ -419,8 +403,10 @@ names of the data.frame and optionally (by default) remove the row in
which names were stored and/or the rows above it.

``` r
dirt <- data.frame(X_1 = c(NA, "ID", 1:3),
X_2 = c(NA, "Value", 4:6))
dirt <- data.frame(
X_1 = c(NA, "ID", 1:3),
X_2 = c(NA, "Value", 4:6)
)

row_to_names(dirt, 2)
#> ID Value
Expand Down Expand Up @@ -454,7 +440,8 @@ grouped into head/middle/tail groups.

``` r
f <- factor(c("strongly agree", "agree", "neutral", "neutral", "disagree", "strongly agree"),
levels = c("strongly agree", "agree", "neutral", "disagree", "strongly disagree"))
levels = c("strongly agree", "agree", "neutral", "disagree", "strongly disagree")
)
top_levels(f)
#> f n percent
#> strongly agree, agree 3 0.5000000
Expand Down
6 changes: 3 additions & 3 deletions vignettes/tabyls.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ humans %>%
function or using janitor’s `round_half_up()` to round all ties up
([thanks,
StackOverflow](https://stackoverflow.com/a/12688836/4470365)).
- e.g., round 10.5 up to 11, consistent with Excels tie-breaking
- e.g., round 10.5 up to 11, consistent with Excel's tie-breaking
behavior.
- This contrasts with rounding 10.5 down to 10 as in base R’s
`round(10.5)`.
Expand All @@ -263,7 +263,7 @@ humans %>%
`adorn_pct_formatting()`; these two functions should not be called
together.
- **`adorn_ns()`**: add Ns to a tabyl. These can be drawn from the
tabyls underlying counts, which are attached to the tabyl as
tabyl's underlying counts, which are attached to the tabyl as
metadata, or they can be supplied by the user.
- **`adorn_title()`**: add a title to a tabyl (or other data.frame).
Options include putting the column title in a new row on top of the
Expand Down Expand Up @@ -427,7 +427,7 @@ comparison %>%
#> Total 100.0% (3,000) 100.0% (3,000) 100.0% (6,000)
```

Now we format them to insert the thousands commas. A tabyls raw Ns are
Now we format them to insert the thousands commas. A tabyl's raw Ns are
stored in its `"core"` attribute. Here we retrieve those with `attr()`,
then apply the base R function `format()` to all numeric columns.
Lastly, we append these Ns using `adorn_ns()`.
Expand Down

0 comments on commit f9ec0eb

Please sign in to comment.