Fix spelling issues, update janitor.md

sfirke · Dec 18, 2024 · f9ec0eb · f9ec0eb
1 parent 42199d8
commit f9ec0eb
Show file tree

Hide file tree

Showing 6 changed files with 82 additions and 93 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -44,6 +44,7 @@ Suggests:
     rmarkdown,
     RSQLite,
     sf,
+    spelling,
     testthat (>= 3.0.0),
     tibble,
     tidygraph
@@ -54,3 +55,4 @@ Encoding: UTF-8
 LazyData: true
 Roxygen: list(markdown = TRUE)
 RoxygenNote: 7.3.2
+Language: en-US
diff --git a/NEWS.md b/NEWS.md
@@ -106,7 +106,7 @@ These are all minor breaking changes resulting from enhancements and are not exp
 
 ## New features
 
-* The `adorn_totals()` function now accepts the special argument `fill = NA`, which will insert a class-appropriate `NA` value into each column that isn't being totaled.  This preserves the class of each column; previously they were all convered to character. (thanks **@hamstr147** for implementing in #404 and **@ymer** for reporting in #298).
+* The `adorn_totals()` function now accepts the special argument `fill = NA`, which will insert a class-appropriate `NA` value into each column that isn't being totaled.  This preserves the class of each column; previously they were all converted to character. (thanks **@hamstr147** for implementing in #404 and **@ymer** for reporting in #298).
 
 * `adorn_totals()` now takes the value of `"both"` for the `where` argument.  That is, `adorn_totals("both")` is a shorter version of `adorn_totals(c("col", "row"))`.  (#362, thanks to **@svgsstats** for implementing and **@sfd99** for suggesting).
 
@@ -130,7 +130,7 @@ These are all minor breaking changes resulting from enhancements and are not exp
 
 * A call to make a 3-way `tabyl()` now succeeds when the first variable is of class `ordered` (#386)
 
-* If a totals row and/or column is present on a tabyl as a result of `adorn_totals()`, the functions `chisq.test()` and `fisher.test()` drop the totals and print a warning before proceding with the calculations (#385).
+* If a totals row and/or column is present on a tabyl as a result of `adorn_totals()`, the functions `chisq.test()` and `fisher.test()` drop the totals and print a warning before proceeding with the calculations (#385).
 
 # janitor 2.0.1 (2020-04-12)
 
@@ -276,7 +276,7 @@ This builds on the original functionality of janitor, with similar-but-improved
 
 ### A fully-overhauled `tabyl`
 
-`tabyl()` is now a single function that can count combinations of one, two, or three variables, ala base R's `table()`.  The resulting `tabyl` data.frames can be manipulated and formatted using a family of `adorn_` functions.  See the [tabyls vignette](https://sfirke.github.io/janitor/articles/tabyls.html) for more.
+`tabyl()` is now a single function that can count combinations of one, two, or three variables, a la base R's `table()`.  The resulting `tabyl` data.frames can be manipulated and formatted using a family of `adorn_` functions.  See the [tabyls vignette](https://sfirke.github.io/janitor/articles/tabyls.html) for more.
 
 The now-redundant legacy functions `crosstab()` and `adorn_crosstab()` have been deprecated, but remain in the package for now.  Existing code that relies on the version of `tabyl` present in janitor versions <= 0.3.1 will break if the `sort` argument was used, as that argument no longer exists in `tabyl` (use `dplyr::arrange()` instead).
 
@@ -292,7 +292,7 @@ No further changes are planned to `clean_names()` and its results should be stab
 
 ## Major features
 
-- `clean_names()` transliterates accented letters, e.g., `çãüœ` becomes `cauoe` [(#120)](https://github.com/sfirke/janitor/issues/120).  Thanks to **@fernandovmacedo**.
+- `clean_names()` transliterates accented letters, e.g., `C'C#C<E` becomes `cauoe` [(#120)](https://github.com/sfirke/janitor/issues/120).  Thanks to **@fernandovmacedo**.
 
 - `clean_names()` offers multiple options for variable name styling.  In addition to `snake_case` output you can select `smallCamelCase`, `BigCamelCase`, `ALL_CAPS` and others. [(#131)](https://github.com/sfirke/janitor/issues/131).
   - Thanks to **@tazinho**, who wrote the [snakecase](https://github.com/Tazinho/snakecase/) package that janitor depends on to do this, as well as the patch to incorporate it into `clean_names()`.  And thanks to **@maelle** for proposing this feature.

diff --git a/R/round_half_up.R b/R/round_half_up.R
@@ -1,4 +1,4 @@
-#' Round a numeric vector; halves will be rounded up, ala Microsoft Excel.
+#' Round a numeric vector; halves will be rounded up, a la Microsoft Excel.
 #'
 #' @description
 #' In base R `round()`, halves are rounded to even, e.g., 12.5 and

diff --git a/vignettes/janitor.Rmd b/vignettes/janitor.Rmd
@@ -124,7 +124,7 @@ Smaller functions for use in particular situations.  More human-readable than th
 
 ### Manipulate vectors of names with `make_clean_names()`
 
-Like base R's `make.names()`, but with the stylings and case choice of the long-time janitor function `clean_names()`.  While `clean_names()` is still offered for use in  data.frame pipeline with `%>%`, `make_clean_names()` allows for more general usage, e.g., on a vector.
+Like base R's `make.names()`, but with the styling and case choice of the long-time janitor function `clean_names()`.  While `clean_names()` is still offered for use in  data.frame pipeline with `%>%`, `make_clean_names()` allows for more general usage, e.g., on a vector.
 
 It can also be used as an argument to `.name_repair` in the newest version of `tibble::as_tibble`:
 ```{r}

diff --git a/vignettes/janitor.md b/vignettes/janitor.md
@@ -1,68 +1,45 @@
 Overview of janitor functions
 ================
-2023-02-02
-
-- <a href="#major-functions" id="toc-major-functions">Major functions</a>
-  - <a href="#cleaning" id="toc-cleaning">Cleaning</a>
-    - <a href="#clean-dataframe-names-with-clean_names"
-      id="toc-clean-dataframe-names-with-clean_names">Clean data.frame names
-      with <code>clean_names()</code></a>
-    - <a href="#do-those-dataframes-actually-contain-the-same-columns"
-      id="toc-do-those-dataframes-actually-contain-the-same-columns">Do those
-      data.frames actually contain the same columns?</a>
-  - <a href="#exploring" id="toc-exploring">Exploring</a>
-    - <a href="#tabyl---a-better-version-of-table"
-      id="toc-tabyl---a-better-version-of-table"><code>tabyl()</code> - a
-      better version of <code>table()</code></a>
-    - <a
-      href="#explore-records-with-duplicated-values-for-specific-combinations-of-variables-with-get_dupes"
-      id="toc-explore-records-with-duplicated-values-for-specific-combinations-of-variables-with-get_dupes">Explore
-      records with duplicated values for specific combinations of variables
-      with <code>get_dupes()</code></a>
-    - <a href="#explore-relationships-between-columns-with-get_one_to_one"
-      id="toc-explore-relationships-between-columns-with-get_one_to_one">Explore
-      relationships between columns with <code>get_one_to_one()</code></a>
-- <a href="#minor-functions" id="toc-minor-functions">Minor functions</a>
-  - <a href="#cleaning-1" id="toc-cleaning-1">Cleaning</a>
-    - <a href="#manipulate-vectors-of-names-with-make_clean_names"
-      id="toc-manipulate-vectors-of-names-with-make_clean_names">Manipulate
-      vectors of names with <code>make_clean_names()</code></a>
-    - <a href="#validate-that-a-column-has-a-single_value-per-group"
-      id="toc-validate-that-a-column-has-a-single_value-per-group">Validate
-      that a column has a <code>single_value()</code> per group</a>
-    - <a href="#remove_empty-rows-and-columns"
-      id="toc-remove_empty-rows-and-columns"><code>remove_empty()</code> rows
-      and columns</a>
-    - <a href="#remove_constant-columns"
-      id="toc-remove_constant-columns"><code>remove_constant()</code>
-      columns</a>
-    - <a href="#directionally-consistent-rounding-behavior-with-round_half_up"
-      id="toc-directionally-consistent-rounding-behavior-with-round_half_up">Directionally-consistent
-      rounding behavior with <code>round_half_up()</code></a>
-    - <a
-      href="#round-decimals-to-precise-fractions-of-a-given-denominator-with-round_to_fraction"
-      id="toc-round-decimals-to-precise-fractions-of-a-given-denominator-with-round_to_fraction">Round
-      decimals to precise fractions of a given denominator with
-      <code>round_to_fraction()</code></a>
-    - <a href="#fix-dates-stored-as-serial-numbers-with-excel_numeric_to_date"
-      id="toc-fix-dates-stored-as-serial-numbers-with-excel_numeric_to_date">Fix
-      dates stored as serial numbers with
-      <code>excel_numeric_to_date()</code></a>
-    - <a href="#convert-a-mix-of-date-and-datetime-formats-to-date"
-      id="toc-convert-a-mix-of-date-and-datetime-formats-to-date">Convert a
-      mix of date and datetime formats to date</a>
-    - <a href="#elevate-column-names-stored-in-a-dataframe-row"
-      id="toc-elevate-column-names-stored-in-a-dataframe-row">Elevate column
-      names stored in a data.frame row</a>
-    - <a href="#find-the-header-row-buried-within-a-messy-dataframe"
-      id="toc-find-the-header-row-buried-within-a-messy-dataframe">Find the
-      header row buried within a messy data.frame</a>
-  - <a href="#exploring-1" id="toc-exploring-1">Exploring</a>
-    - <a
-      href="#count-factor-levels-in-groups-of-high-medium-and-low-with-top_levels"
-      id="toc-count-factor-levels-in-groups-of-high-medium-and-low-with-top_levels">Count
-      factor levels in groups of high, medium, and low with
-      <code>top_levels()</code></a>
+2024-12-18
+
+- [Major functions](#major-functions)
+  - [Cleaning](#cleaning)
+    - [Clean data.frame names with
+      `clean_names()`](#clean-dataframe-names-with-clean_names)
+    - [Do those data.frames actually contain the same
+      columns?](#do-those-dataframes-actually-contain-the-same-columns)
+  - [Exploring](#exploring)
+    - [`tabyl()` - a better version of
+      `table()`](#tabyl---a-better-version-of-table)
+    - [Explore records with duplicated values for specific combinations
+      of variables with
+      `get_dupes()`](#explore-records-with-duplicated-values-for-specific-combinations-of-variables-with-get_dupes)
+    - [Explore relationships between columns with
+      `get_one_to_one()`](#explore-relationships-between-columns-with-get_one_to_one)
+- [Minor functions](#minor-functions)
+  - [Cleaning](#cleaning-1)
+    - [Manipulate vectors of names with
+      `make_clean_names()`](#manipulate-vectors-of-names-with-make_clean_names)
+    - [Validate that a column has a `single_value()` per
+      group](#validate-that-a-column-has-a-single_value-per-group)
+    - [`remove_empty()` rows and
+      columns](#remove_empty-rows-and-columns)
+    - [`remove_constant()` columns](#remove_constant-columns)
+    - [Directionally-consistent rounding behavior with
+      `round_half_up()`](#directionally-consistent-rounding-behavior-with-round_half_up)
+    - [Round decimals to precise fractions of a given denominator with
+      `round_to_fraction()`](#round-decimals-to-precise-fractions-of-a-given-denominator-with-round_to_fraction)
+    - [Fix dates stored as serial numbers with
+      `excel_numeric_to_date()`](#fix-dates-stored-as-serial-numbers-with-excel_numeric_to_date)
+    - [Convert a mix of date and datetime formats to
+      date](#convert-a-mix-of-date-and-datetime-formats-to-date)
+    - [Elevate column names stored in a data.frame
+      row](#elevate-column-names-stored-in-a-dataframe-row)
+    - [Find the header row buried within a messy
+      data.frame](#find-the-header-row-buried-within-a-messy-dataframe)
+  - [Exploring](#exploring-1)
+    - [Count factor levels in groups of high, medium, and low with
+      `top_levels()`](#count-factor-levels-in-groups-of-high-medium-and-low-with-top_levels)
 
 The janitor functions expedite the initial data exploration and cleaning
 that comes with any new data set. This catalog describes the usage for
@@ -78,7 +55,7 @@ Functions for everyday use.
 
 Call this function every time you read data.
 
-It works in a `%>%` pipeline, and handles problematic variable names,
+It works in a `%>%` pipeline and handles problematic variable names,
 especially those that are so well-preserved by `readxl::read_excel()`
 and `readr::read_csv()`.
 
@@ -94,8 +71,10 @@ and `readr::read_csv()`.
 ``` r
 # Create a data.frame with dirty names
 test_df <- as.data.frame(matrix(ncol = 6))
-names(test_df) <- c("firstName", "ábc@!*", "% successful (2009)",
-                    "REPEAT VALUE", "REPEAT VALUE", "")
+names(test_df) <- c(
+  "firstName", "ábc@!*", "% successful (2009)",
+  "REPEAT VALUE", "REPEAT VALUE", ""
+)
 ```
 
 Clean the variable names, returning a data.frame:
@@ -111,8 +90,8 @@ Compare to what base R produces:
 
 ``` r
 make.names(names(test_df))
-#> [1] "firstName"            "ábc..."               "X..successful..2009." "REPEAT.VALUE"         "REPEAT.VALUE"        
-#> [6] "X"
+#> [1] "firstName"            "ábc..."               "X..successful..2009."
+#> [4] "REPEAT.VALUE"         "REPEAT.VALUE"         "X"
 ```
 
 This function is powered by the underlying exported function
@@ -229,10 +208,11 @@ sets of one-to-one clusters:
 
 ``` r
 library(dplyr)
-starwars[1:4,] %>%
+starwars[1:4, ] %>%
   get_one_to_one()
 #> [[1]]
-#> [1] "name"       "height"     "mass"       "skin_color" "birth_year" "films"     
+#> [1] "name"       "height"     "mass"       "skin_color" "birth_year"
+#> [6] "films"     
 #> 
 #> [[2]]
 #> [1] "hair_color" "starships" 
@@ -250,7 +230,7 @@ than the equivalent code they replace.
 
 ### Manipulate vectors of names with `make_clean_names()`
 
-Like base R’s `make.names()`, but with the stylings and case choice of
+Like base R’s `make.names()`, but with the styling and case choice of
 the long-time janitor function `clean_names()`. While `clean_names()` is
 still offered for use in data.frame pipeline with `%>%`,
 `make_clean_names()` allows for more general usage, e.g., on a vector.
@@ -273,7 +253,7 @@ tibble::as_tibble(iris, .name_repair = janitor::make_clean_names)
 #>  8          5           3.4          1.5         0.2 setosa 
 #>  9          4.4         2.9          1.4         0.2 setosa 
 #> 10          4.9         3.1          1.5         0.1 setosa 
-#> # … with 140 more rows
+#> # ℹ 140 more rows
 ```
 
 ### Validate that a column has a `single_value()` per group
@@ -290,7 +270,8 @@ where it should not:
 ``` r
 not_one_to_one <- data.frame(
   X = rep(1:3, each = 2),
-  Y = c(rep(1:2, each = 2), 1:2))
+  Y = c(rep(1:2, each = 2), 1:2)
+)
 
 not_one_to_one
 #>   X Y
@@ -303,12 +284,13 @@ not_one_to_one
 
 # throws informative error:
 try(not_one_to_one %>%
-      dplyr::group_by(X) %>%
-      dplyr::mutate(
-        Z = single_value(Y, info = paste("Calculating Z for group X =", X)))
-      )
+  dplyr::group_by(X) %>%
+  dplyr::mutate(
+    Z = single_value(Y, info = paste("Calculating Z for group X =", X))
+  ))
 #> Error in dplyr::mutate(., Z = single_value(Y, info = paste("Calculating Z for group X =",  : 
-#>   ℹ In argument: `Z = single_value(Y, info = paste("Calculating Z for group X =", X))`.
+#>   ℹ In argument: `Z = single_value(Y, info = paste("Calculating Z for
+#>   group X =", X))`.
 #> ℹ In group 3: `X = 3`.
 #> Caused by error in `single_value()`:
 #> ! More than one (2) value found (1, 2): Calculating Z for group X = 3: Calculating Z for group X = 3
@@ -320,9 +302,11 @@ Does what it says. For cases like cleaning Excel files that contain
 empty rows and columns after being read into R.
 
 ``` r
-q <- data.frame(v1 = c(1, NA, 3),
-                v2 = c(NA, NA, NA),
-                v3 = c("a", NA, "b"))
+q <- data.frame(
+  v1 = c(1, NA, 3),
+  v2 = c(NA, NA, NA),
+  v3 = c("a", NA, "b")
+)
 q %>%
   remove_empty(c("rows", "cols"))
 #>   v1 v3
@@ -419,8 +403,10 @@ names of the data.frame and optionally (by default) remove the row in
 which names were stored and/or the rows above it.
 
 ``` r
-dirt <- data.frame(X_1 = c(NA, "ID", 1:3),
-           X_2 = c(NA, "Value", 4:6))
+dirt <- data.frame(
+  X_1 = c(NA, "ID", 1:3),
+  X_2 = c(NA, "Value", 4:6)
+)
 
 row_to_names(dirt, 2)
 #>   ID Value
@@ -454,7 +440,8 @@ grouped into head/middle/tail groups.
 
 ``` r
 f <- factor(c("strongly agree", "agree", "neutral", "neutral", "disagree", "strongly agree"),
-            levels = c("strongly agree", "agree", "neutral", "disagree", "strongly disagree"))
+  levels = c("strongly agree", "agree", "neutral", "disagree", "strongly disagree")
+)
 top_levels(f)
 #>                            f n   percent
 #>        strongly agree, agree 3 0.5000000

diff --git a/vignettes/tabyls.md b/vignettes/tabyls.md
@@ -254,7 +254,7 @@ humans %>%
   function or using janitor’s `round_half_up()` to round all ties up
   ([thanks,
   StackOverflow](https://stackoverflow.com/a/12688836/4470365)).
-  - e.g., round 10.5 up to 11, consistent with Excel’s tie-breaking
+  - e.g., round 10.5 up to 11, consistent with Excel's tie-breaking
     behavior.
     - This contrasts with rounding 10.5 down to 10 as in base R’s
       `round(10.5)`.
@@ -263,7 +263,7 @@ humans %>%
     `adorn_pct_formatting()`; these two functions should not be called
     together.
 - **`adorn_ns()`**: add Ns to a tabyl. These can be drawn from the
-  tabyl’s underlying counts, which are attached to the tabyl as
+  tabyl's underlying counts, which are attached to the tabyl as
   metadata, or they can be supplied by the user.
 - **`adorn_title()`**: add a title to a tabyl (or other data.frame).
   Options include putting the column title in a new row on top of the
@@ -427,7 +427,7 @@ comparison %>%
 #>     Total 100.0% (3,000) 100.0% (3,000) 100.0% (6,000)
 ```
 
-Now we format them to insert the thousands commas. A tabyl’s raw Ns are
+Now we format them to insert the thousands commas. A tabyl's raw Ns are
 stored in its `"core"` attribute. Here we retrieve those with `attr()`,
 then apply the base R function `format()` to all numeric columns.
 Lastly, we append these Ns using `adorn_ns()`.