Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bring add_totals functions up to v1.0 as add_totals #99

Merged
merged 30 commits into from
Mar 30, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
3fbd673
add_totals functions now work w/ non-numeric cols
sfirke Feb 2, 2017
5e90189
helpers nearly done taking non-numeric cols
sfirke Feb 3, 2017
a50f938
resolved first column should be made into character for add_totals_row
sfirke Feb 3, 2017
e9f24a3
whoops ns_to_percent does not need a fill argument
sfirke Feb 3, 2017
69a6a7e
note changes to adorn helpers in news
sfirke Feb 5, 2017
5c70a27
Merge branch 'master' into improve_adorn_helpers
sfirke Mar 19, 2017
4e8950e
eliminate check of non-numeric 1st col for add_totals functions
sfirke Mar 19, 2017
81f3e6a
note the change to adorn_crosstab arguments in NEWS
sfirke Mar 19, 2017
c8893e4
merged add_totals_row and add_totals_col into a single add_totals()
sfirke Mar 19, 2017
55f93c9
minor tweaks for add_totals function
sfirke Mar 19, 2017
c326e44
add_totals works on grouped_df
sfirke Mar 19, 2017
7059535
complete test coverage, improve error message for add_totals
sfirke Mar 19, 2017
2677369
putting add_totals_col, add_totals_row back as deprecated functions
sfirke Mar 19, 2017
7b1792f
add tests for deprecated add_totals_* functions
sfirke Mar 19, 2017
82b0aac
added test for adorn_crosstab on factor input
sfirke Mar 20, 2017
f3b72de
this test passes on my PC, breaking apart to see why failing on Travis
sfirke Mar 20, 2017
7bd5166
Merge branch 'master' into improve_adorn_helpers
sfirke Mar 22, 2017
b9d0cfa
switching from as_tibble to as_data_frame in test
sfirke Mar 22, 2017
e6852b0
tell Travis not to treat warnings as errors
sfirke Mar 22, 2017
a1325bf
removing Travis tests on OSX
sfirke Mar 22, 2017
af68304
fix agreement error typo
sfirke Mar 22, 2017
0bd3f50
adding test coverage comments to PRs
sfirke Mar 22, 2017
18053da
add_totals("col") retains input factor class in 1st col
sfirke Mar 22, 2017
3db268c
adjust adorn_crosstab description, check all columns 2:n are numeric
sfirke Mar 22, 2017
c79e6cb
fix small typo in adorn_crosstab
sfirke Mar 22, 2017
bf9b39f
ns_to_percents works on a data.frame with just one numeric col
sfirke Mar 22, 2017
83b4df5
rename add_totals() to adorn_totals()
sfirke Mar 30, 2017
c23f669
add link to dplyr select_if on bad names issue
sfirke Mar 30, 2017
d57477d
update NEWS with adorn_totals() features
sfirke Mar 30, 2017
6fd5cf7
note in NEWS that adorn_crosstab works on a 2-column df
sfirke Mar 30, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@ r:
- release
- devel

warnings_are_errors: false


os:
- linux
- osx
# - osx

cache: packages

r_github_packages:
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ S3method(tabyl,default)
export(add_totals_col)
export(add_totals_row)
export(adorn_crosstab)
export(adorn_totals)
export(clean_names)
export(convert_to_NA)
export(crosstab)
Expand Down
9 changes: 9 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,25 @@ NEWS

# janitor 0.2.1.9000 (in progress)

## Breaking changes
* The first argument of `adorn_crosstab()` is now "dat" instead of "crosstab" (since the function can be called on any data.frame, not just a result of `crosstab()`)
* The functions `add_totals_row` and `add_totals_col` were combined into a single function, `adorn_totals()`. [(#57)](https://github.com/sfirke/janitor/issues/57). The `add_totals_` functions are now deprecated and should not be used.

## Features


### Major


### Minor
* `adorn_totals()` and `ns_to_percents()` can now be called on data.frames that have non-numeric columns beyond the first one (they will be ignored) [(#57)](https://github.com/sfirke/janitor/issues/57)
* `adorn_totals("col")` retains factor class in 1st column if that was the input

## Bug fixes
* Long variable names with spaces no longer break `tabyl()` and `crosstab()` [(#87)](https://github.com/sfirke/janitor/issues/87)
* `clean_names()` now handles leading spaces [(#85)](https://github.com/sfirke/janitor/issues/85)
* `adorn_crosstab()` and `ns_to_percents()` work on a 2-column data.frame [(#89)](https://github.com/sfirke/janitor/issues/89)
* `adorn_totals()` now works on a grouped tibble [(#97)](https://github.com/sfirke/janitor/issues/97)

# janitor 0.2.1 (Release date: 2016-10-30)

Expand Down
83 changes: 69 additions & 14 deletions R/add_totals.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,67 @@
#' @title Append a totals row and/or column to a data.frame.
#'
#' @description
#' This function excludes the first column of the input data.frame, assuming it's a descriptive variable not to be summed. It also excludes other non-numeric columns.
#'
#' @param dat an input data.frame with at least one numeric column.
#' @param which one of "row", "col", or \code{c("row", "col")}
#' @param fill if there are multiple non-numeric columns, what string should fill the bottom row of those columns?
#' @param na.rm should missing values (including NaN) be omitted from the calculations?
#' @return Returns a data.frame augmented with a totals row, column, or both.
#' @export
#' @examples
#' library(dplyr) # for the %>% pipe
#' mtcars %>%
#' crosstab(am, cyl) %>%
#' adorn_totals()


adorn_totals <- function(dat, which = c("row", "col"), fill = "-", na.rm = TRUE){
if("grouped_df" %in% class(dat)){ dat <- dplyr::ungroup(dat) } # grouped_df causes problems, #97

if(sum(unlist(lapply(dat, is.numeric))[-1]) == 0){stop("at least one one of columns 2:n must be of class numeric")}

if("row" %in% which){
dat[[1]] <- as.character(dat[[1]]) # for type matching when binding the word "Total" on a factor when adding Totals row
# creates the totals row to be appended
col_vec <- function(a_col, na_rm = na.rm){
if(is.numeric(a_col)){ # can't do this with if_else because it doesn't like the sum() of a character vector, even if that clause is not reached
sum(a_col, na.rm = na_rm)
} else {fill}
}

col_totals <- lapply(dat, col_vec) %>%
as.data.frame(stringsAsFactors = FALSE) %>%
stats::setNames(names(dat))

col_totals[nrow(col_totals), 1] <- "Total" # replace final row, first column with "Total"
dat <- dplyr::bind_rows(dat %>%
stats::setNames(names(dat)) %>%
dplyr::mutate_at(1, as.character), col_totals)
}

if("col" %in% which){
# Add totals col
clean_dat <- clean_names(dat) # bad names will make select_if choke; this may get fixed, see https://github.com/hadley/dplyr/issues/2243 but work around it for now w/ this line
row_totals <- clean_dat %>%
dplyr::select(-1) %>% # don't include the first column, even if numeric
dplyr::select_if(is.numeric) %>%
dplyr::transmute(Total = rowSums(., na.rm = na.rm))

dat$Total <- row_totals$Total
}

dat
}

### Deprecated functions -----------------------------
#' @title Append a totals row to a data.frame.
#'
#' @description
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the help reference that it is deprecated?

Copy link
Owner Author

@sfirke sfirke Mar 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good call. I'll add that.

  • Deprecate the retired add_totals_* functions in their documentation

#' This function excludes the first column of the input data.frame, assuming that it contains a descriptive variable not to be summed.
#' This function is deprecated, use \code{adorn_totals} instead.
#'
#' @param dat an input data.frame with numeric values in all columns beyond the first.
#' @param dat an input data.frame with at least one numeric column.
#' @param fill if there are more than one non-numeric columns, what string should fill the bottom row of those columns?
#' @param na.rm should missing values (including NaN) be omitted from the calculations?
#' @return Returns a data.frame with a totals row, consisting of "Total" in the first column and column sums in the others.
#' @export
Expand All @@ -14,22 +72,20 @@
#' add_totals_row


add_totals_row <- function(dat, na.rm = TRUE){
check_all_cols_after_first_are_numeric(dat)
dat[[1]] <- as.character(dat[[1]]) # for binding to the "Total" character value of add-on row
col_totals <- data.frame(x1 = "Total", t(colSums(dat[-1], na.rm = na.rm)), stringsAsFactors = FALSE) %>%
stats::setNames(names(dat))
dplyr::bind_rows(dat, col_totals)
add_totals_row <- function(dat, fill = "-", na.rm = TRUE){
.Deprecated("adorn_totals(\"row\")")
adorn_totals(dat, which = "row", fill = fill, na.rm = na.rm)

}

#' @title Append a totals column to a data.frame.
#'
#' @description
#' This function excludes the first column of the input data.frame, assuming that it contains a descriptive variable not to be summed.
#' This function is deprecated, use \code{adorn_totals} instead.
#'
#' @param dat an input data.frame with numeric values in all columns beyond the first.
#' @param dat an input data.frame with at least one numeric column.
#' @param na.rm should missing values (including NaN) be omitted from the calculations?
#' @return Returns a data.frame with a totals column, consisting of "Total" in the first row and row sums in the others.
#' @return Returns a data.frame with a totals column containing row-wise sums.
#' @export
#' @examples
#' library(dplyr) # for the %>% pipe
Expand All @@ -38,8 +94,7 @@ add_totals_row <- function(dat, na.rm = TRUE){
#' add_totals_col

add_totals_col <- function(dat, na.rm = TRUE){
check_all_cols_after_first_are_numeric(dat)
row_totals <- data.frame(Total = rowSums(dat[-1], na.rm = na.rm))
dplyr::bind_cols(dat, row_totals)
.Deprecated("adorn_totals(\"col\")")
adorn_totals(dat, which = "col", fill = "-", na.rm = na.rm)
}

27 changes: 14 additions & 13 deletions R/adorn_crosstab.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
#' @title Add formatting to a crosstabulation table.
#' @title Add presentation formatting to a crosstabulation table.
#'
#' @description
#' Designed to run on the output of a call to \code{crosstab}, this adds formatting, percentage sign, Ns, totals row/column, and custom rounding to a table of numeric values. The result is no longer clean data, but it saves time in reporting table results.
#' Formats a data.frame containing counts of co-occurences of two variables (i.e., a contingency table or crosstab). Adds a mix of percentages, Ns, totals row/column, and custom rounding to a table of integer counts, in the style of a Microsoft Excel PivotTable. The result is no longer clean data, but is an audience-friendly way to report results.
#'
#' Designed to run on the output of a call to \code{janitor::crosstab}, but can be called on any data.frame containing a contingency table, e.g., the result of \code{dplyr::count()} followed by \code{tidyr::spread()}.
#'
#' @param crosstab a data.frame with row names in the first column and numeric values in all other columns. Usually the piped-in result of a call to \code{crosstab} that included the argument \code{percent = "none"}.
#' @param dat a data.frame with row names in the first column and numeric values in all other columns. Usually the piped-in result of a call to \code{crosstab} that included the argument \code{percent = "none"}.
#' @param denom the denominator to use for calculating percentages. One of "row", "col", or "all".
#' @param show_n should counts be displayed alongside the percentages?
#' @param digits how many digits should be displayed after the decimal point?
Expand All @@ -28,23 +30,22 @@

# take result of a crosstab() call and print a nice result
#' @export
adorn_crosstab <- function(crosstab, denom = "row", show_n = TRUE, digits = 1, show_totals = FALSE, rounding = "half to even"){
adorn_crosstab <- function(dat, denom = "row", show_n = TRUE, digits = 1, show_totals = FALSE, rounding = "half to even"){
# some input checks
if(! rounding %in% c("half to even", "half up")){stop("'rounding' must be one of 'half to even' or 'half up'")}
check_all_cols_after_first_are_numeric(crosstab)

crosstab[[1]] <- as.character(crosstab[[1]]) # for type matching when binding the word "Total" on a factor
dat[[1]] <- as.character(dat[[1]]) # for type matching when binding the word "Total" on a factor. Moved up to this line so that if only 1st col is numeric, the function errors
if(sum(!unlist(lapply(dat, is.numeric))[-1]) > 0){stop("all columns 2:n in input data.frame must be of class numeric")}

showing_col_totals <- (show_totals & denom %in% c("col", "all"))
showing_row_totals <- (show_totals & denom %in% c("row", "all"))

complete_n <- complete_n <- sum(crosstab[, -1], na.rm = TRUE) # capture for percent calcs before any totals col/row is added
complete_n <- sum(dat[, -1], na.rm = TRUE) # capture for percent calcs before any totals col/row is added

if(showing_col_totals){ crosstab <- add_totals_col(crosstab) }
if(showing_row_totals){ crosstab <- add_totals_row(crosstab) }
n_col <- ncol(crosstab)
if(showing_col_totals){ dat <- adorn_totals(dat, "col") }
if(showing_row_totals){ dat <- adorn_totals(dat, "row") }
n_col <- ncol(dat)

percs <- ns_to_percents(crosstab, denom, total_n = complete_n) # last argument only gets used in the "all" case = no harm in passing otherwise
percs <- ns_to_percents(dat, denom, total_n = complete_n) # last argument only gets used in the "all" case = no harm in passing otherwise

# round %s using specified method, add % sign
percs <- dplyr::mutate_at(percs, dplyr::vars(2:n_col), dplyr::funs(. * 100)) # since we'll be adding % sign - do this before rounding
Expand All @@ -56,7 +57,7 @@ adorn_crosstab <- function(crosstab, denom = "row", show_n = TRUE, digits = 1, s

# paste Ns if needed
if(show_n){
result <- paste_ns(percs, crosstab)
result <- paste_ns(percs, dat)
} else{ result <- percs}

as.data.frame(result) # drop back to data.frame from tibble
Expand Down
14 changes: 1 addition & 13 deletions R/adorn_helpers.R
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,4 @@ fix_parens_whitespace <- function(x){
fixed = TRUE)
}

}

# check that all columns in a data.frame beyond the first one are numeric
check_all_cols_after_first_are_numeric <- function(x){
non_numeric_count <- x %>%
dplyr::select(-1) %>%
lapply(function(x) !is.numeric(x)) %>%
unlist %>%
sum
if(non_numeric_count > 0){
stop("all columns after the first one must be numeric")
}
}
}
25 changes: 13 additions & 12 deletions R/ns_to_percents.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#' @param denom the denominator to use for calculating percentages. One of "row", "col", or "all".
#' @param na.rm should missing values (including NaN) be omitted from the calculations?
#' @param total_n an optional number to use as the denominator when calculating table-level percentages (when denom = "all"). Supply this if your input data.frame \code{dat} has values that would throw off the denominator if they were included, e.g., if there's a totals row appended to the bottom of the table.
#'
#' @return Returns a data.frame of percentages, expressed as numeric values between 0 and 1.
#' @export
#' @examples
Expand All @@ -18,32 +19,32 @@
#' # when total_n is needed
#' mtcars %>%
#' crosstab(am, cyl) %>%
#' add_totals_row() %>% # add a totals row that should not be included in the denominator
#' adorn_totals("row") %>% # add a totals row that should not be included in the denominator
#' ns_to_percents(denom = "all", total_n = nrow(mtcars)) # specify correct denominator

ns_to_percents <- function(dat, denom = "row", na.rm = TRUE, total_n = NULL){
# catch bad inputs
if(! denom %in% c("row", "col", "all")){stop("'denom' must be one of 'row', 'col', or 'all'")}
check_all_cols_after_first_are_numeric(dat)

numeric_cols <- which(unlist(lapply(dat, is.numeric)))
numeric_cols <- setdiff(numeric_cols, 1) # assume 1st column should not be included so remove it from numeric_cols. Moved up to this line so that if only 1st col is numeric, the function errors
if(length(numeric_cols) == 0){stop("at least one one of columns 2:n must be of class numeric")}

if(!is.null(total_n)){
if(!is.numeric(total_n)){stop("override_n must be numeric")}
complete_n <- total_n
} else{
complete_n <- sum(dat[, -1], na.rm = TRUE)
complete_n <- sum(dat[, numeric_cols], na.rm = TRUE)
}


n_col <- ncol(dat)


if(denom == "row"){
row_sum <- rowSums(dat[, 2:n_col], na.rm = na.rm)
dat[, 2:n_col] <- dat[, 2:n_col] / row_sum
row_sum <- rowSums(dat[numeric_cols], na.rm = na.rm)
dat[, numeric_cols] <- dat[numeric_cols] / row_sum
} else if(denom == "col"){
col_sum <- colSums(dat[, 2:n_col], na.rm = na.rm)
dat[, 2:n_col] <- sweep(dat[, 2:n_col], 2, col_sum,`/`) # from http://stackoverflow.com/questions/9447801/dividing-columns-by-colsums-in-r
col_sum <- colSums(dat[numeric_cols], na.rm = na.rm)
dat[, numeric_cols] <- sweep(dat[numeric_cols], 2, col_sum,`/`) # from http://stackoverflow.com/questions/9447801/dividing-columns-by-colsums-in-r
} else if(denom == "all"){
dat[, 2:n_col] <- dat[, 2:n_col] / complete_n
dat[numeric_cols] <- dat[numeric_cols] / complete_n
}

dat
Expand Down
8 changes: 7 additions & 1 deletion codecov.yml
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
comment: false
comment:
layout: "reach, diff, flags, files"
behavior: default
require_changes: false # if true: only post the comment if coverage changes
require_base: no # [yes :: must have a base report to post]
require_head: yes # [yes :: must have a head report to post]
branches: null
6 changes: 3 additions & 3 deletions man/add_totals_col.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions man/add_totals_row.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 6 additions & 4 deletions man/adorn_crosstab.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 29 additions & 0 deletions man/adorn_totals.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading