-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bring add_totals functions up to v1.0 as add_totals #99
Changes from 26 commits
3fbd673
5e90189
a50f938
e9f24a3
69a6a7e
5c70a27
4e8950e
81f3e6a
c8893e4
55f93c9
c326e44
7059535
2677369
7b1792f
82b0aac
f3b72de
7bd5166
b9d0cfa
e6852b0
a1325bf
af68304
0bd3f50
18053da
3db268c
c79e6cb
bf9b39f
83b4df5
c23f669
d57477d
6fd5cf7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,67 @@ | ||
#' @title Append a totals row and/or column to a data.frame. | ||
#' | ||
#' @description | ||
#' This function excludes the first column of the input data.frame, assuming it's a descriptive variable not to be summed. It also excludes other non-numeric columns. | ||
#' | ||
#' @param dat an input data.frame with at least one numeric column. | ||
#' @param which one of "row", "col", or \code{c("row", "col")} | ||
#' @param fill if there are multiple non-numeric columns, what string should fill the bottom row of those columns? | ||
#' @param na.rm should missing values (including NaN) be omitted from the calculations? | ||
#' @return Returns a data.frame augmented with a totals row, column, or both. | ||
#' @export | ||
#' @examples | ||
#' library(dplyr) # for the %>% pipe | ||
#' mtcars %>% | ||
#' crosstab(am, cyl) %>% | ||
#' add_totals() | ||
|
||
|
||
add_totals <- function(dat, which = c("row", "col"), fill = "-", na.rm = TRUE){ | ||
if("grouped_df" %in% class(dat)){ dat <- dplyr::ungroup(dat) } # grouped_df causes problems, #97 | ||
|
||
if(sum(unlist(lapply(dat, is.numeric))[-1]) == 0){stop("at least one one of columns 2:n must be of class numeric")} | ||
|
||
if("row" %in% which){ | ||
dat[[1]] <- as.character(dat[[1]]) # for type matching when binding the word "Total" on a factor when adding Totals row | ||
# creates the totals row to be appended | ||
col_vec <- function(a_col, na_rm = na.rm){ | ||
if(is.numeric(a_col)){ # can't do this with if_else because it doesn't like the sum() of a character vector, even if that clause is not reached | ||
sum(a_col, na.rm = na_rm) | ||
} else {fill} | ||
} | ||
|
||
col_totals <- lapply(dat, col_vec) %>% | ||
as.data.frame(stringsAsFactors = FALSE) %>% | ||
stats::setNames(names(dat)) | ||
|
||
col_totals[nrow(col_totals), 1] <- "Total" # replace final row, first column with "Total" | ||
dat <- dplyr::bind_rows(dat %>% | ||
stats::setNames(names(dat)) %>% | ||
dplyr::mutate_at(1, as.character), col_totals) | ||
} | ||
|
||
if("col" %in% which){ | ||
# Add totals col | ||
clean_dat <- clean_names(dat) # bad names will make select_if choke | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reference the relevant dplyr issue here; this should be temporary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
row_totals <- clean_dat %>% | ||
dplyr::select(-1) %>% # don't include the first column, even if numeric | ||
dplyr::select_if(is.numeric) %>% | ||
dplyr::transmute(Total = rowSums(., na.rm = na.rm)) | ||
|
||
dat$Total <- row_totals$Total | ||
} | ||
|
||
dat | ||
} | ||
|
||
### Deprecated functions ----------------------------- | ||
#' @title Append a totals row to a data.frame. | ||
#' | ||
#' @description | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should the help reference that it is deprecated? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, good call. I'll add that.
|
||
#' This function excludes the first column of the input data.frame, assuming that it contains a descriptive variable not to be summed. | ||
#' This function excludes the first column of the input data.frame, assuming it's a descriptive variable not to be summed. It also excludes other non-numeric columns. | ||
#' | ||
#' @param dat an input data.frame with numeric values in all columns beyond the first. | ||
#' @param dat an input data.frame with at least one numeric column. | ||
#' @param fill if there are more than one non-numeric columns, what string should fill the bottom row of those columns? | ||
#' @param na.rm should missing values (including NaN) be omitted from the calculations? | ||
#' @return Returns a data.frame with a totals row, consisting of "Total" in the first column and column sums in the others. | ||
#' @export | ||
|
@@ -14,22 +72,20 @@ | |
#' add_totals_row | ||
|
||
|
||
add_totals_row <- function(dat, na.rm = TRUE){ | ||
check_all_cols_after_first_are_numeric(dat) | ||
dat[[1]] <- as.character(dat[[1]]) # for binding to the "Total" character value of add-on row | ||
col_totals <- data.frame(x1 = "Total", t(colSums(dat[-1], na.rm = na.rm)), stringsAsFactors = FALSE) %>% | ||
stats::setNames(names(dat)) | ||
dplyr::bind_rows(dat, col_totals) | ||
add_totals_row <- function(dat, fill = "-", na.rm = TRUE){ | ||
.Deprecated("add_totals(\"row\")") | ||
add_totals(dat, which = "row", fill = fill, na.rm = na.rm) | ||
|
||
} | ||
|
||
#' @title Append a totals column to a data.frame. | ||
#' | ||
#' @description | ||
#' This function excludes the first column of the input data.frame, assuming that it contains a descriptive variable not to be summed. | ||
#' This function excludes the first column of the input data.frame, assuming it's a descriptive variable not to be summed. It also excludes other non-numeric columns. | ||
#' | ||
#' @param dat an input data.frame with numeric values in all columns beyond the first. | ||
#' @param dat an input data.frame with at least one numeric column. | ||
#' @param na.rm should missing values (including NaN) be omitted from the calculations? | ||
#' @return Returns a data.frame with a totals column, consisting of "Total" in the first row and row sums in the others. | ||
#' @return Returns a data.frame with a totals column containing row-wise sums. | ||
#' @export | ||
#' @examples | ||
#' library(dplyr) # for the %>% pipe | ||
|
@@ -38,8 +94,7 @@ add_totals_row <- function(dat, na.rm = TRUE){ | |
#' add_totals_col | ||
|
||
add_totals_col <- function(dat, na.rm = TRUE){ | ||
check_all_cols_after_first_are_numeric(dat) | ||
row_totals <- data.frame(Total = rowSums(dat[-1], na.rm = na.rm)) | ||
dplyr::bind_cols(dat, row_totals) | ||
.Deprecated("add_totals(\"col\")") | ||
add_totals(dat, which = "col", fill = "-", na.rm = na.rm) | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,7 @@ | |
#' @param denom the denominator to use for calculating percentages. One of "row", "col", or "all". | ||
#' @param na.rm should missing values (including NaN) be omitted from the calculations? | ||
#' @param total_n an optional number to use as the denominator when calculating table-level percentages (when denom = "all"). Supply this if your input data.frame \code{dat} has values that would throw off the denominator if they were included, e.g., if there's a totals row appended to the bottom of the table. | ||
#' | ||
#' @return Returns a data.frame of percentages, expressed as numeric values between 0 and 1. | ||
#' @export | ||
#' @examples | ||
|
@@ -18,32 +19,32 @@ | |
#' # when total_n is needed | ||
#' mtcars %>% | ||
#' crosstab(am, cyl) %>% | ||
#' add_totals_row() %>% # add a totals row that should not be included in the denominator | ||
#' add_totals("row") %>% # add a totals row that should not be included in the denominator | ||
#' ns_to_percents(denom = "all", total_n = nrow(mtcars)) # specify correct denominator | ||
|
||
ns_to_percents <- function(dat, denom = "row", na.rm = TRUE, total_n = NULL){ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One option for a v1.0 approach: This is a function primarily intended to modify a tabyl, so it would be an If you included both It sounds hard to make |
||
# catch bad inputs | ||
if(! denom %in% c("row", "col", "all")){stop("'denom' must be one of 'row', 'col', or 'all'")} | ||
check_all_cols_after_first_are_numeric(dat) | ||
|
||
numeric_cols <- which(unlist(lapply(dat, is.numeric))) | ||
numeric_cols <- setdiff(numeric_cols, 1) # assume 1st column should not be included so remove it from numeric_cols. Moved up to this line so that if only 1st col is numeric, the function errors | ||
if(length(numeric_cols) == 0){stop("at least one one of columns 2:n must be of class numeric")} | ||
|
||
if(!is.null(total_n)){ | ||
if(!is.numeric(total_n)){stop("override_n must be numeric")} | ||
complete_n <- total_n | ||
} else{ | ||
complete_n <- sum(dat[, -1], na.rm = TRUE) | ||
complete_n <- sum(dat[, numeric_cols], na.rm = TRUE) | ||
} | ||
|
||
|
||
n_col <- ncol(dat) | ||
|
||
|
||
if(denom == "row"){ | ||
row_sum <- rowSums(dat[, 2:n_col], na.rm = na.rm) | ||
dat[, 2:n_col] <- dat[, 2:n_col] / row_sum | ||
row_sum <- rowSums(dat[numeric_cols], na.rm = na.rm) | ||
dat[, numeric_cols] <- dat[numeric_cols] / row_sum | ||
} else if(denom == "col"){ | ||
col_sum <- colSums(dat[, 2:n_col], na.rm = na.rm) | ||
dat[, 2:n_col] <- sweep(dat[, 2:n_col], 2, col_sum,`/`) # from http://stackoverflow.com/questions/9447801/dividing-columns-by-colsums-in-r | ||
col_sum <- colSums(dat[numeric_cols], na.rm = na.rm) | ||
dat[, numeric_cols] <- sweep(dat[numeric_cols], 2, col_sum,`/`) # from http://stackoverflow.com/questions/9447801/dividing-columns-by-colsums-in-r | ||
} else if(denom == "all"){ | ||
dat[, 2:n_col] <- dat[, 2:n_col] / complete_n | ||
dat[numeric_cols] <- dat[numeric_cols] / complete_n | ||
} | ||
|
||
dat | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,7 @@ | ||
comment: false | ||
comment: | ||
layout: "reach, diff, flags, files" | ||
behavior: default | ||
require_changes: false # if true: only post the comment if coverage changes | ||
require_base: no # [yes :: must have a base report to post] | ||
require_head: yes # [yes :: must have a head report to post] | ||
branches: null |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer
adorn_totals
as this is something that is primarily intended to modify atabyl
. I think thatadorn_crosstab
may eventually become the outlier in theadorn_
approach. I think I've given this feedback already, though, so I assume that you have heard it and decided on verb_object approach to things that modify a tabyl.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's something to this, and I agree that
adorn_crosstab
will become the outlier (a) because the nomenclature of "crosstab" may go away and (b) if we can fully replicate its functionality with modular pieces like this. In that case, I do like the common prefix ofadorn_
and could see this asadorn_totals
.I want to merge this in now rather than wait for us to figure out #101, since this closes existing bugs. I'm not sure if it should be
add_totals
when we might later re-route people toadorn_totals
... maybeadorn_totals
now. I think this won't be set in stone until we resolve #101, the big question, and so no matter what we call it now, the name and behavior of the totals function could ultimately change. Say, if the totals function needs to add an attribute to a tabyl to convey info tons_to_percents
down the line.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I renamed it
adorn_totals
.