adorn_crosstab's helper functions always skip the first column #57

sfirke · 2016-08-28T01:57:43Z

add_totals_row, add_totals_col, and ns_to_percents all have it hard-coded that the first column is a character vector to be ignored. Fine when they were non-exported helper functions for adorn_crosstab, but if exported, this should be more flexible. There could be multiple non-numeric columns, in any order, or none.

Default to skipping an arbitrarily-wide number of initial non-numeric columns? Or detect non-numeric columns, and exclude (for summing) or leave as is (for in-place conversion to percentages)?

The text was updated successfully, but these errors were encountered:

sfirke · 2017-01-31T04:27:46Z

I guess the latter? Here's a good use case for this:
http://stackoverflow.com/questions/41949529/frequency-table-and-group-by-multiple-variables-in-r

per #57

sfirke · 2017-02-03T02:50:42Z

I kind of went with a hybrid. add_totals_col is simple, you just skip the non-numeric columns.

For the other two, add_totals_row and ns_to_percents, the first column always gets left alone. The result of say mtcars %>% crosstab(cyl, carb) will have a numeric first column. And I could make crosstab results have a character-class first column, but many users will be coming to this from dplyr::count + tidyr::spread and would still be in that pickle of their numeric 1st column getting summed when in practice it's not truly a numeric value anymore. And I don't like coercing the 1st column of the input data.frame to a character, since then the result can't be filtered for say am > 0.

So, first column always gets skipped and the word "Total" placed at the bottom.

But other non-numeric columns can occur in any position, mixed in among numeric vectors. This could be nice, say if a data source has a right-most text column but you still want a totals row.

Oh and if there are those additional non-numeric columns, and you call add_totals_row, there's a fill argument to specify the character string that goes in those bottom row cells.

@rgknight what do you think about that approach for the three functions, in particular add_totals_row and ns_to_percents which are more complicated? You can check out the "improve_adorn_helpers" branch, it is working. Try:

dat <- structure(list(a = c("hi", "lo"), b = c(1, 2), c = c(5, 10), 
    d = c("big", "small"), e = c(20, NA)), .Names = c("a", "b", 
"c", "d", "e"), row.names = c(NA, -2L), class = "data.frame")

to play around with.

Then if you are good with it, I'll write tests and merge it in pending Travis/codecov. But I didn't want to write tests yet in case you had other thoughts re: how these should work.

rgknight · 2017-02-06T20:12:26Z

For version 0.X:

I think ns_to_percents may not be working yet. The approach to add looks good.

For version 1.0:

I think there are two potential approaches that will work well here. One is to make everything an option instead of a helper function, and the other is to create a short prefix for all tabyl modifiers so that you know they are intended to modify a tabyl (assuming that crosstab and tabyl are eventually merged.

These would both mean moving away from making these fully-featured export functions. I think you will get into trouble if you start trying to support use cases for these functions other than modifying a tabyl. I keep going back to Hadley's idea that tidyr does less than reshape2 -- I think you will be more successful if you focus on making tabyl really good pivot table alternative.

If these were options, I would use:

tabyl(df, rows, cols, split, values = c("N", "row_percent", "col_percent"), totals = c("row", "col"))

Then row_percent and col_percent would be like adding ns_to_percents with the respective denom, and totals = "row" would give you the same thing as adding %>% adorn_total_row.

(Random thought: maybe we eliminate the split and just split on grouped vars when df is a grouped_df. Might be a bad idea though).

The second approach would be to make these all start with something like adorn_ or fmt_ (for format) so that users know that all of these things are intended to modify a tabyl. Then you would do

mtcars %>%
  tabyl(am, cycle) %>%
  adorn_values(row_percent) %>%
  adorn_totals(row)

Or something like that, which would let us add multiple values under adorn_values in the future. Looking at both, I'm leaning toward the adorn_ approach.

sfirke · 2017-03-17T20:28:57Z

Yeah I like the adorn_ approach better. I've had people using the add_totals_ functions with non-tabyl outputs, which would be lost if we made it an option to tabyl. Also a tabyl in 1.0 that subsumes crosstab will already be a more complex function, and starts to exceed the tidyverse principle of bite-sized simple functions.

I never loved "adorn" but at least that word is short and isn't used widely - and I couldn't think of a better one.

Well maybe I should take this to the 1.0 version on the "improve_adorn_helpers" branch, make both of the add_totals functions into a single function adorn_totals or adorn_add_totals (?) that takes "col" "row" or "both". I think ns_to_percents() isn't as simple a case so I'll just improve it slightly as stated above, for now.

sfirke · 2017-03-17T20:31:21Z

For consistency, should remove_empty_rows and remove_empty_cols be a single function too? remove_empty("cols") and it defaults to "both"? Dunno, "remove_empty()" isn't fully self-descriptive. remove_empty_lines()? Line is kind of synonymous with "row" though.

sfirke · 2017-03-17T20:32:54Z

@rgknight I'm grateful for your vision for 1.0 and a cleaner API, at the expense of some small helper functions or trying to do everything. The more I think about it, the more I am convinced it's the way to go.

rgknight · 2017-03-17T20:59:35Z

:-) Very glad that you have found this helpful. Huge props to you for being open to the feedback. It's not at all easy to open up your vision in this way!

I really like remove_empty() with it defaulting to both rows and columns, similar to how distinct() works now in dplyr.

See #57

sfirke · 2017-03-19T03:10:11Z

Whew I got the two add_totals functions merged into a single function and passing tests. The remaining things to do:

Decide what to call it. If adorn_totals, it clashes a little semantically with adorn_crosstab because in the latter, it's the crosstab being adorned.
Deprecate the old functions, add_totals_row_ and _col
Fill in a few unit tests that have long been missing, like whether na.rm flag works. Skeletons already exist for them.

@rgknight thoughts on keeping it as add_totals vs. switching to adorn_totals? You can add totals already through adorn_crosstab so I don't know how helpful it is to use the same word to draw associations between the two.

sfirke · 2017-03-22T02:45:02Z

I'll just go with add_totals for now.

sfirke added a commit that referenced this issue Feb 2, 2017

add_totals functions now work w/ non-numeric cols

3fbd673

per #57

sfirke self-assigned this Feb 3, 2017

sfirke added the enhancement label Feb 3, 2017

sfirke mentioned this issue Mar 17, 2017

add_totals_row() fails on grouped data_frame #97

Closed

sfirke added a commit that referenced this issue Mar 19, 2017

eliminate check of non-numeric 1st col for add_totals functions

4e8950e

See #57

This was referenced Mar 22, 2017

bring add_totals functions up to v1.0 as add_totals #99

Merged

Combine remove_empty functions, like how add_totals were combined #100

Closed

sfirke closed this as completed in #99 Mar 30, 2017

sydanny mentioned this issue Apr 16, 2019

adorn_total("col") still skips first column even when numeric #285

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adorn_crosstab's helper functions always skip the first column #57

adorn_crosstab's helper functions always skip the first column #57

sfirke commented Aug 28, 2016

sfirke commented Jan 31, 2017

sfirke commented Feb 3, 2017 •

edited

Loading

rgknight commented Feb 6, 2017

sfirke commented Mar 17, 2017

sfirke commented Mar 17, 2017

sfirke commented Mar 17, 2017

rgknight commented Mar 17, 2017

sfirke commented Mar 19, 2017 •

edited

Loading

sfirke commented Mar 22, 2017

adorn_crosstab's helper functions always skip the first column #57

adorn_crosstab's helper functions always skip the first column #57

Comments

sfirke commented Aug 28, 2016

sfirke commented Jan 31, 2017

sfirke commented Feb 3, 2017 • edited Loading

rgknight commented Feb 6, 2017

sfirke commented Mar 17, 2017

sfirke commented Mar 17, 2017

sfirke commented Mar 17, 2017

rgknight commented Mar 17, 2017

sfirke commented Mar 19, 2017 • edited Loading

sfirke commented Mar 22, 2017

sfirke commented Feb 3, 2017 •

edited

Loading

sfirke commented Mar 19, 2017 •

edited

Loading