Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percentages in 1-dim tabyl are not really percentages despite column name. #460

Closed
phargarten2 opened this issue Oct 5, 2021 · 1 comment

Comments

@phargarten2
Copy link

Thank you for making the tabyl function! It makes viewing tables very easy in R, and I use it all the time in my work.

For one-dimensional tabyls, the numbers in the "percent" column are actually proportions, not percents, unless the table is adorned. For example, looking at the iris data:

library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test
library(magrittr)

 data(iris)
 iris %>% tabyl(Species)
#>     Species  n   percent
#>      setosa 50 0.3333333
#>  versicolor 50 0.3333333
#>   virginica 50 0.3333333

The 0.333 is not a percent but a proportion. I would expect that the column header should read "proportion". Or, if possible, the values should be actual percents (i.e. multipled by 100). A way to do this is applying the adorn_pct_formatting() to 1-dimensional tabyls.

iris %>% tabyl(Species) %>% adorn_pct_formatting()
#>     Species  n percent
#>      setosa 50   33.3%
#>  versicolor 50   33.3%
#>   virginica 50   33.3%

And this is what I would expect the table to look like; here, the column header matches the numbers in the column. But, users may forget to add this additional function, and what is left may confuse the average reader, including someone who isn't a statistician.

Created on 2021-10-05 by the reprex package (v2.0.1)

@sfirke
Copy link
Owner

sfirke commented Oct 5, 2021

Hello, I'm glad janitor is useful for you in your work. You raise a valid point, however I plan to leave this as-is -- any proposed change introduces bigger problems so I think it's best if we all live with this. This has been discussed in #342 and #300 previously. #342 makes a better case for why I'm leaving this as "percent" despite being an inaccurate label.

@sfirke sfirke closed this as completed Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants