Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error : Computation failed in stat_bin(): binwidth must be positive #3043

Closed
rajkstats opened this issue Dec 20, 2018 · 14 comments · Fixed by #3047
Closed

Error : Computation failed in stat_bin(): binwidth must be positive #3043

rajkstats opened this issue Dec 20, 2018 · 14 comments · Fixed by #3047

Comments

@rajkstats
Copy link

This might be related to a previous issue:
#2312

I am using the function below to create multiple histograms in one plot. It seems that features with zero variance / one value are not getting plotted. It was working with previous version of ggplot2. Please help me in fixing this.

plot_hist_facet <- function(data, bins = 10, ncol = 5,
fct_reorder = FALSE, fct_rev = FALSE,
fill = palette_light()[[3]],
color = "white", scale = "free") {

data_factored <- data %>%
mutate_if(is.character, as.factor) %>%
mutate_if(is.factor, as.numeric) %>%
gather(key = key, value = value, factor_key = TRUE)

if(fct_reorder) {
data_factored <- data_factored %>%
mutate(key = as.character(key) %>% as.factor())
}

if(fct_rev){
data_factored <- data_factored %>%
mutate(key = fct_rev(key))
}

g <- data_factored %>%
ggplot(aes(x = value, group = key )) +
geom_histogram(bins = bins, fill = fill , color = color,
) +
facet_wrap(~ key, ncol = ncol, scale = scale) +
theme_tq()

return(g)

}

Getting Computation failed in stat_bin(): binwidth must be positive error
snip20181221_20

@jennybc
Copy link
Member

jennybc commented Dec 21, 2018

This looks like an issue where you're using reprex (this repo, which helps with reprex mechanics) to pose a question about ggplot2, yes? In that case, I recommend you close this issue and open same over on https://github.com/tidyverse/ggplot2/issues

@batpigandme batpigandme transferred this issue from tidyverse/reprex Dec 21, 2018
@batpigandme

This comment has been minimized.

@clauswilke
Copy link
Member

Please reduce your code example to the absolute minimum necessary to produce the issue and then make it reproducible by running it through the reprex package. These articles may help:

  1. https://reprex.tidyverse.org/articles/articles/learn-reprex.html
  2. https://reprex.tidyverse.org/articles/articles/magic-reprex.html

@rajkstats
Copy link
Author

Thanks @batpigandme for moving the issue to ggplot2. Thanks @jennybc & @clauswilke for pointing to reprex. This is a really nice way to have a conversation.

Hope the snippet below helps, I am trying to create multiple histograms in one plot. It seems that features with zero variance / one value are not getting plotted like Over18, Standard Hours. Please help me in resolving this.

library(readxl)
library(httr)
library(tidyverse)
library(tidyquant)
#> Loading required package: lubridate
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
#> Loading required package: PerformanceAnalytics
#> Loading required package: xts
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> 
#> Attaching package: 'xts'
#> The following objects are masked from 'package:dplyr':
#> 
#>     first, last
#> 
#> Attaching package: 'PerformanceAnalytics'
#> The following object is masked from 'package:graphics':
#> 
#>     legend
#> Loading required package: quantmod
#> Loading required package: TTR
#> Version 0.4-0 included new data defaults. See ?getSymbols.

url <- "https://community.watsonanalytics.com/wp-content/uploads/2016/06/HR-Employee-Attrition-data.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
#> Response [https://community.watsonanalytics.com/wp-content/uploads/2016/06/HR-Employee-Attrition-data.xlsx]
#>   Date: 2018-12-21 19:19
#>   Status: 200
#>   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
#>   Size: 266 kB
#> <ON DISK>  /var/folders/gl/wb7rqfgd3v708q5m0ydj3pkc0000gq/T//RtmpYF44cr/filef2d148f65c0.xlsx
data<- read_excel(tf)

bins = 10 
ncol = 5
fct_reorder = FALSE
fct_rev = FALSE
fill = palette_light()[[3]]
color = "white" 
scale = "free"

data_factored <- data %>%
  mutate_if(is.character, as.factor) %>%
  mutate_if(is.factor, as.numeric) %>%
  gather(key = key, value = value, factor_key = TRUE)

if(fct_reorder) {
  data_factored <- data_factored %>%
    mutate(key = as.character(key) %>% as.factor())
}

if(fct_rev){
  data_factored <- data_factored %>%
    mutate(key = fct_rev(key))
}

g <- data_factored %>%
  ggplot(aes(x = value, group = key )) +
  geom_histogram(bins = bins, fill = fill , color = color,
  ) +
  facet_wrap(~ key, ncol = ncol, scale = scale) +
  theme_tq()
g
#> Warning: Computation failed in `stat_bin()`:
#> `binwidth` must be positive

#> Warning: Computation failed in `stat_bin()`:
#> `binwidth` must be positive

#> Warning: Computation failed in `stat_bin()`:
#> `binwidth` must be positive

Created on 2018-12-22 by the reprex package (v0.2.1)

@ptoche
Copy link

ptoche commented Dec 21, 2018

This does not look like a "minimal example"... Can you identify which of these many plots is associated with an error? (retry plot by plot) From the error message, are you attempting to set too many bins (10) for too little data?

@rajkstats
Copy link
Author

rajkstats commented Dec 21, 2018

Hey @ptoche , thanks for your response. Hope the following example helps. I have tried plot by plot and still getting the similar warning. Following example sets bins = 5 which produces the similar error.
Featutres like Employee Count, Over 18, Standard Hours are associated with error / warning.

library(readxl)
library(httr)
library(tidyverse)
library(tidyquant)
#> Loading required package: lubridate
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
#> Loading required package: PerformanceAnalytics
#> Loading required package: xts
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> 
#> Attaching package: 'xts'
#> The following objects are masked from 'package:dplyr':
#> 
#>     first, last
#> 
#> Attaching package: 'PerformanceAnalytics'
#> The following object is masked from 'package:graphics':
#> 
#>     legend
#> Loading required package: quantmod
#> Loading required package: TTR
#> Version 0.4-0 included new data defaults. See ?getSymbols.

url <- "https://community.watsonanalytics.com/wp-content/uploads/2016/06/HR-Employee-Attrition-data.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
#> Response [https://community.watsonanalytics.com/wp-content/uploads/2016/06/HR-Employee-Attrition-data.xlsx]
#>   Date: 2018-12-21 20:22
#>   Status: 200
#>   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
#>   Size: 266 kB
#> <ON DISK>  /var/folders/gl/wb7rqfgd3v708q5m0ydj3pkc0000gq/T//RtmpfHG0YO/filef3692827f789.xlsx
data<- read_excel(tf)

bins = 5
ncol = 1
fill = palette_light()[[3]]
color = "white" 
scale = "free"

zeroVar <- function(data, useNA = 'ifany') {
  out <- apply(data, 2, function(x) {length(table(x, useNA = useNA))})
  which(out==1)
}

# Following are the features which produces error
zeroVar(data)
#> Employee Count        Over 18 Standard Hours 
#>              1              8             23

data_factored <- data %>%
  select('Employee Count', 'Over 18', 'Standard Hours')%>%
  mutate_if(is.character, as.factor) %>%
  mutate_if(is.factor, as.numeric) %>%
  gather(key = key, value = value, factor_key = TRUE)

data_factored %>%
  ggplot(aes(x = value, group = key )) +
  geom_histogram(bins = bins, fill = fill , color = color,
  ) +
  facet_wrap(~ key, ncol = ncol, scale = scale) +
  theme_tq()
#> Warning: Computation failed in `stat_bin()`:
#> `binwidth` must be positive

#> Warning: Computation failed in `stat_bin()`:
#> `binwidth` must be positive

#> Warning: Computation failed in `stat_bin()`:
#> `binwidth` must be positive

Created on 2018-12-22 by the reprex package (v0.2.1)

@ptoche
Copy link

ptoche commented Dec 23, 2018

Can you cut that down to one plot, dput the minimal data (only the variables used), and reprex it?

@rajkstats
Copy link
Author

Thanks @ptoche for following up. Hope following is what you are looking to reproduce

library(readxl)
library(httr)
library(tidyverse)
library(tidyquant)
#> Loading required package: lubridate
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
#> Loading required package: PerformanceAnalytics
#> Loading required package: xts
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> 
#> Attaching package: 'xts'
#> The following objects are masked from 'package:dplyr':
#> 
#>     first, last
#> 
#> Attaching package: 'PerformanceAnalytics'
#> The following object is masked from 'package:graphics':
#> 
#>     legend
#> Loading required package: quantmod
#> Loading required package: TTR
#> Version 0.4-0 included new data defaults. See ?getSymbols.

url <- "https://community.watsonanalytics.com/wp-content/uploads/2016/06/HR-Employee-Attrition-data.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
#> Response [https://community.watsonanalytics.com/wp-content/uploads/2016/06/HR-Employee-Attrition-data.xlsx]
#>   Date: 2018-12-24 21:07
#>   Status: 200
#>   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
#>   Size: 266 kB
#> <ON DISK>  /var/folders/gl/wb7rqfgd3v708q5m0ydj3pkc0000gq/T//RtmpA1oFEE/file108fe1d346270.xlsx
data<- read_excel(tf)
data <- data %>%
  select('Over 18')

bins = 5
ncol = 1
fct_reorder = FALSE
fct_rev = FALSE
fill = palette_light()[[3]]
color = "white" 
scale = "free"

data_factored <- data %>%
  mutate_if(is.character, as.factor) %>%
  mutate_if(is.factor, as.numeric) %>%
  gather(key = key, value = value, factor_key = TRUE)

dput(head(data_factored,10))
#> structure(list(key = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
#> 1L, 1L, 1L), .Label = "Over 18", class = "factor"), value = c(1, 
#> 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df", 
#> "tbl", "data.frame"))

data_factored %>%
  ggplot(aes(x = value, group = key )) +
  geom_histogram(bins = bins, fill = fill , color = color,
  ) +
  facet_wrap(~ key, ncol = ncol, scale = scale) +
  theme_tq()
#> Warning: Computation failed in `stat_bin()`:
#> `binwidth` must be positive

Created on 2018-12-25 by the reprex package (v0.2.1)

@yutannihilation
Copy link
Member

You can use the result of dput() as the minimal data, so that we can shortcut to reproduce your code easily.

data_factored <- structure(list(key = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Over 18", class = "factor"), value = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

data_factored %>%
  ggplot(aes(x = value, group = key )) +
  geom_histogram(bins = bins, fill = fill , color = color) +
  facet_wrap(~ key, ncol = ncol, scale = scale) +
  theme_tq()

I guess the code below is the minimal version of the problem; if there's only one value, stat_bin() cannot calculate the binwidth.

library(ggplot2)

d <- data.frame(x = rep(1, 100))
ggplot(d, aes(x = x)) +
  geom_histogram()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning: Computation failed in `stat_bin()`:
#> `binwidth` must be positive

Created on 2018-12-25 by the reprex package (v0.2.1)

This is because bin_breaks_bins() fails if the upper and the lower of the range are the same. Maybe it should return a fixed value in such cases? At least, I want to make error message friendlier.

ggplot2/R/bin.R

Line 104 in 7f13dfa

width <- (x_range[2] - x_range[1]) / (bins - 1)

@ptoche
Copy link

ptoche commented Dec 25, 2018

@yutannihilation : this is pretty much the problem we suspected from the start. I don't think the error message is that bad... Having said that, a possible improvement would be to set bins to 1 if there isn't enough data to compute a width, e.g. along the lines of if (length(unique(na.omit(data[, "x"]))) == 1) bins = 1, so that a single bin would be used in such degenerate cases.

@yutannihilation
Copy link
Member

Thanks, but I think geom_histogram(bins = 1) still gives you error since bin_breaks_bins() cannot calculate the binwidth for a 0-width range. So, I think what we should provide specially is width in this case.

@ptoche
Copy link

ptoche commented Dec 26, 2018

@yutannihilation, Indeed you're right, geom_histogram would produce an error in this case. But I don't think that's the correct behaviour either. I think a histogram for a single category ought to be... a simple barchart. Basically geom_histogram() should default to geom_bar().

From a basic definition of a histogram (wikipedia, I'm afraid): "To construct a histogram, divide the entire range of values into a series of intervals and then count how many values fall into each interval." In the case you have highlighted, data.frame(x = rep(1, 100)), my instinct would be to draw a single bin of some arbitrary width. What do you think?

What I mean is that I think the histogram should look like this:

library(ggplot2)
d <- data.frame(x = rep(1, 100))
ggplot(d, aes(x = x)) +
    geom_bar()

Created on 2018-12-26 by the reprex package (v0.2.1)

Does that make sense?

@yutannihilation
Copy link
Member

I basically agree with you in that this is a problem, I just disagreed with this part:

a possible improvement would be to set bins to 1

@lock
Copy link

lock bot commented Jul 25, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants