Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Violin quantiles are based on observations #5912

Merged
merged 16 commits into from
Jan 27, 2025

Conversation

teunbrand
Copy link
Collaborator

This PR aims to fix #4120.

Briefly, instead of GeomViolin estimating quantiles based on the density, StatYdensity includes quantile as a computed variable. GeomViolin is then able to draw the quantiles based on that computed variable.

I consider this a breaking change as StatYdensity how 'owns' the draw_quantiles parameter, rather than GeomViolin. For users, this should only matter when they are breaking the link between geom_violin() and stat_ydensity(). It is also a visual change, as the quantiles are now correct.

Reprex from the issue, notice how the red quantiles of the violin now aligns with the boxplot quantiles.

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2
set.seed(5)

types <- list(
  rnorm(n = 20, mean = 5),
  c(10, 9, 9, 9, 9, 7, 7, 6, 5, 4, 1),
  c(rnorm(n = 10, mean = 2.5), rnorm(n = 10, mean = 7.5, sd = 0.5))
)

df = data.frame(
  type = rep(paste0("Type ", seq_along(types)), lengths(types)),
  val  = unlist(types)
)

ggplot(df, aes(x = type, y = val)) + theme_classic() + 
  geom_boxplot(alpha = 0.5) + 
  geom_violin(scale = "area", alpha = 0.5, draw_quantiles = c(0.25, 0.5, 0.75), 
              colour = "red") +
  geom_dotplot(binaxis = "y", stackdir = "center", alpha = 0.3, dotsize = 0.4)
#> Bin width defaults to 1/30 of the range of the data. Pick better value with
#> `binwidth`.

Created on 2024-05-28 with reprex v2.1.0

@teunbrand teunbrand added the breaking change ☠️ API change likely to affect existing code label May 28, 2024
@thomasp85
Copy link
Member

I'm considering whether draw_quantiles should still be a parameter of the geom. It seems wrong that drawing is controlled by a stat.

Would it make sense to have the stat always compute the quantile, and the geom drawing it depending on the value of draw_quantile?

@teunbrand
Copy link
Collaborator Author

teunbrand commented Jul 12, 2024

In principle I agree with you, though there are some practicalities that may get in the way.
The issue is that draw_quantiles currently is not a boolean and carries the numeric values for which quantiles the stat should know about, but the geom doesn't need to know about.
Another issue is what the default quantiles that are computed should be, but it'd probably should be c(0.25, 0.5, 0.75).

Copy link
Member

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not like how the stat flows into the how the geom behaves. Is it not possible to gracefully deprecate the old behaviour.

So, geom_violin() will check the value of draw_quantile. If it is not a boolean it will emit a deprecation warning and forward the value to quantiles in the stat (which does what draw_quantile does now)

@teunbrand
Copy link
Collaborator Author

We should discuss if we should give this the #5423 treatment. So instead of draw_quantile = TRUE/FALSE we use quantiles.linetype = 0 and add quantiles.colour = NULL, quantiles.linewidth = NULL.

@@ -29,7 +29,7 @@
stat_ydensity <- function(mapping = NULL, data = NULL,
geom = "violin", position = "dodge",
...,
draw_quantiles = NULL,
quantiles = c(0.25, 0.50, 0.75),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to properly deprecate draw_quantiles here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because stat_ydensity(draw_quantiles) never was a formal argument, I've deprecated in the parameter setup rather than constructor.

@teunbrand teunbrand merged commit 123b26e into tidyverse:main Jan 27, 2025
13 checks passed
@teunbrand teunbrand deleted the violin_data_quantiles branch January 27, 2025 12:19
teunbrand added a commit to teunbrand/ggplot2 that referenced this pull request Jan 27, 2025
teunbrand added a commit that referenced this pull request Jan 27, 2025
* bump NA to back of key

* add test

* add news bullet

* Fix interaction between #5912 and #6072
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ☠️ API change likely to affect existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: quantiles based on observations instead of density estimates with geom_violin
2 participants