feature request: pp_check histograms for ordinal regression #73

silberzwiebel · 2017-02-15T12:32:17Z

Hi,

I'm running ordinal regression with 'brms' and would like to produce a plot similar to what Kruschke does in his book:

Running the default pp_check gives me continuous lines which is misleading, as the data are ordinal:

I know there is a histogram style, bu not in overlay mode, making plots rather big and not that easy to compare.
I also tried the new rootogram style, which works well but is not suited for ordinal models, if I got this right, because still the expected values are shown as continuous.

I think Kruschke's style is quite nice to see data and predictions (+ uncertainty) at one glance.

Thanks for your efforts!

jgabry · 2017-02-15T17:42:09Z

Thanks for the feature request! I definitely want to add some plots that are more useful for ordinal models than what we currently have. In the Kruschke plot I assume the histogram is the oberved y and the blue intervals are computed from the posterior predictive simulations, right? It shouldn't be hard to add something like that. Also, I'm curious if the overlaid ecdf plot already implemented in bayesplot would be useful here. What does that plot look like for your model? (you should be able to make that by calling ppc_ecdf_overlay via pp_check). On Wed, Feb 15, 2017 at 7:32 AM silberzwiebel <[email protected]> wrote: Hi, I'm running ordinal regression with 'brms' and would like to produce a plot similar to what Kruschke does in his book: [image: bildschirmfoto 2017-02-15 um 13 22 31] <https://cloud.githubusercontent.com/assets/4533862/22974356/ca638966-f379-11e6-9ca5-cff576b1ac70.png> Running the default pp_check gives me continuous lines which is misleading, as the data are ordinal: [image: bildschirmfoto 2017-02-15 um 13 28 58] <https://cloud.githubusercontent.com/assets/4533862/22974523/97aa0d5a-f37a-11e6-8b70-ffe6600a3c71.png> I know there is a histogram style, bu not in overlay mode, making plots rather big and not that easy to compare. I also tried the new rootogram style, which works well but is not suited for ordinal models, if I got this right, because still the expected values are shown as continuous. [image: bildschirmfoto 2017-02-15 um 13 29 30] <https://cloud.githubusercontent.com/assets/4533862/22974527/9b5ee0ec-f37a-11e6-8767-b9bd0d20e10c.png> I think Kruschke's style is quite nice to see data and predictions (+ uncertainty) at one glance. Thanks for your efforts! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#73>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHb4Q6jiq8jUX8USxLaH_NM5Sj7gYbttks5rcvBRgaJpZM4MBqPB> .

silberzwiebel · 2017-02-16T09:39:38Z

are more useful for ordinal models than what we currently have. In the Kruschke plot I assume the histogram is the oberved y and the blue intervals are computed from the posterior predictive simulations, right? It shouldn't be hard to add something like that.

Yes, right. Here's the corresponding quote from the book: "The top-right subpanel of Figure 23.2 superimposes posterior predictive probabilities of the outcomes. At each outcome value, a dot plots the median posterior predictive probability and a vertical segment indicates the 95% HDI of posterior predictive probabilities."

Also, I'm curious if the overlaid ecdf plot already implemented in bayesplot would be useful here. What does that plot look like for your model? (you should be able to make that by calling ppc_ecdf_overlay via pp_check).

The ecdf_overlay for my model is in the attachment (let's see how github's issue plays with e-mail attachments ...)

silberzwiebel · 2017-02-16T09:44:30Z

e-mail reply does neither work with markdown nor with images, good to know. nevermind, here's the ecdf_overlay picture:

jgabry · 2017-02-17T16:53:23Z

Cool, thanks. We can definitely add a plot like that. Also, I think the ecdf plot does do ok here, although I should look at for some models that don't fit well too. Might need to add more yrep draws to the plot for it to be more clear, but it shows that the model does well predicting the proportion of obs in category <= j. Unlike the regular density plot the cdf plot should be a good option for discrete data as well as continuous data.

…

On Thu, Feb 16, 2017 at 4:39 AM silberzwiebel ***@***.***> wrote: > are more useful for ordinal models than what we currently have. In the > Kruschke plot I assume the histogram is the oberved y and the blue > intervals are computed from the posterior predictive simulations, right? It > shouldn't be hard to add something like that. Yes, right. Here's the corresponding quote from the book: "The top-right subpanel of Figure 23.2 superimposes posterior predictive probabilities of the outcomes. At each outcome value, a dot plots the median posterior predictive probability and a vertical segment indicates the 95% HDI of posterior predictive probabilities." > > Also, I'm curious if the overlaid ecdf plot already implemented in > bayesplot would be useful here. What does that plot look like for your > model? (you should be able to make that by calling ppc_ecdf_overlay via > pp_check). The ecdf_overlay for my model is in the attachment (let's see how github's issue plays with e-mail attachments ...) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#73 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHb4Qyo_Q0-kxyzspUWbOyV0LcyqYgcXks5rdBlcgaJpZM4MBqPB> .

jgabry · 2017-02-19T20:10:20Z

Any suggestions for what to name the function that creates this plot?

silberzwiebel · 2017-02-20T10:10:23Z

hmm, what about ppc_hist_overlay? Am 19.02.17 um 21:10 schrieb Jonah Gabry:

…

Any suggestions for what to name the function that make create this plot? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#73 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEUuZqq60xwQqngpnRNVsMUWTkNw-1CYks5reKGtgaJpZM4MBqPB>.

jgabry · 2017-02-20T21:33:40Z

Hmm, thanks for the suggestion. I like the name but I think it would be more appropriate for a plot that is actually one histogram on top of another histogram (e.g. what the plot from the mcmc_nuts_energy function looks like). And hist suggests that the function should be useful for continuous data too, but in this case the plot is particularly designed for discrete data. So maybe a name like ppc_barplot, or ppc_bars, or ppc_count or something like that?

@paul-buerkner Any suggestions?

paul-buerkner · 2017-02-20T21:36:58Z

I wouldn't use ppc_count since we are not (necessarily) dealing with count data. ppc_bars sounds fine.
Or maybe ppc_categorical, but that doesn't describe the plot well.

jgabry · 2017-02-20T21:53:03Z

Thanks. I think I'll go with ppc_bars for now.

…

On Mon, Feb 20, 2017 at 4:37 PM Paul-Christian Bürkner < ***@***.***> wrote: I wouldn't use ppc_count since we are not (necessarily) dealing with count data. ppc_bars sounds fine. Or maybe ppc_categorical, but that doesn't describe the plot well. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#73 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHb4Q-60PPnpQlYbQdunS6c0OeyITZDCks5regd7gaJpZM4MBqPB> .

silberzwiebel · 2017-02-21T11:32:08Z

It would be super useful for me, if you could also add ppc_bars_grouped comparable to ppc_violin_grouped, i.e., using ppc_bars_grouped with a model that has one condition with two levels gives me two barplots (and not as many rows as in yrep) which can easily be compared.

Thanks for your responsiveness! Looking forward to plot my data and model fits!

jgabry · 2017-02-21T22:09:07Z

Yup I'll definitely add both ppc_bars and ppc_bars_grouped. I'll have them up on a feature branch soon and you can try them out. It'd be great to get your feedback on them before releasing them.

silberzwiebel · 2017-02-27T11:29:20Z

I saw the new branch containing this feature and tested it with one of my models (through brms). It looks great and helps already a lot in inspecting the fit of my models.
I'm not sure to what amount this is still a work-in-progress, so you might have the following on your list anyway.
I've tested the ppc_bars_grouped and added some stuff via additional ggplot2 commands like the following:

p =  pp_check(m, type = "bars_grouped", group = "condition", size=0.8, fatten=1, prob=0.95) + 
		ylab("Number of Ratings") + 
		scale_fill_manual("", values ="lightblue", labels = "emp. data") +
		scale_colour_manual("", values ="black", labels = "model predict.") +

This results in this plot:

Since the y-axes are differing, I've tried to put them on the same scale with p+ylim(0,1000). This does not have the desired effect but gives a plot I'd need to interpret completely different as you can see here:

Maybe ylim is not the correct thing to do here?

A related point that could be used as a workaround would be to use density instead of counts. I tried the freq=F argument but this seems not (yet) to be implemented.

By the way, is there a nicer way to change the labels for the legend than what I've tried above (y -> emp. data, yrep -> model predict.)?

silberzwiebel · 2017-02-27T12:45:55Z

Oh, and I'm just seeing that the x-axis still provides a metric representation of the data (2.5, 5.0, 7.5, instead of 1,2,3,4,5,6,7,8,9).

jgabry · 2017-02-27T19:15:22Z

On Mon, Feb 27, 2017 at 6:29 AM, silberzwiebel ***@***.***> wrote: I'm not sure to what amount this is still a work-in-progress, so you might have the following on your list anyway.

Definitely still a work in progress, but I think the bulk of the work is done. Just need to sort out some small issues like the ones you mention.

I've tested the ppc_bars_grouped and added some stuff via additional ggplot2 commands like the following: p = pp_check(m, type = "bars_grouped", group = "condition", size=0.8, fatten=1, prob=0.95) + ylab("Number of Ratings") + scale_fill_manual("", values ="lightblue", labels = "emp. data") + scale_colour_manual("", values ="black", labels = "model predict.") + This results in this plot: [image: noylim] <https://cloud.githubusercontent.com/assets/4533862/23359584/2e8d334e-fce7-11e6-96df-9566568b8ca5.png> Since the y-axes are differing, I've tried to put them on the same scale with p+ylim(0,1000). This does not have the desired effect but gives a plot I'd need to interpret completely different as you can see here: [image: ylim] <https://cloud.githubusercontent.com/assets/4533862/23359663/8ebd4920-fce7-11e6-8a33-274a70fd2fe0.png> Maybe ylim is not the correct thing to do here?

Another way to ensure the same y-axis limits is to add the argument facet_args = list(scales = "fixed") to your call to ppc_bars_grouped, but I think that will result in something similar to what happens when you do ylim(). So yeah you're right that ppc_bars and ppc_bars_grouped should provide the option of putting density on the y-axis instead of counts. I'll see about adding a 'freq' argument.

By the way, is there a nicer way to change the labels for the legend than what I've tried above (y -> emp. data, yrep -> model predict.)?

Good question. Unfortunately not that I know of, but I agree that would be nice. You *could* edit the ggplot object like this p$scales$scales[[1]]$labels <- "emp. data" p$scales$scales[[2]]$labels <- "model predict." but that's not especially nice and that particular code would only work for this plotting function (for other plotting functions in bayesplot you'd have to check the order of the scales to know which you were editing). It would be really nice if bayesplot could provide a helper function for editing the legends but I don't think that's simple to do given the way legends are created by ggplot2. It's easy to edit the legend aesthetics after creating the plot but not so easy to change the labels. But another option would be to add additional arguments to all of the bayesplot plotting functions so that user's can specify the names to be used in the legend at the same time they create the plot. For example, it could look something like this maybe: my_legend <- list(y = "emp. data", yrep = "model predict.") ppc_bars(y, yrep, ..., legend_labels = my_legend) That way the legends could be created with those labels instead of having to change them after the fact. I'd need to figure out the best way to implement that (it would vary depending on the plotting function) but that should be much easier to do. What do you think about that option? Of course brms, rstanarm, and any other packages providing a pp_check function interfacing with bayesplot could then let you also pass the legend_labels argument to pp_check.

jgabry · 2017-02-27T19:27:45Z

On Mon, Feb 27, 2017 at 7:45 AM, silberzwiebel ***@***.***> wrote: Oh, and I'm just seeing that the x-axis still provides a metric representation of the data (2.5, 5.0, 7.5, instead of 1,2,3,4,5,6,7,8,9).

Yes, good catch. I just pushed a commit to the ppc_bars branch that adds the line scale_x_continuous(breaks = pretty) inside ppc_bars and ppc_bars_grouped. That should force it to display whole numbers on the x-axis, although it won't force it label every valid x value (e.g., it might just do 2, 5, 7 instead of 2.5, 5.0, 7.5, but not 1,2,3,4,5,6,7,8,9). I think that's preferable to defaulting to marking every valid value on the x-axis because it will often be very cluttered. But if you do want it to label every x value then I think it should be easy to override the new default with scale_x_continuous, providing the breaks and limits arguments that you want.

jgabry · 2017-02-27T23:22:27Z

@silberzwiebel I just added the freq argument in 2d63b80. The default is TRUE, but if set to FALSE it should put proportions on the y-axis instead of counts. Does your plot that groups by "condition" look better using freq=FALSE?

silberzwiebel · 2017-02-28T10:55:56Z

It's a pleasure to get these quick responses including implementations, many thanks!

Another way to ensure the same y-axis limits is to add the argument
facet_args = list(scales = "fixed") to your call to ppc_bars_grouped, but I think that will result in something similar to what happens when you do ylim().

This option works nicely (and maybe should be the default? Neighbouring plots with different axes are mostly confusing if not even misleading in my opinion)

I got however the following error (translated from German) before I explicitly loaded the dplyr package:

 error in function_list[[i]](value) : 
  could not find function "mutate_"

Does your plot that groups by "condition" look better using freq=FALSE?

Yes, but if you compare the following two plots I'd again suggest to use facet_args = list(scales = "fixed") as default. The first plot is without specifying this argument, the second is with the argument, both have also freq=FALSE

The x-scale looks good now and I was able to manually change it afterwards via scale_x_continuous(), thanks.

About the labels of the legend:

But another option would be to add additional arguments to all of the bayesplot plotting functions so that user's can specify the names to be used in the legend at the same time they create the plot. For example, it could look something like this maybe: my_legend <- list(y = "emp. data", yrep = "model predict.") ppc_bars(y, yrep, ..., legend_labels = my_legend) That way the legends could be created with those labels instead of having to change them after the fact. I'd need to figure out the best way to implement that (it would vary depending on the plotting function) but that should be much easier to do. What do you think about that option?

I'd like such option very much! This would make it also easy to plot the output from different models and name them model A and model B. I guess, this use-case might be relevant as a visual model comparison?
But maybe there should be another issue for this?

jgabry · 2017-02-28T21:59:00Z

On Tue, Feb 28, 2017 at 5:55 AM, silberzwiebel ***@***.***> wrote: It's a pleasure to get these quick responses including implementations, many thanks! Another way to ensure the same y-axis limits is to add the argument facet_args = list(scales = "fixed") to your call to ppc_bars_grouped, but I think that will result in something similar to what happens when you do ylim(). This option works nicely (and maybe should be the default? Neighbouring plots with different axes are mostly confusing if not even misleading in my opinion)

Yeah, defaulting to fixed scales is probably the best way to go. I agree that it's almost always preferable to have the same axis limits when comparing plots. The bayesplot_grid function that I added in the last release is also useful for that when you're comparing different ggplot objects (as opposed to facets within a single plot object). It lets you pass in a bunch of plot objects and specify a single set of axis limits, and then it lays out the plots in a grid and applies those axis limits to all of them.

I got however the following error (translated from German) before I explicitly loaded the dplyr package: error in function_list[[i]](value) : could not find function "mutate_"

Yeah I needed import mutate_ from dplyr in the NAMESPACE file. Should be fixed.

Does your plot that groups by "condition" look better using freq=FALSE? Yes, but if you compare the following two plots I'd again suggest to use facet_args = list(scales = "fixed") as default.

I made the switch. If you reinstall that should be the default now for ppc_bars_grouped.

About the labels of the legend: But another option would be to add additional arguments to all of the bayesplot plotting functions so that user's can specify the names to be used in the legend at the same time they create the plot. For example, it could look something like this maybe: my_legend <- list(y = "emp. data", yrep = "model predict.") ppc_bars(y, yrep, ..., legend_labels = my_legend) That way the legends could be created with those labels instead of having to change them after the fact. I'd need to figure out the best way to implement that (it would vary depending on the plotting function) but that should be much easier to do. What do you think about that option? I'd like such option very much! This would make it also easy to plot the output from different models and name them model A and model B. I guess, this use-case might be relevant as a visual model comparison? But maybe there should be another issue for this?

Ok cool. I think it's a good idea too and we should definitely open a separate issue. Do you want to open the issue for that so you'll get notified when progress is made (when the issue is closed or if there's discussion)? I think if I open the issue then you won't be notified at all unless you've made bayesplot a repository that you're "watching" (or whatever GitHub calls it).

silberzwiebel · 2017-03-07T12:20:26Z

I found a minor issue and I'm actually not sure if this is related to ppc_bars_groupedor to the ggplot2 theme_bw()that I added afterwards.
In the following plot, at the top right, the interval is not fully displayed but overlapped from the facet title:

jgabry · 2017-03-07T19:27:46Z

In a lot of the bayesplot functions I have a line in the code that tells ggplot2 not to expand the y-axis. By default ggplot2 will add some cushion below zero and above the highest value in order to "ensure that the data is placed some distance away from the axes". But I didn't like that there was extra white space under the bars. What happens if you add

scale_y_continuous(expand = c(0.05, 0))

to your plot? It probably gets rid of the problem at the top but adds some whitespace at the bottom, right? The ideal solution I think would be to remove the whitespace at the bottom without removing it at the top, but that doesn't seem to be possible as far as I can tell.

silberzwiebel · 2017-03-08T09:30:44Z

Yeah, I do not like these extra spaces, too.
Adding your command does exactly, what you thought it would. It fixes the problem at the top but adds withespace between the x-axis and the bars.

Using one of the following three commands, however, worked for me:

scale_y_continuous(limits = c(0.0, 0.4), expand = c(0.0,0.0)) # explicitly turning the expansion off
expand_limits(y = c(0.0, 0.4))
coord_cartesian(ylim = c(0.0, 0.4))

It might be possible to find out the maximal value (0.4 for me) inside the ppc_barsfunction, right?
Not sure whether one of these has any side effects, though. The first gives me a warning:

Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.

The other two remain silent. The plot looks the same for all.

jgabry · 2017-03-08T16:25:28Z

Thanks for following up. It should be possible to detect the max for the y-axis and use expand_limits like you suggest. I think the only case when that won't work is for ppc_bars_grouped with facet_args = list(scales = "free") because then every facet needs a unique adjustment (which doesn't seem possible, at least not obviously possible). But for ppc_bars_grouped with the default of scales="fixed" it should also work fine.

I'll make the change soon.

jgabry · 2017-03-08T18:11:20Z

Just added the extra space in 8afc0b0

jgabry · 2017-03-14T00:36:33Z

@silberzwiebel Merged and will be in the next release. Thanks for your help with this!

jgabry added the feature label Feb 16, 2017

This was referenced Mar 1, 2017

feature request: allow changing the labels of the legends #75

Open

feature request: marginal_effects plot for ordinal regression paul-buerkner/brms#190

Closed

jgabry mentioned this issue Mar 13, 2017

Introduce ppc_bars and ppc_bars_grouped for ordinal/categorical data #78

Merged

jgabry closed this as completed in #78 Mar 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: pp_check histograms for ordinal regression #73

feature request: pp_check histograms for ordinal regression #73

silberzwiebel commented Feb 15, 2017

jgabry commented Feb 15, 2017 via email

silberzwiebel commented Feb 16, 2017 via email

silberzwiebel commented Feb 16, 2017

jgabry commented Feb 17, 2017 via email

jgabry commented Feb 19, 2017 •

edited

Loading

silberzwiebel commented Feb 20, 2017 via email

jgabry commented Feb 20, 2017

paul-buerkner commented Feb 20, 2017

jgabry commented Feb 20, 2017 via email

silberzwiebel commented Feb 21, 2017 •

edited

Loading

jgabry commented Feb 21, 2017

silberzwiebel commented Feb 27, 2017

silberzwiebel commented Feb 27, 2017

jgabry commented Feb 27, 2017 via email

jgabry commented Feb 27, 2017 via email •

edited

Loading

jgabry commented Feb 27, 2017

silberzwiebel commented Feb 28, 2017

jgabry commented Feb 28, 2017 via email •

edited

Loading

silberzwiebel commented Mar 7, 2017

jgabry commented Mar 7, 2017

silberzwiebel commented Mar 8, 2017

jgabry commented Mar 8, 2017

jgabry commented Mar 8, 2017

jgabry commented Mar 14, 2017

feature request: pp_check histograms for ordinal regression #73

feature request: pp_check histograms for ordinal regression #73

Comments

silberzwiebel commented Feb 15, 2017

jgabry commented Feb 15, 2017 via email

silberzwiebel commented Feb 16, 2017 via email

silberzwiebel commented Feb 16, 2017

jgabry commented Feb 17, 2017 via email

jgabry commented Feb 19, 2017 • edited Loading

silberzwiebel commented Feb 20, 2017 via email

jgabry commented Feb 20, 2017

paul-buerkner commented Feb 20, 2017

jgabry commented Feb 20, 2017 via email

silberzwiebel commented Feb 21, 2017 • edited Loading

jgabry commented Feb 21, 2017

silberzwiebel commented Feb 27, 2017

silberzwiebel commented Feb 27, 2017

jgabry commented Feb 27, 2017 via email

jgabry commented Feb 27, 2017 via email • edited Loading

jgabry commented Feb 27, 2017

silberzwiebel commented Feb 28, 2017

jgabry commented Feb 28, 2017 via email • edited Loading

silberzwiebel commented Mar 7, 2017

jgabry commented Mar 7, 2017

silberzwiebel commented Mar 8, 2017

jgabry commented Mar 8, 2017

jgabry commented Mar 8, 2017

jgabry commented Mar 14, 2017

jgabry commented Feb 19, 2017 •

edited

Loading

silberzwiebel commented Feb 21, 2017 •

edited

Loading

jgabry commented Feb 27, 2017 via email •

edited

Loading

jgabry commented Feb 28, 2017 via email •

edited

Loading