-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: pp_check histograms for ordinal regression #73
Comments
Thanks for the feature request! I definitely want to add some plots that
are more useful for ordinal models than what we currently have. In the
Kruschke plot I assume the histogram is the oberved y and the blue
intervals are computed from the posterior predictive simulations, right? It
shouldn't be hard to add something like that.
Also, I'm curious if the overlaid ecdf plot already implemented in
bayesplot would be useful here. What does that plot look like for your
model? (you should be able to make that by calling ppc_ecdf_overlay via
pp_check).
On Wed, Feb 15, 2017 at 7:32 AM silberzwiebel <[email protected]> wrote:
Hi,
I'm running ordinal regression with 'brms' and would like to produce a plot
similar to what Kruschke does in his book:
[image: bildschirmfoto 2017-02-15 um 13 22 31]
<https://cloud.githubusercontent.com/assets/4533862/22974356/ca638966-f379-11e6-9ca5-cff576b1ac70.png>
Running the default pp_check gives me continuous lines which is misleading,
as the data are ordinal:
[image: bildschirmfoto 2017-02-15 um 13 28 58]
<https://cloud.githubusercontent.com/assets/4533862/22974523/97aa0d5a-f37a-11e6-8b70-ffe6600a3c71.png>
I know there is a histogram style, bu not in overlay mode, making plots
rather big and not that easy to compare.
I also tried the new rootogram style, which works well but is not suited
for ordinal models, if I got this right, because still the expected values
are shown as continuous.
[image: bildschirmfoto 2017-02-15 um 13 29 30]
<https://cloud.githubusercontent.com/assets/4533862/22974527/9b5ee0ec-f37a-11e6-8767-b9bd0d20e10c.png>
I think Kruschke's style is quite nice to see data and predictions (+
uncertainty) at one glance.
Thanks for your efforts!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#73>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHb4Q6jiq8jUX8USxLaH_NM5Sj7gYbttks5rcvBRgaJpZM4MBqPB>
.
|
are more useful for ordinal models than what we currently have. In the
Kruschke plot I assume the histogram is the oberved y and the blue
intervals are computed from the posterior predictive simulations, right? It
shouldn't be hard to add something like that.
Yes, right. Here's the corresponding quote from the book:
"The top-right subpanel of Figure 23.2 superimposes posterior predictive
probabilities of the outcomes. At each outcome value, a dot plots the
median posterior predictive probability and a vertical segment indicates
the 95% HDI of posterior predictive
probabilities."
Also, I'm curious if the overlaid ecdf plot already implemented in
bayesplot would be useful here. What does that plot look like for your
model? (you should be able to make that by calling ppc_ecdf_overlay via
pp_check).
The ecdf_overlay for my model is in the attachment (let's see how
github's issue plays with e-mail attachments ...)
|
Cool, thanks. We can definitely add a plot like that.
Also, I think the ecdf plot does do ok here, although I should look at for
some models that don't fit well too. Might need to add more yrep draws to
the plot for it to be more clear, but it shows that the model does well
predicting the proportion of obs in category <= j. Unlike the regular
density plot the cdf plot should be a good option for discrete data as well
as continuous data.
…On Thu, Feb 16, 2017 at 4:39 AM silberzwiebel ***@***.***> wrote:
> are more useful for ordinal models than what we currently have. In the
> Kruschke plot I assume the histogram is the oberved y and the blue
> intervals are computed from the posterior predictive simulations, right?
It
> shouldn't be hard to add something like that.
Yes, right. Here's the corresponding quote from the book:
"The top-right subpanel of Figure 23.2 superimposes posterior predictive
probabilities of the outcomes. At each outcome value, a dot plots the
median posterior predictive probability and a vertical segment indicates
the 95% HDI of posterior predictive
probabilities."
>
> Also, I'm curious if the overlaid ecdf plot already implemented in
> bayesplot would be useful here. What does that plot look like for your
> model? (you should be able to make that by calling ppc_ecdf_overlay via
> pp_check).
The ecdf_overlay for my model is in the attachment (let's see how
github's issue plays with e-mail attachments ...)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHb4Qyo_Q0-kxyzspUWbOyV0LcyqYgcXks5rdBlcgaJpZM4MBqPB>
.
|
Any suggestions for what to name the function that creates this plot? |
hmm, what about ppc_hist_overlay?
Am 19.02.17 um 21:10 schrieb Jonah Gabry:
… Any suggestions for what to name the function that make create this plot?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#73 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/AEUuZqq60xwQqngpnRNVsMUWTkNw-1CYks5reKGtgaJpZM4MBqPB>.
|
Hmm, thanks for the suggestion. I like the name but I think it would be more appropriate for a plot that is actually one histogram on top of another histogram (e.g. what the plot from the @paul-buerkner Any suggestions? |
I wouldn't use |
Thanks. I think I'll go with ppc_bars for now.
…On Mon, Feb 20, 2017 at 4:37 PM Paul-Christian Bürkner < ***@***.***> wrote:
I wouldn't use ppc_count since we are not (necessarily) dealing with
count data. ppc_bars sounds fine.
Or maybe ppc_categorical, but that doesn't describe the plot well.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHb4Q-60PPnpQlYbQdunS6c0OeyITZDCks5regd7gaJpZM4MBqPB>
.
|
It would be super useful for me, if you could also add Thanks for your responsiveness! Looking forward to plot my data and model fits! |
Yup I'll definitely add both |
Oh, and I'm just seeing that the x-axis still provides a metric representation of the data (2.5, 5.0, 7.5, instead of 1,2,3,4,5,6,7,8,9). |
On Mon, Feb 27, 2017 at 6:29 AM, silberzwiebel ***@***.***> wrote:
I'm not sure to what amount this is still a work-in-progress, so you might
have the following on your list anyway.
Definitely still a work in progress, but I think the bulk of the work is
done. Just need to sort out some small issues like the ones you mention.
I've tested the ppc_bars_grouped and added some stuff via additional
ggplot2 commands like the following:
p = pp_check(m, type = "bars_grouped", group = "condition", size=0.8, fatten=1, prob=0.95) +
ylab("Number of Ratings") +
scale_fill_manual("", values ="lightblue", labels = "emp. data") +
scale_colour_manual("", values ="black", labels = "model predict.") +
This results in this plot:
[image: noylim]
<https://cloud.githubusercontent.com/assets/4533862/23359584/2e8d334e-fce7-11e6-96df-9566568b8ca5.png>
Since the y-axes are differing, I've tried to put them on the same scale
with p+ylim(0,1000). This does not have the desired effect but gives a
plot I'd need to interpret completely different as you can see here:
[image: ylim]
<https://cloud.githubusercontent.com/assets/4533862/23359663/8ebd4920-fce7-11e6-8a33-274a70fd2fe0.png>
Maybe ylim is not the correct thing to do here?
Another way to ensure the same y-axis limits is to add the argument
facet_args = list(scales = "fixed")
to your call to ppc_bars_grouped, but I think that will result in something
similar to what happens when you do ylim(). So yeah you're right that
ppc_bars and ppc_bars_grouped should provide the option of putting density
on the y-axis instead of counts. I'll see about adding a 'freq' argument.
By the way, is there a nicer way to change the labels for the legend than
what I've tried above (y -> emp. data, yrep -> model predict.)?
Good question. Unfortunately not that I know of, but I agree that would be
nice. You *could* edit the ggplot object like this
p$scales$scales[[1]]$labels <- "emp. data"
p$scales$scales[[2]]$labels <- "model predict."
but that's not especially nice and that particular code would only work for
this plotting function (for other plotting functions in bayesplot you'd
have to check the order of the scales to know which you were editing). It
would be really nice if bayesplot could provide a helper function for
editing the legends but I don't think that's simple to do given the way
legends are created by ggplot2. It's easy to edit the legend aesthetics
after creating the plot but not so easy to change the labels.
But another option would be to add additional arguments to all of the
bayesplot plotting functions so that user's can specify the names to be
used in the legend at the same time they create the plot. For example, it
could look something like this maybe:
my_legend <- list(y = "emp. data", yrep = "model predict.")
ppc_bars(y, yrep, ..., legend_labels = my_legend)
That way the legends could be created with those labels instead of having
to change them after the fact. I'd need to figure out the best way to
implement that (it would vary depending on the plotting function) but that
should be much easier to do. What do you think about that option? Of course
brms, rstanarm, and any other packages providing a pp_check function
interfacing with bayesplot could then let you also pass the legend_labels
argument to pp_check.
|
On Mon, Feb 27, 2017 at 7:45 AM, silberzwiebel ***@***.***> wrote:
Oh, and I'm just seeing that the x-axis still provides a metric
representation of the data (2.5, 5.0, 7.5, instead of 1,2,3,4,5,6,7,8,9).
Yes, good catch. I just pushed a commit to the ppc_bars branch that adds
the line
scale_x_continuous(breaks = pretty)
inside ppc_bars and ppc_bars_grouped. That should force it to display whole
numbers on the x-axis, although it won't force it label every valid x value
(e.g., it might just do 2, 5, 7 instead of 2.5, 5.0, 7.5, but not
1,2,3,4,5,6,7,8,9). I think that's preferable to defaulting to marking
every valid value on the x-axis because it will often be very cluttered.
But if you do want it to label every x value then I think it should be easy
to override the new default with scale_x_continuous, providing the breaks
and limits arguments that you want.
|
@silberzwiebel I just added the |
On Tue, Feb 28, 2017 at 5:55 AM, silberzwiebel ***@***.***> wrote:
It's a pleasure to get these quick responses including implementations,
many thanks!
Another way to ensure the same y-axis limits is to add the argument
facet_args = list(scales = "fixed") to your call to ppc_bars_grouped, but
I think that will result in something similar to what happens when you do
ylim().
This option works nicely (and maybe should be the default? Neighbouring
plots with different axes are mostly confusing if not even misleading in my
opinion)
Yeah, defaulting to fixed scales is probably the best way to go. I agree
that it's almost always preferable to have the same axis limits when
comparing plots. The bayesplot_grid function that I added in the last
release is also useful for that when you're comparing different ggplot
objects (as opposed to facets within a single plot object). It lets you
pass in a bunch of plot objects and specify a single set of axis limits,
and then it lays out the plots in a grid and applies those axis limits to
all of them.
I got however the following error (translated from German) before I
explicitly loaded the dplyr package:
error in function_list[[i]](value) :
could not find function "mutate_"
Yeah I needed import mutate_ from dplyr in the NAMESPACE file. Should be
fixed.
Does your plot that groups by "condition" look better using freq=FALSE?
Yes, but if you compare the following two plots I'd again suggest to use facet_args
= list(scales = "fixed") as default.
I made the switch. If you reinstall that should be the default now for
ppc_bars_grouped.
About the labels of the legend:
But another option would be to add additional arguments to all of the
bayesplot plotting functions so that user's can specify the names to be
used in the legend at the same time they create the plot. For example, it
could look something like this maybe: my_legend <- list(y = "emp. data",
yrep = "model predict.") ppc_bars(y, yrep, ..., legend_labels = my_legend)
That way the legends could be created with those labels instead of having
to change them after the fact. I'd need to figure out the best way to
implement that (it would vary depending on the plotting function) but that
should be much easier to do. What do you think about that option?
I'd like such option very much! This would make it also easy to plot the
output from different models and name them model A and model B. I guess,
this use-case might be relevant as a visual model comparison?
But maybe there should be another issue for this?
Ok cool. I think it's a good idea too and we should definitely open a
separate issue. Do you want to open the issue for that so you'll get
notified when progress is made (when the issue is closed or if there's
discussion)? I think if I open the issue then you won't be notified at all
unless you've made bayesplot a repository that you're "watching" (or
whatever GitHub calls it).
|
In a lot of the bayesplot functions I have a line in the code that tells ggplot2 not to expand the y-axis. By default ggplot2 will add some cushion below zero and above the highest value in order to "ensure that the data is placed some distance away from the axes". But I didn't like that there was extra white space under the bars. What happens if you add scale_y_continuous(expand = c(0.05, 0)) to your plot? It probably gets rid of the problem at the top but adds some whitespace at the bottom, right? The ideal solution I think would be to remove the whitespace at the bottom without removing it at the top, but that doesn't seem to be possible as far as I can tell. |
Yeah, I do not like these extra spaces, too. Using one of the following three commands, however, worked for me:
It might be possible to find out the maximal value (0.4 for me) inside the
The other two remain silent. The plot looks the same for all. |
Thanks for following up. It should be possible to detect the max for the y-axis and use I'll make the change soon. |
Just added the extra space in 8afc0b0 |
@silberzwiebel Merged and will be in the next release. Thanks for your help with this! |
Hi,
I'm running ordinal regression with 'brms' and would like to produce a plot similar to what Kruschke does in his book:
Running the default pp_check gives me continuous lines which is misleading, as the data are ordinal:
I know there is a histogram style, bu not in overlay mode, making plots rather big and not that easy to compare.
I also tried the new rootogram style, which works well but is not suited for ordinal models, if I got this right, because still the expected values are shown as continuous.
I think Kruschke's style is quite nice to see data and predictions (+ uncertainty) at one glance.
Thanks for your efforts!
The text was updated successfully, but these errors were encountered: