Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: pp_check histograms for ordinal regression #73

Closed
silberzwiebel opened this issue Feb 15, 2017 · 24 comments
Closed

feature request: pp_check histograms for ordinal regression #73

silberzwiebel opened this issue Feb 15, 2017 · 24 comments
Labels

Comments

@silberzwiebel
Copy link
Contributor

Hi,

I'm running ordinal regression with 'brms' and would like to produce a plot similar to what Kruschke does in his book:
bildschirmfoto 2017-02-15 um 13 22 31

Running the default pp_check gives me continuous lines which is misleading, as the data are ordinal:
bildschirmfoto 2017-02-15 um 13 28 58

I know there is a histogram style, bu not in overlay mode, making plots rather big and not that easy to compare.
I also tried the new rootogram style, which works well but is not suited for ordinal models, if I got this right, because still the expected values are shown as continuous.
bildschirmfoto 2017-02-15 um 13 29 30

I think Kruschke's style is quite nice to see data and predictions (+ uncertainty) at one glance.

Thanks for your efforts!

@jgabry
Copy link
Member

jgabry commented Feb 15, 2017 via email

@jgabry jgabry added the feature label Feb 16, 2017
@silberzwiebel
Copy link
Contributor Author

silberzwiebel commented Feb 16, 2017 via email

@silberzwiebel
Copy link
Contributor Author

e-mail reply does neither work with markdown nor with images, good to know. nevermind, here's the ecdf_overlay picture:

ecdf

@jgabry
Copy link
Member

jgabry commented Feb 17, 2017 via email

@jgabry
Copy link
Member

jgabry commented Feb 19, 2017

Any suggestions for what to name the function that creates this plot?

@silberzwiebel
Copy link
Contributor Author

silberzwiebel commented Feb 20, 2017 via email

@jgabry
Copy link
Member

jgabry commented Feb 20, 2017

Hmm, thanks for the suggestion. I like the name but I think it would be more appropriate for a plot that is actually one histogram on top of another histogram (e.g. what the plot from the mcmc_nuts_energy function looks like). And hist suggests that the function should be useful for continuous data too, but in this case the plot is particularly designed for discrete data. So maybe a name like ppc_barplot, or ppc_bars, or ppc_count or something like that?

@paul-buerkner Any suggestions?

@paul-buerkner
Copy link
Collaborator

I wouldn't use ppc_count since we are not (necessarily) dealing with count data. ppc_bars sounds fine.
Or maybe ppc_categorical, but that doesn't describe the plot well.

@jgabry
Copy link
Member

jgabry commented Feb 20, 2017 via email

@silberzwiebel
Copy link
Contributor Author

silberzwiebel commented Feb 21, 2017

It would be super useful for me, if you could also add ppc_bars_grouped comparable to ppc_violin_grouped, i.e., using ppc_bars_grouped with a model that has one condition with two levels gives me two barplots (and not as many rows as in yrep) which can easily be compared.

Thanks for your responsiveness! Looking forward to plot my data and model fits!

@jgabry
Copy link
Member

jgabry commented Feb 21, 2017

Yup I'll definitely add both ppc_bars and ppc_bars_grouped. I'll have them up on a feature branch soon and you can try them out. It'd be great to get your feedback on them before releasing them.

@silberzwiebel
Copy link
Contributor Author

I saw the new branch containing this feature and tested it with one of my models (through brms). It looks great and helps already a lot in inspecting the fit of my models.
I'm not sure to what amount this is still a work-in-progress, so you might have the following on your list anyway.
I've tested the ppc_bars_grouped and added some stuff via additional ggplot2 commands like the following:

p =  pp_check(m, type = "bars_grouped", group = "condition", size=0.8, fatten=1, prob=0.95) + 
		ylab("Number of Ratings") + 
		scale_fill_manual("", values ="lightblue", labels = "emp. data") +
		scale_colour_manual("", values ="black", labels = "model predict.") + 

This results in this plot:
noylim

Since the y-axes are differing, I've tried to put them on the same scale with p+ylim(0,1000). This does not have the desired effect but gives a plot I'd need to interpret completely different as you can see here:
ylim

Maybe ylim is not the correct thing to do here?

A related point that could be used as a workaround would be to use density instead of counts. I tried the freq=F argument but this seems not (yet) to be implemented.

By the way, is there a nicer way to change the labels for the legend than what I've tried above (y -> emp. data, yrep -> model predict.)?

@silberzwiebel
Copy link
Contributor Author

Oh, and I'm just seeing that the x-axis still provides a metric representation of the data (2.5, 5.0, 7.5, instead of 1,2,3,4,5,6,7,8,9).

@jgabry
Copy link
Member

jgabry commented Feb 27, 2017 via email

@jgabry
Copy link
Member

jgabry commented Feb 27, 2017 via email

@jgabry
Copy link
Member

jgabry commented Feb 27, 2017

@silberzwiebel I just added the freq argument in 2d63b80. The default is TRUE, but if set to FALSE it should put proportions on the y-axis instead of counts. Does your plot that groups by "condition" look better using freq=FALSE?

@silberzwiebel
Copy link
Contributor Author

It's a pleasure to get these quick responses including implementations, many thanks!

Another way to ensure the same y-axis limits is to add the argument
facet_args = list(scales = "fixed") to your call to ppc_bars_grouped, but I think that will result in something similar to what happens when you do ylim().

This option works nicely (and maybe should be the default? Neighbouring plots with different axes are mostly confusing if not even misleading in my opinion)
facet

I got however the following error (translated from German) before I explicitly loaded the dplyr package:

 error in function_list[[i]](value) : 
  could not find function "mutate_"

Does your plot that groups by "condition" look better using freq=FALSE?

Yes, but if you compare the following two plots I'd again suggest to use facet_args = list(scales = "fixed") as default. The first plot is without specifying this argument, the second is with the argument, both have also freq=FALSE
freqnf
freqf

The x-scale looks good now and I was able to manually change it afterwards via scale_x_continuous(), thanks.

About the labels of the legend:

But another option would be to add additional arguments to all of the bayesplot plotting functions so that user's can specify the names to be used in the legend at the same time they create the plot. For example, it could look something like this maybe: my_legend <- list(y = "emp. data", yrep = "model predict.") ppc_bars(y, yrep, ..., legend_labels = my_legend) That way the legends could be created with those labels instead of having to change them after the fact. I'd need to figure out the best way to implement that (it would vary depending on the plotting function) but that should be much easier to do. What do you think about that option?

I'd like such option very much! This would make it also easy to plot the output from different models and name them model A and model B. I guess, this use-case might be relevant as a visual model comparison?
But maybe there should be another issue for this?

@jgabry
Copy link
Member

jgabry commented Feb 28, 2017 via email

@silberzwiebel
Copy link
Contributor Author

I found a minor issue and I'm actually not sure if this is related to ppc_bars_groupedor to the ggplot2 theme_bw()that I added afterwards.
In the following plot, at the top right, the interval is not fully displayed but overlapped from the facet title:
bars_ylim

@jgabry
Copy link
Member

jgabry commented Mar 7, 2017

In a lot of the bayesplot functions I have a line in the code that tells ggplot2 not to expand the y-axis. By default ggplot2 will add some cushion below zero and above the highest value in order to "ensure that the data is placed some distance away from the axes". But I didn't like that there was extra white space under the bars. What happens if you add

scale_y_continuous(expand = c(0.05, 0))

to your plot? It probably gets rid of the problem at the top but adds some whitespace at the bottom, right? The ideal solution I think would be to remove the whitespace at the bottom without removing it at the top, but that doesn't seem to be possible as far as I can tell.

@silberzwiebel
Copy link
Contributor Author

Yeah, I do not like these extra spaces, too.
Adding your command does exactly, what you thought it would. It fixes the problem at the top but adds withespace between the x-axis and the bars.

Using one of the following three commands, however, worked for me:

scale_y_continuous(limits = c(0.0, 0.4), expand = c(0.0,0.0)) # explicitly turning the expansion off
expand_limits(y = c(0.0, 0.4))
coord_cartesian(ylim = c(0.0, 0.4))

It might be possible to find out the maximal value (0.4 for me) inside the ppc_barsfunction, right?
Not sure whether one of these has any side effects, though. The first gives me a warning:

Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.

The other two remain silent. The plot looks the same for all.

@jgabry
Copy link
Member

jgabry commented Mar 8, 2017

Thanks for following up. It should be possible to detect the max for the y-axis and use expand_limits like you suggest. I think the only case when that won't work is for ppc_bars_grouped with facet_args = list(scales = "free") because then every facet needs a unique adjustment (which doesn't seem possible, at least not obviously possible). But for ppc_bars_grouped with the default of scales="fixed" it should also work fine.

I'll make the change soon.

@jgabry
Copy link
Member

jgabry commented Mar 8, 2017

Just added the extra space in 8afc0b0

@jgabry
Copy link
Member

jgabry commented Mar 14, 2017

@silberzwiebel Merged and will be in the next release. Thanks for your help with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants