Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default print() output #32

Closed
mjskay opened this issue Oct 31, 2019 · 15 comments
Closed

Default print() output #32

mjskay opened this issue Oct 31, 2019 · 15 comments
Assignees
Labels
feature New feature or request
Milestone

Comments

@mjskay
Copy link
Collaborator

mjskay commented Oct 31, 2019

As I've been writing rename_variables() I've found it's a little awkward to work with draws objects when the default print output at the console is typically gigantic. This also makes examples a little verbose, as it feels necessary to call summarise_draws() constantly.

Two thoughts:

  1. Any objections to making the default print() for draws objects call summarise_draws()?
  2. If we agree to do (1), we will now have three ways of getting the same info (print, summary, and summarise_draws). That possibly feels a bit overkill? I can see how they are typically used in different ways, so having them all as aliases is probably fine, but it is worth considering.
@paul-buerkner
Copy link
Collaborator

I agree we should have a better print() method than the current one. For now, just using print() as another alias of summarise_draws is fine, but we may want to think of whether we should do something similar to what tibbles do that is truncating a lot of output to make the structure visible without printing an overwhelming amount of information. The problem with summarise_draws is that it is computationally non-trivial for large posteriors so we may not want to do all those computations for a simple print call (or store the summary in some internal environment just as rstan does it to avoid recomputation).

@paul-buerkner paul-buerkner added the feature New feature or request label Nov 1, 2019
@jgabry
Copy link
Member

jgabry commented Nov 1, 2019

Hmm, what if print() just provides a useful summary of the structure? (e.g. like str() but doesn’t have to look like str() output)

@paul-buerkner
Copy link
Collaborator

paul-buerkner commented Nov 1, 2019 via email

@mjskay
Copy link
Collaborator Author

mjskay commented Nov 1, 2019

Good idea. What about something like this:

 'draws_array' num [1:100, 1:4, 1:10]
 - 4 chains × 100 iterations
 - 10 variables (median±mad):
      mu      tau theta[1] theta[2] theta[3] theta[4] theta[5] theta[6] theta[7] theta[8] 
 4.5±3.5  2.9±2.6  5.5±4.9  4.5±4.1  4.5±4.6  5.0±4.7  3.8±4.3  4.4±4.6  6.2±4.5  4.5±4.6 

Where the first line will give the basic structure according to the specific format, but the remaining lines would be the same structure for all formats. Could either keep listing variables at the end or truncate it to the first k (10?) with an option not to truncate. That might also save us from needing to pre-compute summaries and keep them around.

I have some formatting code I have been playing with while experimenting with rv-like interfaces that can output the last line, so I'd be happy to write this if we want it.

@paul-buerkner
Copy link
Collaborator

I would very much like such a light-weight print method!

@mjskay
Copy link
Collaborator Author

mjskay commented Nov 2, 2019

Hmm, now that I've tried implementing something like this, I've realized it becomes a bit annoying in other ways: because it masks the default print output of the underlying format, if you want to get a sense of what the draws format looks like in the specific format you are using it is harder.

So perhaps we should leave print alone?

@paul-buerkner
Copy link
Collaborator

paul-buerkner commented Nov 2, 2019 via email

@jgabry
Copy link
Member

jgabry commented Nov 7, 2019

Not sure how reasonable and doable such a print method would be for the other formats though.

It’s definitely trickier but I think worth thinking about at some point. Default print for arrays isn’t super helpful in my experience, so I’d love to have something nicer if possible!

@mjskay
Copy link
Collaborator Author

mjskay commented Nov 7, 2019

Seems reasonable. The approach I was taking (with mean+/-sd) is fine for the rvar stuff but does weird things when people start subsetting the other formats (which was what made me realize it won't really work here). But something that shows structure and abbreviates only as necessary, more like tibbles, makes sense.

@jgabry
Copy link
Member

jgabry commented Nov 7, 2019

I meant to also say though that it's fine by me if you want to leave print() alone for now though. In that case we can leave this issue open to make sure we come back to it at some point.

@paul-buerkner
Copy link
Collaborator

paul-buerkner commented Mar 24, 2020

I have now added some print methods that mostly truncate the output and give additional meta-information but make sure the underlying format is still visible. I would be happy to hear your thoughts.

@paul-buerkner paul-buerkner self-assigned this Mar 24, 2020
@MansMeg
Copy link
Collaborator

MansMeg commented Apr 2, 2020

@paul-buerkner asked me to check the print statements so I think I print them here for ease of discussion:

x <- example_draws()
print(as_draws_matrix(x))
print(as_draws_array(x))
print(as_draws_df(x))
print(as_draws_list(x))

This results in:

> print(as_draws_matrix(x))
# A draws_matrix: 400 draws, and 10 variables
    variable
draw   mu tau theta[1] theta[2] theta[3] theta[4] theta[5] theta[6]
  1  2.01 2.8     3.96    0.271    -0.74      2.1    0.923      1.7
  2  1.46 7.0     0.12   -0.069     0.95      7.3   -0.062     11.3
  3  5.81 9.7    21.25   14.931     1.83      1.4    0.531      7.2
  4  6.85 4.8    14.70    8.586     2.67      4.4    4.758      8.1
  5  1.81 2.8     5.96    1.156     3.11      2.0    0.769      4.7
  6  3.84 4.1     5.76    9.909    -1.00      5.3    5.889     -1.7
  7  5.47 4.0     4.03    4.151    10.15      6.6    3.741     -2.2
  8  1.20 1.5    -0.28    1.846     0.47      4.3    1.467      3.3
  9  0.15 3.9     1.81    0.661     0.86      4.5   -1.025      1.1
  10 7.17 1.8     6.08    8.102     7.68      5.6    7.106      8.5
# ... with 390 more draws, and 2 more variables
> print(as_draws_array(x))
# A draws_array: 100 iterations, 4 chains, and 10 variables
, , variable = mu

         chain
iteration   1    2     3   4
        1 2.0  3.0  1.79 6.5
        2 1.5  8.2  5.99 9.1
        3 5.8 -1.2  2.56 0.2
        4 6.8 10.9  2.79 3.7
        5 1.8  9.8 -0.03 5.5

, , variable = tau

         chain
iteration   1    2    3   4
        1 2.8 2.80  8.7 3.8
        2 7.0 2.76  2.9 6.8
        3 9.7 0.57  8.4 5.3
        4 4.8 2.45  4.4 1.6
        5 2.8 2.80 11.0 3.0

, , variable = theta[1]

         chain
iteration     1     2    3     4
        1  3.96  6.26 13.3  5.78
        2  0.12  9.32  6.3  2.09
        3 21.25 -0.97 10.6 15.72
        4 14.70 12.45  5.4  2.69
        5  5.96  9.75  8.2 -0.91

, , variable = theta[2]

         chain
iteration      1    2   3   4
        1  0.271  1.0 2.1 5.0
        2 -0.069  9.4 7.3 8.2
        3 14.931 -1.2 5.7 6.0
        4  8.586 12.5 2.8 2.7
        5  1.156 11.9 3.2 3.2

, , variable = theta[3]

         chain
iteration     1     2     3   4
        1 -0.74  0.22   1.4 5.7
        2  0.95  9.68   4.1 3.5
        3  1.83 -1.37  -8.3 3.1
        4  2.67 11.15 -10.8 3.2
        5  3.11 12.72 -27.8 2.6

# ... with 95 more iterations, and 5 more variables
> print(as_draws_df(x))
# A draws_df: 100 iterations, 4 chains, and 10 variables
     mu tau theta[1] theta[2] theta[3] theta[4] theta[5] theta[6]
1  2.01 2.8     3.96    0.271    -0.74      2.1    0.923      1.7
2  1.46 7.0     0.12   -0.069     0.95      7.3   -0.062     11.3
3  5.81 9.7    21.25   14.931     1.83      1.4    0.531      7.2
4  6.85 4.8    14.70    8.586     2.67      4.4    4.758      8.1
5  1.81 2.8     5.96    1.156     3.11      2.0    0.769      4.7
6  3.84 4.1     5.76    9.909    -1.00      5.3    5.889     -1.7
7  5.47 4.0     4.03    4.151    10.15      6.6    3.741     -2.2
8  1.20 1.5    -0.28    1.846     0.47      4.3    1.467      3.3
9  0.15 3.9     1.81    0.661     0.86      4.5   -1.025      1.1
10 7.17 1.8     6.08    8.102     7.68      5.6    7.106      8.5
# ... with 390 more draws, and 2 more variables
> print(as_draws_list(x))
# A draws_list: 100 iterations, 4 chains, and 10 variables

[chain = 1]
$mu
 [1] 2.01 1.46 5.81 6.85 1.81 3.84 5.47 1.20 0.15 7.17

$tau
 [1] 2.8 7.0 9.7 4.8 2.8 4.1 4.0 1.5 3.9 1.8

$`theta[1]`
 [1]  3.96  0.12 21.25 14.70  5.96  5.76  4.03 -0.28  1.81  6.08

$`theta[2]`
 [1]  0.271 -0.069 14.931  8.586  1.156  9.909  4.151  1.846  0.661
[10]  8.102

$`theta[3]`
 [1] -0.74  0.95  1.83  2.67  3.11 -1.00 10.15  0.47  0.86  7.68


[chain = 2]
$mu
 [1]   2.99   8.17  -1.15  10.93   9.82 -10.90  -9.26   1.79   5.35
[10]   0.87

$tau
 [1] 2.80 2.76 0.57 2.45 2.80 6.08 9.33 6.81 2.82 6.69

$`theta[1]`
 [1]  6.26  9.32 -0.97 12.45  9.75  2.56 11.92  9.89  4.31  9.26

$`theta[2]`
 [1]  1.0  9.4 -1.2 12.5 11.9 -8.8 -6.1 11.6  2.8  8.4

$`theta[3]`
 [1]   0.22   9.68  -1.37  11.15  12.72 -20.73 -12.17   1.77   5.98
[10]  -3.31


[chain = 3]
$mu
 [1]  1.79  5.99  2.56  2.79 -0.03  1.06  3.67  3.51  8.85  8.85

$tau
 [1]  8.72  2.91  8.41  4.39 11.03  2.70  1.68  0.52  5.96  5.96

$`theta[1]`
 [1] 13.3  6.3 10.6  5.4  8.2  5.0  5.2  3.7 13.1 13.1

$`theta[2]`
 [1] 2.1 7.3 5.7 2.8 3.2 4.3 4.1 4.1 4.7 4.7

$`theta[3]`
 [1]   1.38   4.11  -8.27 -10.77 -27.78  -3.94   0.36   3.84   2.75
[10]   2.75


[chain = 4]
$mu
 [1]  6.46  9.15  0.20  3.69  5.48  2.38 11.82  4.90  0.88  3.81

$tau
 [1]  3.8  6.8  5.3  1.6  3.0  2.3  4.3  3.1 15.8  2.7

$`theta[1]`
 [1]  5.78  2.09 15.72  2.69 -0.91  0.59 18.87  1.50  9.07  7.52

$`theta[2]`
 [1]  5.0  8.2  6.0  2.7  3.2  1.1 13.0  6.1 11.6  4.3

$`theta[3]`
 [1]  5.69  3.47  3.13  3.16  2.55 -0.12 14.96  3.31  4.29  4.69

# ... with 90 more iterations, and 5 more variables
> 

I think these print methods are really nice. I only have two - very minor - suggestions.

Suggestions

  1. I think that we would like to limit the number of chains as a default as well - it doesn't look like that currently done, but in posteriordb I work with 10 chains, and then the print() would give too much. I think 1 or 3 chains would be enough to show using print.
  2. If I shift my console width the print jumps to second row. I think it is nice to have a solution where the columns are adapted to the console size. But, that is need to have. See below
# A draws_matrix: 400 draws, and 10 variables
    variable
draw   mu tau theta[1] theta[2] theta[3]
  1  2.01 2.8     3.96    0.271    -0.74
  2  1.46 7.0     0.12   -0.069     0.95
  3  5.81 9.7    21.25   14.931     1.83
  4  6.85 4.8    14.70    8.586     2.67
  5  1.81 2.8     5.96    1.156     3.11
  6  3.84 4.1     5.76    9.909    -1.00
  7  5.47 4.0     4.03    4.151    10.15
  8  1.20 1.5    -0.28    1.846     0.47
  9  0.15 3.9     1.81    0.661     0.86
  10 7.17 1.8     6.08    8.102     7.68
    variable
draw theta[4] theta[5] theta[6]
  1       2.1    0.923      1.7
  2       7.3   -0.062     11.3
  3       1.4    0.531      7.2
  4       4.4    4.758      8.1
  5       2.0    0.769      4.7
  6       5.3    5.889     -1.7
  7       6.6    3.741     -2.2
  8       4.3    1.467      3.3
  9       4.5   -1.025      1.1
  10      5.6    7.106      8.5
# ... with 390 more draws, and 2 more variables

@paul-buerkner
Copy link
Collaborator

Thank you for your comments!

I think that we would like to limit the number of chains as a default as well - it doesn't look like that currently done, but in posteriordb I work with 10 chains, and then the print() would give too much. I think 1 or 3 chains would be enough to show using print.

They are limited by default in a format dependent manner, which can be set globally via the options(max_chains = <x>). The format dependent defaults are shown in ?print.draws_<format>.

For example, draws_list shows 4 chains by default, but I honestly think this may be too much. Perhaps just show 2 by default for this format?

If I shift my console width the print jumps to second row. I think it is nice to have a solution where the columns are adapted to the console size. But, that is need to have.

I agree but have two concerns.

First, this may not be trivial to implement. I know tibble has some features in that regard but as far as I can see, the related code is not quite trivial. Printing is one of the primary concerns for tibble so I understand why they put so much effort into it. I am not sure I can put that effort into the print methods of posterior though. But perhaps this adaptive printing is easier than I think so if anybody has expierience with that, I would love to hear their thoughts.

Second, I currently control the number of variables, iterations, chains, etc. shown via format specific defaults and the option to set defaults globally via options(). However, to my understanding, this interferes with a console width dependent printing. I think, we can either have printing that does one or the other, not both, at least not for the dimension that spans along the console width.

@MansMeg
Copy link
Collaborator

MansMeg commented Apr 2, 2020

Yes. I agree that it is not trivial, and - at least to me - it is more of a nice to have than anything important.

Regarding how many chains to print - I actually think that print should show the minimal possible so I agree that 2 chains is probably a good idea as default.

@paul-buerkner
Copy link
Collaborator

I am going to close this issue for now since we have reasonable print outputs to start with. If, at a later stage, we want to make things prettier, for instance, more adaptive to the console width, we can open a specific new issue dedicated to that purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants