Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constructing a censored distribution #1468

Closed
sethaxen opened this issue Jan 11, 2022 · 17 comments · Fixed by #1470
Closed

Constructing a censored distribution #1468

sethaxen opened this issue Jan 11, 2022 · 17 comments · Fixed by #1470

Comments

@sethaxen
Copy link
Contributor

It would be handy to have a Censored object and corresponding censored function that behaved just like Truncated and truncate, respectively, but for censored data. The implementations would be much the same, with two main differences:

  • instead of evaluating the logpdf of the wrapped distribution at the bounds, they would evaluate its logcdf at the lower-bound and logccdf at the upper-bound
  • rand would use clamp instead of rejection sampling

inspired by the addition of Censored to PyMC and this Twitter thread

@mschauer
Copy link
Member

This is a good opportunity to see how well we did with https://github.com/JuliaMath/DensityInterface.jl

@devmotion
Copy link
Member

This is a good opportunity to see how well we did with https://github.com/JuliaMath/DensityInterface.jl

Can you explain how it is related to the DensityInterface integration? I would imagine that censored/Censored is basically the same as truncated/Truncated but with different values for the upper and lower bounds and different sampling, as @sethaxen said.

@mschauer
Copy link
Member

It's a mixture between the truncated and a Dirac at the boundary. We can already express this in Distributions as

using Distributions
B = Binomial(10, 0.5)
censoredB = MixtureModel([Truncated(B, 0, 4), Dirac(5)], [cdf(B, 4), 1-cdf(B, 4)])

in the discrete case. For the Gaussian censored we could extend

using Distributions
N = Normal(10, 0.5)
censoredN = MixtureModel([Truncated(N, 0, 5), Dirac(5)], [cdf(N, 5), 1-cdf(N, 5)])

but currently we get into value support hell

ERROR: LoadError: MethodError: no method matching value_support(::Type{UnivariateDistribution})
Closest candidates are:
  value_support(::Type{<:Distribution{VF, VS}}) where {VF, VS} at ~/.julia/packages/Distributions/Gnx7g/src/common.jl:141

because discrete-continuous mixtures are not allowed in Distributions.

These are exactly the discussions we had before Christmas. The good news is though: the example https://discourse.julialang.org/t/turing-censored-regression-example/53837/7 doesn't pose problems because the likelihood is weird, but the posterior density doesn't contain the truncated parameters so no problems with reference measures etc.

@devmotion
Copy link
Member

Yes, definitely MixtureModel, and in particular this example, would benefit from the approaches in the ProductDistribution PR.

However, I assume you wouldn't define it as a MixtureModel but with a custom Censored default fallback and possibly some optimizations for specific distributions, as suggested by @sethaxen. A custom distribution would avoid all the MixtureModel issues and also allow to dispatch on the censored distribution (not possible with the current design of MixtureModel, would require breaking changes such as tuples of distributions - which IMO we should support but this would hold back a PR and even then the order of the tuple elements is ambiguous).

@mschauer
Copy link
Member

I have to think if it's working exactly as in ProductDistribution. And yes, MixtureModel isn't the optimal container, but as you say mostly because of the short-comings from the implementation. It is just already there. And tuples for mixture models would be a good idea sometimes.

@sethaxen
Copy link
Contributor Author

I have to think if it's working exactly as in ProductDistribution. And yes, MixtureModel isn't the optimal container, but as you say mostly because of the short-comings from the implementation. It is just already there. And tuples for mixture models would be a good idea sometimes.

Also because sample generation for MixtureModel wrapping Truncated would generally be slower than clamp due to the possibility of a draw being rejected. Dispatching also on Censored would be preferable for defining bijectors, etc. So my view is that while the MixtureModel approach should be possible, a special distribution modifier should be preferred.

@sethaxen
Copy link
Contributor Author

I also still don't understand how DensityInterface.jl is relevant here.

@mschauer
Copy link
Member

I also still don't understand how DensityInterface.jl is relevant here.

If we did it right with DensityInterface.jl you can have a hierarchical model where X ~ censored(...) is not directly observed and you can still compute a logdensityof from the model such that a sampler supporting it (you know, a sampler like https://discourse.julialang.org/t/ann-zigzagboomerang-jl/57287/24 ...) can sample the posterior of the latent X.

@mschauer
Copy link
Member

I suggest we can have censored return an object which is almost the same as Truncated except the different definition of rand and cdf etc.

struct Censored{....}
    uncensored::D      # the original distribution (uncensored)
    lower::T      # lower bound
    upper::T      # upper bound
    lcdf::T       # cdf of lower bound (exclusive): P(X < lower)
    ucdf::T       # cdf of upper bound (inclusive): P(X ≤ upper)
    tp::T         # the probability of the truncated part, i.e. ucdf - lcdf
    logtp::T      # log(tp)
end

@devmotion
Copy link
Member

I think this was @sethaxen's suggestion above as well 🙂

@mschauer
Copy link
Member

And then you want to follow the line of reasoning in TuringLang/Turing.jl#847 (comment) ?

@sethaxen
Copy link
Contributor Author

And then you want to follow the line of reasoning in TuringLang/Turing.jl#847 (comment) ?

Which line of reasoning would that be?

@devmotion
Copy link
Member

Hmm I'm not sure what you would like to refer to here. Maybe I'm oversimplifying a possible implementation since I haven't made a concrete draft but I don't think there are any problems with our type hierarchies: Analogously to truncated/Truncated a censored DiscreteDistribution would be a DiscreteDistribution (since its support is still only a discrete set of values) and a censored ContinuousDistribution would be a ContinuousDistribution (since its support is still an uncountable set of values - well, apart from special cases which would be a ContinuousDistribution to avoid type stability issues and which are similar to e.g. the limit case of Normal(m, s) as s to 0 which of course is also a ContinuousDistribution). (If the probability mass on the censored interval is 0, we should probably throw an error anyway in both cases.)

@mschauer
Copy link
Member

Which line of reasoning would that be?

Hmm I'm not sure what you would like to refer to here.

Come on! We have to figure out if a censored continuous distribution is continuous. And it says there:

Discrete and DiscreteDistribution are (more or less commonly) understood as a countable support and distributions with countable support (i.e., distributions of discrete random variables in line with e.g. the simple notion on Wikipedia), and Continuous and ContinuousDistribution are viewed to refer to uncountable support and distributions with uncountable support (i.e., distributions of continuous random variables). This seems to deviate from their original interpretation but is consistent with the implemented DiscreteDistributions and ContinuousDistributions, allows to characterize each distribution as either DiscreteDistribution or ContinuousDistribution, and seemed the practically most useful approach for e.g. defining product distributions of discrete and continuous distributions (as e.g. in JuliaStats/Distributions.jl#1391) without having to break or change the type structure.

@mschauer
Copy link
Member

Analogously to truncated/Truncated

Note that a truncated continuous distribution has no atoms, but a censored has.

@devmotion
Copy link
Member

I assumed that you wanted to refer to this question

We have to figure out if a censored continuous distribution is continuous.

and hence I commented on it. However, I was confused and unsure about your intentions since you linked to a comment in a completely different repo (and which I had already forgotten 😄) and more importantly we already agreed on this definition some weeks ago (well, at least that was my impression: #1391 (comment)) and hence it was unclear to me which part would be a new reasoning.

I assumed we already agreed that

  • a DiscreteDistribution is characterized by a discrete (possibly countable infinite) support
  • a ContinuousDistribution is characterized by an uncountable support.
    As discussed in Generalize Product #1391 this keeps the type structure simple, allows to characterize any Distribution as either DiscreteDistribution or ContinuousDistribution, and matches the (more or less common) understanding of discrete and continuous random variables. But since it is not based on the continuity of the probability density function or the base measure for the type it does not matter if a distribution has atoms or not. Such properties would have to be assessed with specific functions but are not encoded in the type. (As a side remark, this also implies that a truncated ContinuousDistribution can actually have atoms if the untruncated ContinuousDistributions has some.)

The point I wanted to raise above was that we have to decide what to do in the special cases where censoring of a ContinuousDistribution leaves us only with a discrete support: E.g., if we censor a mixture of a Uniform(0, 1) and a Dirac(2) to the interval [1.5, 2.5]. I guess we could either try to check this, probably with a to be introduced function, and then error or make it a DiscreteDistribution. The latter creates type instabilities though and hence does seem suboptimal. Alternatively, we could just ignore these special cases (for now), since e.g. mixtures of DiscreteDistribution and ContinuousDistribution aren't even supported. And even if they would, we also consider other special cases such as Normal(0, 0) of ContinuousDistributions where the support collapses to a discrete set still as ContinuousDistributions. So I guess at least initially it seems fine to ignore these special cases.

Hence I guess the question regarding these atoms is not the type structure but mainly how to define functions such as logpdf and pdf in a good and useful way. I assume one useful and (hopefully) natural and unsurprising way would be to use an unweighted sum of two Diracs and the base measure of the original distribution restricted to the interior of the censored interval as a base measure of the censored distribution. Then for ContinuousDistributions pdf just returns the cdf and ccdf at the boundaries plus any potential mass at the upper boundary (if the original distribution already had an atom there) and otherwise the pdf, and for DiscreteDistributions it always returns cdf (is inclusive) and ccdf + pdf at the boundaries and pdf otherwise.

@mschauer
Copy link
Member

mschauer commented Jan 13, 2022

Yes, I understand we both think this is the way to go but we haven't really made it Distributions.jl-official including edging out the interface which allows us to check for atoms in continuous distributions: if continuous distributions can have atoms (they couldn't have before) all the distribution function calculations need to take this into account, there are perhaps many places where we assume that P(X < c) = P(X ≤ c) for continuous distributions. Thinking about it, this should be issue-tracked with a paragraph added to the docs clarifying the interface when we settled it. In retrospect it was nice that <: Continuous meant no atoms and <: Discrete meant only atoms. The case of the product distribution #1391 this issue is more subtle and less pressing. Subtle e.g. because the product between a Uniform and a Bernoulli has no atoms but also no density with respect to the joint Lebesgue measure and less pressing because distribution functions of multivariate distributions don't play such an important role.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants