Unexpected behavior in `random` #2909

ColCarroll · 2018-03-28T18:53:19Z

Consider the following:

with pm.Model() as model:
    a = pm.Uniform('a', lower=0, upper=1, shape=10)
    b = pm.Binomial('b', n=1, p=a, shape=10)
    
b.random(size=10000).mean(axis=0)

# array([0.7022, 0.0073, 0.9857, 0.5378, 0.9821, 0.7176, 0.0905, 0.2513, 0.5835, 0.0521])

I would have expected this mean to be close to 0.5 for each element.

This is happening because the implementation for .random methods all follow this pattern:

def random(self, point=None, size=None):
    n, p = draw_values([self.n, self.p], point=point)
    return generate_samples(stats.binom.rvs, n=n, p=p,
                            dist_shape=self.shape,
                            size=size)

Note that a single value is drawn for p, and used in all draws for the random variable. Indeed, in the above example, the value drawn for p was quite close to the mean of the draws!

This is causing strange behavior in sample_prior (#2876). If this is considered a bug, I can fix it by allowing draw_values to have a size argument. Otherwise I can change #2876 to work around the behavior.

(NB: I think this does not effect any of the sampling procedures)

The text was updated successfully, but these errors were encountered:

junpenglao · 2018-03-28T19:01:17Z

I hope it doesnt effect sample_ppc...

ColCarroll · 2018-03-28T19:11:08Z

A quick experiment says it does not:

with pm.Model() as model:
    a = pm.Uniform('a', lower=0, upper=1, shape=10)
    b = pm.Binomial('b', n=1, p=a, shape=10, observed=np.ones(10))
    trace = pm.sample(1000)
    post = pm.sample_ppc(trace)

post['b'].mean(axis=0).std()  # 0.0139301830569451
b.distribution.random(size=post['b'].shape[0]).mean(axis=0).std()  # 0.2558969323770803

lucianopaz · 2018-04-04T07:51:03Z

Sorry to jump in on this discussion not being a member but I think that what @ColCarroll pointed out should be considered a bug and should be fixed as he says by adding a size parameter to draw_values.

Consider his first example. The current behavior of random draws a single value from b's distribution metaparameters. I think that this has at least two problems.

I think that the current implementation is not consistent, as I would expect a call to b.random(size=10000) to have the same distribution as a call like [b.random() for i in range(10000)]. I would only expect the first call to be faster because it could be vectorized or distributed in some more efficient way.
It is not fully consistent with the bayesian interpretation of metaparameters. In his example, each b[i] is sampled as P(b[i]|a) where a is drawn once in the beginning and has a common value across all b[i]. This seems consistent with the frequentist interpretation, where the model parameters are real fixed numbers in nature and should be the same for each b_i that is drawn. However, this was clearly not intended as the value of a is not stored in between calls to random, and different calls will very likely use different a's. A method more consistent with the bayesian interpretation, where the uncertainty in the metaparameters' value is encoded by a probability distribution, would be to jointly sample a and b together, in other words, to draw a value of a for each draw of b.

In the end, if draw_values also took the size parameter, as @ColCarroll suggests, it would be possible to get an independent value of a for each entry to the b array, solving both the problems I mentioned above.

springcoil · 2018-04-04T15:05:36Z

@ColCarroll I'd agree with @lucianopaz on this. There's a bug in draw_values

pymc-devs deleted a comment from springcoil Apr 21, 2018

junpenglao mentioned this issue Apr 22, 2018

Add sample_prior function #2876

Closed

ziyenl mentioned this issue Apr 22, 2018

Fix for issue #2909 #2946

Closed

ColCarroll mentioned this issue May 1, 2018

sample_ppc does not respect the size of Mixture likelihoods #2954

Closed

springcoil mentioned this issue May 17, 2018

Bug Fix for 2909 #2979

Merged

springcoil self-assigned this May 18, 2018

ColCarroll closed this as completed in #2979 May 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected behavior in `random` #2909

Unexpected behavior in `random` #2909

ColCarroll commented Mar 28, 2018

junpenglao commented Mar 28, 2018

ColCarroll commented Mar 28, 2018

lucianopaz commented Apr 4, 2018

springcoil commented Apr 4, 2018

Unexpected behavior in random #2909

Unexpected behavior in random #2909

Comments

ColCarroll commented Mar 28, 2018

junpenglao commented Mar 28, 2018

ColCarroll commented Mar 28, 2018

lucianopaz commented Apr 4, 2018

springcoil commented Apr 4, 2018

Unexpected behavior in `random` #2909

Unexpected behavior in `random` #2909