Fix categorical random shape #3060

canyon289 · 2018-06-26T15:02:31Z

Hello,
This pull request is for #3035 and is very much in WIP.

To summarize I tried to hit two objectives per @junpenglao request, one was removing duplication of random draws from categorical, the other was address the shape.

For removing duplication
I created a function in distribution called random_choice and pointed the previous methods to it. All tests are passing in this structure.

Questions I have
Is this name good or should it be called categorical_random_choice?
Is distributions.py a good place for it?

For the shape comment
I believe there is an expectation that if a (10,5) size is requested, the method returns a (10,5) array.
These tests specifically seem to test for that behavior.
pymc3.tests.test_distributions_random.BaseTestCases.BaseTestCase#test_scalar_parameter_shape
pymc3.tests.test_distributions_random.BaseTestCases.BaseTestCase#test_scalar_shape
Given that I don't understand how this test reconciles with the original issue, but this is not to suggest the request was wrong. I'm very much at the edge of my understanding and feel like I'm missing something.
How should I proceed?

I've attached some supporting screenshots that may help. Thanks for the chance to participate and leanr

junpenglao · 2018-06-26T15:06:01Z

Is this name good or should it be called categorical_random_choice?

I am fine with it

Is distributions.py a good place for it?

distribution.dist_math might be better

junpenglao · 2018-06-26T15:06:20Z

Also rebase needed.

canyon289 · 2018-06-27T04:00:50Z

@junpenglao Currently running linting and coverage overnight but do let me know if anything is missing.

I still do have a question about the comment on the shapes. Given the test_scalar_parameter_shape test does it still make sense to have random_choice return an (x,1) shape array?

Thanks for the quick comment

canyon289 · 2018-06-27T04:08:19Z

Some tests are failing. I'll check them out

canyon289 · 2018-06-27T04:27:51Z

One last thing, I still need to run the examples in docs/source/notebooks, but wanted to make sure my implementation at the code level was ok before proceeding.

junpenglao · 2018-06-27T06:00:55Z

Cool.
Need a test to make sure the case in #3035 is now return array with the correct shape.

twiecki · 2018-06-27T09:54:26Z

pymc3/distributions/dist_math.py

+        samples = np.row_stack([np.random.choice(k, p=p_) for p_ in p])
+    else:
+        samples = np.random.choice(k, p=p, size=size)
+    return samples


missing new-line

canyon289 · 2018-06-28T04:22:59Z

@junpenglao and @twiecki thanks for the feedback

In regards to the shape, I think an input size of (10,5) matrix with an output size of (10,) would break the below but I could be mistaken. Is my understanding correct? Below is a screenshot as well showing how the test expects random_choice to output the same size array as the input.

Lastly I'll be away for the next week but can finish this when I get back!

https://github.com/pymc-devs/pymc3/blob/master/pymc3/tests/test_distributions_random.py#L148-L168

junpenglao · 2018-06-28T06:20:26Z

Yes you should change that test for Categorical - the shape of the output from Categorical random is not the same as parameters.

canyon289 · 2018-07-05T15:14:38Z

Hello, I'm back and will be working on this again this weekend! Just wanted to give a heads up

canyon289 · 2018-07-06T15:21:54Z

pymc3/distributions/discrete.py

-            )
-        else:
-            return np.asarray(np.random.choice(k, p=p, size=size))
-
    def random(self, point=None, size=None):
        p, k = draw_values([self.p, self.k], point=point, size=size)


@junpenglao @twiecki Just for my own understanding why is draw_values called here? Why can't we just use generate_samples directly?

I realize this may be a really basic question, but I'm biasing towards asking rather than assuming as I learn more about pymc3 :)

sorry, i think this was a question for me: draw_values is generally to get the parameters of a distribution to the right shape (size=size), with the right conditioning (point=point). The generate_samples function actually draws new samples for an RV.

If you trace the code, draw_values will call .random on each of the variables, which will (usually) end up at generate_samples. This is how we traverse back up the computation graph.

canyon289 · 2018-07-08T22:31:07Z

pymc3/tests/test_distributions_random.py

+    def test_scalar_parameter_shape(self):
+        rv = self.get_random_variable(None)
+        # for size, expected in ((None, (1,)), (5, (5, 1)), ((4, 5), (4, 1))):
+        for size, expected in (((4, 5), (4, 1)),):


So while I fixed the original inciting case in the issue, I don't know enough about all the possible ways that categorical.random could be called to know how to modify these tests correctly.

The original example passes p as a parameter to dist.random and that case is handled
This test passes size to dist.random and I don't understand how that should be handled by Categorical.random or if this ever happens during nominal use of this library.

Any guidance would be appreciated and thanks for bearing with me while I learn the ropes.

Just to be clear, in this example, you expected v.shape to be (10, 5), right? I tried to get draw_values and generate_samples to return something shaped roughly the following way:

broadcast the parameters together using np.broadcast to get their shape. Call it d1.

look at the shape argument passed to the random variable, and call it d2. Try to combine d1 and d2 in a sensible way, and throw an exception instead of guessing too much. This is the bulk of the draw_values code. We'll say d3 is the combination of d1 and d2

return something of shape size x d3

Explicitly, consider `x = pm.Normal.dist(mu=np.random.rand(5), sd=1, shape=(2, 5)).random(size=10)

The size of mu is (5,), and sd is (), and broadcasting them gives (5,)

The shape argument ((2, 5)) fits nicely with the broadcast shape, so d3 above is (2, 5). Note that shape=(2, 4) throws an exception.

We return an array of shape (10, 2, 5), since the size argument was 10.

I hope this helps!

Hey @ColCarroll
I believe v.shape should be (10,1) per the issue @junpenglao filed? #3035

I've read your comments and want to read them a couple more times before asking any other follow up questions. This is very helpful!

@ColCarroll and @junpenglao

Would either of you mind if I put the deduplication of random_choice on another pull request so it can be merged independently of the shape issue? All tests were passing for that portion of the code.

I ask because I think it'll streamline the code changes and reduce the risk of conflicts later.

Sure thing. Which PR is that?

Just created it here. Pending CI tests at time of writing
#3084

canyon289 · 2018-07-14T14:40:11Z

@junpenglao and @ColCarroll Starting a new thread since I jumbled the last one. I also updated the commit tree with code that only relevant to the discussion around shape.

To summarize
I fixed the original issue that @junpenglao referenced in this issue, where a 2 dimensional array is passed to p, pymc3 should return a (n,1) column vector for Categorical distributions
#3035

To directly address @ColCarroll's statement. I do not expect v.shape to be (10,5) due to @junpenglao's original ticket, I expect it to return a (10,1) column vector.

However I have no idea if this change breaks people's code. I certainly broke a number of tests. The tests all ensure that the correct size is returned when asize argument is passed.

I tried reading a bunch of closed issues but I ended up a little more confused than I originally started.
#2614
#2623
#2876
#2979

So starting off with my first question
@ColCarroll When you say parameter in Step 1 above, in this case the parameter would be p and the first step is for np.broadcast to get p's shape?

Lastly, if it becomes too much to answer my questions please let me know. My intention is to help contribute to pymc3 but if it takes 5 hours of questions from me to get 1 hour of work done then it's may have ended up detracting from you guys :( I truly do appreciate the help thus far.

Thank you

junpenglao · 2018-07-14T21:18:42Z

Looking at the failed test, I think you are getting really close :-)

Currently, most of the test fail is matching (n, ) with (n, 1). I dont have a strong feeling of this, I guess if you modify it to return a (n, ) is fine as well. Not sure which one makes more sense tho... I asked twitter https://twitter.com/junpenglao/status/1018242975682453504

the test mismatch of (4, 5) and (4, 1) you should just go ahead and modify the test.

the test mismatch of (5, 10, 5) and (50, 5) would need some investigation.

canyon289 · 2018-07-17T14:36:47Z

@ColCarroll @junpenglao
I modified Categorical.random back to (n,) output and all tests pass. In that situation all tests pass and I could use a review

junpenglao · 2018-07-17T17:05:38Z

Did you implement a new test for this?

canyon289 · 2018-07-17T23:16:35Z

I did not but should have, I'll change this back to WIP and do that sometime before Sunday, My apologies

canyon289 · 2018-07-20T02:41:01Z

@junpenglao I think I'm ready for what I hope is the final code review! Hopefully this is the one that makes it

junpenglao · 2018-07-20T04:17:39Z

Awesome! One less shape issue ;-)

canyon289 force-pushed the fix_categorical_random_shape branch from 3ee36c5 to 7252674 Compare June 27, 2018 03:52

canyon289 force-pushed the fix_categorical_random_shape branch from 7252674 to 9a84dac Compare June 27, 2018 04:11

twiecki reviewed Jun 27, 2018

View reviewed changes

canyon289 force-pushed the fix_categorical_random_shape branch from ac5d6f2 to 51c7f1b Compare June 28, 2018 04:26

canyon289 commented Jul 6, 2018

View reviewed changes

canyon289 force-pushed the fix_categorical_random_shape branch from 51c7f1b to efa5816 Compare July 8, 2018 22:24

canyon289 commented Jul 8, 2018

View reviewed changes

canyon289 force-pushed the fix_categorical_random_shape branch from efa5816 to cadee9c Compare July 14, 2018 02:07

canyon289 mentioned this pull request Jul 15, 2018

Parametrize shape and size tests #3099

Merged

canyon289 force-pushed the fix_categorical_random_shape branch 4 times, most recently from 1c19ce1 to be8c44b Compare July 17, 2018 04:05

canyon289 changed the title ~~[WIP] Fix categorical random shape~~ Fix categorical random shape Jul 17, 2018

canyon289 changed the title ~~Fix categorical random shape~~ [WIP] Fix categorical random shape Jul 17, 2018

Modify random_choice to return correct shape

8dd412f

canyon289 added 2 commits July 18, 2018 20:43

Add test for categorical distribution shape

d7644d3

Remove unused arrays in test

1263e02

canyon289 force-pushed the fix_categorical_random_shape branch from be8c44b to 1263e02 Compare July 19, 2018 03:43

junpenglao changed the title ~~[WIP] Fix categorical random shape~~ Fix categorical random shape Jul 20, 2018

junpenglao merged commit 973f25d into pymc-devs:master Jul 20, 2018

canyon289 deleted the fix_categorical_random_shape branch July 21, 2018 05:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix categorical random shape #3060

Fix categorical random shape #3060

canyon289 commented Jun 26, 2018 •

edited

Loading

junpenglao commented Jun 26, 2018

junpenglao commented Jun 26, 2018

canyon289 commented Jun 27, 2018 •

edited

Loading

canyon289 commented Jun 27, 2018

canyon289 commented Jun 27, 2018

junpenglao commented Jun 27, 2018

twiecki Jun 27, 2018

canyon289 commented Jun 28, 2018

junpenglao commented Jun 28, 2018 •

edited

Loading

canyon289 commented Jul 5, 2018

canyon289 Jul 6, 2018 •

edited

Loading

canyon289 Jul 8, 2018

ColCarroll Jul 8, 2018

canyon289 Jul 8, 2018 •

edited

Loading

ColCarroll Jul 8, 2018

canyon289 Jul 9, 2018 •

edited

Loading

canyon289 Jul 10, 2018 •

edited

Loading

junpenglao Jul 10, 2018

canyon289 Jul 10, 2018

canyon289 commented Jul 14, 2018 •

edited

Loading

junpenglao commented Jul 14, 2018

canyon289 commented Jul 17, 2018

junpenglao commented Jul 17, 2018

canyon289 commented Jul 17, 2018

canyon289 commented Jul 20, 2018

junpenglao commented Jul 20, 2018

Fix categorical random shape #3060

Fix categorical random shape #3060

Conversation

canyon289 commented Jun 26, 2018 • edited Loading

junpenglao commented Jun 26, 2018

junpenglao commented Jun 26, 2018

canyon289 commented Jun 27, 2018 • edited Loading

canyon289 commented Jun 27, 2018

canyon289 commented Jun 27, 2018

junpenglao commented Jun 27, 2018

twiecki Jun 27, 2018

Choose a reason for hiding this comment

canyon289 commented Jun 28, 2018

junpenglao commented Jun 28, 2018 • edited Loading

canyon289 commented Jul 5, 2018

canyon289 Jul 6, 2018 • edited Loading

Choose a reason for hiding this comment

canyon289 Jul 8, 2018

Choose a reason for hiding this comment

ColCarroll Jul 8, 2018

Choose a reason for hiding this comment

canyon289 Jul 8, 2018 • edited Loading

Choose a reason for hiding this comment

ColCarroll Jul 8, 2018

Choose a reason for hiding this comment

canyon289 Jul 9, 2018 • edited Loading

Choose a reason for hiding this comment

canyon289 Jul 10, 2018 • edited Loading

Choose a reason for hiding this comment

junpenglao Jul 10, 2018

Choose a reason for hiding this comment

canyon289 Jul 10, 2018

Choose a reason for hiding this comment

canyon289 commented Jul 14, 2018 • edited Loading

junpenglao commented Jul 14, 2018

canyon289 commented Jul 17, 2018

junpenglao commented Jul 17, 2018

canyon289 commented Jul 17, 2018

canyon289 commented Jul 20, 2018

junpenglao commented Jul 20, 2018

canyon289 commented Jun 26, 2018 •

edited

Loading

canyon289 commented Jun 27, 2018 •

edited

Loading

junpenglao commented Jun 28, 2018 •

edited

Loading

canyon289 Jul 6, 2018 •

edited

Loading

canyon289 Jul 8, 2018 •

edited

Loading

canyon289 Jul 9, 2018 •

edited

Loading

canyon289 Jul 10, 2018 •

edited

Loading

canyon289 commented Jul 14, 2018 •

edited

Loading