Skip to content
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.

Broadcasting dimension seems counter-intuitive #927

Closed
CatchemAL opened this issue Sep 30, 2017 · 4 comments
Closed

Broadcasting dimension seems counter-intuitive #927

CatchemAL opened this issue Sep 30, 2017 · 4 comments
Assignees
Labels
Milestone

Comments

@CatchemAL
Copy link
Collaborator

Hi @cesarsouza ,

Is my intuition failing me here?

[Test]
public void BroadcastingTest()
{
    double[,] matrix = Matrix.Zeros(rows: 100, columns: 3);
    double[] vec = { 1, 2, 3 };

    double[,] broadcasted1 = matrix.Add(vec, dimension: 0); // works!     :O
    double[,] broadcasted2 = matrix.Add(vec, dimension: 1); // fails!     :\
}

Certainly took me by surprise. It's not a massive issue if you get an IndexOutOfRangeException but it's a lot harder to spot on square matrices.

Thanks,
A

ps - Perhaps my intuition agrees with yours (#902)?

@cesarsouza
Copy link
Member

cesarsouza commented Sep 30, 2017

Hi @AlexJCross,

Thanks for raising this issue! This is actually something that I always wanted to double-check with another first time user of the elementwise operations. I've been exposed for those methods for so long, that I kind of lost my intuition of what a first time user would expect them to mean.

I understand that the "dimension" parameter in the elementwise operations is not intuitive, but I am not sure it is counter-intuitive either. If I didn't know how it worked already, I think it also would not be immediate to me why it would have worked with dimension: 1 and failed with dimension: 0.

The current behavior of those methods is based on the following behavior in Python:

In [3]: matrix = np.zeros((100, 3))

In [4]: vec = np.array([1,2,3])

In [5]: result = np.add(matrix, np.expand_dims(vec, axis=0))

In [6]: result = np.add(matrix, np.expand_dims(vec, axis=1))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-e3923ebe2c5e> in <module>()
----> 1 result = np.add(matrix, np.expand_dims(vec, axis=1))

ValueError: operands could not be broadcast together with shapes (100,3) (3,1)

However, I see that the documentation is certainly not helping, since the documentation for the "dimension" parameter is plain wrong in some cases. The real documentation should have been something along the lines of (let me know if you have ideas to improve it):

The dimension along which the operation will be carried. If set to 0, then the vector b must have the same dimensions as rows of A, and the operation will be carried at each column vector. If 1, then it must have the same dimensions as columns in A, and the operation will be carried at each row vector.

Please let me know how did you interpret the dimension parameter at a first look, and if you have any ideas to improve this.

Thanks a lot!
Cesar

@CatchemAL
Copy link
Collaborator Author

Hey @cesarsouza,

Thanks very much for the response. I agree that the issue is not clear cut!

Please let me know how did you interpret the dimension parameter at a first look, and if you have any ideas to improve this.

My (initial) interpretation was that dimension refers to the axis along which the vector should lie (as opposed to the expand dimension). I asked my gf this morning - she programs python - and she independently went for dimension: 1. I must admit, I never even considered the possibility of an ambiguity when you first told me about the method (although you did also tell me dimension: 1 in #902 😜 which is why I thought I'd check it with you).

However, thinking about it this morning, I've decided my thinking was a bit woolly and doesn't readily generalise to higher dimensions. If I were to sum() a tensor of order 5 along axes x0, x1, x3, x4 I would expect to collapse the tensor to a vector whose axis lies along x2. By that logic, I should probably expect to specify the four expansion axes when broadcasting a vector from x2 (and not the axis the vector simply lies on). Equally, if I were adding a matrix to a 3D tensor, it feels natural to specify the remaining (expansion) axis. So yes, I think what you have gone for is the most sensible after all.

Agree the documentation not being wrong would help (although I only went to read it once my code bombed out). To my mind, the clearest thing to do is to be more explicit that dimension refers to the axis that should be expanded:

`double[,] broadcasted1 = matrix.Add(vec, expandDimension: 0); // preferred`

Numpy does exactly that and it makes it a lot clearer I think: np.expand_dims(...). Alternatively, I suppose you could do something like this (which is very clear) but doesn't generalise so well.

`double[,] broadcasted1 = matrix.Add(vec, isRowVector: true);`

Regarding the proposed documentation...

The dimension along which the operation will be carried. If set to 0, then the vector b must have the same dimensions as rows of A, and the operation will be carried at each column vector. If 1, then it must have the same dimensions as columns in A, and the operation will be carried at each row vector.

I'm sorry but I'm afraid I don't understand this. Are you proposing changing the current behaviour? If currently set to 0, the vector b must have the same dimensions as columns of A right? Apologies if I have completely misunderstood!

Best,
Alex

@cesarsouza
Copy link
Member

Hi @AlexJCross,

Regarding the proposed documentation...

The dimension along which the operation will be carried. If set to 0, then the vector b must have the same dimensions as rows of A, and the operation will be carried at each column vector. If 1, then it must have the same dimensions as columns in A, and the operation will be carried at each row vector.
I'm sorry but I'm afraid I don't understand this. Are you proposing changing the current behaviour? If currently set to 0, the vector b must have the same dimensions as columns of A right? Apologies if I have completely misunderstood!

I am sorry. As you see, this was confusing even for me and I switched the axes in that proposal! I guess that's a sign that my brain keeps telling me that the inverted approach is better (#902), and with two independent sources also reporting that dimension: 1 is more intuitive, maybe it would be more sensible to invert the axes after all. The problem is doing so without breaking everything...

I guess your suggestion for the boolean parameter is indeed the clearest (thanks!), and the fact that it uses a different method signature is a big plus because it would allow us to deprecate the previous methods instead of changing their behavior abruptly.

Then maybe what we could consider is to add a new VectorType enumeration that can be either Row or Column, add new Add(double[,] m, double[] vec, VectorType dimension) overloads to the elementwise methods, and mark Add(double[,] m, double[] vec, int dimension) as deprecated. This would enable us to keep both versions working for a while. Then, when its time (let's say, the day the framework finally adds support for general tensors), we can finally write the correct Add(double[,] m, double[] vec, int dimension) with inverted axes w.r.t what we have today.

What do you think?

Thanks a lot for the feedback on this!

Regards,
Cesar

@cesarsouza cesarsouza self-assigned this Oct 2, 2017
cesarsouza added a commit that referenced this issue Oct 3, 2017
 - Updates GH-927: Broadcasting dimension seems counter-intuitive
cesarsouza added a commit that referenced this issue Oct 4, 2017
 - Adding new overloads for element-wise operations that accept a VectorType enumeration instead of a raw integer;
 - Marking the previous methods as obsolete.
@cesarsouza
Copy link
Member

Added in 3.8.0.

@cesarsouza cesarsouza added this to the 3.8 milestone Oct 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants