Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: design API with performance in mind #59

Closed
garrison opened this issue Aug 11, 2013 · 21 comments
Closed

Proposal: design API with performance in mind #59

garrison opened this issue Aug 11, 2013 · 21 comments

Comments

@garrison
Copy link
Contributor

Have you considered making performance/speed an explicit goal of math.js? For instance, if you were to require all matrices to have all numbers of a specified type (as e.g. numpy does), then you could in the future use typed arrays or even call into BLAS libraries using node-ffi to gain additional performance advantages. It will require some API changes, but could very well be worth it. Even if we don't work directly on performance at the moment, refactoring parts of the API in ways that would allow them to be implemented with high performance could be quite useful in preparing for future growth.

@sebpiq
Copy link
Collaborator

sebpiq commented Aug 11, 2013

Typed arrays is on the lists of things to do (see #43). I need it so I will implement it myself in the near future.

I also think performance is something to consider, but as we are doing a rather big refactoring those days to get the API right, maybe this will come bit later.

One problem, is that the library should run on the browser. Of course I think it would be great at some point to have both C implementation and a JavaScript one so it runs even faster on node. But from my perspective, having a fast implementation that runs in the browser has a bigger priority than an even faster implementation on node.

I think there's already a lot of things that could be optimized on the JavaScript side. For this we would need to write some benchmarks. I have tried a while ago to benchmark some node code, but couldn't find an easy to use library (seems like you need to know V8 internals, the profiler is quite hardcore to use). Any tips?

@garrison
Copy link
Contributor Author

On 08/11/2013 01:21 AM, Sebastien Piquemal wrote:

I also think performance is something to consider, but as we are doing a
rather big refactoring those days to get the API right, maybe this will
come bit later.

Actually, this is precisely why I raise this issue now. If we are ever
going to care about performance in the future, getting the API "right"
means designing thing in such a way that allows us to plug in typed code
or C code later without losing backwards compatibility. Performance is
not something that can be added later to an API that was not designed
with performance in mind. Since the current goal is the stabilize the
API, we should explicitly decide whether a goal of the API is to allow
future implementations to focus on performance.

Here's one example in which the API is currently not optimal for
performance optimization in the future:

  • The release notes for version 0.10.0 mention "For complex
    calculations, all functions now automatically replace results having an
    imaginary part of zero with a Number. (2i * 2i now returns a Number
    -4 instead of a Complex -4 + 0i)." It is not clear to me why this
    change was made, but it is actually directly at odds with implementing a
    complex-valued matrix using typed arrays.

I bet I could find more examples if I were to dig deeper.

@sebpiq
Copy link
Collaborator

sebpiq commented Aug 11, 2013

Well ... @josdejong can tell more, but the example you just gave - to me - would almost qualify as a bug ... (cf #53). Jos says "The idea is that when you feed a function Arrays, you get back an Array, and when you feed it a Matrix, you get back a Matrix". This should be the same thing here.

To me, performance is definitely a priority. But knowing next to nothing about C, I have no idea what needs to be kept in mind when designing for a future integration of portions C code ...

@josdejong
Copy link
Owner

The main goal of math.js is to offer an integrated solution to work with real and complex numbers, units, and matrices. That's first. Second, the faster performance the better, of course :). But a user friendly, flexible API where you can use mixed types goes first. This way of thinking and working fits perfectly in the loosly typed JavaScript language.

The basic choice to allow mixed inputs to functions is by itself at odds with performance: for every call the function has to do type checking on the arguments before it can do an operation, which causes a little overhead. Reducing a returned result from complex to real is similar but then on the output side of a function (note that Octave/Matlab do the same). However, it is exactly this fact that makes math.js so convenient to use: you can just add a complex and a real number together without having to do type conversion yourself. Similarly, it is really powerful that a matrix can contain mixed types. A much desired feature is support for arbitrary precision calculations - that will be terrible for performance but really convenient (floating point round-off errors is a big annoyance under JavaScript developers), but don't worry, it will be an option which you can turn on/off as you please.

There is plenty of room to improve performance. It all circles around utilizing knowledge of the type of data you are dealing with and optimize for that. One interesting improvement will indeed be to add support for typed arrays. An other idea would be to create some kind of a JIT compiler which can optimize calculations based on knowledge about the data types (like stripping type checking). And we can probably improve performance by implementing algorithms in native c and use these when running server side JavaScript (like if I execute a fft with input of type Float32Array, I switch to the c implementation that is a factor x faster).

However all these kind of performance optimizations do not influence the API. What kind of things do you think we should reckon in with the API in this regard?

@garrison
Copy link
Contributor Author

I find the notion that you can multiply two number of complex data type and have it return a number of real type on rare occasions to be problematic. What if the user calling a function assumes the returned type is complex, and then intermittently it happens to be a real number, which is returned as a different data type? It could lead to some border cases that are tricky to pin down and resolve. And suddenly the surface area for writing test cases is much larger.

you can just add a complex and a real number together without having to do type conversion yourself.

I think this is great, but I would suggest that the returned number should always be of complex data type, even if it has zero imaginary part.

Similarly, it is really powerful that a matrix can contain mixed types. A much desired feature is support for arbitrary precision calculations - that will be terrible for performance but really convenient (floating point round-off errors is a big annoyance under JavaScript developers), but don't worry, it will be an option which you can turn on/off as you please.

Arbitrary precision calculations could still be done if a given matrix were forced to have the same data type throughout. With pre-specified data type for each matrix, multiplying two 1000x1000 matrices would require type checking to be performed only once, not 1000^3 times during the multiplication. It would also mean that the array itself could be stored in a single typed array with strides (i.e. row-major or column-major format). This is the way numpy works, and I would assume that matlab does something similar.

So with regard to the API, I think

  • The type returned by a function should be predictable by the types of the arguments given to it. Having a function return complex numbers sometimes and real numbers other times can be problematic.
  • Matrices should have a type, which can be specified by the user. The default for a new matrix with unspecified type should be something sensible, such as the basic real or complex type.
  • Nothing about the API for matrices should depend on the assumption that an N-dimensional matrix is stored in the array-inside-array... format. As far as the API is concerned, it should be possible for the implementation to store it in a single typed array, in either row major or column major format.

@sebpiq
Copy link
Collaborator

sebpiq commented Aug 12, 2013

The type returned by a function should be predictable by the types of the arguments given to it.

Couldn't agree more. I've already had quite a few headaches because of this behavior.

Jos I disagree that the library should be as flexible as possible. When you do math you should know a little bit what you are doing!!! For the use cases of this library I don't find it a desirable behavior that types can be mixed together, and returned indifferently. As I said in #53 , when you want to implement robust algorithms, you need to keep control over what you are doing, so no different types returned, no automatic transtyping!

@josdejong
Copy link
Owner

Ah come guys I was so proud of this automatic conversion from complex to real, it's just beautiful that an expression like square(sqrt(-2)) returns a real value -2, and that (2+3i) + (4-3i) returns a real value 2 ;). But I agree it is kind of an edge case and less practical when programming algorithms.

I think we are having a couple of different discussions in one thread now:

  1. In order to enable performance optimizations it should be possible to use optimized types (like typed arrays or some extended ComplexMatrix or whatever), and be able to create optimized implementations for different algorithms/functions. So far the library doesn't have generics like a universal interface for matrices (which is your concern if I'm right - matrices using nested arrays work really different than matrices using flat, indexed arrays). Right now if you introduce a new type of matrix, you will have to implement support for it in each of the functions. That is quite cumbersome but gives you complete freedom to implement an optimized solution. I think we need to improve on that some day though, make it easier to register new data types to existing functions. Good to keep this in mind.
  2. Right now the type of matrix output is determined by the type of input (Matrix as input -> Matrix as output, Array as input -> Array as output, and yes there are edge cases), which can be confusing (Confusing it is that some functions return JS arrays other matrices #53). So we need a way to control the type of matrix output. I think a configuration option like output.matrix.type which can be "auto", "Matrix", or "Array" would be nice.
  3. The automatic conversion from complex to real is something nice from a maths point of view, but less practical from a programming point of view: when you are dealing with external algorithms which cannot handle mixed data types the way math.js does. I think you are right that this is kind of an edge case, so it makes sense remove this feature. (if you want this kind of behavior you could always wrap your expressions in some utility to handle this). Note though that in general we can't at all predict the type of output from the type of input. sqrt(x) will return a number for inputs >=0, and a complex for values < 0. pow(x,y) will return real values only when y is integer and x >= 0, etc etc. Anyway removing the conversion from complex to number will improve the predictability of your output which is a good thing.

nice discussion btw...

@sebpiq
Copy link
Collaborator

sebpiq commented Aug 12, 2013

I think in general Jos (correct me if I am wrong), we seem to have rather different perspective, because you mostly have the math point of view in mind ...

But honestly, I don't think there is need for another Matlab. On the other hand there is a huge need for a good/fast library to implement algorithms in JavaScript. And your library is the best candidate to fill this need! So I think you should forget the perspective of the maths, and view everything from the programming point of view!!!

And from that point of view, not having full control over outputs is not acceptable :(

@sebpiq
Copy link
Collaborator

sebpiq commented Aug 12, 2013

@garrison , do you have more suggestions on what should be kept in mind if we want to design a good API now?

At this point as it is hard to have any visibility, wouldn't it make sense to actually try interfacing a BLAS library with mathjs, and see what implications this has on the API on JavaScript side?

I see the "typed arrays" issue as a semi-separated topic, so maybe we can talk about it separately there #43

@josdejong
Copy link
Owner

@sebpiq yes indeed. That's why I like these discussions :). Let's try to combine the best of both worlds...

josdejong added a commit that referenced this issue Aug 14, 2013
  value with an imaginary part equal to zero to a number (see #59)
@garrison
Copy link
Contributor Author

The main goal of math.js is to offer an integrated solution to work with real and complex numbers, units, and matrices. That's first. Second, the faster performance the better, of course :).

I think it would be great to add this explicitly to the documentation somewhere. Documenting clearly the goals, priorities, and design guidelines of the project can at times be even more important than detailed API documentation.

With regard to matrix storage, I believe the current API does not make any use of the actual storage of the matrix, so that allows the implementation to be flexible in the future. At some point in the future, I would suggest storing all dense matrices (even untyped ones) as a one-dimensional array with strides (configurable to be row-major or column-major). This may simplify the implementation ultimately (instead of having different data types for typed and untyped matrices), and it makes it much easier to assert that a matrix is a valid shape. (I do still think it should be possible to construct a new array using the array-in-array syntax, though, which can be quite convenient and intuitive.)

(What I say above provides a uniform mechanism for storing dense matrices, but it does not consider sparse matrices -- that is, matrices whose elements are assumed to be zero except for a small number of elements which are explicitly nonzero -- which we may wish to implement one day. Note that there are multiple ways of storing such matrices in memory, each with advantages and disadvantages.)

Another thing I should mention here is that the API to matrix.set() might be too clever for its own good. Consider a call like

var b = math.matrix([[5, 6], [1, 1]]);
b.set([1, [0, 1]], [[7, 8]]);

It's a bit confusing how the indices work for modifying multiple elements at once, and keeping functionality like this might make it difficult to optimize things in the future.

Another thing I noticed is (from the README)

var a = math.matrix();                          // Matrix, []
a.set([math.range('1:4')], [7, 2, 1, 5]);       // Matrix, [0, 7, 2, 1, 5]

I find it troubling that the set() method can implicitly cause a matrix to be resized, and that it implicitly inserts zeroes everywhere that is not mentioned. I think it would be much better/predictable/optimizable to make such resizes explicit.

Note though that in general we can't at all predict the type of output from the type of input. sqrt(x) will return a number for inputs >=0, and a complex for values < 0. pow(x,y) will return real values only when y is integer and x >= 0, etc etc. Anyway removing the conversion from complex to number will improve the predictability of your output which is a good thing.

Actually, IMO the best solution here would be for sqrt() to return the type it is given. If it is given a complex number, it should go ahead and return a complex number. If it is given a real number, it should return a real value or NaN if the square root would be imaginary. Doing it this way allows the function to return a predictable type. If the user wants a complex square root, explicitly casting to the complex type would be the way to go. (Speaking of which, it would be useful to allow math.complex() to take a single value, which would create a "complex number" with zero imaginary part. That way, one could call a complex square root by e.g. saying math.sqrt(math.complex(-4)).

This is how std::sqrt() works in C++ (see here and here) and also how numpy.sqrt() works.

@josdejong
Copy link
Owner

The main goal of math.js is to offer an integrated solution to work with real and complex numbers, units, and matrices. That's first. Second, the faster performance the better, of course :).

I think it would be great to add this explicitly to the documentation somewhere. Documenting clearly the goals, priorities, and design guidelines of the project can at times be even more important than detailed API documentation.

ehhh, this is already literally the project description, first paragraph of the readme and the website ;). As for the documentation, there is a lot to be done there, currently the only documentation there is is the readme which only gives an overview.

I think we can distinguish two groups of matrices: the ones using nested arrays for dimensions, and the ones storing all data in a one dimensions array, using strides to locate values in the matrix. And maybe a third one is sparse matrices? Supporting only one of them (I think you favor the second) will be easiest. However, the initial point that you make in this topic is to not enforce a particular way to store data in a matrix in the libraries API, and that is an important point.
I think we have to do a lot of testing with both types of arrays, and benchmark how fast particular operations are. I have really no clue how fast JavaScript is for operations on two such different types of matrices. If it turns out that one of the two types is better on all fronts (like creating a matrix, resizing, iterating over all values, retrieving a particular value, exporting to a nested Array again, etc etc), then we can limit ourselves to only this type of matrix and make it easier for ourselves. However if both approaches turn out to have strong and weak sides, I think it is best to support both types, and clearly document the advantages and disadvantages of both so people can decide which one they need.

As for the matrix index, you may like the changes in this regard currently under development, as discussed in #61.

Letting functions like sqrt only return the input type as output is an interesting idea. Let me give this some more thought. From an implementation/machine point of view it will allow you to nicely decouple data types. For end users using the expression parser it will not be fun though.

@garrison
Copy link
Contributor Author

The main goal of math.js is to offer an integrated solution to work with real and complex numbers, units, and matrices. That's first. Second, the faster performance the better, of course :).
I think it would be great to add this explicitly to the documentation somewhere. Documenting clearly the goals, priorities, and design guidelines of the project can at times be even more important than detailed API documentation.
ehhh, this is already literally the project description, first paragraph of the readme and the website ;). As for the documentation, there is a lot to be done there, currently the only documentation there is is the readme which only gives an overview.

I meant it should be mentioned somewhere that performance is indeed a secondary priority.

Everything else you say sounds great.

Letting functions like sqrt only return the input type as output is an interesting idea. Let me give this some more thought. From an implementation/machine point of view it will allow you to nicely decouple data types. For end users using the expression parser it will not be fun though.

As for the expression parser, I am not opposed to it having more "magic" than the programming API.

@garrison
Copy link
Contributor Author

One thing we mentioned in this thread is a universal interface for matrix/array operations. There exists a document detailing numpy's internals, including the implementation of universal functions. This is surely overkill at the moment, as the priority now is to stabilize and API, but may be good to keep in mind for the future.

@garrison
Copy link
Contributor Author

I think we have to do a lot of testing with both types of arrays, and benchmark how fast particular operations are. I have really no clue how fast JavaScript is for operations on two such different types of matrices. If it turns out that one of the two types is better on all fronts (like creating a matrix, resizing, iterating over all values, retrieving a particular value, exporting to a nested Array again, etc etc), then we can limit ourselves to only this type of matrix and make it easier for ourselves. However if both approaches turn out to have strong and weak sides, I think it is best to support both types, and clearly document the advantages and disadvantages of both so people can decide which one they need.

Minor note: I think storing the matrix as a single array will have better performance on nearly all fronts, but Javascript performance has surprised me more than once in the past. One nice thing about storing matrices in a single array is that there are many tricks you can use, such as "converting" a matrix to its transpose by only changing the stride lengths. There exist many tricks like this that allow you to take advantage of the data's internal structure. Keeping everything in a single memory-contiguous array is also nice from a cache (locality) point of view as well.

@sebpiq
Copy link
Collaborator

sebpiq commented Aug 19, 2013

I am eager to experiment with that! Jos is finishing-up a rather big refactoring, and I think next we can think more about the internal structure of the library.

What I think would be great is a plugin-based architecture, which would allow to implement different types of matrices as different plugins, thus different projects, therefore everybody is happy, and can use its favorite implementation of matrices.

In any case, I'll try my hand at implementing single array matrices, as I it is probably the best way to work with typed arrays, and typed arrays I need !!!

@garrison
Copy link
Contributor Author

The more I think about this, the more I am convinced that storing a matrix as a single array is really the best thing to do. Not only will it almost certainly be more efficient, but the operations it easily enables can be very useful in scientific computing. I mentioned a quick matrix transpose in an above comment, but also of particular interest are numpy's and Matlab's "reshape" functionality, which will reshape a matrix to have the same data but with a different shape. Understanding (and at times specifying) an underlying storage order is necessary to make this functionality possible (and efficient).

@josdejong
Copy link
Owner

Yes at least when we are going to use typed arrays this is needed and I guess will improve performance a lot. I'm curious to see the implications on generic matrix manipulations, we will have to do some testing and benchmarking on that.

It may be a good idea as well to have a closer look at asm.js, see if we can integrate with these patterns to improve performance.

@rreusser
Copy link

@sebpiq @garrison @josdejong — Sorry to bother you, but looks like you expressed interest in numpy-style ndarrays a couple years back and in the roadmap. Did you make or are you aware of any progress on this front? Just a couple days into implementing my own math/science library, I've already realized that if I don't either use or build a coherent API that confronts the issue of real and complex matrices and vectors, then my effort is probably going to be wasted.

On a larger note, it seems like maybe asm.js + C/C++ is the one true future of trying to use JavaScript for math/science stuff. I could give it a couple years and wait until someone bites the bullet and compiles numpy into javascript, but whatever the solution is, I don't want to wait that long. Life is short. I'm on a sabbatical of sorts, so I have the bandwidth to work on this; I just don't want to waste my effort.

I totally understand if rewriting numpy isn't the aim of math.js, but I do think there's a need for some means of doing real-ish computation in JS (because JS excels where other languages fail—easily communicating content to large audience) so I'd love to hear thoughts or ideas on how to get there.

@josdejong
Copy link
Owner

@rreusser thanks for your feedback. I've read the introduction of your math library, it's exactly the same reason why I started math.js in the first place! It's indeed not the goal to create an exact clone of numpy, but numpy is a great source of inspiration here. We want to create a toolbox for advanced mathematics, and integrate working with various data types (like matrices, complex numbers, etc).

Right now math.js supports multidimensional (mixed) matrices, and 2d sparse matrices (implemented recently by @rjbaucells). We're right now refactoring the code into a modular structure (see discussion on modularization and on v2). When we are there it will be straight forward to implement support for typed arrays (homogeneous), which can improve performance a lot (no type checking needed for example).

I think it's important to keep the library available for both node.js and browsers, which means that we cannot just put functionality in C/C++ (though it could be implemented as a progressive enhancement for node.js environments). Optimizations like with asm.js and maybe object pools may help too in improving performance.

Currently math.js offers a stable, powerful API, a broad base of basic mathematical functions, and supports the most commonly used data types. Next level is to implement more advanced maths functions, and to improve the performance. Think numerical functions like lup and lusolve which is recently implemented by @rjbaucells too). Integration and differentiation will be awesome, and CAS in general (see #87, #35, #38).

Feel free to email me to change minds on your intentions and ideas via github or by mailing me. It would be great to have a new contributor on the project :).

@josdejong
Copy link
Owner

Closing this discussion now because it has been inactive for a long time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants