-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: design API with performance in mind #59
Comments
Typed arrays is on the lists of things to do (see #43). I need it so I will implement it myself in the near future. I also think performance is something to consider, but as we are doing a rather big refactoring those days to get the API right, maybe this will come bit later. One problem, is that the library should run on the browser. Of course I think it would be great at some point to have both C implementation and a JavaScript one so it runs even faster on node. But from my perspective, having a fast implementation that runs in the browser has a bigger priority than an even faster implementation on node. I think there's already a lot of things that could be optimized on the JavaScript side. For this we would need to write some benchmarks. I have tried a while ago to benchmark some node code, but couldn't find an easy to use library (seems like you need to know V8 internals, the profiler is quite hardcore to use). Any tips? |
On 08/11/2013 01:21 AM, Sebastien Piquemal wrote:
Actually, this is precisely why I raise this issue now. If we are ever Here's one example in which the API is currently not optimal for
I bet I could find more examples if I were to dig deeper. |
Well ... @josdejong can tell more, but the example you just gave - to me - would almost qualify as a bug ... (cf #53). Jos says "The idea is that when you feed a function Arrays, you get back an Array, and when you feed it a Matrix, you get back a Matrix". This should be the same thing here. To me, performance is definitely a priority. But knowing next to nothing about C, I have no idea what needs to be kept in mind when designing for a future integration of portions C code ... |
The main goal of math.js is to offer an integrated solution to work with real and complex numbers, units, and matrices. That's first. Second, the faster performance the better, of course :). But a user friendly, flexible API where you can use mixed types goes first. This way of thinking and working fits perfectly in the loosly typed JavaScript language. The basic choice to allow mixed inputs to functions is by itself at odds with performance: for every call the function has to do type checking on the arguments before it can do an operation, which causes a little overhead. Reducing a returned result from complex to real is similar but then on the output side of a function (note that Octave/Matlab do the same). However, it is exactly this fact that makes math.js so convenient to use: you can just add a complex and a real number together without having to do type conversion yourself. Similarly, it is really powerful that a matrix can contain mixed types. A much desired feature is support for arbitrary precision calculations - that will be terrible for performance but really convenient (floating point round-off errors is a big annoyance under JavaScript developers), but don't worry, it will be an option which you can turn on/off as you please. There is plenty of room to improve performance. It all circles around utilizing knowledge of the type of data you are dealing with and optimize for that. One interesting improvement will indeed be to add support for typed arrays. An other idea would be to create some kind of a JIT compiler which can optimize calculations based on knowledge about the data types (like stripping type checking). And we can probably improve performance by implementing algorithms in native c and use these when running server side JavaScript (like if I execute a However all these kind of performance optimizations do not influence the API. What kind of things do you think we should reckon in with the API in this regard? |
I find the notion that you can multiply two number of complex data type and have it return a number of real type on rare occasions to be problematic. What if the user calling a function assumes the returned type is complex, and then intermittently it happens to be a real number, which is returned as a different data type? It could lead to some border cases that are tricky to pin down and resolve. And suddenly the surface area for writing test cases is much larger.
I think this is great, but I would suggest that the returned number should always be of complex data type, even if it has zero imaginary part.
Arbitrary precision calculations could still be done if a given matrix were forced to have the same data type throughout. With pre-specified data type for each matrix, multiplying two 1000x1000 matrices would require type checking to be performed only once, not 1000^3 times during the multiplication. It would also mean that the array itself could be stored in a single typed array with strides (i.e. row-major or column-major format). This is the way numpy works, and I would assume that matlab does something similar. So with regard to the API, I think
|
Couldn't agree more. I've already had quite a few headaches because of this behavior. Jos I disagree that the library should be as flexible as possible. When you do math you should know a little bit what you are doing!!! For the use cases of this library I don't find it a desirable behavior that types can be mixed together, and returned indifferently. As I said in #53 , when you want to implement robust algorithms, you need to keep control over what you are doing, so no different types returned, no automatic transtyping! |
Ah come guys I was so proud of this automatic conversion from complex to real, it's just beautiful that an expression like I think we are having a couple of different discussions in one thread now:
nice discussion btw... |
I think in general Jos (correct me if I am wrong), we seem to have rather different perspective, because you mostly have the math point of view in mind ... But honestly, I don't think there is need for another Matlab. On the other hand there is a huge need for a good/fast library to implement algorithms in JavaScript. And your library is the best candidate to fill this need! So I think you should forget the perspective of the maths, and view everything from the programming point of view!!! And from that point of view, not having full control over outputs is not acceptable :( |
@garrison , do you have more suggestions on what should be kept in mind if we want to design a good API now? At this point as it is hard to have any visibility, wouldn't it make sense to actually try interfacing a BLAS library with I see the "typed arrays" issue as a semi-separated topic, so maybe we can talk about it separately there #43 |
@sebpiq yes indeed. That's why I like these discussions :). Let's try to combine the best of both worlds... |
value with an imaginary part equal to zero to a number (see #59)
I think it would be great to add this explicitly to the documentation somewhere. Documenting clearly the goals, priorities, and design guidelines of the project can at times be even more important than detailed API documentation. With regard to matrix storage, I believe the current API does not make any use of the actual storage of the matrix, so that allows the implementation to be flexible in the future. At some point in the future, I would suggest storing all dense matrices (even untyped ones) as a one-dimensional array with strides (configurable to be row-major or column-major). This may simplify the implementation ultimately (instead of having different data types for typed and untyped matrices), and it makes it much easier to assert that a matrix is a valid shape. (I do still think it should be possible to construct a new array using the array-in-array syntax, though, which can be quite convenient and intuitive.) (What I say above provides a uniform mechanism for storing dense matrices, but it does not consider sparse matrices -- that is, matrices whose elements are assumed to be zero except for a small number of elements which are explicitly nonzero -- which we may wish to implement one day. Note that there are multiple ways of storing such matrices in memory, each with advantages and disadvantages.) Another thing I should mention here is that the API to
It's a bit confusing how the indices work for modifying multiple elements at once, and keeping functionality like this might make it difficult to optimize things in the future. Another thing I noticed is (from the README)
I find it troubling that the
Actually, IMO the best solution here would be for This is how |
ehhh, this is already literally the project description, first paragraph of the readme and the website ;). As for the documentation, there is a lot to be done there, currently the only documentation there is is the readme which only gives an overview. I think we can distinguish two groups of matrices: the ones using nested arrays for dimensions, and the ones storing all data in a one dimensions array, using strides to locate values in the matrix. And maybe a third one is sparse matrices? Supporting only one of them (I think you favor the second) will be easiest. However, the initial point that you make in this topic is to not enforce a particular way to store data in a matrix in the libraries API, and that is an important point. As for the matrix index, you may like the changes in this regard currently under development, as discussed in #61. Letting functions like |
I meant it should be mentioned somewhere that performance is indeed a secondary priority. Everything else you say sounds great.
As for the expression parser, I am not opposed to it having more "magic" than the programming API. |
One thing we mentioned in this thread is a universal interface for matrix/array operations. There exists a document detailing numpy's internals, including the implementation of universal functions. This is surely overkill at the moment, as the priority now is to stabilize and API, but may be good to keep in mind for the future. |
Minor note: I think storing the matrix as a single array will have better performance on nearly all fronts, but Javascript performance has surprised me more than once in the past. One nice thing about storing matrices in a single array is that there are many tricks you can use, such as "converting" a matrix to its transpose by only changing the stride lengths. There exist many tricks like this that allow you to take advantage of the data's internal structure. Keeping everything in a single memory-contiguous array is also nice from a cache (locality) point of view as well. |
I am eager to experiment with that! Jos is finishing-up a rather big refactoring, and I think next we can think more about the internal structure of the library. What I think would be great is a plugin-based architecture, which would allow to implement different types of matrices as different plugins, thus different projects, therefore everybody is happy, and can use its favorite implementation of matrices. In any case, I'll try my hand at implementing single array matrices, as I it is probably the best way to work with typed arrays, and typed arrays I need !!! |
The more I think about this, the more I am convinced that storing a matrix as a single array is really the best thing to do. Not only will it almost certainly be more efficient, but the operations it easily enables can be very useful in scientific computing. I mentioned a quick matrix transpose in an above comment, but also of particular interest are numpy's and Matlab's "reshape" functionality, which will reshape a matrix to have the same data but with a different shape. Understanding (and at times specifying) an underlying storage order is necessary to make this functionality possible (and efficient). |
Yes at least when we are going to use typed arrays this is needed and I guess will improve performance a lot. I'm curious to see the implications on generic matrix manipulations, we will have to do some testing and benchmarking on that. It may be a good idea as well to have a closer look at asm.js, see if we can integrate with these patterns to improve performance. |
@sebpiq @garrison @josdejong — Sorry to bother you, but looks like you expressed interest in numpy-style ndarrays a couple years back and in the roadmap. Did you make or are you aware of any progress on this front? Just a couple days into implementing my own math/science library, I've already realized that if I don't either use or build a coherent API that confronts the issue of real and complex matrices and vectors, then my effort is probably going to be wasted. On a larger note, it seems like maybe asm.js + C/C++ is the one true future of trying to use JavaScript for math/science stuff. I could give it a couple years and wait until someone bites the bullet and compiles numpy into javascript, but whatever the solution is, I don't want to wait that long. Life is short. I'm on a sabbatical of sorts, so I have the bandwidth to work on this; I just don't want to waste my effort. I totally understand if rewriting numpy isn't the aim of math.js, but I do think there's a need for some means of doing real-ish computation in JS (because JS excels where other languages fail—easily communicating content to large audience) so I'd love to hear thoughts or ideas on how to get there. |
@rreusser thanks for your feedback. I've read the introduction of your math library, it's exactly the same reason why I started math.js in the first place! It's indeed not the goal to create an exact clone of numpy, but numpy is a great source of inspiration here. We want to create a toolbox for advanced mathematics, and integrate working with various data types (like matrices, complex numbers, etc). Right now math.js supports multidimensional (mixed) matrices, and 2d sparse matrices (implemented recently by @rjbaucells). We're right now refactoring the code into a modular structure (see discussion on modularization and on v2). When we are there it will be straight forward to implement support for typed arrays (homogeneous), which can improve performance a lot (no type checking needed for example). I think it's important to keep the library available for both node.js and browsers, which means that we cannot just put functionality in C/C++ (though it could be implemented as a progressive enhancement for node.js environments). Optimizations like with asm.js and maybe object pools may help too in improving performance. Currently math.js offers a stable, powerful API, a broad base of basic mathematical functions, and supports the most commonly used data types. Next level is to implement more advanced maths functions, and to improve the performance. Think numerical functions like Feel free to email me to change minds on your intentions and ideas via github or by mailing me. It would be great to have a new contributor on the project :). |
Closing this discussion now because it has been inactive for a long time |
Have you considered making performance/speed an explicit goal of math.js? For instance, if you were to require all matrices to have all numbers of a specified type (as e.g. numpy does), then you could in the future use typed arrays or even call into BLAS libraries using node-ffi to gain additional performance advantages. It will require some API changes, but could very well be worth it. Even if we don't work directly on performance at the moment, refactoring parts of the API in ways that would allow them to be implemented with high performance could be quite useful in preparing for future growth.
The text was updated successfully, but these errors were encountered: