Performance issues with deepMap #3266

dvd101x · 2024-09-16T18:04:47Z

Hi Jos,

This PR addresses issue #3262 and introduces improvements to our recursion functions which increases performance in general:

Key changes:

collection.deepMap and collection.deepForEach: are optimized to run the callback with only one argument. Thus avoiding the use of matrix.map.

Other changes:

Fixed collection.deepMap: Uses skipZeros argument
Performance Boost: Recursion functions now use the correct number of arguments, clone the index only when necessary, and separate map and forEach recursions for improved speed.

Areas for Future Consideration:

Merging Recursion Functions: Explore merging the recursion functions in collection and array for potential code consolidation and simplification.
Argument Handling: Consider a more robust approach to determining the correct number of arguments for regular functions. (not func.length)
skipZeros for BigNumbers: In this PR it only works for regular numbers.
Array recursion: The recurse functions in array are set up in a way that they can also work for matrices. Thus the duplicated first and second argument for arrays (array, array, callback) but for matrices is (matrix.valueOf(), matrix, callback).

While I previously suggested discussing these changes before submitting a PR, I decided to proceed directly without breaking changes and to facilitate further discussion based on the concrete implementation.

Please review the changes and provide feedback. I'm open to discussing potential modifications or alternative approaches.

john9384

LGTM

josdejong · 2024-09-18T10:13:27Z

This looks really promising @dvd101x ! Some thoughts:

I expect that the most important performance difference is between one callback argument and more than one. Maybe we can handle the cases of 2 and 3 arguments the same way to reduce the amount of code?
In #3251, @Galm007 has created a single _forEach method used by both map and forEach methods to deduplicate the logic. It may be worth trying to let deepMap use deepForEach under the hood to deduplicate logic of this PR too. What do you think?
Can you use index.slice() instead of [...index]? I tried it out and it is a bit faster.

dvd101x · 2024-09-18T14:50:48Z

Thanks Jos!

Will test performance and report back.
I did some tests previously and it affects speed whether we use map or forEach, using forEach and pushing into an array was slower than map. I will check Speed up the map() and forEach() functions in DenseMatrix.js by over 300% #3251 and try to come up with something.
OK

dvd101x · 2024-09-20T23:03:40Z

While working on tests I found something very interesting. If a typed function is wrapped in a regular function it seems that the map and forEach function gains significant speed.

abs(array)                                 x 585,307 ops/sec ±1.17% (97 runs sampled)
...
map(array, abs)                            x 131,558 ops/sec ±0.37% (98 runs sampled)
map(array, (x)=>abs(abs))                  x 515,884 ops/sec ±0.54% (97 runs sampled)
map(array, abs.signatures.number)          x 580,084 ops/sec ±1.20% (96 runs sampled)

It might be due to the try catch that happens on each step (only for typed functions) and if it is possible to run it at a higher level.

dvd101x · 2024-09-21T03:43:30Z

In the latest version some big improvements were made regarding speed, even in a few cases surpassing abs(array)

By running benchmarks on a MacBook Air M1 with 8 GB the following results were found comparing develop vs this.

Note: When a particular signature is mapped, map and abs(matrix) show similar results, probably eliminating the issue with deepMap.

Note: There are more benefits in forEach due to the current implementation using mostly the same map functions without returning their value and this implementation supplying separate recurse functions.

The next step is to try to reduce the lines of code while trying to maintain performance as previously commented.

Also I will try merging the array.deepMap and collection.deepMap and merging array.deepForEach and collection.deepForEach with the expectation of closing the gap between mapping a matrix and an array.

josdejong · 2024-09-25T08:19:41Z

😎 thanks for the updates

josdejong · 2024-09-27T13:23:26Z

@dvd101x I've just merged #3251, which improves DenseMatrix.map and DenseMatrix.forEach. There is a merge conflict, and I think we should figure out how to dedupe any deepmap and deepForeach logic, since that overlaps with DenseMatrix.map and DenseMatrix.forEach. Maybe we should move (part of?) the logic of DenseMatrix._forEach into deepMap or deepForEach? I'm not sure what will be handiest.

For reference: I compared the current develop against the current deepMap-2 (without the latest changes from develop)

# develop

abs(genericMatrix)                         x 344,632 ops/sec ±0.50% (96 runs sampled)
abs(array)                                 x 630,263 ops/sec ±0.90% (96 runs sampled)
abs(numberMatrix)                          x 320,922 ops/sec ±1.79% (87 runs sampled)
genericMatrix.map(abs)                     x 152,504 ops/sec ±0.68% (97 runs sampled)
numberMatrix.map(abs)                      x 152,673 ops/sec ±0.82% (96 runs sampled)
map(genericMatrix, abs)                    x 150,779 ops/sec ±1.11% (97 runs sampled)
map(numberMatrix, abs)                     x 149,533 ops/sec ±1.25% (90 runs sampled)
map(array, abs)                            x 50,960 ops/sec ±0.92% (96 runs sampled)
map(array, abs.signatures.number)          x 74,270 ops/sec ±0.95% (95 runs sampled)
genericMatrix.map(abs.signatures.number)   x 417,552 ops/sec ±1.12% (92 runs sampled)
numberMatrix.map(abs.signatures.number)    x 417,524 ops/sec ±1.26% (93 runs sampled)

# deepMap-2 (without the latest develop)

abs(genericMatrix)                         x 244,629 ops/sec ±1.79% (93 runs sampled)
abs(array)                                 x 829,277 ops/sec ±0.55% (95 runs sampled)
abs(numberMatrix)                          x 249,694 ops/sec ±0.41% (96 runs sampled)
genericMatrix.map(abs)                     x 228,617 ops/sec ±0.48% (98 runs sampled)
numberMatrix.map(abs)                      x 169,613 ops/sec ±0.53% (98 runs sampled)
map(genericMatrix, abs)                    x 227,495 ops/sec ±0.38% (98 runs sampled)
map(numberMatrix, abs)                     x 168,517 ops/sec ±0.58% (97 runs sampled)
map(array, abs)                            x 292,857 ops/sec ±0.51% (94 runs sampled)
map(array, abs.signatures.number)          x 825,040 ops/sec ±0.49% (97 runs sampled)
genericMatrix.map(abs.signatures.number)   x 253,256 ops/sec ±0.64% (97 runs sampled)
numberMatrix.map(abs.signatures.number)    x 176,966 ops/sec ±1.32% (94 runs sampled)
map(array, abs + idx)                      x 412,065 ops/sec ±0.87% (94 runs sampled)
genericMatrix.map(abs + idx)               x 189,221 ops/sec ±0.57% (96 runs sampled)
numberMatrix.map(abs + idx)                x 147,322 ops/sec ±0.55% (97 runs sampled)
map(array, abs + idx + arr)                x 397,918 ops/sec ±0.58% (98 runs sampled)
genericMatrix.map(abs + idx + matrix)      x 130,700 ops/sec ±0.53% (92 runs sampled)
numberMatrix.map(abs + idx + matrix)       x 110,057 ops/sec ±0.73% (96 runs sampled)

dvd101x · 2024-09-27T14:53:02Z

Hi Jos,

Thanks for the update

In this PR the intention was to see how much performance could be gained while maintaining recursion, as I understand the main improvement of #3251 is this new algorithm that eliminates recursion. I wasn't sure at the beginning but I'm glad that even with all the attempts to fix the issues with deepMap the algorithm in #3251 is still faster.

I agree that we should merge some of the logic in a way that arrays can also benefit from this new algorithm (if possible).

I did test using forEach as the main algorithm for map but the difference seems significant, I think if we test both approaches the benefit in splitting the logic could be shown also in develop.

I'm still reviewing the new algorithm and trying to understand how does it manages the differences when a callback has one, two or three arguments and it's effect in performance.

I'm also not sure what is the best way to proceed. I can try something for the following weeks.

josdejong · 2024-09-27T15:09:57Z

Maybe it is easiest to start in a new PR and split the task in two steps:

Unify logic: extract DenseMatrix._forEach into the deepForEach utility function and use that from DenseMatrix.map, DenseMatrix.forEach, deepMap.
Optimize callback: add the optimizations again to not provide the callback index when not needed (mostly copying the logic from this PR).

dvd101x · 2024-10-01T03:18:53Z

Maybe you are right. I found another bottleneck in this code and it made the recursion functions faster, but they were still slower than develop.

I will attempt first unify logic into a utility, maybe the option of skip zeros will have to be removed or implemented with a special callback.

The next step is that the unification eventually is lost, when the dense matrix is stored as a flat array, I assume it might be faster even with the index conversions.

I think there is another area of opportunity for the optimize callback utility, including an option to also simplify when the callback has only one signature with the right amount of arguments.

Please let me give it a try and eventually close this PR.

josdejong · 2024-10-02T07:37:15Z

Sounds good, thanks!

dvd101x · 2024-11-07T16:26:32Z

Hi Jos,

Just to let you know, I did have some progress with this, but couldn't make it work for all tests.

I was thinking of some options to move forwards:

Leaving the matrix functions as such and make some faster recursion on the array functions and deepMap and deepForEach in collections.
Updating the current branch I have (including the failing tests) and request for help.
Keep working on it and maybe have the solution by the end of this year (2024).

Ideally I don't think it's needed for all to use the same logic, as I think the main goal is for matrices to use flat arrays internally (as tensorflowjs does).

josdejong · 2024-11-08T13:10:20Z

Thanks for the update David! Good to hear you're planning to work on this next month.

dvd101x added 2 commits September 16, 2024 10:50

Faster deepMap recurse

7b10e01

Callback with right number of arguments.

c7d98df

dvd101x mentioned this pull request Sep 16, 2024

Performance issues in deepMap #3262

Open

Removed return in forEach

8324f63

john9384 approved these changes Sep 17, 2024

View reviewed changes

Merge branch 'develop' into deepMap-2

0aad351

dvd101x added 5 commits September 18, 2024 20:38

Clone with slice

9c42720

Added index benchmarks

2ec54ad

Format

81f6d76

Optimize callback returns the number of arguments

edc61eb

Added tests for callbacks with three arguments.

9f80b90

optimizeCallback returns a non typed function.

a5618b5

Optimize recursre functions by mapping the last dimension.

c7a4210

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues with deepMap #3266

Performance issues with deepMap #3266

dvd101x commented Sep 16, 2024 •

edited

Loading

john9384 left a comment

josdejong commented Sep 18, 2024

dvd101x commented Sep 18, 2024

dvd101x commented Sep 20, 2024

dvd101x commented Sep 21, 2024

josdejong commented Sep 25, 2024

josdejong commented Sep 27, 2024

dvd101x commented Sep 27, 2024

josdejong commented Sep 27, 2024

dvd101x commented Oct 1, 2024 •

edited

Loading

josdejong commented Oct 2, 2024

dvd101x commented Nov 7, 2024

josdejong commented Nov 8, 2024

Performance issues with deepMap #3266

Are you sure you want to change the base?

Performance issues with deepMap #3266

Conversation

dvd101x commented Sep 16, 2024 • edited Loading

Key changes:

Other changes:

Areas for Future Consideration:

john9384 left a comment

Choose a reason for hiding this comment

josdejong commented Sep 18, 2024

dvd101x commented Sep 18, 2024

dvd101x commented Sep 20, 2024

dvd101x commented Sep 21, 2024

josdejong commented Sep 25, 2024

josdejong commented Sep 27, 2024

dvd101x commented Sep 27, 2024

josdejong commented Sep 27, 2024

dvd101x commented Oct 1, 2024 • edited Loading

josdejong commented Oct 2, 2024

dvd101x commented Nov 7, 2024

josdejong commented Nov 8, 2024

dvd101x commented Sep 16, 2024 •

edited

Loading

dvd101x commented Oct 1, 2024 •

edited

Loading