Optimize the invocation of parallel methods (HAST-246) #36

Piedone · 2019-10-23T14:43:59Z

When Tasks are started in a loop then there will be an InvocationProxy-like structure in the VHDL (see invocationIndex) where we select the next available instance of the parallelized method (which was created from a lambda expression) to start. This has as many branches as many instances there are. It's not an issue with lower instance counts or simpler initialization logic but once it gets more complex timing errors can happen.

Then there's a corresponding wait state ("Waiting for the state machine invocation of the following method to finish...") that checks all the instances' start and finish signals, which again can get complex.

While both just scale linearly we could optimize to make them simpler somehow. I don't know whether we can do anything really with the wait state, as it's unavoidable for us to check at one point whether all the FSMs finished. But the invocation part could be simpler by pairing invocations similar to invocations between standard methods' FSMs.

Possible approaches:

The most promising: Since just a single FSM is started at a given time it would work to push input data to common global registers and have an invocationIndex register as well (but a single signal can't have multiple drivers). If invocationIndex contains the index corresponding to a given FSM then that FSM will start itself. However, this would need significant architectural changes. Alternatively, we could add small pieces of glue logic between the existing parallel FSMs and such global registers (for every FSM there would be some combinatorial logic listening to its corresponding invocationIndex). However, this might not help as all because the current logic is supposed to describe the same too.
Possibly related: Loop unrolling (HAST-114) #14. One solution might be to unroll the Task-creating loops and pair an instance in each unrolled loop body. This, however, would be pretty hard to implement and if the loop body is complex then it'd also greatly increase resource usage.
This SO answer mentions using shift registers instead multiplexers. However, at higher levels of parallelism shifting out inputs from a register would take a lot of time for higher FSM indices.

Jira issue

The text was updated successfully, but these errors were encountered:

Piedone added the enhancement label Oct 23, 2019

Piedone mentioned this issue Oct 23, 2019

Optimize InternalInvocationProxy #30

Closed

github-actions bot changed the title ~~Optimize the invocation of parallel methods~~ Optimize the invocation of parallel methods (HAST-246) Sep 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the invocation of parallel methods (HAST-246) #36

Optimize the invocation of parallel methods (HAST-246) #36

Piedone commented Oct 23, 2019 •

edited by github-actions bot

Loading

Optimize the invocation of parallel methods (HAST-246) #36

Optimize the invocation of parallel methods (HAST-246) #36

Comments

Piedone commented Oct 23, 2019 • edited by github-actions bot Loading

Piedone commented Oct 23, 2019 •

edited by github-actions bot

Loading