You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When Tasks are started in a loop then there will be an InvocationProxy-like structure in the VHDL (see invocationIndex) where we select the next available instance of the parallelized method (which was created from a lambda expression) to start. This has as many branches as many instances there are. It's not an issue with lower instance counts or simpler initialization logic but once it gets more complex timing errors can happen.
Then there's a corresponding wait state ("Waiting for the state machine invocation of the following method to finish...") that checks all the instances' start and finish signals, which again can get complex.
While both just scale linearly we could optimize to make them simpler somehow. I don't know whether we can do anything really with the wait state, as it's unavoidable for us to check at one point whether all the FSMs finished. But the invocation part could be simpler by pairing invocations similar to invocations between standard methods' FSMs.
Possible approaches:
The most promising: Since just a single FSM is started at a given time it would work to push input data to common global registers and have an invocationIndex register as well (but a single signal can't have multiple drivers). If invocationIndex contains the index corresponding to a given FSM then that FSM will start itself. However, this would need significant architectural changes. Alternatively, we could add small pieces of glue logic between the existing parallel FSMs and such global registers (for every FSM there would be some combinatorial logic listening to its corresponding invocationIndex). However, this might not help as all because the current logic is supposed to describe the same too.
Possibly related: Loop unrolling (HAST-114) #14. One solution might be to unroll the Task-creating loops and pair an instance in each unrolled loop body. This, however, would be pretty hard to implement and if the loop body is complex then it'd also greatly increase resource usage.
This SO answer mentions using shift registers instead multiplexers. However, at higher levels of parallelism shifting out inputs from a register would take a lot of time for higher FSM indices.
When
Task
s are started in a loop then there will be anInvocationProxy
-like structure in the VHDL (seeinvocationIndex
) where we select the next available instance of the parallelized method (which was created from a lambda expression) to start. This has as many branches as many instances there are. It's not an issue with lower instance counts or simpler initialization logic but once it gets more complex timing errors can happen.Then there's a corresponding wait state ("Waiting for the state machine invocation of the following method to finish...") that checks all the instances' start and finish signals, which again can get complex.
While both just scale linearly we could optimize to make them simpler somehow. I don't know whether we can do anything really with the wait state, as it's unavoidable for us to check at one point whether all the FSMs finished. But the invocation part could be simpler by pairing invocations similar to invocations between standard methods' FSMs.
Possible approaches:
invocationIndex
register as well (but a single signal can't have multiple drivers). IfinvocationIndex
contains the index corresponding to a given FSM then that FSM will start itself. However, this would need significant architectural changes. Alternatively, we could add small pieces of glue logic between the existing parallel FSMs and such global registers (for every FSM there would be some combinatorial logic listening to its correspondinginvocationIndex
). However, this might not help as all because the current logic is supposed to describe the same too.Task
-creating loops and pair an instance in each unrolled loop body. This, however, would be pretty hard to implement and if the loop body is complex then it'd also greatly increase resource usage.Jira issue
The text was updated successfully, but these errors were encountered: