-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limitations of deadlines #403
Comments
With centralized coordination, we have a potential problem that a federate could block for an unbounded amount of time on a network input after having been granted a PTAG. While we could use a This suggests that federated execution provides a clean alternative to the setjump/longjump solution given above. A long-running computation can be allowed to complete without blocking other federates, and once it completes, its outputs will be ignored (though not its state updates). If it never completes, then, as long as all downstream federates use this deadline trick, the fault handler may be repeatedly invoked. If the network gets partitioned, we get the same result. The fault handler is repeatedly invoked until the network is repaired. This seems to me like a very practical solution. |
We met today (@hokeun , @lhstrh , and @edwardalee ) and figured a really nice enhancement to the deadline functionality that also plays beautifully with an extension to support LET. Consider this reactor (apologies, @Soroosh129 ):
Notice the Hence, the above reactor implements an "anytime" computation. Soroush keeps writing until he runs out of time, then he files a thesis. :-) |
This is a very interesting proposal, but for some reason, reading it gives me anxiety :)
I have two questions about this proposal at the moment.
|
Ah, yes, these questions remind me of the key difficulty in implementing the LET concept, namely that we need to ensure that any reaction that has a non-zero LET cannot see the logical time advances around it that are occurring while it executes. Logical time should never advance during the execution of a reaction, so, conceptually, the As a result, this is actually quite easy to implement s.t. the semantics is preserved. However, if we want the reaction with LET to execute in parallel with other reactions, and to allow logical time to advance to advance around it during its execution, the implementation is harder. We have to figure out a way to freeze this one reaction's view of current logical time while, for reactions in other reactors, logical time can advance. One way to do this would be change |
On the question of mutual exclusion, we do not propose to compromise on that. During the LET, this one reactor will be unable to react to any events (inputs, timers, or actions). @lhstrh suggested that the reactor have a property that specifies what to do with such events if they appear. By default, they would be deferred, reassigned a logical time equal to the completion time of the LET (or, if multiple of a single event appear, at an advanced microstep). Alternatively, the reactor could specify to drop such events. This is similar to the minimum spacing property of actions and could presumable be implemented in a similar way. |
This sounds like a reasonable solution to me. It seems like with this proposed solution, removing the barrier synchronization on tag advancement would still be needed to allow for parallelism and preemptive scheduling across tags. I was thinking that, at least for reactor-c, the mechanism implemented in Whenever a reaction is set to be executed that has a LET, there would be a tag advancement barrier raised for the current tag plus the LET for that reaction. Whenever the reaction is done executing, it could remove its barrier (which would decrement the Would it be appropriate to create a discussion to keep track of the discussion and the design related to LET? |
Absolutely. I think there are actually multiple discussion topics here. I really like the syntax proposed by @edwardalee above! It's funny how we are going back to reactions consuming logical time, as this is something I implemented accidentally when I first implemented reactors in C++ and didn't fully understand the model yet. I see the proposed syntax for LET tasks (reactions) as orthogonal to execution details and strategies for relaxing the barrier synchronization. Probably also "traditional" LF programs could benefit from the latter. I am also thinking lately about such strategies in the back of my had, but nothing concrete has fallen out so far. So it would be great to get a discussion started. |
Just trying to summarize the issue of possible confusion with the name of So far, we discussed some potential options (could be more) to address the issue:
|
Another shorter name that might help a bit would be
|
I like the idea of combining renaming and adding the boolean argument! |
I think Another possibility could be to have two separate functions: |
I think it's important to note that with the current design, 1 and 2 are checked using the same numerical value, for both a late release and a longer than expected execution time for a portion of the current reaction. I think these two concepts represent different things. This is not a problem if the goal of the deadline handler mechanism is to keep the total lag of the program in check (e.g., to keep it below 2 msec) and do something else if lag exceeds that value. I think the design of |
I don't immediately see a use case where two different deadlines would be needed, but if there is such a use case, it is easily implemented by just checking physical time in the body of the reaction directly against some arbitrary other deadline. So I don't think we really need to add anything to the infrastructure to support this. |
Yes, that makes sense. However, to play the devil's advocate, both the |
I added a boolean argument |
The deadline construct in LF is capable of detecting deadline violations only after the fact. If a reaction runs longer than expected, this can be detected by downstream reactor that receives an input from an output produced by that reaction, but the detection occurs only after the upstream reaction has completed execution. Suppose instead that we want to react as soon as the upstream reaction has exceeded its time budget? One way to do this might be with the following syntax:
Notice the additional argument to the
deadline
keyword, which is the input portexpected
. The meaning of this could be as follows:Let t be the logical time and T be the physical time at which the
trigger
reaction fires. If T > t + 10 msec, then the PANIC body is invoked, as it would be for an ordinary deadline instruction. Otherwise, the ordinary reaction is invoked, but in addition, a physical timer is set to expire after 10 msec - (T - t). When that physical timer expires, a reaction is put on the reaction queue at highest priority. When that executes, it checks to see whether theexpected
input is known (absent or present), and if not, it invokes the PANIC reaction. It would also have to check whether logical time has advanced past the one at which the timer was started, in which case, there is no need to PANIC.This seems kind of complicated, but I think it would work, at least in a multithreaded environment that either has multiple cores or has an underlying preemptive priority-driven thread scheduler.
Why is it so complicated? First, I think the detection has to be done in a separate reactor from the one where the violation occurs. Suppose we were to try this:
In this case, the PANIC reaction body would be invoked concurrently with the Long-running computation, a violation of the mutual exclusion principle in Lingua Franca. To allow this violation, both the Long-running computation and the PANIC reaction bodies would have to mutexes to access outputs and state variables, which is a lot to expect from programmers and is impossible to enforce.
Instead, in my proposed solution, the Upstream reactor has no deadline clause, and the input that triggers its Long-running computation is simultaneously provided to the
trigger
input of the downstream reactor. As a consequence, this solution is not adversely affected by the barrier synchronization we currently perform when advancing time on each federate.This would only work with multithreaded execution.
Although I think this solution solves some problems, it does not solve all. For example, notice that logical time cannot advance until the Long-running computation completes, regardless of how long it takes. To solve that problem requires a more drastic addition to the language, namely a mechanism for aborting reactions while they are executing (e.g. using setjump/longjump). This is drastic: it would require repair of data structures storing any state of the system, and it would still require disciplined programming from users (e.g., to not allocate memory in reactions). If we want to support that, I suggest the following syntax:
This would use setjump/longjump to abort the Long-running computation after 10 msec and execute the PANIC body. This is dangerous, but with some user discipline, could work. I this case, I would suggest that on starting the Long-running computation, all state variables and output port states are cached before starting the Long-running computation and restored if and when the kill triggers. Any outputs that have been produced would be retracted before they trigger downstream reactions and state updates would be reversed. The keyword
kill
is violent, but not as triggering asabort
might be, and it's very descriptive of what will happen. But we could chose a less violent keyword, I suppose.Note that in my suggestion, the
kill 10 msec
clause would still be relating logical time to physical time. That is, the Long-running computation would be killed when physical time hits t + 10 msec, not when it hits T + 10 msec. Thus, this could be viewed as a variant of thedeadline
keyword that references the completion of the reaction execution rather than the start of the execution body.On reflection, the
kill
keyword seems rather easy to implement (easier than the first proposal), so perhaps we should start with that.The text was updated successfully, but these errors were encountered: