-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TAN messages no longer used and in-transit messages recorded in the RTI #1074
Conversation
I've been wrestling with this issue for the past few days. I think I have identified two possible race conditions, one of which I think I found a fix for (but not the other). Imagine the following federated program (PassThrough and TestCount are simple library reactors with trivial implementation that is elided for the sake of simplicity): target C {
timeout: 4 sec
};
import PassThrough from "../lib/PassThrough.lf"
import TestCount from "../lib/TestCount.lf"
reactor WithLogicalAction {
output out:int;
state thread_id:lf_thread_t(0);
state counter:int(1);
logical action act(0):int;
reaction(startup, act) -> act, out {=
SET(out, self->counter);
schedule_int(act, MSEC(1), self->counter++);
=}
}
federated reactor {
a = new WithLogicalAction();
test = new TestCount(num_inputs=21);
passThroughs = new PassThrough();
a.out, passThroughs.out ->
passThroughs.in, test.in
} Race condition 1: NET with FOREVER tag from the federate after message forwarding by the RTI but before deliveryExample scenario:
Fix: RTI checks that a recorded next event is already completed before replacing it with a larger next event. Race condition 2: NET with smaller tag from the federate before message forwarding by the RTIExample scenario:
Fix: ? Any suggestion on how to fix the second race condition? |
It seems that maybe the RTI needs to keep a list of tags of messages it has forwarded to a federate for which it has not yet received an LTC. When it receives an LTC, it removes items from this list with tags <= the LTC. When it receives a NET, the RTI checks and sets its local record of the NET to the minimum of the NET it received and the tags in the list. This is unfortunate because, although this list is likely to be small in practice, there is no upper bound on its size, so the implementation inevitably will require a malloc. But it probably could be optimized to minimize the likelihood of a malloc. This whole protocol is screaming to be formally verified... |
This was my initial instinct as well. I implemented a prototype using a vector but I quickly realized that this list of tags needs to be sorted to be efficient. I think our existing pqueue sounds like the right structure for this.
I agree. Tagging @lsk567 to see if he would be interested in continuing our unfinished work toward formally verifying the federated execution. |
Here is an implementation for the queue of in-transit messages: lf-lang/reactor-c@98dd2c2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This PR removes TAN messages entirely.
Instead, a federate with a physical action (that is connected to a network output) is going to periodically create a dummy event (with the period controlled by coordination-options: {advance-message-interval: 10 msec}) which forces the federate to advance its tag and allow downstream federates to make progress.
After fixing this bug, another bug was exposed in the RTI, in which the RTI could potentially lose track of a federate's actual earliest next event (see #1074 (comment) comment for more detail). This caused the RTI to grant incorrect tag advance grant (TAG) messages. This bug was fixed by adding a queue to the RTI that keeps a record of all currently in-transit messages.