-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removal of the "waiting for tag" field #266
Conversation
Oops, now there are STP violations. My goal is still to get these changes ready in the next few days. |
The |
When a federate sends a NET for a certain tag, and it has upstream federates, we incorrectly assumed that the federate needs to receive a TAG or PTAG in order to allow it to proceed to that NET. This assumption is incorrect because after sending the NET, the federate might receive a message from an upstream federate; that message might lead to a newly lowered NET. The bug that resulted from the incorrect assumption is that the federate would not proceed to execute the event enabled by the message it received after sending the NET. This is because the federate does not realize that it has anything to do; it still thinks that it cannot do anything until it receives a TAG or PTAG that enables it to process the NET. Therefore, the federation deadlocks. The fix is to compute a new next event tag whenever the event queue changes.
The problem was that the NET obtained by get_next_event_tag might be farther into the future than the NET that was just submitted to the RTI. This is possible when executing the start tag because all federates send a NET for the start tag regardless of their event queues.
3ba4d73
to
49483cf
Compare
This PR is almost ready, except that there is one failure on macOS in
But I can't think of any reason why the event queue would ever be in this state, even temporarily. It seems like we are looking at the event queue at a time when it is temporarily in an invalid state. |
On Ubuntu I ran the test about 700 times without reproducing the error. Will take another look tonight. |
The remove-waiting-for-tag branch doesn't compile for me, so I can't check the fix. The first compile error is in federate.c:
|
4ae2a20
to
49483cf
Compare
Sorry about that. The latest commit was wrong, so I reverted it. This means that I am still not sure whether this commit introduces a rare bug on macOS, or if the bug already existed. |
I think we should just merge this as-is because I cannot reproduce the error that occurred here (it is not even reproducible in CI), and also because the logs do not include any information about what happened in federate 1 before the error -- they do not even include debug prints from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's merge this. It is clearly an improvement even if problems remain.
I think that this fixes a bug which was reported over email.
When a federate sends a NET for a certain tag, and it has upstream federates, we incorrectly assumed that the federate needs to receive a TAG or PTAG in order to allow it to proceed to that NET.
This assumption is incorrect because after sending the NET, the federate might receive a message from an upstream federate; that message might lead to a newly lowered NET.
The bug that resulted from the incorrect assumption is that the federate would not proceed to execute the event enabled by the message it received after sending the NET. This is because the federate does not realize that it has anything to do; it still thinks that it cannot do anything until it receives a TAG or PTAG that enables it to process the NET. Therefore, the federation deadlocks.
The fix is to compute a new next event tag whenever the event queue changes.
For my own sake, I also point out that this issue appears to have been in the code for about 2 years, so I do not think it was introduced by the changes I made in the last few months.
The original minimal reproduction is as follows: