-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix docker tests in fed-gen #1556
Conversation
Looks like the RTI issue also occurs in CI: https://github.com/lf-lang/lingua-franca/actions/runs/3949937855/jobs/6761805738 |
The |
if (LFCommand.get("docker", List.of()) == null) { | ||
throw new TestError("Executable 'docker' not found" , Result.NO_EXEC_FAIL); | ||
} | ||
if (LFCommand.get("docker-compose", List.of()) == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hasn't the API changed to just docker
(no docker-compose
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just remove this check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to install docker-compose separately, though (even if you use docker compose
). This also puzzled me at first, but they shipp as separate packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we revert the removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to merge but left a question.
Much cleaner indeed! Thank you so much for these fixes :-) |
There is a long-standing flaw in the RTI/federate mechanism for handling ports. The RTI tries to get a default port, and if is unavailable, it tries a port number that is one larger, and if that fails, it tries one more, etc. The federates go through a similar sequence, trying the default port number first, and if failing, trying one more. However, this really doesn't work. In particular, if you start a federate before the RTI, it skips the default port, and it takes a very long time for it to circle around to try that default port again. The problem this was trying to address is that when a program releases a port, the OS does not make the port available to other programs for some time. There is a good reason for this: the OS wants to prevent a program from grabbing a port and then receiving messages that were intended for a program that has exited. It therefore holds the port long enough that any messages that were in flight die before it releases the port. This feature was making CI fail because it runs many federated programs in sequence. I think a better solution is just that the RTI should just use a fixed port, perhaps optionally specified as a command-line argument (which the federates will also need to be told). Then we just have to figure out how to make CI work (wait long enough between federated tests?). |
Why not pass the port to use as a command line argument to the RTI and the federates? Then we can even have multiple instances running in parallel on the same host using different ports. What we are missing is an actual deployment. |
I personally think we need a service/daemon that listens at default and recognizable port (like |
Yes, you can improve what we have or start over with another framework. That choice seems clear. |
Yes, this would be easy to implement and would delete a bunch of code. |
Update: resolving the race condition between RTI and federates by declaring a dependency in |
I converted this discussion to an issue lf-lang/reactor-c#146 |
This fixes several problems in the docker test execution. It simplifies and generalizes the script that is generated in TestBase for running federated docker test. The script has now a little "smartness" build into it. It simply starts all containers and uses grep to find if any of them failed (as opposed to start each container separately as was done before).
I also noticed that the non-federated docker tests were broken, as any errors within the container were not picked up by our test framework. Both test categories now use the same script. This also allowed a significant cleanup of the test code.
We might not be fully in the green yet. During my testing I observed multiple times that federates could not connect to the RTI. The log looked always looked smth like this:
I am not sure if this is a problem in my setup or a problem with our tests. I always had to restart the docker daemon to make it work again.