Fix docker tests in fed-gen #1556

cmnrd · 2023-01-18T14:39:46Z

This fixes several problems in the docker test execution. It simplifies and generalizes the script that is generated in TestBase for running federated docker test. The script has now a little "smartness" build into it. It simply starts all containers and uses grep to find if any of them failed (as opposed to start each container separately as was done before).

I also noticed that the non-federated docker tests were broken, as any errors within the container were not picked up by our test framework. Both test categories now use the same script. This also allowed a significant cleanup of the test code.

We might not be fully in the green yet. During my testing I observed multiple times that federates could not connect to the RTI. The log looked always looked smth like this:

DistributedCountContainerized-c    | Federate 0: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-c    | Federate 0: Failed to connect to RTI on port 15065. Trying 15066.
DistributedCountContainerized-c    | Federate 0: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-c    | Federate 0: Failed to connect to RTI on port 15066. Trying 15067.
DistributedCountContainerized-rti  | WARNING: RTI failed to accept the socket. Resource temporarily unavailable. Trying again.
DistributedCountContainerized-c    | Federate 0: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-c    | Federate 0: Failed to connect to RTI on port 15067. Trying 15068.
DistributedCountContainerized-c    | Federate 0: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-c    | Federate 0: Failed to connect to RTI on port 15068. Trying 15069.
DistributedCountContainerized-c    | Federate 0: Could not connect to RTI at rti. Will try again every 2 seconds.
DistributedCountContainerized-c    | Federate 0: Failed to connect to RTI on port 15069. Trying 15070.

I am not sure if this is a problem in my setup or a problem with our tests. I always had to restart the docker daemon to make it work again.

cmnrd · 2023-01-18T16:09:06Z

Looks like the RTI issue also occurs in CI: https://github.com/lf-lang/lingua-franca/actions/runs/3949937855/jobs/6761805738

lhstrh · 2023-01-18T16:18:13Z

The rti problem is puzzling and needs looking into, but this PR sure is a major improvement!

lhstrh · 2023-01-18T16:20:36Z

org.lflang.tests/src/org/lflang/tests/TestBase.java

+        if (LFCommand.get("docker", List.of()) == null) {
+            throw new TestError("Executable 'docker' not found" , Result.NO_EXEC_FAIL);
+        }
+        if (LFCommand.get("docker-compose", List.of()) == null) {


Hasn't the API changed to just docker (no docker-compose)?

I think we can just remove this check...

You need to install docker-compose separately, though (even if you use docker compose). This also puzzled me at first, but they shipp as separate packages

Should we revert the removal?

lhstrh

Good to merge but left a question.

lhstrh · 2023-01-18T16:21:41Z

Much cleaner indeed! Thank you so much for these fixes :-)

edwardalee · 2023-01-18T18:01:32Z

There is a long-standing flaw in the RTI/federate mechanism for handling ports. The RTI tries to get a default port, and if is unavailable, it tries a port number that is one larger, and if that fails, it tries one more, etc. The federates go through a similar sequence, trying the default port number first, and if failing, trying one more.

However, this really doesn't work. In particular, if you start a federate before the RTI, it skips the default port, and it takes a very long time for it to circle around to try that default port again.

The problem this was trying to address is that when a program releases a port, the OS does not make the port available to other programs for some time. There is a good reason for this: the OS wants to prevent a program from grabbing a port and then receiving messages that were intended for a program that has exited. It therefore holds the port long enough that any messages that were in flight die before it releases the port.

This feature was making CI fail because it runs many federated programs in sequence.

I think a better solution is just that the RTI should just use a fixed port, perhaps optionally specified as a command-line argument (which the federates will also need to be told). Then we just have to figure out how to make CI work (wait long enough between federated tests?).

cmnrd · 2023-01-19T07:45:53Z

Why not pass the port to use as a command line argument to the RTI and the federates? Then we can even have multiple instances running in parallel on the same host using different ports. What we are missing is an actual deployment.

lhstrh · 2023-01-19T07:57:43Z

I personally think we need a service/daemon that listens at default and recognizable port (like 80 for http) with the sole purpose of brokering connections between participants in federations. We never built this and implemented this kind of the port knocking scheme instead, but I think this issue belongs in the category of problems solved by existing frameworks...

edwardalee · 2023-01-19T08:05:03Z

Yes, you can improve what we have or start over with another framework. That choice seems clear.

edwardalee · 2023-01-19T08:07:51Z

Why not pass the port to use as a command line argument to the RTI and the federates? Then we can even have multiple instances running in parallel on the same host using different ports. What we are missing is an actual deployment.

Yes, this would be easy to implement and would delete a bunch of code.

lhstrh · 2023-01-19T22:42:31Z

Update: resolving the race condition between RTI and federates by declaring a dependency in docker-compose.yml seems to have done the trick. We have some remaining errors, but those are of a different nature.

cmnrd · 2023-01-20T10:21:13Z

I converted this discussion to an issue lf-lang/reactor-c#146

cmnrd added 7 commits January 18, 2023 10:01

fix error handling in case docker is not installed

4f69401

fix dockerfile output

492116d

Also check if docker-compose is installed

e98ec25

build a more reliable docker test script

6a6b099

use the new script for running the federated docker tests

6e1b726

use the same script to run non-federated docker tests

fda6a04

create only a single process builder

1ff2ad0

cmnrd requested review from lhstrh and petervdonovan January 18, 2023 14:39

lhstrh reviewed Jan 18, 2023

View reviewed changes

lhstrh approved these changes Jan 18, 2023

View reviewed changes

Remove check for docker-compose which is no longer used.

3f4f156

lhstrh merged commit 539a12e into fed-gen Jan 18, 2023

lhstrh deleted the fix-docker-tests branch January 18, 2023 16:35

cmnrd mentioned this pull request Jan 20, 2023

Make the handling of RTI connections more robust lf-lang/reactor-c#146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix docker tests in fed-gen #1556

Fix docker tests in fed-gen #1556

cmnrd commented Jan 18, 2023

cmnrd commented Jan 18, 2023

lhstrh commented Jan 18, 2023

lhstrh Jan 18, 2023

lhstrh Jan 18, 2023

cmnrd Jan 18, 2023

cmnrd Jan 18, 2023

lhstrh left a comment

lhstrh commented Jan 18, 2023

edwardalee commented Jan 18, 2023

cmnrd commented Jan 19, 2023

lhstrh commented Jan 19, 2023

edwardalee commented Jan 19, 2023

edwardalee commented Jan 19, 2023

lhstrh commented Jan 19, 2023

cmnrd commented Jan 20, 2023

Fix docker tests in fed-gen #1556

Fix docker tests in fed-gen #1556

Conversation

cmnrd commented Jan 18, 2023

cmnrd commented Jan 18, 2023

lhstrh commented Jan 18, 2023

lhstrh Jan 18, 2023

Choose a reason for hiding this comment

lhstrh Jan 18, 2023

Choose a reason for hiding this comment

cmnrd Jan 18, 2023

Choose a reason for hiding this comment

cmnrd Jan 18, 2023

Choose a reason for hiding this comment

lhstrh left a comment

Choose a reason for hiding this comment

lhstrh commented Jan 18, 2023

edwardalee commented Jan 18, 2023

cmnrd commented Jan 19, 2023

lhstrh commented Jan 19, 2023

edwardalee commented Jan 19, 2023

edwardalee commented Jan 19, 2023

lhstrh commented Jan 19, 2023

cmnrd commented Jan 20, 2023