[Bug]: LocalStack & @SqsListener - SdkClientException "Unable to execute HTTP request: Connection refused: /127.0.0.1:{portNumber}" during tests #7454
Replies: 12 comments 12 replies
-
Hi @daniel-frak, I've moved the issue to a discussion, until we can identify if it is an issue or not. Please, next time consider raising a discussion in order to triage it or join the slack. I am not able to reproduce the issue with https://github.com/testcontainers/tc-guide-testing-aws-service-integrations-using-localstack. Did you change something? If sqs is not available then the test would fail and that's not the case, right? I think this is related to the container/test shutdown. Once the test finished, the container will be killed and be removed. So, at this point the sqs client is still trying to pull from localstack containers but this doesn't exist anymore. So, the logs you see are related to the end of the test and not at the beginning as I understand is your concern. |
Beta Was this translation helpful? Give feedback.
-
I have not changed the code in any way:
I'm running this on Ubuntu and OpenJDK 17, Maven 3.9.0: $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
java --version
openjdk 17.0.8.1 2023-08-24
OpenJDK Runtime Environment (build 17.0.8.1+1-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 17.0.8.1+1-Ubuntu-0ubuntu122.04, mixed mode, sharing)
Apache Maven 3.9.0 (9b58d2bad23a66be161c4664ef21ce219c2c8584)
Maven home: /opt/maven
Java version: 17.0.8.1, vendor: Private Build, runtime: /usr/lib/jvm/java-17-openjdk-amd64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "6.2.0-31-generic", arch: "amd64", family: "unix" But the issue also happens on Gitlab CI runners. Upon further investigation, it seems you were right that it's related to container/test shutdown - when I add a On production code, I also noticed exceptions related to acknowledgement of messages by an @SqsListener (among others) but I didn't include them to not muddy the issue. There I also get a Finally, it seems like this is an issue that has been persisting for some time, as looking at this random blog example: The author silences `Connection refused' messages, which in that earlier version of the AWS library (which his code depends on) used to be logged as WARN, instead of ERROR: <!-- Noisy logs when shutting down the context, connection refuse messages for LocalStack -->
<logger name="io.awspring.cloud.messaging.listener" level="error"/> Let me know if I can provide any more info. |
Beta Was this translation helpful? Give feedback.
-
I have the same issue, the sqslistener spam the logs with connection refused when the test is finished ( so the localstack container is shutdown) until the spring container is destroyed. Is there anyway to stop the listener directly after the test ? |
Beta Was this translation helpful? Give feedback.
-
I have the exact same issue too. Also this issue happens mostly in Jenkins. |
Beta Was this translation helpful? Give feedback.
-
I have one more a quite minimalistic repo where this issue can can be reproduced (though in my case it's dynamo sending these messages). This starts to happen with no code change and now it happens for every commit. This is not the first time this error happened to me. Last time it was fixed by bumping the localstack image version. So I have a suspicion that the problem in on localstack end. |
Beta Was this translation helpful? Give feedback.
-
My theory is that this happens because both Spring leaves the beans in a cache that can be used later if needed to save on context spinning up time, also could be since request have already gone out and hanging while you close down the SQS container from localstack. Our workaround is by marking the context as dirty, use which ever mode that fits your needs best just remember that the lower scope you use the longer the time it takes since for every dirty context a new one has to be spun up again. So if you only need to do it once per test class you will only need to spin up the context one extra time, however if you have 10 tests and you go with after each test, you will need to spin up the context 10 more times. |
Beta Was this translation helpful? Give feedback.
-
I think the problem basically boils down to this:
And that's it, basically. There's no easy way around this - you can disable Ryuk with an environment variable, but then you get no cleanup at all. I think the "easiest" solution is to turn off logging for |
Beta Was this translation helpful? Give feedback.
-
The problem is that sometimes the processes executed by Spring JVM shutdown hook expect and require a working connection (e.g to the DB), that is handled by TestContainers. In examples above the main problem is just spamming in the logs, so these warnings can be ignored. But this is not always the case. E.g. Spring Integration executes When TestContainers provides this connection, then all connections in the pool become invalid earlier, than the pool is shutted down. Example:
Marking each method with I think, we need a mechanism to synchronize shutting down of TestContainers and Spring Context. They should not shut down in parallel. They should shut down in a guaranteed order, where TestContainers is always shutting down the last (the same way as it is always starts the first). Maybe Spring's |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
We experienced this issue more frequently after a recent round of library upgrades which included testcontainers, localstack, Camel and Spring Boot. The way we addressed it was by creating an /**
* This is a **Test** [EventListener] used to stop all Camel routes when the application begins the process
* of shutting down by switching to a [ReadinessState.REFUSING_TRAFFIC] state.
*
* This is a hack to avoid the situation occurring after running tests where the _localstack_ shutdown hook
* terminates the process before the Camel shutdown hook performs its shutdown routine as is described in the
* following [testcontainers discussion](https://github.com/testcontainers/testcontainers-java/discussions/7454).
*/
@EventListener
fun stopRoutesOnShutdown(event: AvailabilityChangeEvent<*>) {
val source = event.source
if (event.state == ReadinessState.REFUSING_TRAFFIC && source is ApplicationContext) {
logger.info { "Event received: eventState=${event.state}. Shutting down Camel routes in reverse startup order.." }
with(source.getBean<CamelContext>()) {
routes.sortedByDescending { it.startupOrder }.forEach { route ->
routeController.stopRoute(route.routeId, 250, TimeUnit.MILLISECONDS)
}
}
}
} |
Beta Was this translation helpful? Give feedback.
-
My update on this issue, I was able to prevent this error from happening by enabling containers reuse: public abstract class IntegrationTest {
static final LocalStackContainer localstack =
new LocalStackContainer(DockerImageName.parse("localstack/localstack:3.2.0"))
.withServices(LocalStackContainer.Service.SQS)
.withReuse(true);
static {
TestcontainersConfiguration.getInstance()
.updateUserConfig("testcontainers.reuse.enable", "true");
}
@BeforeAll
static void init() {
localstack.start();
localstack.followOutput(new Slf4jLogConsumer(LoggerFactory.getLogger("localstack")));
}
} In case if you're running your tests in context of Spring Boot Extension, another option would be to manager container lifecycle through Spring: @SpringBootTest
public abstract class IntegrationTest {
}
@Configuration
public class TestContainersConfig {
@Bean(initMethod = "start", destroyMethod = "stop")
public LocalStackContainer container() {
return new LocalStackContainer(DockerImageName.parse("localstack/localstack:3.2.0"))
.withServices(LocalStackContainer.Service.SQS);
}
} |
Beta Was this translation helpful? Give feedback.
-
According to the testcontainers commit that introduced this issue last October:
As per the above description, this change was aimed at expediting Ryuk's finalisation routine which has introduced a whole raft of other issues when other components such as Spring JMS or Camel routes that depend on the services monitored by Ryuk are suddenly terminated; akin to having the rug pulled from under them while they are busy in the process of coalescing. Should we consider configuring the shutdown hook above as an opt-in until such time a more permanent solution has been found? |
Beta Was this translation helpful? Give feedback.
-
Module
LocalStack
Testcontainers version
1.19.0
Using the latest Testcontainers version?
Yes
Host OS
Linux
Host Arch
x86
Docker version
Client: Docker Engine - Community Version: 24.0.5 API version: 1.43 Go version: go1.20.6 Git commit: ced0996 Built: Fri Jul 21 20:35:18 2023 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 24.0.5 API version: 1.43 (minimum version 1.12) Go version: go1.20.6 Git commit: a61e2b4 Built: Fri Jul 21 20:35:18 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.22 GitCommit: 8165feabfdfe38c65b599c4993d227328c231fca runc: Version: 1.1.8 GitCommit: v1.1.8-0-g82f18fe docker-init: Version: 0.19.0 GitCommit: de40ad0
What happened?
When using the LocalStack container in tests with an @SqsListener, the logs get spammed with
software.amazon.awssdk.core.exception.SdkClientException
for a while, even though the container is supposedly already running.It seems that TestContainers are not quite ready after the "Ready." log.
This can make CI logs unreadable, as sometimes the LocalStack initialization takes a long time, overwhelming the log file with errors.
Relevant log output
Additional Information
To reproduce the issue, the official Testcontainers example can be cloned:
https://github.com/testcontainers/tc-guide-testing-aws-service-integrations-using-localstack
The
MessageListenerTest::shouldHandleMessageSuccessfully
test creates the attached log withERROR io.awspring.cloud.sqs.listener.source.AbstractPollingMessageSource - Error polling for messages in queue
.I have also tested this in my own project, using the latest version of Testcontainers (1.19.0) and the result is the same.
Additionally, I have tried creating the queue using a LocalStack init script (
/etc/localstack/init/ready.d/init-aws.sh
) but the error persists. I have tried explicitly waiting for my SQS queue to be created:This also does not fix the error.
Beta Was this translation helpful? Give feedback.
All reactions