Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: duct-tape:1.0.8 contains a Thread Leak #9227

Open
dadoonet opened this issue Sep 16, 2024 · 1 comment
Open

[Bug]: duct-tape:1.0.8 contains a Thread Leak #9227

dadoonet opened this issue Sep 16, 2024 · 1 comment
Labels

Comments

@dadoonet
Copy link
Contributor

Module

Core

Testcontainers version

1.20.1

Using the latest Testcontainers version?

Yes

Host OS

MacOS

Host Arch

Apple M3 Pro

Docker version

Client:
 Cloud integration: v1.0.35+desktop.10
 Version:           25.0.3
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        4debf41
 Built:             Tue Feb  6 21:13:26 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.27.2 (137060)
 Engine:
  Version:          25.0.3
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       f417435
  Built:            Tue Feb  6 21:14:22 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

What happened?

Here are my findings.

When using TestContainers in tests, some threads are never stopped which creates Zombie threads.
I'm using the RandomizedTesting framework on my projects and it automatically detects when some threads are still running although everything has been stopped.

The problem is not directly coming from TC but from duct-tape which has been archived 2 years ago by @rnorth.

duct-tape is a dependency of TC: https://github.com/testcontainers/testcontainers-java/blob/0217e78eb986da4e73402288959d05f34b37546f/core/build.gradle#L77C1-L79C6

api ('org.rnorth.duct-tape:duct-tape:1.0.8') {
  exclude(group: 'org.jetbrains', module: 'annotations')
}

The problem in duck tape starts here: https://github.com/rnorth/duct-tape/blob/2a1c5be9f2ef3f16bf036cec8752a170d130b61e/src/main/java/org/rnorth/ducttape/timeouts/Timeouts.java#L15-L25

    private static final ExecutorService EXECUTOR_SERVICE = Executors.newCachedThreadPool(new ThreadFactory() {


        final AtomicInteger threadCounter = new AtomicInteger(0);


        @Override
        public Thread newThread(@NotNull Runnable r) {
            Thread thread = new Thread(r, "ducttape-" + threadCounter.getAndIncrement());
            thread.setDaemon(true);
            return thread;
        }
    });

As soon as you call one of the methods in the Timeouts class, there's one thread which is started and never stopped.

We do call Timeouts in LazyFuture:

@Override
public T get(long timeout, TimeUnit unit) throws TimeoutException {
try {
return Timeouts.getWithTimeout((int) timeout, unit, this::get);
} catch (org.rnorth.ducttape.TimeoutException e) {
throw new TimeoutException(e.getMessage());
}
}
so we end up creating a thread ducttape-1.

Here is a simple test which reproduces the problem:

@RunWith(RandomizedRunner.class)
@TimeoutSuite(millis = 5 * 60 * 1000)
@ThreadLeakScope(ThreadLeakScope.Scope.SUITE)
@ThreadLeakLingering(linger = 10000) // 5 sec lingering
public class ZombieDucttapeDemoIT {

    @Test
    public void testZombie() throws Exception {
        Timeouts.doWithTimeout(1, TimeUnit.SECONDS, () -> {
            System.out.println("Hello world!");
        });
    }
}

When I stop my tests, I can see this:

Hello world!
sept. 16, 2024 5:05:56 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
WARNING: Will linger awaiting termination of 1 leaked thread(s).
sept. 16, 2024 5:06:06 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
SEVERE: 1 thread leaked from SUITE scope at fr.pilato.test.zombie.minio.ZombieDucttapeDemoIT: 
   1) Thread[id=24, name=ducttape-0, state=TIMED_WAITING, group=TGRP-ZombieDucttapeDemoIT]
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)
        at java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:410)
        at java.base/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:452)
        at java.base/java.util.concurrent.SynchronousQueue$Transferer.xferLifo(SynchronousQueue.java:194)
        at java.base/java.util.concurrent.SynchronousQueue.xfer(SynchronousQueue.java:233)
        at java.base/java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:336)
        at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
sept. 16, 2024 5:06:06 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
INFO: Starting to interrupt leaked threads:
   1) Thread[id=24, name=ducttape-0, state=TIMED_WAITING, group=TGRP-ZombieDucttapeDemoIT]
sept. 16, 2024 5:06:08 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
SEVERE: There are still zombie threads that couldn't be terminated:
   1) Thread[id=24, name=ducttape-0, state=TIMED_WAITING, group=TGRP-ZombieDucttapeDemoIT]
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)
        at java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:410)
        at java.base/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:452)
        at java.base/java.util.concurrent.SynchronousQueue$Transferer.xferLifo(SynchronousQueue.java:194)
        at java.base/java.util.concurrent.SynchronousQueue.xfer(SynchronousQueue.java:233)
        at java.base/java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:336)
        at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)

com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at fr.pilato.test.zombie.minio.ZombieDucttapeDemoIT: 
   1) Thread[id=24, name=ducttape-0, state=TIMED_WAITING, group=TGRP-ZombieDucttapeDemoIT]
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)
        at java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:410)
        at java.base/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:452)
        at java.base/java.util.concurrent.SynchronousQueue$Transferer.xferLifo(SynchronousQueue.java:194)
        at java.base/java.util.concurrent.SynchronousQueue.xfer(SynchronousQueue.java:233)
        at java.base/java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:336)
        at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)

	at __randomizedtesting.SeedInfo.seed([5F166AAD0B3CB2D7]:0)


com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated:
   1) Thread[id=24, name=ducttape-0, state=TIMED_WAITING, group=TGRP-ZombieDucttapeDemoIT]
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)
        at java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:410)
        at java.base/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:452)
        at java.base/java.util.concurrent.SynchronousQueue$Transferer.xferLifo(SynchronousQueue.java:194)
        at java.base/java.util.concurrent.SynchronousQueue.xfer(SynchronousQueue.java:233)
        at java.base/java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:336)
        at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)

	at __randomizedtesting.SeedInfo.seed([5F166AAD0B3CB2D7]:0)


Process finished with exit code 255

I suggest to do this:

  1. Move the source code of ducttape to test containers
  2. Update the code to provide a way to close the started threads
  3. Ideally automatically close the Threads when container.close() is called.

Relevant log output

No response

Additional Information

The code can be found at https://github.com/dadoonet/demo-ssh-mino/blob/master/src/test/java/fr/pilato/test/zombie/ducctape/ZombieDucttapeDemoIT.java

dadoonet added a commit to dadoonet/testcontainers-java that referenced this issue Sep 16, 2024
1st step: move code from the archived repository

Related to testcontainers#9227.
dadoonet added a commit to dadoonet/testcontainers-java that referenced this issue Sep 16, 2024
2nd step: call shutdown() when stopping the container

Related to testcontainers#9227.
@eddumelendez
Copy link
Member

Hi @dadoonet, thanks for the very detailed explanation. I've been thinking for quite some time to replace duct-tape with resilience4j, rate-limiter and time-limiter modules only depends on the core module and doesn't bring more transitive dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants