Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform the port checks in parallel #4463

Merged
merged 3 commits into from
Oct 13, 2021
Merged

Perform the port checks in parallel #4463

merged 3 commits into from
Oct 13, 2021

Conversation

bsideup
Copy link
Member

@bsideup bsideup commented Sep 16, 2021

No description provided.

@bsideup bsideup added this to the next milestone Sep 16, 2021
@@ -41,10 +49,52 @@ protected void waitUntilReady() {
Callable<Boolean> externalCheck = new ExternalPortListeningCheck(waitStrategyTarget, externalLivenessCheckPorts);

try {
Unreliables.retryUntilTrue((int) startupTimeout.getSeconds(), TimeUnit.SECONDS,
() -> getRateLimiter().getWhenReady(() -> internalCheck.call() && externalCheck.call()));
List<Future<Boolean>> futures = EXECUTOR.invokeAll(Arrays.asList(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


public abstract class AbstractWaitStrategy implements WaitStrategy {

static final ExecutorService EXECUTOR = Executors.newCachedThreadPool(new ThreadFactory() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use Guava's ThreadFactoryBuilder here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather reduce Guava usage, not increase it 😅 I could change it, but this is some "static" code, and it is not huge, so... WDYT? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say not use Guava and use a real class WaitStrategyDaemonThreadFactory or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with a non-reusable 10 lines anonymous class close to the usage? 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That I am confronted with lower-level implementation detail at the head of the file instead of a class name acting as an abstraction. But these are differences in our style, which we might never overcome 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind; I'd rather see the removal of Guava as a dependency too, but it's there.

Perhaps we could extract this to our own class, a bit like @kiview suggests? We have the same code over in Startables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has dragged on a bit, so: let’s leave it. A bit of repetition doesn’t do any harm. If we ever add a thread factory again, though, let’s make all occurrences use a shared class

@kiview
Copy link
Member

kiview commented Sep 17, 2021

Could you measure a performance improvement in certain conditions? For me, it is the same with Kafka.

@bsideup
Copy link
Member Author

bsideup commented Sep 17, 2021

@kiview this one a bit hard to measure, as it only saves tens or sometimes hundreds of milliseconds, so we have to rely on a common sense of the code (running checks in parallel and cache their results, as opposed to sequential non-cached execution)

That said, I did observe a slight improvement with Kafka (100ms or so)

// Polling
() -> {
Instant now = Instant.now();
RateLimiter rateLimiter = RateLimiterBuilder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about replacing duct-tape with awaitility here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, Awaitility was performing much slower (~100ms vs 10ms). Need to investigate why (and maybe revisit our previous plans to migrate)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if directly related to your observation, but the way Awaitility handles its executors by default might introduce some overhead, see awaitility/awaitility#168.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am an idiot - I missed the fact that, unless explicitly set, pollDelay is equal to pollInterval, so it was adding a static delay 😅 With pollDelay set to 0, Awaitility is as fast as while (...) {} 🎉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouh yeah! But this is easy to get wrong, not the most intuitive default IMO.

@kiview
Copy link
Member

kiview commented Sep 17, 2021

The failing orientdb tests in CI are weird.

@rnorth
Copy link
Member

rnorth commented Sep 20, 2021

Yeah, FTR:

Gradle Test Executor 1 > org.testcontainers.containers.OrientDBContainerTest > shouldReturnTheSameSession FAILED
    com.orientechnologies.orient.core.exception.OSecurityAccessException: Wrong user/password to [connect] to the remote OrientDB Server instance

...

Gradle Test Executor 1 > org.testcontainers.containers.OrientDBContainerTest > shouldInitializeWithCommands FAILED
    com.orientechnologies.orient.core.exception.OSecurityAccessException: Wrong user/password to [connect] to the remote OrientDB Server instance

Will try this multiple times locally; maybe there is a race condition that's become exposed by faster startup.

@kiview
Copy link
Member

kiview commented Sep 20, 2021

Which would indicate, that the actual WaitStrategy is not stable (which might very well be the case).

@rnorth
Copy link
Member

rnorth commented Sep 20, 2021

Running repeatedly on this branch, I'm getting roughly a 5-10% failure rate on these and other tests in the OrientDB module.

It seems like there's 0% failure rate when running the same tests on master branch.

Maybe this would be a good moment to switch the OrientDB module to use a log-based wait strategy?

Of course I think we (and users) may have to be on the lookout for other race conditions that this faster startup check exposes, which is a real shame 😭.

@kiview
Copy link
Member

kiview commented Sep 20, 2021

TBH I don't understand how this race condition can happen, given that OrientDBContainer container already uses an additional HttpWaitStrategy:

waitStrategy = new WaitAllStrategy()
            .withStrategy(Wait.forListeningPort())
            .withStrategy(waitForHttp)
            .withStartupTimeout(Duration.ofMinutes(2));

I will little bit look into this, to see what it does at startup.

@rnorth
Copy link
Member

rnorth commented Sep 20, 2021

Perhaps the container is applying configuration while already listening? Have created #4471 as a draft PR to change the strategy a bit.

@kiview
Copy link
Member

kiview commented Sep 20, 2021

Let's merge #4471 first and then this PR once CI is green. Better to not increase the flakiness of our own CI pipeline (even if the improvement just highlights existing flakiness).

@rnorth
Copy link
Member

rnorth commented Oct 12, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@kiview kiview merged commit 69cf13c into master Oct 13, 2021
@delete-merged-branch delete-merged-branch bot deleted the parallel_port_checks branch October 13, 2021 11:12
kiview added a commit that referenced this pull request Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants