-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform the port checks in parallel #4463
Conversation
@@ -41,10 +49,52 @@ protected void waitUntilReady() { | |||
Callable<Boolean> externalCheck = new ExternalPortListeningCheck(waitStrategyTarget, externalLivenessCheckPorts); | |||
|
|||
try { | |||
Unreliables.retryUntilTrue((int) startupTimeout.getSeconds(), TimeUnit.SECONDS, | |||
() -> getRateLimiter().getWhenReady(() -> internalCheck.call() && externalCheck.call())); | |||
List<Future<Boolean>> futures = EXECUTOR.invokeAll(Arrays.asList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
public abstract class AbstractWaitStrategy implements WaitStrategy { | ||
|
||
static final ExecutorService EXECUTOR = Executors.newCachedThreadPool(new ThreadFactory() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use Guava's ThreadFactoryBuilder
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather reduce Guava usage, not increase it 😅 I could change it, but this is some "static" code, and it is not huge, so... WDYT? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say not use Guava and use a real class WaitStrategyDaemonThreadFactory
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with a non-reusable 10 lines anonymous class close to the usage? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That I am confronted with lower-level implementation detail at the head of the file instead of a class name acting as an abstraction. But these are differences in our style, which we might never overcome 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind; I'd rather see the removal of Guava as a dependency too, but it's there.
Perhaps we could extract this to our own class, a bit like @kiview suggests? We have the same code over in Startables
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has dragged on a bit, so: let’s leave it. A bit of repetition doesn’t do any harm. If we ever add a thread factory again, though, let’s make all occurrences use a shared class
Could you measure a performance improvement in certain conditions? For me, it is the same with Kafka. |
@kiview this one a bit hard to measure, as it only saves tens or sometimes hundreds of milliseconds, so we have to rely on a common sense of the code (running checks in parallel and cache their results, as opposed to sequential non-cached execution) That said, I did observe a slight improvement with Kafka (100ms or so) |
// Polling | ||
() -> { | ||
Instant now = Instant.now(); | ||
RateLimiter rateLimiter = RateLimiterBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about replacing duct-tape with awaitility here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, Awaitility was performing much slower (~100ms vs 10ms). Need to investigate why (and maybe revisit our previous plans to migrate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if directly related to your observation, but the way Awaitility handles its executors by default might introduce some overhead, see awaitility/awaitility#168.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am an idiot - I missed the fact that, unless explicitly set, pollDelay
is equal to pollInterval
, so it was adding a static delay 😅 With pollDelay
set to 0
, Awaitility is as fast as while (...) {}
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouh yeah! But this is easy to get wrong, not the most intuitive default IMO.
The failing orientdb tests in CI are weird. |
Yeah, FTR:
Will try this multiple times locally; maybe there is a race condition that's become exposed by faster startup. |
Which would indicate, that the actual |
Running repeatedly on this branch, I'm getting roughly a 5-10% failure rate on these and other tests in the OrientDB module. It seems like there's 0% failure rate when running the same tests on master branch. Maybe this would be a good moment to switch the OrientDB module to use a log-based wait strategy? Of course I think we (and users) may have to be on the lookout for other race conditions that this faster startup check exposes, which is a real shame 😭. |
TBH I don't understand how this race condition can happen, given that
I will little bit look into this, to see what it does at startup. |
Perhaps the container is applying configuration while already listening? Have created #4471 as a draft PR to change the strategy a bit. |
Let's merge #4471 first and then this PR once CI is green. Better to not increase the flakiness of our own CI pipeline (even if the improvement just highlights existing flakiness). |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
No description provided.