Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATIS-1968. Remove unreached else block of trySendDelayed #989

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

symious
Copy link
Contributor

@symious symious commented Dec 13, 2023

What changes were proposed in this pull request?

In "

", the else block should not be reached, since only the first request is send out, all other request are still in the "delayedRequests", we can not receive out-of-order replies if "firstReplied" is not set.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-1968

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

No tests.

@szetszwo
Copy link
Contributor

Since testWithLoadAsync fails, it probably needs the else. Sorry that the code was long time ago. I might not remember the details. It may be related to failover to another server (i.e. leader change).

@symious
Copy link
Contributor Author

symious commented Dec 13, 2023

Yes, seems the variables are reset on exceptions. I will try to dive deeper, if needed, I'll close the PR then.

@symious symious marked this pull request as draft December 13, 2023 05:32
@szetszwo
Copy link
Contributor

@symious , I recall now -- when there is a failover, there could possibly be multiple calls needed to be retried. Since the retires are async, some other calls can be retried before the first call.

In short, client calls are always in order but retries may be out of order.

@symious
Copy link
Contributor Author

symious commented Dec 13, 2023

@szetszwo The error seems caused by repeating resets on the slidingWindow, could you help to check?

@szetszwo
Copy link
Contributor

The error seems caused by repeating resets on the slidingWindow, ...

Where is the code?

@symious
Copy link
Contributor Author

symious commented Dec 14, 2023

repeating reset.

https://github.com/apache/ratis/blob/cedcd2ad4cd2da13230aaa0d15678e5aee9b0729/ratis-client/src/main/java/org/apache/ratis/client/impl/OrderedAsync.java#L257C61-L257C61

Steps:

  1. Client send 100 async requests to server, here firstReplied is true
  2. Leader changed, 100 requests replied with exceptions, trying to reset the slidingWindow.
  3. Say the requests id are [100, 199], client receives "id100" and and reset the slidingWindow, firstReplied changed to false.
  4. Client retry on "id100" and receives the reply, so set firstReplied to true.
  5. Client then receives other requests' exception reply from Step2, then reset the slidingWindow again, but the connection is ok now, no need to reset again.

@szetszwo

@szetszwo
Copy link
Contributor

The failAllAsyncRequests may fail with a different exception (AlreadyClosedException may work, or we should add a new AlreadyFailedException). For that exception, do not resetSlidingWindow.

@symious
Copy link
Contributor Author

symious commented Dec 15, 2023

@szetszwo I think failAllAsyncRequests is not invoked here, the slidingWindow is reset, then CompletionException(e) is thrown and failAllAsyncRequests is skipped. (Correct me if I'm wrong.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants