Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATIS-2173. Fix zero-copy bugs for non-gRPC cases. #1167

Merged
merged 4 commits into from
Oct 20, 2024

Conversation

szetszwo
Copy link
Contributor

@szetszwo szetszwo commented Oct 11, 2024

RATIS-2173

This is to fix the following bugs discovered in #1156

  1. LogAppenderDefault.sendAppendEntriesWithRetries(..) calls release() incorrectly for exception.

.
12. MemoryRaftLog has similar problems as in SegmentedRaftLog.

@szetszwo
Copy link
Contributor Author

After a few retries, this can pass everything (although there are still zero-copy bugs).

Let's move ReferenceCountedLeakDetector.enable(..) back to MiniRaftClusterWithGrpc for now in order to easier passing the tests.

  • Indeed, the non-gRPC leaks are not real leaks since there are no underlying buffers to release. Just the reference counts are incorrect.
  • One kind of bugs may be related to CodeInjectionForTesting -- some tests may inject some code but not cleaning them up correctly.

@szetszwo
Copy link
Contributor Author

@adoroszlai , in https://github.com/apache/ratis/actions/runs/11389343669/job/31704203350?pr=1167 , there were no test failures but the build failed. Do you know why? I see it serval times.

@adoroszlai
Copy link
Contributor

there were no test failures but the build failed. Do you know why?

Can happen if the test does not complete:

[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  17:01 min
[INFO] Finished at: 2024-10-17T23:08:41Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test) on project ratis-test: There was a timeout or other error in the fork -> [Help 1]

https://github.com/apache/ratis/actions/runs/11389343669/job/31704203350?pr=1167#step:5:909

@szetszwo
Copy link
Contributor Author

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M4:test (default-test) on project ratis-test: There was a timeout or other error in the fork -> [Help 1]

I saw this error but somehow there were no test timeout errors. Not sure why.

@szetszwo
Copy link
Contributor Author

@adoroszlai , could you review this? I will continue debugging zero-copy after this.

@adoroszlai adoroszlai self-requested a review October 18, 2024 18:06
@adoroszlai adoroszlai merged commit fc39c38 into apache:master Oct 20, 2024
12 checks passed
@adoroszlai
Copy link
Contributor

Thanks @szetszwo for the patch, @133tosakarin for the review.

@szetszwo
Copy link
Contributor Author

@adoroszlai , thanks a lot for reviewing the merging this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants