-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose #7099
Conversation
…#testContainerClose. Change-Id: Ie3cad994ad003bac75b1cae82c63cb8d9b45684f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the fix @jojochuang .
Can you also run a repeat test on the TestBlockOutputStreamWithFailures#testContainerClose to confirm fixing the intermittent failure.
To run the repeating test on your branch, you have to up date your for master branch, there will be a workflow available at https://github.com/jojochuang/ozone/actions/workflows/intermittent-test-check.yml
You can refer to the repeating test submissions here
Cool bean! Running it now. |
Here's the test run 10x10 without failure |
Thanks @duongkame merged. |
* master: (50 commits) HDDS-11331. Fix Datanode unable to report for a long time (apache#7090) HDDS-11346. FS CLI gives incorrect recursive volume deletion prompt (apache#7102) HDDS-11349. Add NullPointer handling when volume/bucket tables are not initialized (apache#7103) HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache (apache#6974) HDDS-11284. refactor quota repair non-blocking while upgrade (apache#7035) HDDS-9790. Add tests for Overview page (apache#6983) HDDS-10904. [hsync] Enable PutBlock piggybacking and incremental chunk list by default (apache#7074) HDDS-11322. [hsync] Block ECKeyOutputStream from calling hsync and hflush (apache#7098) HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7099) HDDS-11340. Avoid extra PubBlock call when a full block is closed (apache#7094) HDDS-11155. Improve Volumes page UI (apache#7048) HDDS-11324. Negative value preOpLatencyMs in DN audit log (apache#7093) HDDS-11246. [Recon] Use optional chaining instead of explicit undefined check for Objects in Container and Pipeline Module. (apache#7037) HDDS-11323. Mark TestLeaseRecovery as flaky HDDS-11338. Bump zstd-jni to 1.5.6-4 (apache#7085) HDDS-11337. Bump Spring Framework to 5.3.39 (apache#7084) HDDS-11327. [hsync] Revert config default ozone.fs.hsync.enabled to false (apache#7079) HDDS-11325. Mark testWriteMoreThanMaxFlushSize as flaky HDDS-11336. Bump slf4j to 2.0.16 (apache#7086) HDDS-11335. Bump exec-maven-plugin to 3.4.1 (apache#7087) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
I'm trying to assess the current health of the master branch before bring it in to our feature branch and I'm confused by this change:
So is this PR supposed to fix failues in TestBlockOutputStream#testWriteMoreThanMaxFlushSize, TestBlockOutputStreamWithFailures#testContainerClose, or both? |
it's supposed to fix both. TestBlockOutputStreamWithFailures#testContainerClose calls TestBlockOutputStream#testWriteMoreThanMaxFlushSize |
Intellij shows no callers of Flaky annotations on both tests mentioned in my original comment are also still present and now pointing to a resolved Jira so it is still unclear what was fixed here. |
Aaah. ok I should reopen and redo it again. |
* master: (50 commits) HDDS-11331. Fix Datanode unable to report for a long time (apache#7090) HDDS-11346. FS CLI gives incorrect recursive volume deletion prompt (apache#7102) HDDS-11349. Add NullPointer handling when volume/bucket tables are not initialized (apache#7103) HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache (apache#6974) HDDS-11284. refactor quota repair non-blocking while upgrade (apache#7035) HDDS-9790. Add tests for Overview page (apache#6983) HDDS-10904. [hsync] Enable PutBlock piggybacking and incremental chunk list by default (apache#7074) HDDS-11322. [hsync] Block ECKeyOutputStream from calling hsync and hflush (apache#7098) HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7099) HDDS-11340. Avoid extra PubBlock call when a full block is closed (apache#7094) HDDS-11155. Improve Volumes page UI (apache#7048) HDDS-11324. Negative value preOpLatencyMs in DN audit log (apache#7093) HDDS-11246. [Recon] Use optional chaining instead of explicit undefined check for Objects in Container and Pipeline Module. (apache#7037) HDDS-11323. Mark TestLeaseRecovery as flaky HDDS-11338. Bump zstd-jni to 1.5.6-4 (apache#7085) HDDS-11337. Bump Spring Framework to 5.3.39 (apache#7084) HDDS-11327. [hsync] Revert config default ozone.fs.hsync.enabled to false (apache#7079) HDDS-11325. Mark testWriteMoreThanMaxFlushSize as flaky HDDS-11336. Bump slf4j to 2.0.16 (apache#7086) HDDS-11335. Bump exec-maven-plugin to 3.4.1 (apache#7087) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
…an-on-error * HDDS-10239-container-reconciliation: (428 commits) HDDS-11081. Use thread-local instance of FileSystem in Freon tests (apache#7091) HDDS-11333. Avoid hard-coded current version in upgrade/xcompat tests (apache#7089) Mark TestPipelineManagerMXBean#testPipelineInfo as flaky Mark TestAddRemoveOzoneManager#testForceBootstrap as flaky HDDS-11352. HDDS-11353. Mark TestOzoneManagerHAWithStoppedNodes as flaky HDDS-11354. Mark TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl as flaky HDDS-11355. Mark TestMultiBlockWritesWithDnFailures#testMultiBlockWritesWithIntermittentDnFailures as flaky HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. (apache#7081) HDDS-11316. Improve Create Key and Chunk IO Dashboards (apache#7075) HDDS-11239. Fix KeyOutputStream's exception handling when calling hsync concurrently (apache#7047) HDDS-11254. Reconcile commands should be handled by datanode ReplicationSupervisor (apache#7076) HDDS-11331. Fix Datanode unable to report for a long time (apache#7090) HDDS-11346. FS CLI gives incorrect recursive volume deletion prompt (apache#7102) HDDS-11349. Add NullPointer handling when volume/bucket tables are not initialized (apache#7103) HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache (apache#6974) HDDS-11284. refactor quota repair non-blocking while upgrade (apache#7035) HDDS-9790. Add tests for Overview page (apache#6983) HDDS-10904. [hsync] Enable PutBlock piggybacking and incremental chunk list by default (apache#7074) HDDS-11322. [hsync] Block ECKeyOutputStream from calling hsync and hflush (apache#7098) HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7099) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
…rrupt-files * HDDS-10239-container-reconciliation: (61 commits) HDDS-11081. Use thread-local instance of FileSystem in Freon tests (apache#7091) HDDS-11333. Avoid hard-coded current version in upgrade/xcompat tests (apache#7089) Mark TestPipelineManagerMXBean#testPipelineInfo as flaky Mark TestAddRemoveOzoneManager#testForceBootstrap as flaky HDDS-11352. HDDS-11353. Mark TestOzoneManagerHAWithStoppedNodes as flaky HDDS-11354. Mark TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl as flaky HDDS-11355. Mark TestMultiBlockWritesWithDnFailures#testMultiBlockWritesWithIntermittentDnFailures as flaky HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. (apache#7081) HDDS-11316. Improve Create Key and Chunk IO Dashboards (apache#7075) HDDS-11239. Fix KeyOutputStream's exception handling when calling hsync concurrently (apache#7047) HDDS-11254. Reconcile commands should be handled by datanode ReplicationSupervisor (apache#7076) HDDS-11331. Fix Datanode unable to report for a long time (apache#7090) HDDS-11346. FS CLI gives incorrect recursive volume deletion prompt (apache#7102) HDDS-11349. Add NullPointer handling when volume/bucket tables are not initialized (apache#7103) HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache (apache#6974) HDDS-11284. refactor quota repair non-blocking while upgrade (apache#7035) HDDS-9790. Add tests for Overview page (apache#6983) HDDS-10904. [hsync] Enable PutBlock piggybacking and incremental chunk list by default (apache#7074) HDDS-11322. [hsync] Block ECKeyOutputStream from calling hsync and hflush (apache#7098) HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7099) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
What changes were proposed in this pull request?
HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose
Please describe your PR in detail:
The assertion assumes the buffer pool is allocated 4 buffers.
After HDDS-9844, the block output stream buffer allocation becomes asynchronous and it's possible to allocate fewer buffers than before, because some response already come back and the associated buffer can be reused.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11325
How was this patch tested?
Unit test change only.