Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11243. SCM SafeModeRule Support EC. #7008

Merged
merged 2 commits into from
Nov 26, 2024

Conversation

slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Jul 31, 2024

What changes were proposed in this pull request?

We aim for SCM to immediately switch to leader once it exits safe mode. Currently, due to certain issues, we need to wait for at least one full container report from a DataNode before proceeding with the switch.

Currently, SCM SafeMode has the following issues:

Issue1

DataNodeSafeModeRule cannot effectively verify the registration status of DataNodes. In most cases, as long as there are more than one DataNode, this rule passes. Therefore, we need to strengthen this rule.

Issue2

ContainerSafeModeRule does not support verification of EC (Erasure Coding) Containers. EC Containers differ significantly from RATIS/THREE Containers because EC Containers require determining how many replicas are needed based on the EC type. For instance, for EC-6-3-1024K, we need to ensure that the Container reports having all 6 replicas before it can provide services.

This PR aims to enhance and improve the above two points.

For code Improve:

Enhance DataNodeSafeModeRule

For the registration of Datanodes, we need to obtain the complete list of Datanodes from SCM. This list can be retrieved from the Pipeline. I pass PipelineManager as a parameter into DataNodeSafeModeRule to calculate the number of Datanodes.

Enhance ContainerSafeModeRule

  • Enhance replica validation for EC containers. Obtain the required replicas based on ECReplicationConfig. Consider container reporting complete only when sufficient replicas have been reported.

  • Modify the message sending location of ContainerSafeModeRule.

// TODO : Return the list of Nodes that forms the SCM HA.
RegisteredCommand registeredCommand = scm.getScmNodeManager()
.register(datanodeDetails, nodeReport, pipelineReportsProto,
layoutInfo);
if (registeredCommand.getError()
== SCMRegisteredResponseProto.ErrorCode.success) {
eventPublisher.fireEvent(CONTAINER_REPORT,
new SCMDatanodeHeartbeatDispatcher.ContainerReportFromDatanode(
datanodeDetails, containerReportsProto));
eventPublisher.fireEvent(SCMEvents.NODE_REGISTRATION_CONT_REPORT,
new NodeRegistrationContainerReport(datanodeDetails,
containerReportsProto));
eventPublisher.fireEvent(PIPELINE_REPORT,
new PipelineReportFromDatanode(datanodeDetails,
pipelineReportsProto));
}

There are some issues in this part of the code. The handling of NODE_REGISTRATION_CONT_REPORT and CONTAINER_REPORT is asynchronous. There is a scenario where NODE_REGISTRATION_CONT_REPORT processing completes, but CONTAINER_REPORT processing does not. This still leads to insufficient EC replicas issue.

I adjusted the sending position of NODE_REGISTRATION_CONT_REPORT (requiring the message to be sent only after CONTAINER_REPORT processing completes) and introduced a new type, CONTAINER_REGISTRATION_REPORT, to distinguish it.

Page display:

image

What is the link to the Apache JIRA

JIRA: HDDS-11243: SCM SafeModeRule Support EC.

How was this patch tested?

Junit Test & Production environment validation

@slfan1989 slfan1989 marked this pull request as draft July 31, 2024 02:15
@slfan1989 slfan1989 changed the title HDDS-11243: SCM SafeModeRule Support EC. HDDS-11243. SCM SafeModeRule Support EC. Aug 1, 2024
@slfan1989 slfan1989 marked this pull request as ready for review August 4, 2024 13:04
@@ -199,6 +201,11 @@ public void onMessage(final ContainerReportFromDatanode reportFromDatanode,
// list
processMissingReplicas(datanodeDetails, expectedContainersInDatanode);
containerManager.notifyContainerReportProcessing(true, true);
if (reportFromDatanode.isRegister()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the CONTAINER_REPORT is completed, we send the message to CONTAINER_REGISTRATION_REPORT to ensure that the container count is accurate.

@slfan1989
Copy link
Contributor Author

@errose28 @siddhantsangwan Can you help review this pr? Thank you very much!

@siddhantsangwan
Copy link
Contributor

@slfan1989 Thanks for taking this up, I was earlier thinking of fixing this myself. I'll review the PR soon.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Aug 22, 2024

@siddhantsangwan Can you help review this pr? Thank you very much! The unit test errors are not caused by our changes.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 I've reviewed this partly. Have some comments below.

if (replicationConfig != null && replicationConfig instanceof ECReplicationConfig) {
ECReplicationConfig ecReplicationConfig = (ECReplicationConfig) replicationConfig;
int data = ecReplicationConfig.getData();
if (uuids != null && uuids.size() > data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Ratis, just one replica per container is required. So for EC, data number of Datanodes should be sufficient. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. For EC, the amount of data we have is already sufficient. I will improve the code.

Comment on lines 175 to 150
if (ratisContainerMap.containsKey(containerID)) {
ratisContainerDNsMap.computeIfAbsent(containerID, key -> Sets.newHashSet());
ratisContainerDNsMap.get(containerID).add(datanodeUUID);
if (!reportedConatinerIDSet.contains(containerID)) {
Set<UUID> uuids = ratisContainerDNsMap.get(containerID);
if (uuids != null && uuids.size() >= 1) {
ratisContainerWithMinReplicas.getAndAdd(1);
reportedConatinerIDSet.add(containerID);
getSafeModeMetrics()
.incCurrentContainersWithOneReplicaReportedCount();
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really understand this change. It seems correct, but is there any reason this logic isn't the same as before? Why do we need to track Datanodes in a set for Ratis containers? Is it because ratisContainerDNsMap and reportedConatinerIDSet are going to be used somewhere else as well? Or is it done this way just so it's similar to the EC logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the question!

I didn't really understand this change. It seems correct, but is there any reason this logic isn't the same as before?

The previous logic was correct. I made this modification for two reasons:

  • To align with the EC's logic and improve code readability.
  • To facilitate the retrieval of additional data in future pr. For example, this will allow users not only to understand the progress but also to identify which containers have not reported and which DataNodes are included in the reported containers.

Why do we need to track Datanodes in a set for Ratis containers? Is it because ratisContainerDNsMap and reportedConatinerIDSet are going to be used somewhere else as well?

The type of ratisContainerDNsMap is Map<Long, Set<UUID>>, where the key is the ContainerId. The reason for using a Set as the value is to avoid retaining duplicate DN information, as we may encounter the same DN registering multiple times.

Or is it done this way just so it's similar to the EC logic?

Here's one reason; it has already been explained in the previous comment.

Can we modify it this way? The original code contains some insufficient information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan Can you help review this PR again? Thank you very much!

I improved some of the code, made it less repetitive, and added some comments.

long ratisCutOff = (long) Math.ceil(ratisMaxContainer * safeModeCutoff);
long ecCutOff = (long) Math.ceil(ecMaxContainer * safeModeCutoff);

getSafeModeMetrics().setNumContainerWithOneReplicaReportedThreshold(ratisCutOff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set EC metrics as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will improve this part of the code.

Comment on lines 276 to 310
private void reInitializeRule() {
containerMap.clear();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like most of the code inside this method is the same as before. If possible, let's refactor this to avoid repetition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will improve this part of the code.

@@ -75,10 +79,18 @@ public void setNumContainerWithOneReplicaReportedThreshold(long val) {
this.numContainerWithOneReplicaReportedThreshold.set(val);
}

public void setNumContainerWithECDataReplicaReportedThreshold(long val) {
this.numContainerWithECDataReplicaReportedThreshold.incr(val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use set() instead of incr().

@slfan1989
Copy link
Contributor Author

@slfan1989 I've reviewed this partly. Have some comments below.

Thank you very much for reviewing this PR! I will respond to your questions as soon as possible.

@@ -1695,6 +1695,15 @@
</description>
</property>

<property>
<name>hdds.scm.safemode.reported.datanode.pct</name>
<value>0.90</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 90% is too much and a significant difference from previous config, If there is a cluster say with under-utilized DN's on which which there is no data in ~~30-40% of total DN's , Safemode would still wait for these to be registered. IMO DatanodeSafemodeRule is to ensure there are datanodes available for a write to go through. We already do the check to see if enough containers are available for reading in the containerSafemodeRule

cc @nandakumar131

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for helping review the code! From my personal perspective, I believe we should still have an optional configuration to control this. You made a valid point—0.9 might be a relatively large value, but if only one DN is registered and the rule passes, it seems a bit too lenient. We set the default value to 0.1. Do you think that would be acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 0.1 sounds good, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that the Datanode safe mode rule is meant to ensure writes work. So that means we only need one Datanode to be present as Ozone still allows single replica writes.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Oct 3, 2024

@errose28 @siddhantsangwan @sadanand48 @adoroszlai

Thank you all very much for paying attention to this PR!

To facilitate a better review of this PR, I'll summarize some additional information for your reference.

Background:

In our production environment, we use EC (Erasure Coding) strategy, and we have written a lot of EC data.
Sometimes we need to restart the SCM.
After the SCM restarts, it can exit safe mode quickly, but when we switch, we encounter an issue where user applications report an error stating: There are insufficient datanodes to read the EC block.

Solution Process:

Our hope is that once the SCM meets the safe mode criteria, it can switch to become the leader SCM, and users' access will no longer report errors.

We found that there are some issues with the two rules for safe mode.

ContainerSafeModeRule is missing the EC validation.

Through our familiarity and understanding of the code, we found that ContainerSafeModeRule.java does not handle EC Containers. This leads to situations where an EC Container with only one replica is reported as successful, but EC requires a rule-based assessment. For EC-6-3-1024K, we need 6 replicas to meet the criteria.

The rules in DataNodeSafeModeRule are lenient.

In DataNodeSafeModeRule, the default condition for exiting the rule is that only one DataNode is registered. I think it would be better if we could configure a proportional parameter to control when we can exit.

Therefore, we added the parameter hdds.scm.safemode.reported.datanode.pct and calculated the actual required number of registered DataNodes based on the number of DataNodes stored in the pipeline.

This is an example of the actual usage effect.
image

if (pipeLineDnSet.contains(dnUUID) || !registeredDnSet.contains(dnUUID)) {
registeredDnSet.add(dnUUID);
registeredDns = registeredDnSet.size();
unRegisteredDn.remove(dnUUID);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@errose28 Regarding the issue we discussed together in HDDS-11481, I plan to add a variable here to store unRegisteredDn and display it in the status. Do you think this approach is acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the the persisted pipeline membership to determine which nodes have not been seen yet? That should work. It won't catch the cases where a new DN not in any pipelines has not registered yet but it at least provides more information.

I'm not sure that putting all nodes back in the unregistered list on refresh it the correct behavior though, since nodes that have already registered should remain accounted for by the rule on refresh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your message! I have made adjustments to this part of the logic. During re-registration, only DNs that are not in the registeredDnSet will be placed in the unRegisteredDn.

@slfan1989
Copy link
Contributor Author

@siddhantsangwan A friendly ping! Supporting EC Container recognition is crucial for SafeMode. I’d like to continue contributing to gain recognition for this change. What additional tasks can I pursue? I would appreciate any guidance or suggestions you can provide.

@siddhantsangwan
Copy link
Contributor

@slfan1989 thanks for updating! I'll be able to review this and have a discussion with you next week, probably on Tuesday.

* @param isEcContainer true, means ECContainer, false, means not ECContainer.
*/
private void recordReportedContainer(long containerID, boolean isEcContainer) {
if (!reportedContainerIDSet.contains(containerID)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get rid of reportedContainerIDSet and just use the ratis and ec maps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion! We can indeed remove reportedContainerIDSet, and I will improve the code.

return 1;
}

private void initContainerDNsMap(long containerID, Map<Long, Set<UUID>> containerDNsMap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming this to putInContainerDNsMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will improve the code.


private AtomicLong containerWithMinReplicas = new AtomicLong(0);
private Set<Long> reportedContainerIDSet = new HashSet<>();
private Map<Long, ContainerInfo> ratisContainerMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really ratisContainerMap and ecContainerMap to be maps at all? As far as I can see, all we need are the container IDs. We can then just use container manager to get the container info object when needed. There's probably at least a couple of GBs of overhead for storing references to billions of ContainerInfo objects in the map, which we don't really need. It can just be a set of container IDs.

Going a step further, I feel like we don't need the List of ContainerInfo objects that's being passed into the constructor of this class. Ultimately, all we need is a mapping from container id to container info for all container IDs. So the constructor should either have that as an argument, or just the container manager, since the container manager can simply be used to get any information we need about the containers in the system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is also very reasonable. I’ve been using ContainerInfo solely to retrieve the minimum replica count of the container. I can optimize ratisContainerMap and ecContainerMap so that these two variables only store the mapping between ContainerID and its corresponding minimum replica count.

ReplicationConfig replicationConfig = container.getReplicationConfig();

if (checkContainerState(containerState) && container.getNumberOfKeys() > 0) {
if (replicationConfig instanceof RatisReplicationConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more intuitive to do something like:

container.getReplicationType().equals(HddsProtos.ReplicationType.RATIS)

}
}
}

private void reInitializeRule() {
containerMap.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was also clearing this map but the new code isn't; can you check if we need to clear the map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I carefully readed the code, and indeed we should also preserve the logic for cleaning up the Map. I have added the relevant logic in the initializeRule method.

@slfan1989 slfan1989 marked this pull request as ready for review November 11, 2024 10:23
@slfan1989
Copy link
Contributor Author

@siddhantsangwan Thank you very much for reviewing the code! I have made the changes according to your suggestions and added unit tests. The unit tests cover common EC types, such as EC-3-2-1024K and EC-6-3-1024K. I would appreciate it if you could find some time to review this PR again.

I have submitted the code to my personal repository, and the CI(https://github.com/slfan1989/ozone/actions/runs/11771095727) shows that all checks have passed. I have now changed the status of this PR to "Ready for Review."

@adoroszlai
Copy link
Contributor

@nandakumar131 can you please review as well?

@slfan1989
Copy link
Contributor Author

@nandakumar131 can you please review as well?

@adoroszlai Thank you very much for reviewing this PR! This improvement is very important to us. Currently, when we restart the SCM, it cannot determine whether the EC Container has finished reporting because, similar to the Ratis 3-replica Container, the SCM considers the Container ready as soon as just one replica reports successfully. This results in an issue where we are unable to promote the SCM to leader when it has just restarted and has already exited safe mode.

This PR has been in use internally for several months, and I personally believe it has met expectations. Currently, we have fully transitioned our internal Ozone cluster to the EC-6-3-1024K strategy (meaning there is almost no 3-replica data in the cluster, with only a small amount, less than 10PB, as exceptions). This decision was driven by cost considerations, as we have already stored over 100PB of data.

I sincerely hope we can continue to push this PR forward. If there are any suggestions for improvement, I will continue to make the necessary changes.

cc: @siddhantsangwan @sadanand48 @errose28

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to suggest relatively simple code changes to reduce duplication in ContainerSafeModeRule: create an instance for each replication type. See: adoroszlai@fb815f3

@slfan1989
Copy link
Contributor Author

I would like to suggest relatively simple code changes to reduce duplication in ContainerSafeModeRule: create an instance for each replication type. See: adoroszlai@fb815f3

@adoroszlai Thank you very much for the code modifications you provided! I am currently reviewing this part of the code and optimizing the PR based on your suggestions.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 thanks for your sustained efforts. I've left some more comments below, mostly regarding maintaining unnecessary data in memory. At high scale, these memory optimisations will make a big difference!

private double maxContainer;

private AtomicLong containerWithMinReplicas = new AtomicLong(0);
private Map<Long, Integer> ratisContainerMinReplicaMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map is not needed as far as I can tell. It can just be a Set of Ratis container ids (long), since the min replica count for ratis containers is always 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! I have improved the relevant code.


private AtomicLong containerWithMinReplicas = new AtomicLong(0);
private Map<Long, Integer> ratisContainerMinReplicaMap;
private Map<Long, Set<UUID>> ratisContainerDNsMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why maintain a mapping for ratis containers? If we use the set of container id that I mentioned above, we can simply remove a container id from the set when a datanode reports having a replica of that container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have improved the code based on the method you suggested.

private AtomicLong containerWithMinReplicas = new AtomicLong(0);
private Map<Long, Integer> ratisContainerMinReplicaMap;
private Map<Long, Set<UUID>> ratisContainerDNsMap;
private Map<Long, Integer> ecContainerMinReplicaMap;
Copy link
Contributor

@siddhantsangwan siddhantsangwan Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to maintain this mapping either, as far as I can tell. Whenever we have the container id and need the replication factor of that container, we can use a container manager method to get it. That'll be a constant time (O(1)) lookup for container manager on average.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the logic has also been improved.

@siddhantsangwan
Copy link
Contributor

@slfan1989 I'm not sure about the Datanode safe mode rule related improvements in this pull request. Logically it's a separate change from adding EC safe mode support, and so it should have a different jira and pull request. I feel it requires more thinking and I'm not sure how carefully others have reviewed it. It's best suited for a different PR.

So in the interest of time, I suggest removing those changes from this PR and introducing them in a separate PR. That way, we'll be able to merge this PR sooner.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commits look good to me. With this, all the EC container safe mode related changes are ready to be merged IMO. If we can remove the datanode rule related changes and have a green CI run, we'll be able to merge the PR.

@slfan1989
Copy link
Contributor Author

The latest commits look good to me. With this, all the EC container safe mode related changes are ready to be merged IMO. If we can remove the datanode rule related changes and have a green CI run, we'll be able to merge the PR.

@siddhantsangwan Thank you for your continued improvement suggestions! I will remove the datanode rule related changes in this PR.

@siddhantsangwan
Copy link
Contributor

@slfan1989 test failure looks related, can you take a look? Also please let me know once it's ready for a final review.

@slfan1989
Copy link
Contributor Author

@slfan1989 test failure looks related, can you take a look? Also please let me know once it's ready for a final review.

@siddhantsangwan I have fixed the errors in the unit tests and am waiting for the CI to pass. Once that’s done, I will ask you to help with another review. Thank you again!

@slfan1989
Copy link
Contributor Author

@siddhantsangwan I have rechecked the code, and this version is the final one. I have also rebased the code. Could you please review it again? Thank you very much!

@adoroszlai
Copy link
Contributor

@slfan1989
Copy link
Contributor Author

Please try to avoid force-push when updating the PR. Here are some great articles that explain why:

https://developers.mattermost.com/blog/submitting-great-prs/#4-avoid-force-pushing https://www.freecodecamp.org/news/optimize-pull-requests-for-reviewer-happiness#request-a-review

@adoroszlai Thank you for providing this information! I will pay attention to this detail in future development to avoid issues caused by force-pushing.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the tests.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending green CI.

@siddhantsangwan siddhantsangwan merged commit a99ab27 into apache:master Nov 26, 2024
40 checks passed
@siddhantsangwan
Copy link
Contributor

Merged, thanks everyone!

@slfan1989
Copy link
Contributor Author

Thank you all for your support in helping us complete this PR. I greatly appreciate everyone’s valuable feedback throughout the process. @siddhantsangwan's professionalism left a strong impression on me—many thanks once again. I’d also like to extend my gratitude to @adoroszlai for their continued assistance. I’ve learned a lot throughout this process.

fapifta pushed a commit to fapifta/hadoop-ozone that referenced this pull request Dec 8, 2024
aswinshakil added a commit to aswinshakil/ozone that referenced this pull request Dec 19, 2024
…239-container-reconciliation-merge

Commits:
0066526 HDDS-11869. Enable OM Ratis in TestOzoneDelegationTokenSecretManager (apache#7594)
4fe166d HDDS-11957. Make breadcrumb scrollable for long path names in DiskUsage page (apache#7590)
a523e17 HDDS-11846. [Recon] Recon Schema version_number column is always set as -1. (apache#7554)
f5ff2f0 HDDS-11868. Enable OM Ratis in TestQuotaRepairTask (apache#7593)
3a0e3b5 HDDS-11845. Extract k8s definitions for HttpFS and Recon from getting-started example (apache#7523)
6e0c753 HDDS-11509. logging improvements for directory deletion (apache#7314)
1f29e05 HDDS-11934. Split compat suite to old/new (apache#7578)
bde8cf4 HDDS-11759. Remove LegacyReplicationManager (apache#7580)
a27e4ec HDDS-11779. Add DN metrics to show deletion progress (apache#7552)
976e45f HDDS-11711. Add SCM metrics for delete commands sent and response received per datanode (apache#7522)
c28e16e HDDS-11950. Enable sortpom in dev-support module. (apache#7586)
dae388b HDDS-11907. OzoneSecretKey does not need to implement Writable (apache#7574)
8bb0587 HDDS-11712. Process other DeletedBlocksTransaction before retrying failed one. (apache#7532)
3648b59 HDDS-11906. Add sortpom dependency, sort root POM. (apache#7555)
54f0272 HDDS-11807. Make callId different for each request in openKeyCleanupService (apache#7551)
c523825 HDDS-11926 - Rename bucket name for bucket info/ls for linked buckets (apache#7581)
daf2f9f HDDS-11863. Speed up TestFSORepairTool (apache#7561)
f5e5493 HDDS-11927. Fix flaky TestContainerBalancerStatusInfo.testGetCurrentStatisticsWhileBalancingInProgress (apache#7579)
bef2415 HDDS-11940. Bump jline to 3.28.0 (apache#7576)
1453fd9 HDDS-11935. Bump develocity-maven-extension to 1.23 (apache#7577)
202b0c7 HDDS-11860. Improve BufferUtils.writeFully. (apache#7564)
008f9a6 HDDS-11852. Reduce duplication in some GenericCli subclasses (apache#7553)
7a46080 HDDS-11914. Snapshot diff should not filter SST Files based by reading SST file reader (apache#7563)
1835326 HDDS-11927. Mark testGetCurrentStatisticsWhileBalancingInProgress as flaky
66ccc25 HDDS-11908. Snapshot diff DAG traversal should not skip node based on prefix presence (apache#7567)
16ba289 Revert "HDDS-11413. PipelineManagerImpl lock optimization reduces AllocateBlock latency (apache#7160)"
bf6f323 HDDS-11413. PipelineManagerImpl lock optimization reduces AllocateBlock latency (apache#7160)
853d657 HDDS-11893. Fix full snapshot diff fallback logic because of DAG pruning (apache#7549)
b5d04e2 HDDS-11915. Netty OpenSsl not available in acceptance tests on arm64 (apache#7570)
8536490 HDDS-11367. Fix flaky balancer robot test (apache#7569)
745ed1c HDDS-11367. Improve ozone balancing status command output (apache#7139)
6b9cbe0 HDDS-11909. Intermittent timeout building Hadoop in s3a test (apache#7559)
eea5600 HDDS-11911. Return consistent error code when snapshot is not found in the DB or Snapshot Chain. (apache#7557)
e8f3b25 HDDS-11873. Skip old-only xcompat read tests (apache#7534)
ec348a7 HDDS-11889. Include Maven dependencies for hdds-rocks-native in cache (apache#7546)
aa37ae8 HDDS-11885. Download Hadoop for S3A test from mirrors if available (apache#7545)
befd64e HDDS-11694. Safemode Improvement: Introduce factory class to create safemode rules. (apache#7433)
a46153d HDDS-11872. Disable Apache snapshots repo (apache#7536)
092b000 HDDS-11890. Update project description in GitHub (apache#7547)
80c6446 HDDS-8101. Add tool to repair broken FSO tree (apache#7368)
23197e2 HDDS-11605. Directory deletion service should support multiple threads (apache#7349)
055b13c HDDS-11751. Use Java 21 in CI (apache#7458)
d0d82c5 HDDS-11886. Bump license-maven-plugin to 2.5.0 (apache#7539)
f7fe30a HDDS-11691. Support object tags in ObjectEndpointStreaming#put (apache#7543)
9854591 HDDS-11882. Make BOM, not aggregate one (apache#7544)
af345b2 HDDS-11877. Enable Maven cache for more checks (apache#7538)
51c6ed6 HDDS-11830. Subcommands should not extend GenericCli .(apache#7537)
959a39d HDDS-11851. Finer-grained subcommand interface for OzoneDebug and OzoneRepair. (apache#7526)
d4c41e5 HDDS-11334. Improve EC xcompat acceptance test (apache#7492)
8526f2e HDDS-11826. Interactive mode for ozone shell. (apache#7515)
fc63710 HDDS-11833. Return NotImplemented for S3 put-object-acl request. (apache#7531)
e8ad7ad HDDS-11782. ozone debug ldb --with-keys defaults to false instead of true (apache#7521)
cb0a402 HDDS-11859. Remove mention of fuse from s3 interface docs page (apache#7530)
e17f92c HDDS-10821. Ensure ChunkBuffer fully writes buffer to FileChannel (apache#6652)
d66c088 HDDS-11848. Serialization bug in Recon listKeys API (apache#7524)
faab1e8 HDDS-11656. Default native ACL limits to user and user's primary group (apache#7455)
8a1967e HDDS-11719. Remove dependency on server components from ozonefs-common (apache#7438)
9fcecc1 HDDS-11855. Mark TestContainerBalancerDatanodeNodeLimit#checkIterationResultException as flaky
65df308 HDDS-11794. Display HostName in OM / SCM Overview. (apache#7482)
0bde3a2 HDDS-11410. Refactoring more tests from TestContainerBalancerTask (apache#7156)
98e070e HDDS-11728. Refactor subcommand layouts of ozone debug and repair (apache#7489)
1c5676b HDDS-11849. Mark TestBlockOutputStreamWithFailures.test2DatanodesFailure as flaky
9dd8bb9 HDDS-11847. Mark TestSnapshotDeletingServiceIntegrationTest#testParallelExcecutionOfKeyDeletionAndSnapshotDeletion as flaky
b27714f HDDS-11266. Update proto.lock for Ozone 1.4.1 (apache#7504)
9b61937 HDDS-11806. Add HttpFS and Recon in getting-started k8s example (apache#7485)
77ce962 HDDS-10568. When the ldb command is executed, it is output by line (apache#7467)
69538b0 HDDS-11687. Robot warning: replace "is not" with "!=" (apache#7516)
b60d897 HDDS-11820. Create test principals at test run time (apache#7507)
6ba309a HDDS-11831. Finer-grained interface for dynamically registered subcommands (apache#7514)
db36e39 HDDS-11822. Register subcommands in OzoneShell (apache#7513)
e6bd3f5 HDDS-11829. Bump zstd-jni to 1.5.6-8 (apache#7510)
09a4a90 HDDS-11827. Bump exec-maven-plugin to 3.5.0 (apache#7512)
ad867bc HDDS-11824. Bump sqlite-jdbc to 3.47.1.0 (apache#7511)
850306d HDDS-11823. Bump cyclonedx-maven-plugin to 2.9.1 (apache#7508)
34f9d9e HDDS-11742. Update metrics with leaderId if known when starting SCM (apache#7471)
2c6c116 HDDS-11265. Add Ozone 1.4.1 to compatibility acceptance tests (apache#7503)
ebcdc6a HDDS-11810. Secure acceptance test on arm64 fails with LoginException: Checksum failed (apache#7498)
7f40624 HDDS-11821. Mark TestECKeyOutputStream#testECKeyCreatetWithDatanodeIdChange as unhealthy
9b26156 HDDS-11811. rocksdbjni deleted on exit could be used by other components apache#7493
3f92663 HDDS-11785. DataNode aborts ContainerStateMachine if it does not know any follower next index (apache#7480)
f0a2c87 HDDS-11773. Prevent frequent DataNode Ratis snapshotting. (apache#7473)
1383c18 HDDS-11718. Some CI checks passing despite error (apache#7483)
f98eac2 HDDS-11561. Refactor Open Key Search Endpoint and Consolidate with OmDBInsightEndpoint Using StartPrefix Parameter. (apache#7336)
a99ab27 HDDS-11243. SCM SafeModeRule Support EC. (apache#7008)
6871547 HDDS-11716. Address Incomplete Upgrade Scenario in Recon Upgrade Framework (apache#7452)
9bc9145 HDDS-10411. Support incremental ChunkBuffer checksum calculation (apache#7189)
579a38e HDDS-11723. Tool to better micro benchmark hbase performance in Ozone (apache#7463)
cc1a374 HDDS-11704. Hadoop test leaves running containers in case of failure (apache#7435)
c4b2056 Update documentation to mention that container schemaV3 is default (apache#7481)
20c4cfa HDDS-11386. Multithreading bug in ContainerBalancerTask (apache#7339)
b090312 HDDS-11780. Increase client write retry when SCM is in safe mode (apache#7470)
f8e4db9 HDDS-11793. Bump maven-checkstyle-plugin to 3.6.0 (apache#7476)
6a1ff84 HDDS-11791. Bump commons-io to 2.18.0 (apache#7478)
d7f8235 HDDS-11788. Bump log4j2 to 2.24.2 (apache#7479)
6547de7 HDDS-11790. Bump commons-lang3 to 3.17.0 (apache#7475)
1d8abd6 HDDS-11789. Bump zstd-jni to 1.5.6-7 (apache#7477)
6ca7230 HDDS-11769. Add tools folder into ozone src package. (apache#7466)
3b8ed58 HDDS-11682. Bump maven-resources-plugin to 3.3.0 (apache#7384)
f4a9ee0 HDDS-11702. Merge test_bucket_encryption into robot compatibility test (apache#7451)
d6a5488 HDDS-11713. Use seek to reach the start transaction. (apache#7460)
d52615a HDDS-11733. Remove okio versioning and unused okhttp dependency (apache#7447)
1a49991 HDDS-11617. Update hadoop to 3.4.1 (apache#7376)
9945de6 HDDS-11667. Validating DatanodeID on any request to the datanode (apache#7418)
fc6a2ea HDDS-11650. ContainerId list to track all containers created in a datanode (apache#7402)
a8db9cd HDDS-11749. Extract moveToTrash implementation to client code (apache#7453)
3ba3474 HDDS-11755. mktemp --suffix does not work on Mac (apache#7457)
433c7bb HDDS-11729. Update skipRecon property to skip only frontend build (apache#7454)
6b40003 HDDS-11739. Extract generic unmarshaller for S3 requests (apache#7449)
c7f65e7 HDDS-11740. Add debug command to show internal component versions (apache#7450)
0f7104e HDDS-11708. Recon ListKeys API should return a proper http response status code if NSSummary rebuild is in progress. (apache#7437)
0e0d5e9 HDDS-11163. Improve Heatmap page UI (apache#7420)
2cef393 HDDS-11696. Limit max number of entries in list keys/status response (apache#7431)
e96e314 HDDS-11697. Integrate Ozone Filesystem Implementation with Ozone ListStatusLight API (apache#7440)
ebcbce7 HDDS-11644. Close OMLayoutVersionManager (apache#7445)
20e4969 HDDS-11737. UnsupportedOperationException in S3 setBucketAcl (apache#7448)
b252181 HDDS-10804. Include only limited set of ports in Pipeline proto (apache#6655)
79ca956 HDDS-8829. Symmetric Keys for Delegation Tokens (apache#7394)
3e798e6 HDDS-11698. Use hadoop images from GitHub in CI (apache#7432)
3e278b7 HDDS-10655. Support PutObjectTagging, GetObjectTagging, and DeleteObjectTagging (apache#6756)
036e727 HDDS-11732. Fix ACL check on bucket resolution while reading from snapshot (apache#7446)
dbda703 HDDS-11736. Bump maven-javadoc-plugin to 3.11.1 (apache#7444)
238f232 HDDS-11692. Skip spotbugs for modules with only generated code. (apache#7428)
f60ad61 HDDS-11705. Snapshot operations on linked buckets should work on actual underlying bucket (apache#7434)
dd22dbe HDDS-11615. Add Upgrade Action for Initial Schema Constraints for Unhealthy Container Table in Recon. (apache#7372)
4066c7c HDDS-117. Add convenience methods for port management in DatanodeDetails (apache#7408)
12419fa HDDS-11695. SCM follower should not log NotLeaderException during Pipeline Report processing. (apache#7430)
5275ded HDDS-10133. Add a method to check key name in OMKeyRequest (apache#6012)
fd5c6d8 HDDS-11689. Extract scheduled workflow for populate-cache (apache#7429)
889ba80 HDDS-11653. Bump Ratis to 3.1.2 (apache#7427)
47ec4dd HDDS-11671. Refer to website for supported versions (apache#7412)
10cac80 HDDS-11686. Use ozone image from GitHub in CI (apache#7425)
aa6da3e HDDS-9781. Limited maxOpenFiles, disabled enableCompactionDag, and createCheckpointDirs when creating OMMetadataManager instance for bootstrapping (apache#7095)
9dd6a83 HDDS-11645. Mark TestReconScmSnapshot#testExplicitRemovalOfNode as flaky
8e617dc HDDS-11672. Mark TestSnapshotBackgroundServices#testCompactionLogBackgroundService as flaky
d09e6d4 HDDS-11646. Mark TestXceiverClientMetrics#testMetrics as flaky
8e4a508 HDDS-11668. Recon List Keys API: Reuse key prefix if parentID is the same (apache#7410)
ee63232 HDDS-11684. Remove suppression of HiddenField (apache#7423)
a33d8a3 HDDS-10166. Replace GenericTestUtils temporary directories with `@TempDir` (apache#7399)
3a18a9d HDDS-11664. Hadoop download failure not reported as error (apache#7421)
47c2409 HDDS-64. OzoneClientException should extend IOException. (apache#7403)
27fcd0c HDDS-11685. Use ozone-testkrb5 from GitHub (apache#7424)
5663971 HDDS-11665. Minor optimizations on the write path (apache#7407)
cb81f0c HDDS-11683. Skip shade in most integration checks (apache#7422)
952e0ec HDDS-11681. Bump Bouncy Castle to 1.79 (apache#7387)
2797c45 HDDS-11677. Bump sqlite-jdbc to 3.47.0.0 (apache#7413)
358534b HDDS-11675. Bump maven-site-plugin to 3.21.0 (apache#7414)
cf79245 HDDS-11674. Bump junit to 5.11.3 (apache#7415)
6dd566f HDDS-11583. Use ozone-runner from GitHub in CI (apache#7409)
ef2bf98 HDDS-11669. In OmUtils.normalizeKey isDebugEnabled should be evaluated first (apache#7411)
a7e3014 HDDS-11660. Recon List Key API: Reduce object creation and buffering memory (apache#7405)
5d18b9c HDDS-11659. Improve HSync compatibility test (apache#7404)
4e603aa HDDS-11462. Enhancing DataNode I/O Monitoring Capabilities. (apache#7206)
18f6e8a HDDS-11311. Added Compatibility test for HSync (apache#7400)
0415c0b HDDS-11649. Recon ListKeys API: Simplify filter predicates (apache#7395)
2547ac0 HDDS-11652. Fix SCM start command in SCM-HA doc (apache#7398)
c045839 HDDS-11623. Improve OM Ratis Configuration change log message (apache#7388)
2b1524b HDDS-11609. Switch to Recon v2 UI as the default UI (apache#7358)
efe5892 HDDS-11641. Allow testing Hadoop with custom docker images (apache#7393)
c055036 HDDS-11637. Compile failure is ignored in build check (apache#7389)
67e5261 HDDS-11563. Display OM/SCM service ID as Namespace in web UI (apache#7321)
0fb5e50 HDDS-11587. Ozone Manager not processing file put requests with multi-tenancy enabled (apache#7316)
786bb49 HDDS-11642. MutableQuantiles should be stopped (apache#7392)
76ec9b9 HDDS-11639. Upgrade ozone-runner to Rocky Linux 9.3 (apache#7391)
3bc3b8a HDDS-11621. Fix missing HADOOP_ variables in MR acceptance test (apache#7375)
6f9db61 HDDS-11200. Hsync client-side metrics (apache#7371)
58d1443 HDDS-10240. Cleanup zero-copy EC (apache#7340)
5b065d8 HDDS-11638. Bump cyclonedx-maven-plugin to 2.9.0 (apache#7383)
c7a196f HDDS-11635. Memory leak when using Ozone FS via Hadoop FileContext API (apache#7382)
c9956a1 HDDS-11601. Intermittent failure in acceptance balancer test. (apache#7343)
a737fc3 HDDS-11619. Remove dependency on hadoop-shaded-guava (apache#7373)
c4d6857 HDDS-11584. Document ozone debug ldb command (apache#7313)
dded26e HDDS-11588. Add main artifact jar to classpath file (apache#7324)
afed6d9 HDDS-11558. Make OM client retry idempotent (apache#7329)
e85b32d HDDS-11591. Copy dependencies when building each module (apache#7325)
72e56d7 HDDS-11601. Disable flaky EC balancer acceptance test
ab16cbe HDDS-11507. Add error information to log while handling ServiceException. (apache#7367)
980b960 HDDS-11380. Make node decommission error message more comprehensive (apache#7155)
61c094f HDDS-11614. Speed up TestTransferLeadershipShell (apache#7370)
91188b3 HDDS-11352. Remove Flaky annotation from TestOzoneManagerHAWithStoppedNodes using Ratis 3.1.1
faf133d HDDS-11220. Initialize block length using the chunk list from DataNode before seek (apache#7221)
91d41a0 HDDS-11465. Introducing Schema Versioning for Recon to Handle Fresh Installs and Upgrades. (apache#7213)
7a27db2 HDDS-11134. Create compatibility test for FSO bucket usage (apache#7350)
30906d1 HDDS-11612. Bump jnr-posix to 3.1.20 (apache#7360)
bed4aef HDDS-11611. Bump docker-maven-plugin to 0.45.1 (apache#7362)
e2c3d57 HDDS-11610. Bump maven-dependency-plugin to 3.8.1 (apache#7361)
24c1000 HDDS-11041. Add admin request filter for S3 requests and UGI support for GrpcOmTransport (apache#7268)
0b84998 HDDS-11594. Update batchPut buffer log for rocksdb. (apache#7356)
c013516 HDDS-11608. Client should not retry invalid protobuf request (apache#7354)
32a8c09 HDDS-11160. Improve Insights page UI (apache#7327)
782ad62 HDDS-11388. Fix unnecessary call to the DB for ContainerBalancer#getBalancerStatusInfo (apache#7224)
35b6a3a HDDS-11600. Intermittent failure in repro due to ordering differences in builddef.lst (apache#7342)
e7bf154 HDDS-11132. Revert client version bump done as part of HDDS-10983 (apache#7348)
ea5cbff HDDS-11602. Bump ozone-runner to 20241022-jdk17-1 (apache#7347)
3f98df5 HDDS-11580. Validate 'hdds.datanode.dir.du.reserved' property (apache#7328)
8568075 HDDS-11570. Fix HDDS Docs build failure with Hugo v0.135.0 (apache#7337)
721ae58 HDDS-11057. Enable reproducible builds (apache#6856)
86b7aae HDDS-11205. Implement a search feature for users to locate keys pending Deletion within the OM Deleted Keys Insights section (apache#6969)
f7b428d HDDS-11503. Add Robot test to verify Container Balancer for EC containers. (apache#7311)
85eb89b HDDS-11483. Make s3g object get and put operation buffer configurable (apache#7233)
515977a HDDS-11582. Bump body-parser to 1.20.3 (apache#7307)
9b66267 HDDS-11589. ReconSCMDBDefinition should be singleton. (apache#7323)
f784a84 HDDS-11578. Unify constants for RATIS_SNAPSHOT_DIR (apache#7310)
4670a5e HDDS-11498. Improve SCM deletion efficiency. (apache#7249)
3fb2cf0 HDDS-11108. Extract keywords for multipart upload tests (apache#7318)
4b24aa9 HDDS-11545. [UI] Add OM and SCM ID information (apache#7287)
860e269 HDDS-11538. Let coverage report link to java sources (apache#7280)
2139367 HDDS-11581. Remove duplicate ContainerStateMachine#RaftGroupId (apache#7312)
64e035d HDDS-11573. Remove lib/gson-2.10.1.jar (apache#7309)
ce07a3c HDDS-11456. Require successful dependency/licence checks for acceptance/compile/kubernetes (apache#7209)
c579d06 HDDS-11574. Ozone client leak in TestS3SDKV1 (apache#7308)
c044b79 HDDS-10390. MiniOzoneCluster to support S3 Gateway (apache#7281)
8eef589 HDDS-11557. Simplify DBColumnFamilyDefinition. (apache#7298)
4c77f6b HDDS-11562. Parameterize TestSCMNodeManager#testProcessLayoutVersion (apache#7300)
b51c4b3 HDDS-11572. Bump commons-io to 2.17.0 (apache#7305)
3a37870 HDDS-11571. Bump log4j2 to 2.24.1 (apache#7301)
494798c HDDS-11564. Mark TestBlockOutputStream#testWriteExactly... as flaky
fabf512 HDDS-11569. Bump restrict-imports-enforcer-rule from 2.5.0 to 2.6.0 (apache#7303)
1e62a0a HDDS-11568. Bump commons-codec to 1.17.1 (apache#7304)
e9f92a7 HDDS-11567. Bump common-custom-user-data-maven-extension to 2.0.1 (apache#7302)
cb44d5e HDDS-11555. SCMDBDefinition should be singleton. (apache#7296)
d473134 HDDS-11486. Reduce log level for NativeLibraryNotLoadedException in SnapshotDiffManager (apache#7290)
3348d91 HDDS-11564. Mark TestBlockOutputStream as flaky
e2f2aeb HDDS-11548. Add some logging to the StateMachine (apache#7291)
523c860 HDDS-11439. De-duplicate code for ReplicatedFileChecksumHelper and ECFileChecksumHelper (apache#7264)
05a409e HDDS-11519. Clean up unused lines in BlockOutputStream
ffe7198 HDDS-11544. Improve work with arrays (apache#7286)
5657604 HDDS-11556. Add a getTypeClass method to Codec. (apache#7295)
256aad9 HDDS-11546. Add regex operation to filter option of ldb scan command. (apache#7289)
7ef7de2 HDDS-11482. EC Checksum throws IllegalArgumentException because the buffer limit is negative (apache#7230)
77c17df HDDS-11551. Provide details about integration check failure (apache#7294)
911a583 HDDS-8188. Support max allowed length in response of ozone admin container list (apache#7181)
7f2e0e3 HDDS-11554. OMDBDefinition should be singleton. (apache#7292)
170761c HDDS-11547. Make MAVEN_OPTS optional (apache#7288)
4846e97 HDDS-11543. Track OzoneClient object leaks via LeakDetector framework. (apache#7285)
e00f7ae HDDS-11159. Improve Containers page UI (apache#7267)
cfda951 HDDS-11520. Fix Delete pending directories key mapping (apache#7269)
2e3de8a HDDS-11476. Implement lesser/greater operation for --filter option of ldb scan command (apache#7222)
06ccdb3 HDDS-11526. Fix hdds.datanode.metadata.rocksdb.cache.size default value mismatch (apache#7284)
b3afaec HDDS-11535. Incomplete SCM roles table header (apache#7278)
ed2a073 HDDS-11536. Bump macOS runner version to macos-13 (apache#7279)
1887f83 HDDS-11537. Bump frontend-maven-plugin to 1.15.1 (apache#7276)
1f1e618 HDDS-6776. Cleanup TestSCMSafeModeManager (apache#7272)
4bee3e9 HDDS-11534. Bump cyclonedx-maven-plugin to 2.8.2 (apache#7277)
789fb53 HDDS-11533. Bump maven-gpg-plugin to 3.2.7 (apache#7275)
eb26677 HDDS-11268. Add --table mode for OM/SCM Roles CLI (apache#7016)
28ea480 HDDS-11527. Avoid unnecessary duplicate build (apache#7270)
30da31f HDDS-3498. Shutdown datanode if address is already in use (apache#7256)
2401d27 HDDS-11046. Coverage decreased due to running tests with Java 17 (apache#7263)
78d8418 HDDS-11524. Bump snappy-java to 1.1.10.7 (apache#7202)
8747c0e HDDS-11518. Recon OmDB Insights show isKey=true for directories (apache#7260)
5d2bbc3 HDDS-11480. Refactor OM volume response tests (apache#7265)
31f9f2c HDDS-11517. Update version to 2.0.0-SNAPSHOT (apache#7258)
a0f0872 HDDS-11444. Make Datanode Command metrics consistent across all commands (apache#7191)
d3b63c6 HDDS-11492. Directory deletion get stuck having millions of directory (apache#7254)
f52f0af HDDS-11127. [hsync] Improve test coverage for XceiverClientRatis.java (apache#7225)
360fea5 HDDS-11494. Improve the duration option of freon ombg (apache#7246)
10d3b21 HDDS-11504. Update Ratis to 3.1.1. (apache#7257)
ce46297 HDDS-11162. Improve Disk Usage page UI (apache#7214)
c91f1c7 HDDS-11491. Avoid sharing clientId among deleting services (apache#7250)
b0943d5 HDDS-11501. Improve logging in XceiverServerRatis (apache#7252)
55925ab HDDS-11502. Class path contains multiple SLF4J providers (apache#7255)
d0ad836 HDDS-11472. Avoid recreating external access authorizer on OM state reload (apache#7238)
254db9e HDDS-11500. RootCARotationManager cancelling wrong task in notifyStatusChanged (apache#7251)
1e6e4b3 HDDS-11499. Remove redundant code from ECReconstructionCoordinator. (apache#7248)
adb2821 HDDS-11490. Bump rollup to 3.29.5 (apache#7232)
189a9fe HDDS-11484. Validate javadoc in CI (apache#7245)
64a29c6 HDDS-11497. Bump commons-configuration2 to 2.11.0 (apache#7242)
95cfadd HDDS-11496. Bump maven-install-plugin to 3.1.3 (apache#7244)
0a999cf HDDS-11493. Bump sqlite-jdbc to 3.46.1.3 (apache#7243)
a214a31 HDDS-11329. Update Ozone images to Rocky Linux-based runner (apache#7241)
56ddb85 HDDS-11371. Handle cases where OM does not have getServerDefaults() implemented. (apache#7130)
b5097c7 HDDS-11347. Add rocks_tools_native lib check in Ozone CLI checknative subcommand (apache#7101)
fb0bf77 HDDS-11489. Bump maven-site-plugin to 3.20.0 (apache#7226)
70e6e40 HDDS-11122. Fix javadoc warnings (apache#7234)
acf3fdc HDDS-11458. Selective checks: trigger checkstyle for properties file changes (apache#7196)
6b87207 HDDS-11469. Statistics of Pipeline and Container (apache#7217)
1b8468b HDDS-11411. Snapshot garbage collection should not run when the keys are moved from a deleted snapshot to the next snapshot in the chain (apache#7193)
1f86ce8 HDDS-10617. Unexpected number of files in ITestS3AContractGetFileStatusV1List (apache#7208)
73a3bcc HDDS-11467. Bump vite to 4.5.5 (apache#7212)
d45aa1d HDDS-11460. Bump express to 4.21.0 (apache#7197)
e2e30b8 HDDS-11354. Intermittent failure in TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl (apache#7205)
0fcb645 HDDS-11477. [doc] Add configuration description for datanode docs (apache#7223)
3598ee3 HDDS-11464. Removed unused constants from OzoneConsts. (apache#7207)
8c0b54e HDDS-11408. Snapshot rename table entries are propagated incorrectly on snapshot deletes (apache#7200)
719bdf9 HDDS-11396. NPE due to empty Handler#clusterId (apache#7145)
40c4001 HDDS-10479. Add ozone admin ratis local raftMetaConf (apache#7170)
45f9138 HDDS-11394. Fix pipeline close --all command (apache#7138)
2b196d1 HDDS-11468. Enabled DB sync button (apache#7216)
d3899d2 Clean up files created after TestKeyValueHandlerWithUnhealthyContainer#testMarkContainerUnhealthyInFailedVolume (apache#7219)
70b8dd5 HDDS-11157. Improve Datanodes page UI (apache#7168)
151709a HDDS-11446. Downgrade picocli to 4.7.5 due to regression (apache#7215)
7a26aff HDDS-11158. Improve Pipelines page UI (apache#7171)
c365aa0 HDDS-11181. Cleanup of unnecessary try-catch blocks (apache#7210)
88dd436 HDDS-11423. Implement equals operation for --filter option to ozone ldb scan (apache#7167)
e0060a8 HDDS-11196. Improve SCM WebUI Display (apache#6960)
22ddfb9 Revert "HDDS-11456. Require successful dependency/licence checks for acceptance/compile/kubernetes (apache#7192)"
9f5bf43 HDDS-11457. Internal error on S3 CompleteMultipartUpload if parts are not specified (apache#7195)
10c47a1 HDDS-11459. Bump develocity-maven-extension to 1.22.1 (apache#7201)
50f2563 HDDS-11419. Fix waitForCheckpointDirectoryExist log message (apache#7199)
a7d7e37 HDDS-11456. Require successful dependency/licence checks for acceptance/compile/kubernetes (apache#7192)
5feb9ea HDDS-11453. OmSnapshotPurge should be in a different ozone manager double buffer batch (apache#7188)
703c4d5 HDDS-10984. Tool to restore SCM certificates from RocksDB. (apache#6781)
d221065 HDDS-11440. Add a lastTransactionInfo field in SnapshotInfo to check for transactions in flight on the snapshot (apache#7179)
e573701 HDDS-11448. Improve documentation in ContainerStateMachine (apache#7183)
0e49f7a HDDS-11449. Remove unnecessary log from client console. (apache#7184)
cd251f2 HDDS-11438. Ensure DataInputBuffer is closed in OMPBHelper#convert (apache#7182)
4b47812 HDDS-11389. Incorrect number of deleted containers shown in Recon UI. (apache#7149)
0915f0b HDDS-10985. EC Reconstruction failed because the size of currentChunks was not equal to checksumBlockDataChunks. (apache#7009)
0f16195 HDDS-11416. refactor ratis submit request avoid code duplicate (apache#7166)
86fe920 HDDS-11376. Improve ReplicationSupervisor to record replication metrics (apache#7140)
883a63f HDDS-11441. ozone sh key put should only accept positive expectedGeneration (apache#7180)
33dbd4a HDDS-11357. Datanode Usageinfo Support Display Pipeline. (apache#7105)
9477aa6 HDDS-11436. Minor update in Recon API handling. (apache#7178)
8ca33c7 HDDS-11414. Key listing for FSO buckets fails with forward client (apache#7161)
f1ebd39 HDDS-11435. Bump sqlite-jdbc to 3.46.1.0 (apache#7174)
aaf8bd0 HDDS-11434. Bump log4j2 to 2.24.0 (apache#7176)
3510ce7 HDDS-11433. Bump Jetty to 9.4.56.v20240826 (apache#7175)
0047cd2 HDDS-11400. Bump maven-core to 3.9.9 (apache#7144)
274da83 HDDS-10488. Datanode OOM due to run out of mmap handler (apache#6690)
7a452ca HDDS-11391. Addendum to fix test failure.
7e1d9b0 HDDS-11145. ozone admin om cancelprepare --service-id improvement (apache#7159)
6888cf2 HDDS-11383. Improve read key dashboard to include add the read key related OM metrics. (apache#7131)
3e0d76c HDDS-11369. [hsync] Remove KeyOutputStreamSemaphore logs (apache#7136)
b23981c HDDS-11342. [hsync] Add a config as HBase-related features master switch (apache#7126)
3e1188a HDDS-11285. cli to trigger quota repair and status (apache#7104)
2e33978 HDDS-11401. Code cleanup in DatanodeStateMachine (apache#7146)
f563d67 HDDS-11391. Frequent Ozone DN Crashes During OM + DN Decommission with Freon (apache#7154)
18b28d2 HDDS-11312. [hsync] Added upgrade tests (apache#7110)
b29beb3 HDDS-11350. NullPointerException thrown on checking container balancer status (apache#7134)
111b9df HDDS-11407. Use OMLayoutFeature.HBASE_SUPPORT for HSYNC (apache#7152)
966b8d0 HDDS-11390. Removed hsync and hflush capability check in ContentGenerator (apache#7153)
877504a HDDS-11156. Improve Buckets page UI (apache#7100)
814f78f HDDS-11392. ChecksumByteBufferImpl's static initializer fails with java 17+ (apache#7135)
b5e1a8b HDDS-11398. Bump commons-compress to 1.27.1 (apache#7142)
a8e3ea9 HDDS-11397. Bump Jersey2 to 2.45 (apache#7141)
5992837 HDDS-11399. Bump maven-deploy-plugin to 3.1.3 (apache#7143)
47564bb HDDS-11359. Intermittent timeout in TestPipelineManagerMXBean#testPipelineInfo (apache#7132)
cc4e026 HDDS-11304. Make up for the missing functionality in CommandDispatcher (apache#7062)
2d372f6 HDDS-11339. Let PrometheusServlet rely on periodically published metrics (apache#7092)
f22c6f8 HDDS-11164. Improve Navbar UI (apache#7088)
23211c1 HDDS-11381. Adding logging for sortByDistanceCost in NetworkTopologyImpl (apache#7133)
3e9cdb6 HDDS-11378. Allow disabling OM version-specific feature via config (apache#7129)
23f3e5b HDDS-11152. OMDoubleBuffer error when handling snapshot's background operations (apache#7112)
5659b7e HDDS-11375. DN startup fails due to illegal configuration of raft.grpc.message.size.max (apache#7128)
41d8147 HDDS-11368. Remove dependency on Babel in Vite (apache#7119)
0bd8ba1 HDDS-11372. No coverage for org.apache.ozone packages (apache#7124)
3bd237d HDDS-11325. (addendum) Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7121)
8306290 HDDS-11373. Log for EC reconstruction command lists the missing indexes as ASCII control characters (apache#7123)
dab1538 HDDS-11216. Replace HAUtils#buildCAX509List usages with other direct usages (apache#6981)
51a5fb9 Revert "HDDS-11235. Spare InfoBucket RPC call in FileSystem#mkdir() call. (apache#6990)" (apache#7122)
fab56b4 HDDS-11229. Chain optionals in Recon Insight (apache#7064)
2236041 HDDS-11365. Fix the NOTICE file (apache#7120)
2e30dc1 HDDS-11190. Add --fields option to ldb scan command (apache#6976)
0b75cb0 HDDS-11251. Deprecate definitions and remove listTrash and recoverTrash APIs (apache#7060)
be34303 HDDS-9198. Maintain local cache in OMSnapshotPurgeRequest to get updated snapshotInfo and pass the same to OMSnapshotPurgeResponse (apache#7045)
c07b408 HDDS-11208. Change RatisBlockOutputStream to use HDDS-11174. (apache#7072)
8f8d809 HDDS-11309. Increase CONTAINER_STATE Column Length in UNHEALTHY_CONTAINERS to Avoid Truncation (apache#7071)
350a340 HDDS-11364. Bump jgraphx to 3.9.12 (apache#7116)
45b7056 HDDS-11363. Bump develocity-maven-extension to 1.22 (apache#7115)
9dd18f1 HDDS-11362. Bump snappy-java to 1.1.10.6 (apache#7114)
637cb91 HDDS-11361. Bump Jersey2 to 2.44 (apache#7113)

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandlerWithUnhealthyContainer.java
hadoop-ozone/dist/src/main/smoketest/admincli/container.robot

Modified during conflict:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/DNContainerOperationClient.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ReconcileContainerTask.java
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestECKeyOutputStream.java
aswinshakil added a commit to aswinshakil/ozone that referenced this pull request Dec 19, 2024
…239-container-reconciliation-merge

Commits:
0066526 HDDS-11869. Enable OM Ratis in TestOzoneDelegationTokenSecretManager (apache#7594)
4fe166d HDDS-11957. Make breadcrumb scrollable for long path names in DiskUsage page (apache#7590)
a523e17 HDDS-11846. [Recon] Recon Schema version_number column is always set as -1. (apache#7554)
f5ff2f0 HDDS-11868. Enable OM Ratis in TestQuotaRepairTask (apache#7593)
3a0e3b5 HDDS-11845. Extract k8s definitions for HttpFS and Recon from getting-started example (apache#7523)
6e0c753 HDDS-11509. logging improvements for directory deletion (apache#7314)
1f29e05 HDDS-11934. Split compat suite to old/new (apache#7578)
bde8cf4 HDDS-11759. Remove LegacyReplicationManager (apache#7580)
a27e4ec HDDS-11779. Add DN metrics to show deletion progress (apache#7552)
976e45f HDDS-11711. Add SCM metrics for delete commands sent and response received per datanode (apache#7522)
c28e16e HDDS-11950. Enable sortpom in dev-support module. (apache#7586)
dae388b HDDS-11907. OzoneSecretKey does not need to implement Writable (apache#7574)
8bb0587 HDDS-11712. Process other DeletedBlocksTransaction before retrying failed one. (apache#7532)
3648b59 HDDS-11906. Add sortpom dependency, sort root POM. (apache#7555)
54f0272 HDDS-11807. Make callId different for each request in openKeyCleanupService (apache#7551)
c523825 HDDS-11926 - Rename bucket name for bucket info/ls for linked buckets (apache#7581)
daf2f9f HDDS-11863. Speed up TestFSORepairTool (apache#7561)
f5e5493 HDDS-11927. Fix flaky TestContainerBalancerStatusInfo.testGetCurrentStatisticsWhileBalancingInProgress (apache#7579)
bef2415 HDDS-11940. Bump jline to 3.28.0 (apache#7576)
1453fd9 HDDS-11935. Bump develocity-maven-extension to 1.23 (apache#7577)
202b0c7 HDDS-11860. Improve BufferUtils.writeFully. (apache#7564)
008f9a6 HDDS-11852. Reduce duplication in some GenericCli subclasses (apache#7553)
7a46080 HDDS-11914. Snapshot diff should not filter SST Files based by reading SST file reader (apache#7563)
1835326 HDDS-11927. Mark testGetCurrentStatisticsWhileBalancingInProgress as flaky
66ccc25 HDDS-11908. Snapshot diff DAG traversal should not skip node based on prefix presence (apache#7567)
16ba289 Revert "HDDS-11413. PipelineManagerImpl lock optimization reduces AllocateBlock latency (apache#7160)"
bf6f323 HDDS-11413. PipelineManagerImpl lock optimization reduces AllocateBlock latency (apache#7160)
853d657 HDDS-11893. Fix full snapshot diff fallback logic because of DAG pruning (apache#7549)
b5d04e2 HDDS-11915. Netty OpenSsl not available in acceptance tests on arm64 (apache#7570)
8536490 HDDS-11367. Fix flaky balancer robot test (apache#7569)
745ed1c HDDS-11367. Improve ozone balancing status command output (apache#7139)
6b9cbe0 HDDS-11909. Intermittent timeout building Hadoop in s3a test (apache#7559)
eea5600 HDDS-11911. Return consistent error code when snapshot is not found in the DB or Snapshot Chain. (apache#7557)
e8f3b25 HDDS-11873. Skip old-only xcompat read tests (apache#7534)
ec348a7 HDDS-11889. Include Maven dependencies for hdds-rocks-native in cache (apache#7546)
aa37ae8 HDDS-11885. Download Hadoop for S3A test from mirrors if available (apache#7545)
befd64e HDDS-11694. Safemode Improvement: Introduce factory class to create safemode rules. (apache#7433)
a46153d HDDS-11872. Disable Apache snapshots repo (apache#7536)
092b000 HDDS-11890. Update project description in GitHub (apache#7547)
80c6446 HDDS-8101. Add tool to repair broken FSO tree (apache#7368)
23197e2 HDDS-11605. Directory deletion service should support multiple threads (apache#7349)
055b13c HDDS-11751. Use Java 21 in CI (apache#7458)
d0d82c5 HDDS-11886. Bump license-maven-plugin to 2.5.0 (apache#7539)
f7fe30a HDDS-11691. Support object tags in ObjectEndpointStreaming#put (apache#7543)
9854591 HDDS-11882. Make BOM, not aggregate one (apache#7544)
af345b2 HDDS-11877. Enable Maven cache for more checks (apache#7538)
51c6ed6 HDDS-11830. Subcommands should not extend GenericCli .(apache#7537)
959a39d HDDS-11851. Finer-grained subcommand interface for OzoneDebug and OzoneRepair. (apache#7526)
d4c41e5 HDDS-11334. Improve EC xcompat acceptance test (apache#7492)
8526f2e HDDS-11826. Interactive mode for ozone shell. (apache#7515)
fc63710 HDDS-11833. Return NotImplemented for S3 put-object-acl request. (apache#7531)
e8ad7ad HDDS-11782. ozone debug ldb --with-keys defaults to false instead of true (apache#7521)
cb0a402 HDDS-11859. Remove mention of fuse from s3 interface docs page (apache#7530)
e17f92c HDDS-10821. Ensure ChunkBuffer fully writes buffer to FileChannel (apache#6652)
d66c088 HDDS-11848. Serialization bug in Recon listKeys API (apache#7524)
faab1e8 HDDS-11656. Default native ACL limits to user and user's primary group (apache#7455)
8a1967e HDDS-11719. Remove dependency on server components from ozonefs-common (apache#7438)
9fcecc1 HDDS-11855. Mark TestContainerBalancerDatanodeNodeLimit#checkIterationResultException as flaky
65df308 HDDS-11794. Display HostName in OM / SCM Overview. (apache#7482)
0bde3a2 HDDS-11410. Refactoring more tests from TestContainerBalancerTask (apache#7156)
98e070e HDDS-11728. Refactor subcommand layouts of ozone debug and repair (apache#7489)
1c5676b HDDS-11849. Mark TestBlockOutputStreamWithFailures.test2DatanodesFailure as flaky
9dd8bb9 HDDS-11847. Mark TestSnapshotDeletingServiceIntegrationTest#testParallelExcecutionOfKeyDeletionAndSnapshotDeletion as flaky
b27714f HDDS-11266. Update proto.lock for Ozone 1.4.1 (apache#7504)
9b61937 HDDS-11806. Add HttpFS and Recon in getting-started k8s example (apache#7485)
77ce962 HDDS-10568. When the ldb command is executed, it is output by line (apache#7467)
69538b0 HDDS-11687. Robot warning: replace "is not" with "!=" (apache#7516)
b60d897 HDDS-11820. Create test principals at test run time (apache#7507)
6ba309a HDDS-11831. Finer-grained interface for dynamically registered subcommands (apache#7514)
db36e39 HDDS-11822. Register subcommands in OzoneShell (apache#7513)
e6bd3f5 HDDS-11829. Bump zstd-jni to 1.5.6-8 (apache#7510)
09a4a90 HDDS-11827. Bump exec-maven-plugin to 3.5.0 (apache#7512)
ad867bc HDDS-11824. Bump sqlite-jdbc to 3.47.1.0 (apache#7511)
850306d HDDS-11823. Bump cyclonedx-maven-plugin to 2.9.1 (apache#7508)
34f9d9e HDDS-11742. Update metrics with leaderId if known when starting SCM (apache#7471)
2c6c116 HDDS-11265. Add Ozone 1.4.1 to compatibility acceptance tests (apache#7503)
ebcdc6a HDDS-11810. Secure acceptance test on arm64 fails with LoginException: Checksum failed (apache#7498)
7f40624 HDDS-11821. Mark TestECKeyOutputStream#testECKeyCreatetWithDatanodeIdChange as unhealthy
9b26156 HDDS-11811. rocksdbjni deleted on exit could be used by other components apache#7493
3f92663 HDDS-11785. DataNode aborts ContainerStateMachine if it does not know any follower next index (apache#7480)
f0a2c87 HDDS-11773. Prevent frequent DataNode Ratis snapshotting. (apache#7473)
1383c18 HDDS-11718. Some CI checks passing despite error (apache#7483)
f98eac2 HDDS-11561. Refactor Open Key Search Endpoint and Consolidate with OmDBInsightEndpoint Using StartPrefix Parameter. (apache#7336)
a99ab27 HDDS-11243. SCM SafeModeRule Support EC. (apache#7008)
6871547 HDDS-11716. Address Incomplete Upgrade Scenario in Recon Upgrade Framework (apache#7452)
9bc9145 HDDS-10411. Support incremental ChunkBuffer checksum calculation (apache#7189)
579a38e HDDS-11723. Tool to better micro benchmark hbase performance in Ozone (apache#7463)
cc1a374 HDDS-11704. Hadoop test leaves running containers in case of failure (apache#7435)
c4b2056 Update documentation to mention that container schemaV3 is default (apache#7481)
20c4cfa HDDS-11386. Multithreading bug in ContainerBalancerTask (apache#7339)
b090312 HDDS-11780. Increase client write retry when SCM is in safe mode (apache#7470)
f8e4db9 HDDS-11793. Bump maven-checkstyle-plugin to 3.6.0 (apache#7476)
6a1ff84 HDDS-11791. Bump commons-io to 2.18.0 (apache#7478)
d7f8235 HDDS-11788. Bump log4j2 to 2.24.2 (apache#7479)
6547de7 HDDS-11790. Bump commons-lang3 to 3.17.0 (apache#7475)
1d8abd6 HDDS-11789. Bump zstd-jni to 1.5.6-7 (apache#7477)
6ca7230 HDDS-11769. Add tools folder into ozone src package. (apache#7466)
3b8ed58 HDDS-11682. Bump maven-resources-plugin to 3.3.0 (apache#7384)
f4a9ee0 HDDS-11702. Merge test_bucket_encryption into robot compatibility test (apache#7451)
d6a5488 HDDS-11713. Use seek to reach the start transaction. (apache#7460)
d52615a HDDS-11733. Remove okio versioning and unused okhttp dependency (apache#7447)
1a49991 HDDS-11617. Update hadoop to 3.4.1 (apache#7376)
9945de6 HDDS-11667. Validating DatanodeID on any request to the datanode (apache#7418)
fc6a2ea HDDS-11650. ContainerId list to track all containers created in a datanode (apache#7402)
a8db9cd HDDS-11749. Extract moveToTrash implementation to client code (apache#7453)
3ba3474 HDDS-11755. mktemp --suffix does not work on Mac (apache#7457)
433c7bb HDDS-11729. Update skipRecon property to skip only frontend build (apache#7454)
6b40003 HDDS-11739. Extract generic unmarshaller for S3 requests (apache#7449)
c7f65e7 HDDS-11740. Add debug command to show internal component versions (apache#7450)
0f7104e HDDS-11708. Recon ListKeys API should return a proper http response status code if NSSummary rebuild is in progress. (apache#7437)
0e0d5e9 HDDS-11163. Improve Heatmap page UI (apache#7420)
2cef393 HDDS-11696. Limit max number of entries in list keys/status response (apache#7431)
e96e314 HDDS-11697. Integrate Ozone Filesystem Implementation with Ozone ListStatusLight API (apache#7440)
ebcbce7 HDDS-11644. Close OMLayoutVersionManager (apache#7445)
20e4969 HDDS-11737. UnsupportedOperationException in S3 setBucketAcl (apache#7448)
b252181 HDDS-10804. Include only limited set of ports in Pipeline proto (apache#6655)
79ca956 HDDS-8829. Symmetric Keys for Delegation Tokens (apache#7394)
3e798e6 HDDS-11698. Use hadoop images from GitHub in CI (apache#7432)
3e278b7 HDDS-10655. Support PutObjectTagging, GetObjectTagging, and DeleteObjectTagging (apache#6756)
036e727 HDDS-11732. Fix ACL check on bucket resolution while reading from snapshot (apache#7446)
dbda703 HDDS-11736. Bump maven-javadoc-plugin to 3.11.1 (apache#7444)
238f232 HDDS-11692. Skip spotbugs for modules with only generated code. (apache#7428)
f60ad61 HDDS-11705. Snapshot operations on linked buckets should work on actual underlying bucket (apache#7434)
dd22dbe HDDS-11615. Add Upgrade Action for Initial Schema Constraints for Unhealthy Container Table in Recon. (apache#7372)
4066c7c HDDS-117. Add convenience methods for port management in DatanodeDetails (apache#7408)
12419fa HDDS-11695. SCM follower should not log NotLeaderException during Pipeline Report processing. (apache#7430)
5275ded HDDS-10133. Add a method to check key name in OMKeyRequest (apache#6012)
fd5c6d8 HDDS-11689. Extract scheduled workflow for populate-cache (apache#7429)
889ba80 HDDS-11653. Bump Ratis to 3.1.2 (apache#7427)
47ec4dd HDDS-11671. Refer to website for supported versions (apache#7412)
10cac80 HDDS-11686. Use ozone image from GitHub in CI (apache#7425)
aa6da3e HDDS-9781. Limited maxOpenFiles, disabled enableCompactionDag, and createCheckpointDirs when creating OMMetadataManager instance for bootstrapping (apache#7095)
9dd6a83 HDDS-11645. Mark TestReconScmSnapshot#testExplicitRemovalOfNode as flaky
8e617dc HDDS-11672. Mark TestSnapshotBackgroundServices#testCompactionLogBackgroundService as flaky
d09e6d4 HDDS-11646. Mark TestXceiverClientMetrics#testMetrics as flaky
8e4a508 HDDS-11668. Recon List Keys API: Reuse key prefix if parentID is the same (apache#7410)
ee63232 HDDS-11684. Remove suppression of HiddenField (apache#7423)
a33d8a3 HDDS-10166. Replace GenericTestUtils temporary directories with `@TempDir` (apache#7399)
3a18a9d HDDS-11664. Hadoop download failure not reported as error (apache#7421)
47c2409 HDDS-64. OzoneClientException should extend IOException. (apache#7403)
27fcd0c HDDS-11685. Use ozone-testkrb5 from GitHub (apache#7424)
5663971 HDDS-11665. Minor optimizations on the write path (apache#7407)
cb81f0c HDDS-11683. Skip shade in most integration checks (apache#7422)
952e0ec HDDS-11681. Bump Bouncy Castle to 1.79 (apache#7387)
2797c45 HDDS-11677. Bump sqlite-jdbc to 3.47.0.0 (apache#7413)
358534b HDDS-11675. Bump maven-site-plugin to 3.21.0 (apache#7414)
cf79245 HDDS-11674. Bump junit to 5.11.3 (apache#7415)
6dd566f HDDS-11583. Use ozone-runner from GitHub in CI (apache#7409)
ef2bf98 HDDS-11669. In OmUtils.normalizeKey isDebugEnabled should be evaluated first (apache#7411)
a7e3014 HDDS-11660. Recon List Key API: Reduce object creation and buffering memory (apache#7405)
5d18b9c HDDS-11659. Improve HSync compatibility test (apache#7404)
4e603aa HDDS-11462. Enhancing DataNode I/O Monitoring Capabilities. (apache#7206)
18f6e8a HDDS-11311. Added Compatibility test for HSync (apache#7400)
0415c0b HDDS-11649. Recon ListKeys API: Simplify filter predicates (apache#7395)
2547ac0 HDDS-11652. Fix SCM start command in SCM-HA doc (apache#7398)
c045839 HDDS-11623. Improve OM Ratis Configuration change log message (apache#7388)
2b1524b HDDS-11609. Switch to Recon v2 UI as the default UI (apache#7358)
efe5892 HDDS-11641. Allow testing Hadoop with custom docker images (apache#7393)
c055036 HDDS-11637. Compile failure is ignored in build check (apache#7389)
67e5261 HDDS-11563. Display OM/SCM service ID as Namespace in web UI (apache#7321)
0fb5e50 HDDS-11587. Ozone Manager not processing file put requests with multi-tenancy enabled (apache#7316)
786bb49 HDDS-11642. MutableQuantiles should be stopped (apache#7392)
76ec9b9 HDDS-11639. Upgrade ozone-runner to Rocky Linux 9.3 (apache#7391)
3bc3b8a HDDS-11621. Fix missing HADOOP_ variables in MR acceptance test (apache#7375)
6f9db61 HDDS-11200. Hsync client-side metrics (apache#7371)
58d1443 HDDS-10240. Cleanup zero-copy EC (apache#7340)
5b065d8 HDDS-11638. Bump cyclonedx-maven-plugin to 2.9.0 (apache#7383)
c7a196f HDDS-11635. Memory leak when using Ozone FS via Hadoop FileContext API (apache#7382)
c9956a1 HDDS-11601. Intermittent failure in acceptance balancer test. (apache#7343)
a737fc3 HDDS-11619. Remove dependency on hadoop-shaded-guava (apache#7373)
c4d6857 HDDS-11584. Document ozone debug ldb command (apache#7313)
dded26e HDDS-11588. Add main artifact jar to classpath file (apache#7324)
afed6d9 HDDS-11558. Make OM client retry idempotent (apache#7329)
e85b32d HDDS-11591. Copy dependencies when building each module (apache#7325)
72e56d7 HDDS-11601. Disable flaky EC balancer acceptance test
ab16cbe HDDS-11507. Add error information to log while handling ServiceException. (apache#7367)
980b960 HDDS-11380. Make node decommission error message more comprehensive (apache#7155)
61c094f HDDS-11614. Speed up TestTransferLeadershipShell (apache#7370)
91188b3 HDDS-11352. Remove Flaky annotation from TestOzoneManagerHAWithStoppedNodes using Ratis 3.1.1
faf133d HDDS-11220. Initialize block length using the chunk list from DataNode before seek (apache#7221)
91d41a0 HDDS-11465. Introducing Schema Versioning for Recon to Handle Fresh Installs and Upgrades. (apache#7213)
7a27db2 HDDS-11134. Create compatibility test for FSO bucket usage (apache#7350)
30906d1 HDDS-11612. Bump jnr-posix to 3.1.20 (apache#7360)
bed4aef HDDS-11611. Bump docker-maven-plugin to 0.45.1 (apache#7362)
e2c3d57 HDDS-11610. Bump maven-dependency-plugin to 3.8.1 (apache#7361)
24c1000 HDDS-11041. Add admin request filter for S3 requests and UGI support for GrpcOmTransport (apache#7268)
0b84998 HDDS-11594. Update batchPut buffer log for rocksdb. (apache#7356)
c013516 HDDS-11608. Client should not retry invalid protobuf request (apache#7354)
32a8c09 HDDS-11160. Improve Insights page UI (apache#7327)
782ad62 HDDS-11388. Fix unnecessary call to the DB for ContainerBalancer#getBalancerStatusInfo (apache#7224)
35b6a3a HDDS-11600. Intermittent failure in repro due to ordering differences in builddef.lst (apache#7342)
e7bf154 HDDS-11132. Revert client version bump done as part of HDDS-10983 (apache#7348)
ea5cbff HDDS-11602. Bump ozone-runner to 20241022-jdk17-1 (apache#7347)
3f98df5 HDDS-11580. Validate 'hdds.datanode.dir.du.reserved' property (apache#7328)
8568075 HDDS-11570. Fix HDDS Docs build failure with Hugo v0.135.0 (apache#7337)
721ae58 HDDS-11057. Enable reproducible builds (apache#6856)
86b7aae HDDS-11205. Implement a search feature for users to locate keys pending Deletion within the OM Deleted Keys Insights section (apache#6969)
f7b428d HDDS-11503. Add Robot test to verify Container Balancer for EC containers. (apache#7311)
85eb89b HDDS-11483. Make s3g object get and put operation buffer configurable (apache#7233)
515977a HDDS-11582. Bump body-parser to 1.20.3 (apache#7307)
9b66267 HDDS-11589. ReconSCMDBDefinition should be singleton. (apache#7323)
f784a84 HDDS-11578. Unify constants for RATIS_SNAPSHOT_DIR (apache#7310)
4670a5e HDDS-11498. Improve SCM deletion efficiency. (apache#7249)
3fb2cf0 HDDS-11108. Extract keywords for multipart upload tests (apache#7318)
4b24aa9 HDDS-11545. [UI] Add OM and SCM ID information (apache#7287)
860e269 HDDS-11538. Let coverage report link to java sources (apache#7280)
2139367 HDDS-11581. Remove duplicate ContainerStateMachine#RaftGroupId (apache#7312)
64e035d HDDS-11573. Remove lib/gson-2.10.1.jar (apache#7309)
ce07a3c HDDS-11456. Require successful dependency/licence checks for acceptance/compile/kubernetes (apache#7209)
c579d06 HDDS-11574. Ozone client leak in TestS3SDKV1 (apache#7308)
c044b79 HDDS-10390. MiniOzoneCluster to support S3 Gateway (apache#7281)
8eef589 HDDS-11557. Simplify DBColumnFamilyDefinition. (apache#7298)
4c77f6b HDDS-11562. Parameterize TestSCMNodeManager#testProcessLayoutVersion (apache#7300)
b51c4b3 HDDS-11572. Bump commons-io to 2.17.0 (apache#7305)
3a37870 HDDS-11571. Bump log4j2 to 2.24.1 (apache#7301)
494798c HDDS-11564. Mark TestBlockOutputStream#testWriteExactly... as flaky
fabf512 HDDS-11569. Bump restrict-imports-enforcer-rule from 2.5.0 to 2.6.0 (apache#7303)
1e62a0a HDDS-11568. Bump commons-codec to 1.17.1 (apache#7304)
e9f92a7 HDDS-11567. Bump common-custom-user-data-maven-extension to 2.0.1 (apache#7302)
cb44d5e HDDS-11555. SCMDBDefinition should be singleton. (apache#7296)
d473134 HDDS-11486. Reduce log level for NativeLibraryNotLoadedException in SnapshotDiffManager (apache#7290)
3348d91 HDDS-11564. Mark TestBlockOutputStream as flaky
e2f2aeb HDDS-11548. Add some logging to the StateMachine (apache#7291)
523c860 HDDS-11439. De-duplicate code for ReplicatedFileChecksumHelper and ECFileChecksumHelper (apache#7264)
05a409e HDDS-11519. Clean up unused lines in BlockOutputStream
ffe7198 HDDS-11544. Improve work with arrays (apache#7286)
5657604 HDDS-11556. Add a getTypeClass method to Codec. (apache#7295)
256aad9 HDDS-11546. Add regex operation to filter option of ldb scan command. (apache#7289)
7ef7de2 HDDS-11482. EC Checksum throws IllegalArgumentException because the buffer limit is negative (apache#7230)
77c17df HDDS-11551. Provide details about integration check failure (apache#7294)
911a583 HDDS-8188. Support max allowed length in response of ozone admin container list (apache#7181)
7f2e0e3 HDDS-11554. OMDBDefinition should be singleton. (apache#7292)
170761c HDDS-11547. Make MAVEN_OPTS optional (apache#7288)
4846e97 HDDS-11543. Track OzoneClient object leaks via LeakDetector framework. (apache#7285)
e00f7ae HDDS-11159. Improve Containers page UI (apache#7267)
cfda951 HDDS-11520. Fix Delete pending directories key mapping (apache#7269)
2e3de8a HDDS-11476. Implement lesser/greater operation for --filter option of ldb scan command (apache#7222)
06ccdb3 HDDS-11526. Fix hdds.datanode.metadata.rocksdb.cache.size default value mismatch (apache#7284)
b3afaec HDDS-11535. Incomplete SCM roles table header (apache#7278)
ed2a073 HDDS-11536. Bump macOS runner version to macos-13 (apache#7279)
1887f83 HDDS-11537. Bump frontend-maven-plugin to 1.15.1 (apache#7276)
1f1e618 HDDS-6776. Cleanup TestSCMSafeModeManager (apache#7272)
4bee3e9 HDDS-11534. Bump cyclonedx-maven-plugin to 2.8.2 (apache#7277)
789fb53 HDDS-11533. Bump maven-gpg-plugin to 3.2.7 (apache#7275)
eb26677 HDDS-11268. Add --table mode for OM/SCM Roles CLI (apache#7016)
28ea480 HDDS-11527. Avoid unnecessary duplicate build (apache#7270)
30da31f HDDS-3498. Shutdown datanode if address is already in use (apache#7256)
2401d27 HDDS-11046. Coverage decreased due to running tests with Java 17 (apache#7263)
78d8418 HDDS-11524. Bump snappy-java to 1.1.10.7 (apache#7202)
8747c0e HDDS-11518. Recon OmDB Insights show isKey=true for directories (apache#7260)
5d2bbc3 HDDS-11480. Refactor OM volume response tests (apache#7265)
31f9f2c HDDS-11517. Update version to 2.0.0-SNAPSHOT (apache#7258)
a0f0872 HDDS-11444. Make Datanode Command metrics consistent across all commands (apache#7191)
d3b63c6 HDDS-11492. Directory deletion get stuck having millions of directory (apache#7254)
f52f0af HDDS-11127. [hsync] Improve test coverage for XceiverClientRatis.java (apache#7225)
360fea5 HDDS-11494. Improve the duration option of freon ombg (apache#7246)
10d3b21 HDDS-11504. Update Ratis to 3.1.1. (apache#7257)
ce46297 HDDS-11162. Improve Disk Usage page UI (apache#7214)
c91f1c7 HDDS-11491. Avoid sharing clientId among deleting services (apache#7250)
b0943d5 HDDS-11501. Improve logging in XceiverServerRatis (apache#7252)
55925ab HDDS-11502. Class path contains multiple SLF4J providers (apache#7255)
d0ad836 HDDS-11472. Avoid recreating external access authorizer on OM state reload (apache#7238)
254db9e HDDS-11500. RootCARotationManager cancelling wrong task in notifyStatusChanged (apache#7251)
1e6e4b3 HDDS-11499. Remove redundant code from ECReconstructionCoordinator. (apache#7248)
adb2821 HDDS-11490. Bump rollup to 3.29.5 (apache#7232)
189a9fe HDDS-11484. Validate javadoc in CI (apache#7245)
64a29c6 HDDS-11497. Bump commons-configuration2 to 2.11.0 (apache#7242)
95cfadd HDDS-11496. Bump maven-install-plugin to 3.1.3 (apache#7244)
0a999cf HDDS-11493. Bump sqlite-jdbc to 3.46.1.3 (apache#7243)
a214a31 HDDS-11329. Update Ozone images to Rocky Linux-based runner (apache#7241)
56ddb85 HDDS-11371. Handle cases where OM does not have getServerDefaults() implemented. (apache#7130)
b5097c7 HDDS-11347. Add rocks_tools_native lib check in Ozone CLI checknative subcommand (apache#7101)
fb0bf77 HDDS-11489. Bump maven-site-plugin to 3.20.0 (apache#7226)
70e6e40 HDDS-11122. Fix javadoc warnings (apache#7234)
acf3fdc HDDS-11458. Selective checks: trigger checkstyle for properties file changes (apache#7196)
6b87207 HDDS-11469. Statistics of Pipeline and Container (apache#7217)
1b8468b HDDS-11411. Snapshot garbage collection should not run when the keys are moved from a deleted snapshot to the next snapshot in the chain (apache#7193)
1f86ce8 HDDS-10617. Unexpected number of files in ITestS3AContractGetFileStatusV1List (apache#7208)
73a3bcc HDDS-11467. Bump vite to 4.5.5 (apache#7212)
d45aa1d HDDS-11460. Bump express to 4.21.0 (apache#7197)
e2e30b8 HDDS-11354. Intermittent failure in TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl (apache#7205)
0fcb645 HDDS-11477. [doc] Add configuration description for datanode docs (apache#7223)
3598ee3 HDDS-11464. Removed unused constants from OzoneConsts. (apache#7207)
8c0b54e HDDS-11408. Snapshot rename table entries are propagated incorrectly on snapshot deletes (apache#7200)
719bdf9 HDDS-11396. NPE due to empty Handler#clusterId (apache#7145)
40c4001 HDDS-10479. Add ozone admin ratis local raftMetaConf (apache#7170)
45f9138 HDDS-11394. Fix pipeline close --all command (apache#7138)
2b196d1 HDDS-11468. Enabled DB sync button (apache#7216)
d3899d2 Clean up files created after TestKeyValueHandlerWithUnhealthyContainer#testMarkContainerUnhealthyInFailedVolume (apache#7219)
70b8dd5 HDDS-11157. Improve Datanodes page UI (apache#7168)
151709a HDDS-11446. Downgrade picocli to 4.7.5 due to regression (apache#7215)
7a26aff HDDS-11158. Improve Pipelines page UI (apache#7171)
c365aa0 HDDS-11181. Cleanup of unnecessary try-catch blocks (apache#7210)
88dd436 HDDS-11423. Implement equals operation for --filter option to ozone ldb scan (apache#7167)
e0060a8 HDDS-11196. Improve SCM WebUI Display (apache#6960)
22ddfb9 Revert "HDDS-11456. Require successful dependency/licence checks for acceptance/compile/kubernetes (apache#7192)"
9f5bf43 HDDS-11457. Internal error on S3 CompleteMultipartUpload if parts are not specified (apache#7195)
10c47a1 HDDS-11459. Bump develocity-maven-extension to 1.22.1 (apache#7201)
50f2563 HDDS-11419. Fix waitForCheckpointDirectoryExist log message (apache#7199)
a7d7e37 HDDS-11456. Require successful dependency/licence checks for acceptance/compile/kubernetes (apache#7192)
5feb9ea HDDS-11453. OmSnapshotPurge should be in a different ozone manager double buffer batch (apache#7188)
703c4d5 HDDS-10984. Tool to restore SCM certificates from RocksDB. (apache#6781)
d221065 HDDS-11440. Add a lastTransactionInfo field in SnapshotInfo to check for transactions in flight on the snapshot (apache#7179)
e573701 HDDS-11448. Improve documentation in ContainerStateMachine (apache#7183)
0e49f7a HDDS-11449. Remove unnecessary log from client console. (apache#7184)
cd251f2 HDDS-11438. Ensure DataInputBuffer is closed in OMPBHelper#convert (apache#7182)
4b47812 HDDS-11389. Incorrect number of deleted containers shown in Recon UI. (apache#7149)
0915f0b HDDS-10985. EC Reconstruction failed because the size of currentChunks was not equal to checksumBlockDataChunks. (apache#7009)
0f16195 HDDS-11416. refactor ratis submit request avoid code duplicate (apache#7166)
86fe920 HDDS-11376. Improve ReplicationSupervisor to record replication metrics (apache#7140)
883a63f HDDS-11441. ozone sh key put should only accept positive expectedGeneration (apache#7180)
33dbd4a HDDS-11357. Datanode Usageinfo Support Display Pipeline. (apache#7105)
9477aa6 HDDS-11436. Minor update in Recon API handling. (apache#7178)
8ca33c7 HDDS-11414. Key listing for FSO buckets fails with forward client (apache#7161)
f1ebd39 HDDS-11435. Bump sqlite-jdbc to 3.46.1.0 (apache#7174)
aaf8bd0 HDDS-11434. Bump log4j2 to 2.24.0 (apache#7176)
3510ce7 HDDS-11433. Bump Jetty to 9.4.56.v20240826 (apache#7175)
0047cd2 HDDS-11400. Bump maven-core to 3.9.9 (apache#7144)
274da83 HDDS-10488. Datanode OOM due to run out of mmap handler (apache#6690)
7a452ca HDDS-11391. Addendum to fix test failure.
7e1d9b0 HDDS-11145. ozone admin om cancelprepare --service-id improvement (apache#7159)
6888cf2 HDDS-11383. Improve read key dashboard to include add the read key related OM metrics. (apache#7131)
3e0d76c HDDS-11369. [hsync] Remove KeyOutputStreamSemaphore logs (apache#7136)
b23981c HDDS-11342. [hsync] Add a config as HBase-related features master switch (apache#7126)
3e1188a HDDS-11285. cli to trigger quota repair and status (apache#7104)
2e33978 HDDS-11401. Code cleanup in DatanodeStateMachine (apache#7146)
f563d67 HDDS-11391. Frequent Ozone DN Crashes During OM + DN Decommission with Freon (apache#7154)
18b28d2 HDDS-11312. [hsync] Added upgrade tests (apache#7110)
b29beb3 HDDS-11350. NullPointerException thrown on checking container balancer status (apache#7134)
111b9df HDDS-11407. Use OMLayoutFeature.HBASE_SUPPORT for HSYNC (apache#7152)
966b8d0 HDDS-11390. Removed hsync and hflush capability check in ContentGenerator (apache#7153)
877504a HDDS-11156. Improve Buckets page UI (apache#7100)
814f78f HDDS-11392. ChecksumByteBufferImpl's static initializer fails with java 17+ (apache#7135)
b5e1a8b HDDS-11398. Bump commons-compress to 1.27.1 (apache#7142)
a8e3ea9 HDDS-11397. Bump Jersey2 to 2.45 (apache#7141)
5992837 HDDS-11399. Bump maven-deploy-plugin to 3.1.3 (apache#7143)
47564bb HDDS-11359. Intermittent timeout in TestPipelineManagerMXBean#testPipelineInfo (apache#7132)
cc4e026 HDDS-11304. Make up for the missing functionality in CommandDispatcher (apache#7062)
2d372f6 HDDS-11339. Let PrometheusServlet rely on periodically published metrics (apache#7092)
f22c6f8 HDDS-11164. Improve Navbar UI (apache#7088)
23211c1 HDDS-11381. Adding logging for sortByDistanceCost in NetworkTopologyImpl (apache#7133)
3e9cdb6 HDDS-11378. Allow disabling OM version-specific feature via config (apache#7129)
23f3e5b HDDS-11152. OMDoubleBuffer error when handling snapshot's background operations (apache#7112)
5659b7e HDDS-11375. DN startup fails due to illegal configuration of raft.grpc.message.size.max (apache#7128)
41d8147 HDDS-11368. Remove dependency on Babel in Vite (apache#7119)
0bd8ba1 HDDS-11372. No coverage for org.apache.ozone packages (apache#7124)
3bd237d HDDS-11325. (addendum) Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7121)
8306290 HDDS-11373. Log for EC reconstruction command lists the missing indexes as ASCII control characters (apache#7123)
dab1538 HDDS-11216. Replace HAUtils#buildCAX509List usages with other direct usages (apache#6981)
51a5fb9 Revert "HDDS-11235. Spare InfoBucket RPC call in FileSystem#mkdir() call. (apache#6990)" (apache#7122)
fab56b4 HDDS-11229. Chain optionals in Recon Insight (apache#7064)
2236041 HDDS-11365. Fix the NOTICE file (apache#7120)
2e30dc1 HDDS-11190. Add --fields option to ldb scan command (apache#6976)
0b75cb0 HDDS-11251. Deprecate definitions and remove listTrash and recoverTrash APIs (apache#7060)
be34303 HDDS-9198. Maintain local cache in OMSnapshotPurgeRequest to get updated snapshotInfo and pass the same to OMSnapshotPurgeResponse (apache#7045)
c07b408 HDDS-11208. Change RatisBlockOutputStream to use HDDS-11174. (apache#7072)
8f8d809 HDDS-11309. Increase CONTAINER_STATE Column Length in UNHEALTHY_CONTAINERS to Avoid Truncation (apache#7071)
350a340 HDDS-11364. Bump jgraphx to 3.9.12 (apache#7116)
45b7056 HDDS-11363. Bump develocity-maven-extension to 1.22 (apache#7115)
9dd18f1 HDDS-11362. Bump snappy-java to 1.1.10.6 (apache#7114)
637cb91 HDDS-11361. Bump Jersey2 to 2.44 (apache#7113)

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandlerWithUnhealthyContainer.java
hadoop-ozone/dist/src/main/smoketest/admincli/container.robot

Modified during conflict:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/DNContainerOperationClient.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ReconcileContainerTask.java
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestECKeyOutputStream.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants