Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11948. [Docs] [User Guide] DistCP integration. #7588

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jojochuang
Copy link
Contributor

What changes were proposed in this pull request?

HDDS-11948. [Docs] [User Guide] DistCP integration.

Please describe your PR in detail:

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11948

How was this patch tested?

./hadoop-ozone/dev-support/checks/docs.sh

Screenshot 2024-12-17 at 10 42 47 AM Screenshot 2024-12-17 at 11 38 08 AM

Change-Id: I4c2e93c00b712b47720ef707d8e8830cbde0e5f2
Change-Id: I6f978c39e409bfb20ae3acce6e70bb95819ab707
@adoroszlai adoroszlai added the documentation Improvements or additions to documentation label Dec 18, 2024
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @jojochuang

weight: 4
menu:
main:
parent: "Application integrations"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Use title case for the parent section: "Application Integration"


# Hadoop DistCp

The Hadoop DistCP is a command line, MapReduce-based tool for bulk data copying.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can link to the hadoop docs right at the top instead of waiting until the end of the document. We may want to still keep the last sentence in the doc as having the link too for clarity.

Suggested change
The Hadoop DistCP is a command line, MapReduce-based tool for bulk data copying.
[Hadoop DistCP](https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html) is a command line, MapReduce-based tool for bulk data copying.

</property>
```

Next, define their logical mappings. For more details, refer to the [OM High Availability]({{< ref "OM-HA.md" >}}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Next, define their logical mappings. For more details, refer to the [OM High Availability]({{< ref "OM-HA.md" >}}).
Next, define their logical mappings. For more details, refer to [OM High Availability]({{< ref "OM-HA.md" >}}).


DistCp performs a file checksum check to ensure file integrity. However, since the default checksum type of HDFS (`CRC32C`) differs from that of Ozone (`CRC32`), the file checksum check will cause the DistCp job to fail.

To prevent job failures, specify checksum options in the DistCp command to force Ozone to use the same checksum type as HDFS. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also required if the source and destination are reversed?

Copy link
Contributor Author

@jojochuang jojochuang Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I havent tried but it will require -Ddfs.checksum.type to specify HDFS checksum type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. tried myself and updated the doc to include commands for both Ozone to HDFS and HDFS to Ozone.

Change-Id: I8f34de3bc2f8b33463b17c9413d76d8b65d1bfe8
Change-Id: Ie16e7f508021dbb0b23dd55e83f654bfd37449de
Change-Id: Id3443092796f00890b44a07adce05055710f5d4e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants