-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-11948. [Docs] [User Guide] DistCP integration. #7588
base: master
Are you sure you want to change the base?
Conversation
Change-Id: I4c2e93c00b712b47720ef707d8e8830cbde0e5f2
Change-Id: I6f978c39e409bfb20ae3acce6e70bb95819ab707
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @jojochuang
weight: 4 | ||
menu: | ||
main: | ||
parent: "Application integrations" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Use title case for the parent section: "Application Integration"
|
||
# Hadoop DistCp | ||
|
||
The Hadoop DistCP is a command line, MapReduce-based tool for bulk data copying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can link to the hadoop docs right at the top instead of waiting until the end of the document. We may want to still keep the last sentence in the doc as having the link too for clarity.
The Hadoop DistCP is a command line, MapReduce-based tool for bulk data copying. | |
[Hadoop DistCP](https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html) is a command line, MapReduce-based tool for bulk data copying. |
</property> | ||
``` | ||
|
||
Next, define their logical mappings. For more details, refer to the [OM High Availability]({{< ref "OM-HA.md" >}}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next, define their logical mappings. For more details, refer to the [OM High Availability]({{< ref "OM-HA.md" >}}). | |
Next, define their logical mappings. For more details, refer to [OM High Availability]({{< ref "OM-HA.md" >}}). |
|
||
DistCp performs a file checksum check to ensure file integrity. However, since the default checksum type of HDFS (`CRC32C`) differs from that of Ozone (`CRC32`), the file checksum check will cause the DistCp job to fail. | ||
|
||
To prevent job failures, specify checksum options in the DistCp command to force Ozone to use the same checksum type as HDFS. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also required if the source and destination are reversed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I havent tried but it will require -Ddfs.checksum.type to specify HDFS checksum type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. tried myself and updated the doc to include commands for both Ozone to HDFS and HDFS to Ozone.
Change-Id: I8f34de3bc2f8b33463b17c9413d76d8b65d1bfe8
Change-Id: Ie16e7f508021dbb0b23dd55e83f654bfd37449de
Change-Id: Id3443092796f00890b44a07adce05055710f5d4e
What changes were proposed in this pull request?
HDDS-11948. [Docs] [User Guide] DistCP integration.
Please describe your PR in detail:
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11948
How was this patch tested?
./hadoop-ozone/dev-support/checks/docs.sh