Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIGTOP-3921: Add ZSTD Codec Support for hadoop #1095

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rzuo
Copy link
Contributor

@rzuo rzuo commented Mar 28, 2023

Description of PR

Add ZSTD codec compiled to hadoop native library by default.

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'BIGTOP-3638. Your PR title ...')?
  • Make sure that newly added files do not have any licensing issues. When in doubt refer to https://www.apache.org/licenses/

@iwasakims
Copy link
Member

We need fix for toolchain too to support this on all supported distros/platforms.

@iwasakims
Copy link
Member

Is the Zstandard licensed under the one we can bundle it by default?
https://www.apache.org/legal/resolved.html

@rzuo
Copy link
Contributor Author

rzuo commented Mar 28, 2023

We need fix for toolchain too to support this on all supported distros/platforms.

In bigtop_toolchain/manifests/packages.pp, libzstd-devel has been included, so anything else need to modify?

Thanks
Robin

@rzuo
Copy link
Contributor Author

rzuo commented Mar 28, 2023

Is the Zstandard licensed under the one we can bundle it by default? https://www.apache.org/legal/resolved.html

By my understanding, this fix only make the hadoop native library compiled with pre-installed ZSTD binary, no ZSTD (GPL-2.0 license) code included in this repository.
If no zstd library present, this script will compile as before.

Thanks
Robin

@guyuqi
Copy link
Member

guyuqi commented Mar 28, 2023

Is the Zstandard licensed under the one we can bundle it by default? https://www.apache.org/legal/resolved.html

zsdt's license is BSD (https://github.com/facebook/zstd/blob/dev/LICENSE).
As far as I know, Apache 2.0 license is compatible with BSD. (Pls correct me if I had misunderstanding)

@rzuo
Copy link
Contributor Author

rzuo commented Mar 28, 2023

Is the Zstandard licensed under the one we can bundle it by default? https://www.apache.org/legal/resolved.html

zsdt's license is BSD (https://github.com/facebook/zstd/blob/dev/LICENSE). As far as I know, Apache 2.0 license is compatible with BSD. (Pls correct me if I had misunderstanding)

aha, you are correct, i miss understood that zstd is GPL licensed.

@iwasakims
Copy link
Member

In bigtop_toolchain/manifests/packages.pp, libzstd-devel has been included, so anything else need to modify?

Oops. I forgot about BIGTOP-3535.

@iwasakims
Copy link
Member

iwasakims commented Mar 28, 2023

By my understanding, this fix only make the hadoop native library compiled with pre-installed ZSTD binary, no ZSTD (GPL-2.0 license) code included in this repository.

We are publishing pre built packages for users' convenience. We can not link it against pre-built library if it is distributed under GPL. Zstandard seems to be dual-licensed under BSD and GPLv2. I'm not confident about which is applied to pre-built libzstd.so redistributed by OS distros.

Comment on lines +124 to +126
[ -f /usr/lib/libzstd.so ] && BUNDLE_ZSTD="-Dbundle.zstd=true -Dzstd.lib=/usr/lib"
[ -f /usr/lib64/libzstd.so ] && BUNDLE_ZSTD="-Dbundle.zstd=true -Dzstd.lib=/usr/lib64"
[ -f /usr/lib/${HOSTTYPE}-linux-gnu/libzstd.so ] && BUNDLE_ZSTD="-Dbundle.zstd=true -Dzstd.lib=/usr/lib/${HOSTTYPE}-linux-gnu"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expect all supported distro/platform have libzstd.so? We can add package dependency on libzstd to make it certain that the required library is installed to runtime environment if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emmm, I only have CentOS 7.4 at hand.
Can someone help to verify other distros/platforms?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emmm, I only have CentOS 7.4 at hand. Can someone help to verify other distros/platforms?

Assume your development environment is x86 64/amd64-based.
Please try to test it in various Distros of Bigtop docker images for x86 64/amd64 first.
(https://hub.docker.com/r/bigtop/slaves/tags)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll paste the result later, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With bigtop docker images for x86_64, got following result:

  1. centos-7

image

image

2. debian-11

image

image

3. ubuntu-22.04

image

image

4. fedora-36

image

image

5. rockylinux-8

image

image

… rpm package, so that no system zstd library dependency is needed.
@guyuqi
Copy link
Member

guyuqi commented Apr 17, 2023

By my understanding, this fix only make the hadoop native library compiled with pre-installed ZSTD binary, no ZSTD (GPL-2.0 license) code included in this repository.

We are publishing pre built packages for users' convenience. We can not link it against pre-built library if it is distributed under GPL. Zstandard seems to be dual-licensed under BSD and GPLv2. I'm not confident about which is applied to pre-built libzstd.so redistributed by OS distros.

I could completely understand your concerns for LICENSE. The terms of the license are too complicated to comprehend well.
Fortunately, it appears that Apache Spark does support ZSTD and Spark guys had updated LICENSE and NOTICE in SPARK-24654.
They provided the extra license statement in source tree: LICENSE-binary-zstd.
Shall we also create the similar LICENSE file for zstd binary in Bigtop?

What do you think of it? @iwasakims

@iwasakims
Copy link
Member

iwasakims commented Apr 18, 2023

Spark bundles pre-built binary of zstd-jni (containing zstd as part of it). I'm concerning about the license of libzstd binary distributed by OS distros here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants