pig-hyperloglog

Several Apache Pig user defined functions (UDFs) to compute and use the HyperLogLog algorithm.

Other implementations exist (for example, this one). This project was implemented to complement the hyperloglog mysql plugin and uses the exact same implementation. Thus, it enables you to compute a HLL string in a pig script, import the results into MySQL, and then invoke the MySQL HLL functions on the data to analyze the data and get cardinality estimation.

Usage

Four separate UDFs exist -
HLL_CREATE, HLL_COMPUTE, HLL_MERGE, HLL_MERGE_COMPUTE.
These are exactly the same functions as in the hyperloglog mysql plugin, so check out its documentation.
You can also see the UdfTest.java for examples.

Note: When used from Apache pig, you need to register the project jar file, but also make sure that the libpighll.so file (or DLL on windows) can be found in the java library path.

What if I do not use Apache Pig

The HyperLogLog class is a java class the wraps the underlying c++ implementation.
It can be used from Hadoop map-reduce, Hive, HBase or any other JVM based program.

Compilation

Prerequisites: You should have CMake and Maven 2 installed.

git submodule update --init
cd jni/mysql-hyperloglog
git submodule update --init
cd ..
cmake .
make
cd ..
mvn package

Note: Tested on ubuntu, but should work fine on most platforms.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
jni		jni
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
generate-jni-header.sh		generate-jni-header.sh
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pig-hyperloglog

Usage

What if I do not use Apache Pig

Compilation

About

Releases

Packages

Languages

xadrnd/pig-hyperloglog

Folders and files

Latest commit

History

Repository files navigation

pig-hyperloglog

Usage

What if I do not use Apache Pig

Compilation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages