Skip to content

Try to build a (probabalistic) dynamic range filter based on Bloom filter and order-preserving hashing ideas

Notifications You must be signed in to change notification settings

huangyihe/dynamic-range-filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Range filters

Idea

The idea is to have k uniform-randomly sampled order-preserving hash functions, h1, h2, ..., hk.

The hash functions map from the key space we want to do range query on to the Bloom filter bit index space (whose cardinality == number of bits in the Bloom filter), and preserve order. So if x, y in key space satisfy x ≤ y, then hi(x) ≤ hi(y).

Like in a Bloom filter, the output n of each hash function sets the n-th bit of the Bloom filter.

The conjecture is that on average, if k = O(log N) (N is cardinality of key/input space), given x, y where x < y, then:

Eall x, y[mini=1...k(hi(y) - hi(x))] = O(log(y-x))

If the conjecture is true, then approximate range query can be performed by checking O(log(y-x)) bits in the Bloom filter.

The asymptotic time complexity will be the same as a full-blown ordered data structures like balanced search trees, but the sapce required can be much smaller.

Constructing order-preserving hash functions

Assume the hash function is a map {0,1}N → {0,1}n. Typical choice of N may be 64 (64-bit keys), and n can be 30 (for a 64MB Bloom filter).

The hash function can be constructed by:

  1. Choose a "lead-in" offset, denoted by o uniformly from [0,2N-1), which is the hash output for the smallest possible key in the key space.

  2. The hash function is essentially defined by (2n - 1) "separators" inserted among all values [o, 2N).

  3. The separator locations can be sampled using resorvior sampling. This operation will take O(2n) time assuming that N is sufficiently larger than n.

  4. Construct interval map out of separator locations. The interval map maps the interval ends with a particular separators to the order of the separator in sorted order.

  5. The hash function is now just the interval map lookup function. Time complexity is O(n), space complexity O(2n).

About

Try to build a (probabalistic) dynamic range filter based on Bloom filter and order-preserving hashing ideas

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published