GitHub

Context: Storj is a service that aggregates unused storage of many, many people and makes it available for commercial purposes.

Each storage node is identified by a 32-byte node ID. The basic unit of content is a piece, identified by a 32-byte piece ID. Nodes store millions of pieces. To save on API calls and other types of overhead, instead of deleting every single piece separately, Storj sends to nodes a bloom filter which defines a probabilistic bitmask of pieces that the node is supposed to store, with the message that the node can remove any pieces that do not match the bloom filter.

As of writing this document, the approach to construct bloom filters is to keep bloom filters for all nodes in memory, iterate over content metadata and update all relevant filters. This requires a lot of memory, and Storj has noted this is a problem.

This repository is a proof-of-concept of a different approach to constructing bloom filters, where the list of pieces is first sorted by nodes using radix sort on external memory (files). Then, iterating over the sorted list, only a single filter needs to be stored in memory.

I wrote this proof-of-concept in C++ to refresh my knowledge of C++. As it is just a feasibility study, code is probably a bit substandard. Yet I believe that it demonstrates that a faster approach to construct bloom filters in Storj is possible, and pretty simple.

References:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
CMakeLists.txt		CMakeLists.txt
README.md		README.md
bloomfilter.cpp		bloomfilter.cpp
generator.cpp		generator.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

liori/bloom_filters_for_nodes

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages