-
Notifications
You must be signed in to change notification settings - Fork 2
Implementation details for a concurrent map
- Map implementation: include/silicondb/map.h
- Tests: test/map.cc
First, we need to evaluate the different data structures that we can consider for the concurrent map:
- A binary tree
- A hash table
- A array
In the case of a binary tree, all tree operations start with the root. Hence every thread would require a unique mutex on the root node. This greatly reduces opportunities for real concurrency. This problem would persist for every subtree since the binary tree is recursive in nature.
Next, let's consider an array. Since an array is a random access data structure, every reader and writer thread would require a unique lock on the entire array to protect data consistency. This almost squanders all opportunity for useful concurrency.
Finally, we evaluate a hash table. In the case of a hash table, the bucket lookup operation for a given key is implicitly thread-safe since it is determined by the hash value of the key. Next, we consider adding elements in the hash bucket. In this case, a shared lock that allows multiple readers and a single writer to access the bucket suffices. The bucket may be implemented as a linked list.
In the following sections, let us elaborate the relevant implementation details:
template <typename Key, typename Value, typename Hash = std::hash<Key>>
class map
{
private:
// map::bucket: hash bucket within the hash table, for storing
// entries with clashing hash values.
class bucket
{
private:
typedef std::pair<Key, Value> bucket_value;
typedef std::list<bucket_value> bucket_data;
typedef typename bucket_data::iterator bucket_iterator;
bucket_data data;
mutable std::shared_mutex mutex; // multi-read, single-write
// map::bucket::find_entry_for: returns a iterator to the first
// (key, value) pair within this buckets data list with given
// argument key.
bucket_iterator find_entry_for(Key const &key)
{
return std::find_if(data.begin(), data.end(),
[&](bucket_value const &item)
{ return item.first == key; });
}
...
First the declaration:
template <typename Key, typename Value, typename Hash = std::hash<Key>>
class map
{
We specify the key, value types as templates, along with the hash function to be used for the Key. std::hash provides default hash functions for most of the common types, and thus qualifies as a sane default for the hash function.
Notice the usage of the std::shared_mutex:
mutable std::shared_mutex mutex; // multi-read, single-write
See https://stackoverflow.com/questions/14131572/always-declare-stdmutex-as-mutable-in-c11 for an in-depth explanation regarding the usage of mutable.
As stated above we required a mutex for multi-read, single-write access.
Finally, we provide a function to find the key, value entry corresponding to a given entry in this bucket.
// map::bucket::find_entry_for: returns a iterator to the first
// (key, value) pair within this buckets data list with given
// argument key.
bucket_iterator find_entry_for(Key const &key)
{
return std::find_if(data.begin(), data.end(),
[&](bucket_value const &item)
{ return item.first == key; });
}
...
Now, we could have qualified this function as const to mark this as a pure function, since it itself doesn't mutate the state of the bucket. However, it does return a mutable iterator which enables mutation in the future.
A more practical reason is that std::find_if takes a template parameter InputIt which is specified by the first parameter passed and returns the same InputIt.
template< class InputIt, class UnaryPredicate >
constexpr InputIt find_if( InputIt first, InputIt last,
UnaryPredicate p );
See https://en.cppreference.com/w/cpp/algorithm/find
If we had marked bucket::find_entry_for with the const qualifier, data.begin() would return a std::list<...>::const_iterator, which cannot be converted to bucket::bucket_iterator. By definition, a const_iterator doesn't allow modification to the elements pointed to by the iterator. However, we require an iterator that allows mutation to the elements, for implementing methods like bucket::put(...) and bucket::unmap(...).
Hence, we refrain from marking this method with the const qualifier.
Finally we implement the public interface for the bucket:
public:
// map::bucket::get: returns the value mapped to the given key
// if present, the default_value passed otherwise
Value get(Key const &key, Value const &default_value)
{
std::shared_lock<std::shared_mutex> lock(mutex);
bucket_iterator const entry = find_entry_for(key);
return entry == data.end() ? default_value : entry->second;
}
// map::bucket::put: establishes a key-value mapping for the
// key-value pair. Creates a new key-value pair in the bucket
// data if the mapping didn't exist before. Otherwise, it
// overwrites the old mapping with the new value.
void put(Key const &key, Value const &value)
{
std::unique_lock<std::shared_mutex> lock(mutex);
bucket_iterator const entry = find_entry_for(key);
if (entry == data.end())
data.push_back(bucket_value(key, value));
else
entry->second = value;
}
// map::bucket::unmap: erases the value associated with this
// key, if present, from the bucket data.
void unmap(Key const &key)
{
std::unique_lock<std::shared_mutex> lock(mutex);
bucket_iterator const entry = find_entry_for(key);
if (entry != data.end())
data.erase(entry);
}
};
In the reader method that is get() we acquire a shared lock with:
std::shared_lock<std::shared_mutex> lock(mutex);
Int the writer methods, put() and unmap(), we acquire a unique lock with:
std::unique_lock<std::shared_mutex> lock(mutex);
The lock is automatically released when their respective enclosing scopes end. In this case, when their enclosing functions exist.
std::shared_lock follows the RAII pattern for locking and unlocking.
In the map, we maintain the buckets and the hashing function as follows:
private:
std::vector<std::unique_ptr<bucket>> buckets;
Hash hasher;
A unique_ptr represents a unique, owning handle to a pointer. It follows RAII and hence can be used to automatically clean up the pointer it holds. In this case, we maintain a vector of unique pointers to buckets. The vector itself contains only the unique pointers and hence is small in size. The unique pointers point to buckets allocated on the heap. Again the vector itself is cleaned up using RAII, also cleaning up the buckets with the unique_ptr handles. Automatic memory deallocation using RAII FTW!
A map is initialized as follows:
public:
map(unsigned int number_of_buckets = 19, Hash hasher_ = Hash())
: buckets(number_of_buckets), hasher(hasher_)
{
for (int i{0}; i != number_of_buckets; ++i)
{
buckets[i].reset(new bucket);
}
Next, we have a bucket lookup method for the map:
private:
bucket &get_bucket(Key const &key) const
{
std::size_t const bucket_index = hasher(key) % buckets.size();
return *buckets[bucket_index];
}
As mentioned above, since this is a read operation, this doesn't require exclusivity.
Finally, we define the public interface for the map:
public:
...
Value get(Key const &key, Value const &default_value = Value()) const
{
return get_bucket(key).get(key, default_value);
}
void put(Key const &key, Value const &value)
{
get_bucket(key).put(key, value);
}
void unmap(Key const &key)
{
get_bucket(key).unmap(key);
}
};
This concludes our discussion of the concurrent, thread-safe map implementation.
Made with ❤️ @solidstatedb