-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3-plain based disk supporting directory rename #61116
Merged
alexey-milovidov
merged 22 commits into
ClickHouse:master
from
jkartseva:s3-plain-with-replace
Apr 30, 2024
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
75147f3
S3-plain based disk supporting directory rename
jkartseva 7916792
Do not create directory metadata recursively
jkartseva b437513
concurrency control
jkartseva d1e5a09
better transaction rollback
jkartseva a67a829
Do not list prefix.path in listDirectory
jkartseva cb3cf73
more tests
jkartseva a3ce616
add integration test
jkartseva 5e716bc
extend system logs integration tests
jkartseva 8c99b0d
docs
jkartseva 01ee500
improvements, cleanups, comments
jkartseva 89f28f3
explicitly disallow ALTERs and mutations for plain
jkartseva 70d55aa
Update src/Disks/ObjectStorages/CommonPathPrefixKeyGenerator.h
jkartseva 4f6a3e2
Update src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage…
jkartseva 36a1cae
address feedback - pt.1
jkartseva 4a7f28f
address feedback - pt.2
jkartseva 802ee27
address feedback - pt.3
jkartseva d1217af
address feedback - pt.4
jkartseva 24d5abb
extract plain_rewritable metadata type
jkartseva c1d62dd
documentation for unbundled configuration
jkartseva 508a42b
use ordered map for path map
jkartseva 3c1207e
remove path normalization
jkartseva dc95558
method rename
jkartseva File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,7 +28,7 @@ Starting from 24.1 clickhouse version, it is possible to use a new configuration | |
It requires to specify: | ||
1. `type` equal to `object_storage` | ||
2. `object_storage_type`, equal to one of `s3`, `azure_blob_storage` (or just `azure` from `24.3`), `hdfs`, `local_blob_storage` (or just `local` from `24.3`), `web`. | ||
Optionally, `metadata_type` can be specified (it is equal to `local` by default), but it can also be set to `plain`, `web`. | ||
Optionally, `metadata_type` can be specified (it is equal to `local` by default), but it can also be set to `plain`, `web` and, starting from `24.4`, `plain_rewritable`. | ||
Usage of `plain` metadata type is described in [plain storage section](/docs/en/operations/storing-data.md/#storing-data-on-webserver), `web` metadata type can be used only with `web` object storage type, `local` metadata type stores metadata files locally (each metadata files contains mapping to files in object storage and some additional meta information about them). | ||
|
||
E.g. configuration option | ||
|
@@ -341,6 +341,36 @@ Configuration: | |
</s3_plain> | ||
``` | ||
|
||
### Using S3 Plain Rewritable Storage {#s3-plain-rewritable-storage} | ||
A new disk type `s3_plain_rewritable` was introduced in `24.4`. | ||
Similar to the `s3_plain` disk type, it does not require additional storage for metadata files; instead, metadata is stored in S3. | ||
Unlike `s3_plain` disk type, `s3_plain_rewritable` allows executing merges and supports INSERT operations. | ||
[Mutations](/docs/en/sql-reference/statements/alter#mutations) and replication of tables are not supported. | ||
|
||
A use case for this disk type are non-replicated `MergeTree` tables. Although the `s3` disk type is suitable for non-replicated | ||
MergeTree tables, you may opt for the `s3_plain_rewritable` disk type if you do not require local metadata for the table and are | ||
willing to accept a limited set of operations. This could be useful, for example, for system tables. | ||
|
||
Configuration: | ||
``` xml | ||
<s3_plain_rewritable> | ||
<type>s3_plain_rewritable</type> | ||
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint> | ||
<use_environment_credentials>1</use_environment_credentials> | ||
</s3_plain_rewritable> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should also support this type of configuration: <s3_plain_rewritable>
<type>object_storage</type>
<object_storage_type>s3</object_storage_type>
<metadata_type>plain_rewritable</metadata_type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_environment_credentials>1</use_environment_credentials>
</s3_plain_rewritable> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It works now. |
||
``` | ||
|
||
is equal to | ||
``` xml | ||
<s3_plain_rewritable> | ||
<type>object_storage</type> | ||
<object_storage_type>s3</object_storage_type> | ||
<metadata_type>plain_rewritable</metadata_type> | ||
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint> | ||
<use_environment_credentials>1</use_environment_credentials> | ||
</s3_plain_rewritable> | ||
``` | ||
|
||
### Using Azure Blob Storage {#azure-blob-storage} | ||
|
||
`MergeTree` family table engines can store data to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) using a disk with type `azure_blob_storage`. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,7 @@ enum class MetadataStorageType | |
None, | ||
Local, | ||
Plain, | ||
PlainRewritable, | ||
StaticWeb, | ||
}; | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
#include "CommonPathPrefixKeyGenerator.h" | ||
|
||
#include <Common/getRandomASCIIString.h> | ||
|
||
#include <deque> | ||
#include <filesystem> | ||
#include <tuple> | ||
|
||
namespace DB | ||
{ | ||
|
||
CommonPathPrefixKeyGenerator::CommonPathPrefixKeyGenerator( | ||
String key_prefix_, SharedMutex & shared_mutex_, std::weak_ptr<PathMap> path_map_) | ||
: storage_key_prefix(key_prefix_), shared_mutex(shared_mutex_), path_map(std::move(path_map_)) | ||
{ | ||
} | ||
|
||
ObjectStorageKey CommonPathPrefixKeyGenerator::generate(const String & path, bool is_directory) const | ||
{ | ||
const auto & [object_key_prefix, suffix_parts] = getLongestObjectKeyPrefix(path); | ||
|
||
auto key = std::filesystem::path(object_key_prefix.empty() ? storage_key_prefix : object_key_prefix); | ||
|
||
/// The longest prefix is the same as path, meaning that the path is already mapped. | ||
jkartseva marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (suffix_parts.empty()) | ||
return ObjectStorageKey::createAsRelative(std::move(key)); | ||
|
||
/// File and top-level directory paths are mapped as is. | ||
if (!is_directory || object_key_prefix.empty()) | ||
for (const auto & part : suffix_parts) | ||
key /= part; | ||
/// Replace the last part of the directory path with a pseudorandom suffix. | ||
else | ||
{ | ||
for (size_t i = 0; i + 1 < suffix_parts.size(); ++i) | ||
key /= suffix_parts[i]; | ||
|
||
constexpr size_t part_size = 16; | ||
key /= getRandomASCIIString(part_size); | ||
} | ||
|
||
return ObjectStorageKey::createAsRelative(key); | ||
} | ||
|
||
std::tuple<std::string, std::vector<std::string>> CommonPathPrefixKeyGenerator::getLongestObjectKeyPrefix(const std::string & path) const | ||
{ | ||
std::filesystem::path p(path); | ||
std::deque<std::string> dq; | ||
|
||
std::shared_lock lock(shared_mutex); | ||
|
||
auto ptr = path_map.lock(); | ||
|
||
while (p != p.root_path()) | ||
{ | ||
auto it = ptr->find(p / ""); | ||
if (it != ptr->end()) | ||
{ | ||
std::vector<std::string> vec(std::make_move_iterator(dq.begin()), std::make_move_iterator(dq.end())); | ||
return std::make_tuple(it->second, std::move(vec)); | ||
} | ||
|
||
if (!p.filename().empty()) | ||
dq.push_front(p.filename()); | ||
|
||
p = p.parent_path(); | ||
} | ||
|
||
return {std::string(), std::vector<std::string>(std::make_move_iterator(dq.begin()), std::make_move_iterator(dq.end()))}; | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
#pragma once | ||
|
||
#include <Common/ObjectStorageKeyGenerator.h> | ||
#include <Common/SharedMutex.h> | ||
|
||
#include <filesystem> | ||
#include <map> | ||
|
||
namespace DB | ||
{ | ||
|
||
/// Object storage key generator used specifically with the | ||
/// MetadataStorageFromPlainObjectStorage if multiple writes are allowed. | ||
|
||
/// It searches for the local (metadata) path in a pre-loaded path map. | ||
/// If no such path exists, it searches for the parent path, until it is found | ||
/// or no parent path exists. | ||
/// | ||
/// The key generator ensures that the original directory hierarchy is | ||
/// preserved, which is required for the MergeTree family. | ||
class CommonPathPrefixKeyGenerator : public IObjectStorageKeysGenerator | ||
{ | ||
public: | ||
/// Local to remote path map. Leverages filesystem::path comparator for paths. | ||
using PathMap = std::map<std::filesystem::path, std::string>; | ||
|
||
explicit CommonPathPrefixKeyGenerator(String key_prefix_, SharedMutex & shared_mutex_, std::weak_ptr<PathMap> path_map_); | ||
|
||
ObjectStorageKey generate(const String & path, bool is_directory) const override; | ||
|
||
private: | ||
/// Longest key prefix and unresolved parts of the source path. | ||
std::tuple<std::string, std::vector<String>> getLongestObjectKeyPrefix(const String & path) const; | ||
|
||
const String storage_key_prefix; | ||
|
||
SharedMutex & shared_mutex; | ||
std::weak_ptr<PathMap> path_map; | ||
}; | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is an unbundled configuration option?
(what type of metadata over the s3 object storage?)