Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3-plain based disk supporting directory rename #61116

Merged
merged 22 commits into from
Apr 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -769,6 +769,7 @@ In addition to local block devices, ClickHouse supports these storage types:
- [`web` for read-only from web](#web-storage)
- [`cache` for local caching](/docs/en/operations/storing-data.md/#using-local-cache)
- [`s3_plain` for backups to S3](/docs/en/operations/backup#backuprestore-using-an-s3-disk)
- [`s3_plain_rewritable` for immutable, non-replicated tables in S3](/docs/en/operations/storing-data.md#s3-plain-rewritable-storage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an unbundled configuration option?

(what type of metadata over the s3 object storage?)


## Using Multiple Block Devices for Data Storage {#table_engine-mergetree-multiple-volumes}

Expand Down
32 changes: 31 additions & 1 deletion docs/en/operations/storing-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Starting from 24.1 clickhouse version, it is possible to use a new configuration
It requires to specify:
1. `type` equal to `object_storage`
2. `object_storage_type`, equal to one of `s3`, `azure_blob_storage` (or just `azure` from `24.3`), `hdfs`, `local_blob_storage` (or just `local` from `24.3`), `web`.
Optionally, `metadata_type` can be specified (it is equal to `local` by default), but it can also be set to `plain`, `web`.
Optionally, `metadata_type` can be specified (it is equal to `local` by default), but it can also be set to `plain`, `web` and, starting from `24.4`, `plain_rewritable`.
Usage of `plain` metadata type is described in [plain storage section](/docs/en/operations/storing-data.md/#storing-data-on-webserver), `web` metadata type can be used only with `web` object storage type, `local` metadata type stores metadata files locally (each metadata files contains mapping to files in object storage and some additional meta information about them).

E.g. configuration option
Expand Down Expand Up @@ -341,6 +341,36 @@ Configuration:
</s3_plain>
```

### Using S3 Plain Rewritable Storage {#s3-plain-rewritable-storage}
A new disk type `s3_plain_rewritable` was introduced in `24.4`.
Similar to the `s3_plain` disk type, it does not require additional storage for metadata files; instead, metadata is stored in S3.
Unlike `s3_plain` disk type, `s3_plain_rewritable` allows executing merges and supports INSERT operations.
[Mutations](/docs/en/sql-reference/statements/alter#mutations) and replication of tables are not supported.

A use case for this disk type are non-replicated `MergeTree` tables. Although the `s3` disk type is suitable for non-replicated
MergeTree tables, you may opt for the `s3_plain_rewritable` disk type if you do not require local metadata for the table and are
willing to accept a limited set of operations. This could be useful, for example, for system tables.

Configuration:
``` xml
<s3_plain_rewritable>
<type>s3_plain_rewritable</type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_environment_credentials>1</use_environment_credentials>
</s3_plain_rewritable>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also support this type of configuration:

<s3_plain_rewritable>
    <type>object_storage</type>
    <object_storage_type>s3</object_storage_type>
    <metadata_type>plain_rewritable</metadata_type>
    <endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
    <use_environment_credentials>1</use_environment_credentials>
</s3_plain_rewritable>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works now.

```

is equal to
``` xml
<s3_plain_rewritable>
<type>object_storage</type>
<object_storage_type>s3</object_storage_type>
<metadata_type>plain_rewritable</metadata_type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_environment_credentials>1</use_environment_credentials>
</s3_plain_rewritable>
```

### Using Azure Blob Storage {#azure-blob-storage}

`MergeTree` family table engines can store data to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) using a disk with type `azure_blob_storage`.
Expand Down
6 changes: 5 additions & 1 deletion programs/keeper/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,12 @@ if (BUILD_STANDALONE_KEEPER)
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/DiskType.cpp

${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/IObjectStorage.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataOperationsHolder.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorageOperations.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorage.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromDisk.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataFromDiskTransactionState.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageTransactionState.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/DiskObjectStorageMetadata.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFromDiskTransactionOperations.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/DiskObjectStorage.cpp
Expand All @@ -137,6 +140,7 @@ if (BUILD_STANDALONE_KEEPER)
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/S3/S3Capabilities.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/S3/diskSettings.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/S3/DiskS3Utils.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/CommonPathPrefixKeyGenerator.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/ObjectStorageFactory.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/MetadataStorageFactory.cpp
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Disks/ObjectStorages/RegisterDiskObjectStorage.cpp
Expand Down
9 changes: 3 additions & 6 deletions src/Common/ObjectStorageKeyGenerator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,7 @@ class GeneratorWithTemplate : public DB::IObjectStorageKeysGenerator
, re_gen(key_template)
{
}
DB::ObjectStorageKey generate(const String &) const override
{
return DB::ObjectStorageKey::createAsAbsolute(re_gen.generate());
}
DB::ObjectStorageKey generate(const String &, bool) const override { return DB::ObjectStorageKey::createAsAbsolute(re_gen.generate()); }

private:
String key_template;
Expand All @@ -32,7 +29,7 @@ class GeneratorWithPrefix : public DB::IObjectStorageKeysGenerator
: key_prefix(std::move(key_prefix_))
{}

DB::ObjectStorageKey generate(const String &) const override
DB::ObjectStorageKey generate(const String &, bool) const override
{
/// Path to store the new S3 object.

Expand Down Expand Up @@ -63,7 +60,7 @@ class GeneratorAsIsWithPrefix : public DB::IObjectStorageKeysGenerator
: key_prefix(std::move(key_prefix_))
{}

DB::ObjectStorageKey generate(const String & path) const override
DB::ObjectStorageKey generate(const String & path, bool) const override
{
return DB::ObjectStorageKey::createAsRelative(key_prefix, path);
}
Expand Down
5 changes: 3 additions & 2 deletions src/Common/ObjectStorageKeyGenerator.h
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
#pragma once

#include "ObjectStorageKey.h"
#include <memory>
#include "ObjectStorageKey.h"

namespace DB
{

class IObjectStorageKeysGenerator
{
public:
virtual ObjectStorageKey generate(const String & path) const = 0;
virtual ~IObjectStorageKeysGenerator() = default;

virtual ObjectStorageKey generate(const String & path, bool is_directory) const = 0;
};

using ObjectStorageKeysGeneratorPtr = std::shared_ptr<IObjectStorageKeysGenerator>;
Expand Down
2 changes: 2 additions & 0 deletions src/Disks/DiskType.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ MetadataStorageType metadataTypeFromString(const String & type)
return MetadataStorageType::Local;
if (check_type == "plain")
return MetadataStorageType::Plain;
if (check_type == "plain_rewritable")
return MetadataStorageType::PlainRewritable;
if (check_type == "web")
return MetadataStorageType::StaticWeb;

Expand Down
1 change: 1 addition & 0 deletions src/Disks/DiskType.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ enum class MetadataStorageType
None,
Local,
Plain,
PlainRewritable,
StaticWeb,
};

Expand Down
2 changes: 2 additions & 0 deletions src/Disks/IDisk.h
Original file line number Diff line number Diff line change
Expand Up @@ -363,6 +363,8 @@ class IDisk : public Space

virtual bool isWriteOnce() const { return false; }

virtual bool supportsHardLinks() const { return true; }

/// Check if disk is broken. Broken disks will have 0 space and cannot be used.
virtual bool isBroken() const { return false; }

Expand Down
72 changes: 72 additions & 0 deletions src/Disks/ObjectStorages/CommonPathPrefixKeyGenerator.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#include "CommonPathPrefixKeyGenerator.h"

#include <Common/getRandomASCIIString.h>

#include <deque>
#include <filesystem>
#include <tuple>

namespace DB
{

CommonPathPrefixKeyGenerator::CommonPathPrefixKeyGenerator(
String key_prefix_, SharedMutex & shared_mutex_, std::weak_ptr<PathMap> path_map_)
: storage_key_prefix(key_prefix_), shared_mutex(shared_mutex_), path_map(std::move(path_map_))
{
}

ObjectStorageKey CommonPathPrefixKeyGenerator::generate(const String & path, bool is_directory) const
{
const auto & [object_key_prefix, suffix_parts] = getLongestObjectKeyPrefix(path);

auto key = std::filesystem::path(object_key_prefix.empty() ? storage_key_prefix : object_key_prefix);

/// The longest prefix is the same as path, meaning that the path is already mapped.
jkartseva marked this conversation as resolved.
Show resolved Hide resolved
if (suffix_parts.empty())
return ObjectStorageKey::createAsRelative(std::move(key));

/// File and top-level directory paths are mapped as is.
if (!is_directory || object_key_prefix.empty())
for (const auto & part : suffix_parts)
key /= part;
/// Replace the last part of the directory path with a pseudorandom suffix.
else
{
for (size_t i = 0; i + 1 < suffix_parts.size(); ++i)
key /= suffix_parts[i];

constexpr size_t part_size = 16;
key /= getRandomASCIIString(part_size);
}

return ObjectStorageKey::createAsRelative(key);
}

std::tuple<std::string, std::vector<std::string>> CommonPathPrefixKeyGenerator::getLongestObjectKeyPrefix(const std::string & path) const
{
std::filesystem::path p(path);
std::deque<std::string> dq;

std::shared_lock lock(shared_mutex);

auto ptr = path_map.lock();

while (p != p.root_path())
{
auto it = ptr->find(p / "");
if (it != ptr->end())
{
std::vector<std::string> vec(std::make_move_iterator(dq.begin()), std::make_move_iterator(dq.end()));
return std::make_tuple(it->second, std::move(vec));
}

if (!p.filename().empty())
dq.push_front(p.filename());

p = p.parent_path();
}

return {std::string(), std::vector<std::string>(std::make_move_iterator(dq.begin()), std::make_move_iterator(dq.end()))};
}

}
41 changes: 41 additions & 0 deletions src/Disks/ObjectStorages/CommonPathPrefixKeyGenerator.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#pragma once

#include <Common/ObjectStorageKeyGenerator.h>
#include <Common/SharedMutex.h>

#include <filesystem>
#include <map>

namespace DB
{

/// Object storage key generator used specifically with the
/// MetadataStorageFromPlainObjectStorage if multiple writes are allowed.

/// It searches for the local (metadata) path in a pre-loaded path map.
/// If no such path exists, it searches for the parent path, until it is found
/// or no parent path exists.
///
/// The key generator ensures that the original directory hierarchy is
/// preserved, which is required for the MergeTree family.
class CommonPathPrefixKeyGenerator : public IObjectStorageKeysGenerator
{
public:
/// Local to remote path map. Leverages filesystem::path comparator for paths.
using PathMap = std::map<std::filesystem::path, std::string>;

explicit CommonPathPrefixKeyGenerator(String key_prefix_, SharedMutex & shared_mutex_, std::weak_ptr<PathMap> path_map_);

ObjectStorageKey generate(const String & path, bool is_directory) const override;

private:
/// Longest key prefix and unresolved parts of the source path.
std::tuple<std::string, std::vector<String>> getLongestObjectKeyPrefix(const String & path) const;

const String storage_key_prefix;

SharedMutex & shared_mutex;
std::weak_ptr<PathMap> path_map;
};

}
35 changes: 25 additions & 10 deletions src/Disks/ObjectStorages/DiskObjectStorage.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -112,20 +112,21 @@ size_t DiskObjectStorage::getFileSize(const String & path) const
return metadata_storage->getFileSize(path);
}

void DiskObjectStorage::moveDirectory(const String & from_path, const String & to_path)
{
if (send_metadata)
sendMoveMetadata(from_path, to_path);

auto transaction = createObjectStorageTransaction();
transaction->moveDirectory(from_path, to_path);
transaction->commit();
}

void DiskObjectStorage::moveFile(const String & from_path, const String & to_path, bool should_send_metadata)
{

if (should_send_metadata)
{
auto revision = metadata_helper->revision_counter + 1;
metadata_helper->revision_counter += 1;

const ObjectAttributes object_metadata {
{"from_path", from_path},
{"to_path", to_path}
};
metadata_helper->createFileOperationObject("rename", revision, object_metadata);
}
sendMoveMetadata(from_path, to_path);

auto transaction = createObjectStorageTransaction();
transaction->moveFile(from_path, to_path);
Expand Down Expand Up @@ -409,6 +410,15 @@ bool DiskObjectStorage::tryReserve(UInt64 bytes)

return false;
}
void DiskObjectStorage::sendMoveMetadata(const String & from_path, const String & to_path)
{
chassert(send_metadata);
auto revision = metadata_helper->revision_counter + 1;
metadata_helper->revision_counter += 1;

const ObjectAttributes object_metadata{{"from_path", from_path}, {"to_path", to_path}};
metadata_helper->createFileOperationObject("rename", revision, object_metadata);
}

bool DiskObjectStorage::supportsCache() const
{
Expand All @@ -425,6 +435,11 @@ bool DiskObjectStorage::isWriteOnce() const
return object_storage->isWriteOnce();
}

bool DiskObjectStorage::supportsHardLinks() const
{
return !isWriteOnce() && !object_storage->isPlain();
}

DiskObjectStoragePtr DiskObjectStorage::createDiskObjectStorage()
{
const auto config_prefix = "storage_configuration.disks." + name;
Expand Down
5 changes: 4 additions & 1 deletion src/Disks/ObjectStorages/DiskObjectStorage.h
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ friend class DiskObjectStorageRemoteMetadataRestoreHelper;

void clearDirectory(const String & path) override;

void moveDirectory(const String & from_path, const String & to_path) override { moveFile(from_path, to_path); }
void moveDirectory(const String & from_path, const String & to_path) override;

void removeDirectory(const String & path) override;

Expand Down Expand Up @@ -183,6 +183,8 @@ friend class DiskObjectStorageRemoteMetadataRestoreHelper;
/// MergeTree table on this disk.
bool isWriteOnce() const override;

bool supportsHardLinks() const override;

/// Get structure of object storage this disk works with. Examples:
/// DiskObjectStorage(S3ObjectStorage)
/// DiskObjectStorage(CachedObjectStorage(S3ObjectStorage))
Expand Down Expand Up @@ -228,6 +230,7 @@ friend class DiskObjectStorageRemoteMetadataRestoreHelper;
std::mutex reservation_mutex;

bool tryReserve(UInt64 bytes);
void sendMoveMetadata(const String & from_path, const String & to_path);

const bool send_metadata;

Expand Down
Loading
Loading