Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

理解 LSM 树:一种适用于频繁写入的数据库的结构 #7795

Merged
merged 35 commits into from
Jan 18, 2021

Conversation

cool-summer-021
Copy link
Contributor

译文翻译完成,resolve #7771

cool-summer-021 and others added 30 commits September 1, 2020 16:24
Python的优化 — 驻留机制
同步原项目的更新内容
同步原项目更新的内容
Python List 使用注意事项
根据校对意见完成修改
根据校对意见完成修改
同步原仓库的更新
同步更新的内容
根据校对意见修改完成
为什么如今 Deno 正全面取代 Node.js
删除多余内容
根据校对意见修改完成
更新原仓库的内容
翻译完成
翻译完成
@lsvih lsvih changed the title Translate/understanding lsm trees 理解 LSM 树:一种适用于频繁写入的数据库的结构 Dec 29, 2020
@chzh9311
Copy link
Contributor

@lsvih 校对认领

@lsvih
Copy link
Member

lsvih commented Jan 11, 2021

@chzh9311 好的~

Copy link
Contributor

@chzh9311 chzh9311 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lsvih @SamYu2000 校对完成


# SSTables

LSM trees are persisted to disk using a **Sorted Strings Table (SSTable)** format. As indicated by the name, SSTables are a format for storing key-value pairs in which the keys are in sorted order. An SSTable will consist of multiple sorted files called **segments**. These segments are immutable once they are written to disk. A simple example could look like this:
LSM 树使用 **Sorted Strings Table (SSTable)** 格式持久化于磁盘中。顾名思义,SSTables 是一种存储 key-value 对的格式,其中 key 是经过排序的。一个 SSTable 是由若干已排序的文件组成的,这些文件称为 **segments**。这些 segments 一经写入磁盘,就处于不可变状态。我们来看一个简单的例子:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『SSTables 是一种存储 key-value 对的格式』=>『SSTable 是一种存储 key-value 对的格式』
复数形式->单数形式


Recall that LSM trees only perform sequential writes. You may be wondering how we sequentially write our data in a sorted format when values may be written in any order. This is solved by using an in-memory tree structure. This is frequently referred to as a **memtable**, but the underlying data structure is generally some form of a sorted tree like a [red-black tree](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree). As writes come in, the data is added to this red-black tree.
我们来回顾下,LSM 树只能处理顺序写入。您可能不知道如何在写入值是无序的情况下顺序写入数据。这个问题可以使用内存中的树结构来解决。它通常被称为 **内存表**,从本质上来看,它是一种经排序的树,类似于[红黑树](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree)。进行数据更新时,数据存入这个红黑树。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『您可能不知道如何在写入值是无序的情况下顺序写入数据。』=> 『您可能想知道如何在写入值是无序的情况下顺序写入数据。』

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder也有”不知道“的意思。


![](https://yetanotherdevblog.com/content/images/2020/06/output-onlinepngtools--4-.png)

Our writes get stored in this red-black tree until the tree reaches a predefined size. Once the red-black tree has enough entries, it is flushed to disk as a segment on disk in sorted order. This allows us to write the segment file as a single sequential write even though the inserts may occur in any order.
我们写入的数据存在红黑树中,直到树的大小达到某个预设的值为止。此时红黑树有了足够的数据元素,它就作为一个有序的片段转移到磁盘上。因此,我们就能以单个顺序写入的方式更新这个片段,即使插入的数据是无序的也可以实现。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『因此,我们就能以单个顺序写入的方式更新这个片段』=>『这样,我们就能以单次顺序写入的方式更新这个片段』


![](https://yetanotherdevblog.com/content/images/2020/06/output-onlinepngtools--6-.png)

We can use this index to quickly find the offsets for values that would come before and after the key we want. Now we only have to scan a small portion of each segment file based on those bounds. For example, let's consider a scenario where we want to look up the key `dollar` in the segment above. We can perform a binary search on our sparse index to find that `dollar` comes between `dog` and `downgrade`. Now we only need to scan from offset 17208 to 19504 in order to find the value (or determine it is missing).
我们使用这样的索引,可以快速得到需要的 key 前后的值的偏移量。现在我们只需要对边界符合条件的 segment 进行扫描。例如,我们需要在上述的 segment 中查找名为 `dollar` 的key。我们可以在稀疏索引中进行二分搜索,结果发现 `dollar` 位于 `dog` `downgrade` 之间。此时我们只需要在偏移量为 17208 19504 之间的数据进行扫描,就能找到需要的值。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『现在我们只需要对边界符合条件的 segment 进行扫描。』=>『现在只需要对每个 segment 中符合边界条件的一小部分进行扫描。』
看原文意思,这个索引应该是对一个 segment 而言的。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『例如,我们需要在上述的 segment 中查找名为 dollar 的key。』=>『例如,我们需要在上图所示的 segment 中查找名为 dollar 的 key。』


This is a nice improvement, but what about looking up records that do not exist? We will still end up looping over all segment files and fail to find the key in each segment. This is something that a [bloom filter](https://yetanotherdevblog.com/bloom-filters/) can help us out with. A bloom filter is a space-efficient data structure that can tell us if a value is missing from our data. We can add entries to a bloom filter as they are written and check it at the beginning of reads in order to efficiently respond to requests for missing data.
这种优化方法很好,但如果查找不存在的记录会怎样呢?如果沿袭上述办法,我们仍然需要遍历所有的 segment 文件,才能得到查找目标不存在的结果。在此情况下,就需要使用[布隆过滤器](https://yetanotherdevblog.com/bloom-filters/)了。布隆过滤器是一种空间效率较高的数据结构,它用于检测数据中某个值素是否存在。我们可以把记录添加到布隆过滤器,这些记录写入后,布隆过滤器会在开始读取时进行检查,从而高效处理对不存在的数据的请求。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『我们可以把记录添加到布隆过滤器,这些记录写入后,布隆过滤器会在开始读取时进行检查,从而高效处理对不存在的数据的请求。』
=>
『在写入数据的同时,我们可以把记录添加到布隆过滤器;在开始读取时,布隆过滤器就会进行检查,从而高效处理对不存在的数据的请求。』


![](https://yetanotherdevblog.com/content/images/2020/06/output-onlinepngtools--7-.png)

You can see in the example above that segments 1 and 2 both have a value for the key `dog`. Newer segments contain the latest values written, so the value from segment 2 is what gets carried forward into the segment 4. Once the compaction process has written a new segment for the input segments, the old segment files are deleted.
在上述例子中,你可以看到,1 号 segment 与 2 号 segment 中, `dog` 键都有对应的值。新的 segment 包含最新写入的值,所以 2 号 segment 中的值是传入 4 号 segment 中的值。当压缩进程把加入的数据写入一个新的 segment 时,旧 segment 文件就被删除了。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『在上述例子中』=>『在上图所示的例子中』


![](https://yetanotherdevblog.com/content/images/2020/06/output-onlinepngtools--8-.png)

The example above shows that the key `dog` had the value 52 at some point in the past, but now it has a tombstone marker. This indicates that if we receive a request for the key `dog` then we should return a response indicating that the key does not exist. This means that delete requests actually take up disk space initially which many developers may find surprising. Eventually, tombstones will get compacted away so that the value no longer exists on disk.
上述例子说明,名为 `dog` 的 key 原来对应的值是 52,现在打上了 tombstone 标记。这说明如果收到一个获取 key `dog` 的数据的请求,我们应当得到的响应是数据不存在。这说明,删除请求起初占用的磁盘空间很大,令开发者感到吃惊。但最终,打上 tombstone 标记的数据被压缩了,因此相关的值就永远消失了。
Copy link
Contributor

@chzh9311 chzh9311 Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『这说明,删除请求起初占用的磁盘空间很大,令开发者感到吃惊。』=>『这说明,删除请求起初其实是占用磁盘空间的,很多开发者可能对此感到吃惊。』

2. When this tree becomes too large it is flushed to disk with the keys in sorted order.
3. When a read comes in we check the bloom filter. If the bloom filter indicates that the value is not present then we tell the client that the key could not be found. If the bloom filter indicates that the value is present then we begin iterating over our segment files from newest to oldest.
4. For each segment file, we check a sparse index and scan the offsets where we expect the key to be found until we find the key. We'll return the value as soon as we find it in a segment file.
1. 写入的数据存储在内存中的树结构中(也可以称为内存表)。任何支持的数据结构(布隆过滤器和稀疏索引)都会在必要时更新。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『任何支持的数据结构』=>『任何辅助的数据结构』

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不太清楚为什么是辅助的数据结构,或许「任何支持的数据结构类型(...)都会在必要时更新」会更好些。

@Eminlin
Copy link

Eminlin commented Jan 17, 2021

@lsvih 校对认领

@lsvih
Copy link
Member

lsvih commented Jan 17, 2021

@Eminlin 好的~

Copy link

@Eminlin Eminlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

翻译得不错,辛苦了,仅做一些建议。

PS.
文章里的 https://yetanotherdevblog.com/content/images/2020/06/output-onlinepngtools--5-.png 等图片貌似加载很慢


Recall that LSM trees only perform sequential writes. You may be wondering how we sequentially write our data in a sorted format when values may be written in any order. This is solved by using an in-memory tree structure. This is frequently referred to as a **memtable**, but the underlying data structure is generally some form of a sorted tree like a [red-black tree](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree). As writes come in, the data is added to this red-black tree.
我们来回顾下,LSM 树只能处理顺序写入。您可能不知道如何在写入值是无序的情况下顺序写入数据。这个问题可以使用内存中的树结构来解决。它通常被称为 **内存表**,从本质上来看,它是一种经排序的树,类似于[红黑树](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree)。进行数据更新时,数据存入这个红黑树。
Copy link

@Eminlin Eminlin Jan 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『进行数据更新时,数据存入这个红黑树。』=>『当进行数据更新时,将会存入这个红黑树。』


This is a nice improvement, but what about looking up records that do not exist? We will still end up looping over all segment files and fail to find the key in each segment. This is something that a [bloom filter](https://yetanotherdevblog.com/bloom-filters/) can help us out with. A bloom filter is a space-efficient data structure that can tell us if a value is missing from our data. We can add entries to a bloom filter as they are written and check it at the beginning of reads in order to efficiently respond to requests for missing data.
这种优化方法很好,但如果查找不存在的记录会怎样呢?如果沿袭上述办法,我们仍然需要遍历所有的 segment 文件,才能得到查找目标不存在的结果。在此情况下,就需要使用[布隆过滤器](https://yetanotherdevblog.com/bloom-filters/)了。布隆过滤器是一种空间效率较高的数据结构,它用于检测数据中某个值素是否存在。我们可以把记录添加到布隆过滤器,这些记录写入后,布隆过滤器会在开始读取时进行检查,从而高效处理对不存在的数据的请求。
Copy link

@Eminlin Eminlin Jan 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『这种优化方法很好』=>『这种方法有了很大的改进』


Over time, this system will accumulate more segment files as it continues to run. These segment files need to be cleaned up and maintained in order to prevent the number of segment files from getting out of hand. This is the responsibility of a process called compaction. Compaction is a background process that is continuously combining old segments together into newer segments.
随着时间的推移,只要系统持续运行,会有越来越多的 segment 文件累计起来。为了防止 segment 文件数量失控,应当对这些 segment 文件进行清理和维护。压缩进程就是负责这些工作的。它是一个后台进程,会持续地把旧 segment 跟新 segment 进行结合。
Copy link

@Eminlin Eminlin Jan 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『随着时间的推移,只要系统持续运行,会有越来越多的 segment 文件累计起来。为了防止 segment 文件数量失控,应当对这些 segment 文件进行清理和维护。』=>
『随着时间的推移,系统在运行过程中,会累计越来越多的 segment 文件。为了防止 segment 文件数量逐渐庞大直至失控,应当对这些 segment 文件进行清理和维护。』


We've covered reading and writing data, but what about deleting data? How do you delete data from the SSTable when the segment files are considered immutable? Deletes  actually follow the exact same path as writing data.  Whenever a delete request is received, a unique marker called a **tombstone** is written for that key.
我们已经讨论了数据的读取和更新,那数据的删除呢?既然 segment 文件是不可变的,那如何把它从 SSTable 中删除呢?实际上,删除跟写入的过程是一样的。无论何时,只要收到删除请求,需要删除的那个 key 就打上了一个被称为 **tombstone** 的标记。
Copy link

@Eminlin Eminlin Jan 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

『需要删除的那个 key 就打上了一个被称为 tombstone 的标记』=>『需要删除的那个 key 就打上具有唯一标识的tombstone 标记』

a unique marker 可以翻译成独特的标记或者唯一的标记,计算机里面 unique 还是有特殊含义的,查阅了相关资料,数据的删除操作确实也是需要唯一标识。


![](https://yetanotherdevblog.com/content/images/2020/06/output-onlinepngtools--8-.png)

The example above shows that the key `dog` had the value 52 at some point in the past, but now it has a tombstone marker. This indicates that if we receive a request for the key `dog` then we should return a response indicating that the key does not exist. This means that delete requests actually take up disk space initially which many developers may find surprising. Eventually, tombstones will get compacted away so that the value no longer exists on disk.
上述例子说明,名为 `dog` 的 key 原来对应的值是 52,现在打上了 tombstone 标记。这说明如果收到一个获取 key `dog` 的数据的请求,我们应当得到的响应是数据不存在。这说明,删除请求起初占用的磁盘空间很大,令开发者感到吃惊。但最终,打上 tombstone 标记的数据被压缩了,因此相关的值就永远消失了。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

「我们应当得到的响应是数据不存在」=> 「我们会收到一个数据不存在的响应」

2. When this tree becomes too large it is flushed to disk with the keys in sorted order.
3. When a read comes in we check the bloom filter. If the bloom filter indicates that the value is not present then we tell the client that the key could not be found. If the bloom filter indicates that the value is present then we begin iterating over our segment files from newest to oldest.
4. For each segment file, we check a sparse index and scan the offsets where we expect the key to be found until we find the key. We'll return the value as soon as we find it in a segment file.
1. 写入的数据存储在内存中的树结构中(也可以称为内存表)。任何支持的数据结构(布隆过滤器和稀疏索引)都会在必要时更新。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不太清楚为什么是辅助的数据结构,或许「任何支持的数据结构类型(...)都会在必要时更新」会更好些。

3. When a read comes in we check the bloom filter. If the bloom filter indicates that the value is not present then we tell the client that the key could not be found. If the bloom filter indicates that the value is present then we begin iterating over our segment files from newest to oldest.
4. For each segment file, we check a sparse index and scan the offsets where we expect the key to be found until we find the key. We'll return the value as soon as we find it in a segment file.
1. 写入的数据存储在内存中的树结构中(也可以称为内存表)。任何支持的数据结构(布隆过滤器和稀疏索引)都会在必要时更新。
2. 当树结构太大时,会以一个有序的片段的形式转移到磁盘上。
Copy link

@Eminlin Eminlin Jan 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

「会以一个有序的片段的形式转移到磁盘上。」 => 「会以一个有序的片段的形式持久化到磁盘上。」

4. For each segment file, we check a sparse index and scan the offsets where we expect the key to be found until we find the key. We'll return the value as soon as we find it in a segment file.
1. 写入的数据存储在内存中的树结构中(也可以称为内存表)。任何支持的数据结构(布隆过滤器和稀疏索引)都会在必要时更新。
2. 当树结构太大时,会以一个有序的片段的形式转移到磁盘上。
3. 读取数据时,我们先检查布隆过滤器。如果布隆过滤器找不到相应的值,就告诉客户端相应的 key 不存在。如果布隆过滤器找到了相应的值,我们就开始从新到旧遍历 segment 文件。
Copy link

@Eminlin Eminlin Jan 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

「我们就开始从新到旧遍历 segment 文件」=> 「就会按照从最新到旧的顺序遍历 segment 文件」

1. 写入的数据存储在内存中的树结构中(也可以称为内存表)。任何支持的数据结构(布隆过滤器和稀疏索引)都会在必要时更新。
2. 当树结构太大时,会以一个有序的片段的形式转移到磁盘上。
3. 读取数据时,我们先检查布隆过滤器。如果布隆过滤器找不到相应的值,就告诉客户端相应的 key 不存在。如果布隆过滤器找到了相应的值,我们就开始从新到旧遍历 segment 文件。
4. 对于每个 segment 文件,我们需要检查稀疏索引并在估计能查找到需要的 key 的位置扫描偏移量,直到我们找到了目标 key 为止。一经找到,就可以返回相应的值。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

「我们需要检查稀疏索引并在估计能查找到需要的 key 的位置扫描偏移量」=>「我们需要检查稀疏索引,并扫描我们期望找到的 key 的偏移量」

@Eminlin
Copy link

Eminlin commented Jan 17, 2021

@lsvih @SamYu2000 @chzh9311 校对完成

根据校对意见修改完成
@lsvih lsvih merged commit 843bb37 into xitu:master Jan 18, 2021
@lsvih
Copy link
Member

lsvih commented Jan 18, 2021

@SamYu2000 已经 merge 啦~ 快快麻溜发布到掘金然后给我发下链接,方便及时添加积分哟。

掘金翻译计划有自己的知乎专栏,你也可以投稿哈,推荐使用一个好用的插件
专栏地址:https://zhuanlan.zhihu.com/juejinfanyi

@cool-summer-021
Copy link
Contributor Author

@lsvih
Copy link
Member

lsvih commented Jan 18, 2021

收到!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

理解 LSM 树:一种适用于频繁写入的数据库的结构
4 participants