数据分片是如何在分布式 SQL 数据库中起作用的 #6227

Ultrasteve · 2019-07-26T14:20:15Z

译文翻译完成，resolve #6204

翻译完成，幸苦校对的同学了

翻译完成，幸苦校对同学了

JackEggie · 2019-07-29T06:06:03Z

校对认领

fanyijihua · 2019-07-29T06:06:04Z

@JackEggie 好的呢 🍺

JaneLdq · 2019-07-30T03:05:02Z

校对认领

fanyijihua · 2019-07-30T03:05:04Z

@JaneLdq 妥妥哒 🍻

JackEggie

校对完毕。译文质量不错。校对意见供参考。

JackEggie · 2019-07-30T05:17:45Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-Enterprises of all sizes are embracing rapid modernization of user-facing applications as part of their broader digital transformation strategy. The relational database (RDBMS) infrastructure that such applications rely on suddenly needs to support much larger data sizes and transaction volumes. However, a monolithic RDBMS tends to quickly get overloaded in such scenarios. One of the most common architectures to get more performance and scalability in an RDBMS is to “shard” the data. In this blog, we will learn what sharding is and how it can be used to scale a database. We will also review the pros and cons of common sharding architectures, plus explore how sharding is implemented in distributed SQL-based RDBMS like [YugaByte DB.](https://github.com/YugaByte/yugabyte-db)
+如今，所有规模的企业都在拥抱用户导向应用的高速现代化，以此来作为它们迈向更广阔的数字转型策略的其中一步。因此，这些应用所依赖的 RDBMS（关系型数据库基础设施），如今就需要支持更大的数据量和事务量。然而，在这种场景中，一个单体 RDBMS 通常很快会达到过载状态。数据分片是用于解决这种问题的其中一种最为普遍的架构，它能够使 RDBMS 得到更好的性能和更高的扩展性。在这篇文章中，我们会探讨几种常见分片架构的优劣，还会探索在分布式 SQL 数据库中，例如 [YugaByte DB](https://github.com/YugaByte/yugabyte-db) 是如何实现数据分片的。 


Suggested change

如今，所有规模的企业都在拥抱用户导向应用的高速现代化，以此来作为它们迈向更广阔的数字转型策略的其中一步。因此，这些应用所依赖的 RDBMS（关系型数据库基础设施），如今就需要支持更大的数据量和事务量。然而，在这种场景中，一个单体 RDBMS 通常很快会达到过载状态。数据分片是用于解决这种问题的其中一种最为普遍的架构，它能够使 RDBMS 得到更好的性能和更高的扩展性。在这篇文章中，我们会探讨几种常见分片架构的优劣，还会探索在分布式 SQL 数据库中，例如 [YugaByte DB](https://github.com/YugaByte/yugabyte-db) 是如何实现数据分片的。

如今，所有规模的企业都在拥抱用户导向应用的高速现代化，以此来作为它们迈向更广阔的数字转型策略的其中一步。因此，这些应用所依赖的 RDBMS（关系型数据库基础设施），如今就需要支持更大的数据量和事务量。然而，在这种场景中，一个单体 RDBMS 通常很快会达到过载状态。数据分片是用于解决这种问题的其中一种最为普遍的架构，它能够使 RDBMS 得到更好的性能和更高的扩展性。在这篇文章中，我们会探讨什么是分片、如何使用分片来扩展数据库、以及几种常见分片架构的优劣。我们还会探索在分布式 SQL 数据库中，例如 [YugaByte DB](https://github.com/YugaByte/yugabyte-db) 是如何实现数据分片的。

漏译了一句话。

「以此来作为它们迈向更广阔的数字转型策略的其中一步。」=> 「以此作为它们更广阔的数字转型策略中的一部分」
个人认为这里直译比较好，“迈向策略”感觉有点奇怪

「我们还会探索在分布式 SQL 数据库中，例如 YugaByte DB 是如何实现数据分片的。」=> 「我们还会探索在如 YugaByte DB 这样的分布式 SQL 数据库中是如何实现数据分片的。」

JackEggie · 2019-07-30T05:33:07Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-One of the most significant challenges with manual sharding is uneven shard allocation. Disproportionate distribution of data could cause shards to become unbalanced, with some overloaded while others remain relatively empty. It’s best to avoid accruing too much data on a shard, because a hotspot can lead to slowdowns and server crashes. This problem could also arise from a small shard set, which forces data to be spread across too few shards. This is acceptable in development and testing environments, but not in production. Uneven data distribution, hotspots, and storing data on too few shards can all cause shard and server resource exhaustion.
+手动分片的其中一个重大挑战便是不平均的分片。不成比例的分配数据将导致分片变得不平衡，这意味着当一些节点过载时其他节点可能是空闲的。因为部分节点的过载可能会拖累整体的响应速度并导致服务崩溃，我们最好在分片时尽可能少的增加数据。这个问题也有可能在一个小的分片集中发生，因为小的分片集意味着将数据分散到极少数量的分片中。这虽然在开发环境和测试环境中是可以接受的，但生产环境中是不允许的。不平均的数据分配，部分节点过载和过少的数据分配都会导致分片和服务资源的枯竭。


Suggested change

手动分片的其中一个重大挑战便是不平均的分片。不成比例的分配数据将导致分片变得不平衡，这意味着当一些节点过载时其他节点可能是空闲的。因为部分节点的过载可能会拖累整体的响应速度并导致服务崩溃，我们最好在分片时尽可能少的增加数据。这个问题也有可能在一个小的分片集中发生，因为小的分片集意味着将数据分散到极少数量的分片中。这虽然在开发环境和测试环境中是可以接受的，但生产环境中是不允许的。不平均的数据分配，部分节点过载和过少的数据分配都会导致分片和服务资源的枯竭。

手动分片的其中一个重大挑战便是不平均的分片。不成比例的分配数据将导致分片变得不平衡，这意味着当一些节点过载时其他节点可能是空闲的。因为部分节点的过载可能会拖累整体的响应速度并导致服务崩溃，我们要尽量避免在一个分片中存入过多的数据。这个问题也有可能在一个小的分片集中发生，因为小的分片集意味着将数据分散到极少数量的分片中。这虽然在开发环境和测试环境中是可以接受的，但生产环境中是不允许的。不平均的数据分配，部分节点过载和过少的数据分配都会导致分片和服务资源的枯竭。

JackEggie · 2019-07-30T05:35:41Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-Finally, manual sharding can complicate operational processes. Backups will now have to be performed for multiple servers. Data migration and schema changes must be carefully coordinated to ensure all shards have the same schema copy. Without sufficient optimization, database joins across multiple servers could highly inefficient and difficult to perform.
+最后，手动分片会使操作过程复杂化。现在需要在多个服务器中进行备份了。为了保证所有分片都有相同的模式，数据迁移和模式的变化现在需要更小心的进行协调。在缺乏足够优化的情况下，在多个服务器中进行数据库 join 操作会变得不高效和难以执行。


Suggested change

最后，手动分片会使操作过程复杂化。现在需要在多个服务器中进行备份了。为了保证所有分片都有相同的模式，数据迁移和模式的变化现在需要更小心的进行协调。在缺乏足够优化的情况下，在多个服务器中进行数据库 join 操作会变得不高效和难以执行。

最后，手动分片会使操作过程复杂化。现在需要在多个服务器中进行备份了。为了保证所有分片都有相同的结构，数据迁移和表结构的变化现在需要更小心的进行协调。在缺乏足够优化的情况下，在多个服务器中进行数据库 join 操作会变得低效和难以执行。

JackEggie · 2019-07-30T05:38:26Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


 ![](https://3lr6t13cowm230cj0q42yphj-wpengine.netdna-ssl.com/wp-content/uploads/2019/06/data-sharding-distributed-sql-1.png)

-**Figure 1 : Vertical and Horizontal Data Partitioning (Source: Medium)**
+**图一 ：垂直切分与水平切分**


Suggested change

**图一：垂直切分与水平切分**

**图一：垂直切分与水平切分（来源：Medium）**

JackEggie · 2019-07-30T05:41:08Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-Range-based sharding divides data based on ranges of the data value (aka the keyspace). Shard keys with nearby values are more likely to fall into the same range and onto the same shards. Each shard essentially preserves the same schema from the original database. Sharding becomes as easy as identifying the data’s appropriate range and placing it on the corresponding shard.
+基于范围的分片，参照数据值的范围来分割数据。切分主键值相同的数据更容易落到同一个范围中，因此也更容易落到同一个分片中。每个分片都必须保存于原数据库相同的模式。数据分片将变得十分简单，正如辨别数据正确范围并放到相应的分片中一样容易。


Suggested change

基于范围的分片，参照数据值的范围来分割数据。切分主键值相同的数据更容易落到同一个范围中，因此也更容易落到同一个分片中。每个分片都必须保存于原数据库相同的模式。数据分片将变得十分简单，正如辨别数据正确范围并放到相应的分片中一样容易。

基于范围的分片，参照数据值的范围来分割数据。分片主键值相近的数据更容易落到同一个范围中，因此也更容易落到同一个分片中。每个分片都必须保存与原数据库相同的结构。数据分片将变得十分简单，正如辨别数据正确范围并放到相应的分片中一样容易。

JackEggie · 2019-07-30T05:44:49Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-The ideal solution to uneven shard sizes is to perform automatic shard splitting and merging. If the shard becomes to big or hosts a frequently accessed row, then breaking the shard into multiple shards and then rebalancing them across all the available nodes leads to better performance. Similarly, the opposite process can be undertaken when there are too many small shards.
+解决不均等分片的理想方法是进行归并和自动化。如果分片变得过大或者其中的某一行被频繁的访问，那么最好就将这个大的分片再进行更细的分片，并将这些小的分片重新平均的分配到各个节点中。同样的，当小分片过多的时候，我们可以做相反的事情。


Suggested change

解决不均等分片的理想方法是进行归并和自动化。如果分片变得过大或者其中的某一行被频繁的访问，那么最好就将这个大的分片再进行更细的分片，并将这些小的分片重新平均的分配到各个节点中。同样的，当小分片过多的时候，我们可以做相反的事情。

解决不均等分片的理想方法是进行归并和自动化分片。如果分片变得过大或者其中的某一行被频繁的访问，那么最好就将这个大的分片再进行更细的分片，并将这些小的分片重新平均的分配到各个节点中。同样的，当小分片过多的时候，我们可以做相反的事情。

JackEggie · 2019-07-30T05:47:10Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-YugaByte DB is an auto-sharded, ultra-resilient, high-performance, geo-distributed SQL database built with inspiration from Google Spanner. It currently supports hash-based sharding by default. Range-based sharding is an active work-in-progress project while geo-based sharding is on the roadmap for later this year. Each data shard is called a tablet, and it resides on a corresponding tablet server.
+YugaByte DB 是一个具备自动分片功能和高度弹性的高性能分布式 SQL 数据库，它由 Google Spanner 开发。它目前默认支持基于哈希的分片方式。它是一个活跃更新的项目，而基于地理位置和基于范围的分片功能将在今年年尾加入。在 YugaByte DB 中每一个数据分片被称作子表，它们被分配在相应的子表服务器中。


Suggested change

YugaByte DB 是一个具备自动分片功能和高度弹性的高性能分布式 SQL 数据库，它由 Google Spanner 开发。它目前默认支持基于哈希的分片方式。它是一个活跃更新的项目，而基于地理位置和基于范围的分片功能将在今年年尾加入。在 YugaByte DB 中每一个数据分片被称作子表，它们被分配在相应的子表服务器中。

YugaByte DB 是一个具备自动分片功能和高度弹性的高性能分布式 SQL 数据库，它由 Google Spanner 开发。它目前默认支持基于哈希的分片方式。它是一个活跃更新的项目，而基于地理位置和基于范围的分片功能将在今年年尾加入。在 YugaByte DB 中每一个数据分片被称作子表（tablet），它们被分配在相应的子表服务器中。

JackEggie · 2019-07-30T05:51:22Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-In read/write operations, the primary keys are first converted into internal keys and their corresponding hash values. The operation is served by collecting data from the appropriate tablets. (Figure 3)
+在读写操作中，主键是最先被转化成内键和它们对应的哈希值。这个操作通过收集可用子表中的数据来实现。（图三）


Suggested change

在读写操作中，主键是最先被转化成内键和它们对应的哈希值。这个操作通过收集可用子表中的数据来实现。（图三）

在读写操作中，主键是最先被转化成内键和它们对应的哈希值。这个操作通过收集可用子表中的数据来实现。（图五）

这里貌似原文有错，应该说的是图五吧？

JackEggie · 2019-07-30T05:55:42Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md

-* [Get started](https://docs.yugabyte.com/latest/quick-start/) with YugaByte DB on macOS, Linux, Docker, and Kubernetes.
-* [Contact us](https://www.yugabyte.com/about/contact/) to learn more about licensing, pricing or to schedule a technical overview.
+* [深入比较](https://docs.yugabyte.com/latest/comparisons/) YugaByte DB 和 [CockroachDB](https://www.yugabyte.com/yugabyte-db-vs-cockroachdb/)，Google Cloud Spanner 与 MongoDB 的不同之处。
+* [开始](https://docs.yugabyte.com/latest/quick-start/)使用 YugaByte DB ，在 macOS，Linux，Docker 和 Kubernetes 中使用它.


Suggested change

* [开始](https://docs.yugabyte.com/latest/quick-start/)使用 YugaByte DB ，在 macOS，Linux，Docker 和 Kubernetes 中使用它.

* [初学](https://docs.yugabyte.com/latest/quick-start/) YugaByte DB ，在 macOS，Linux，Docker 和 Kubernetes 中使用它.

JackEggie · 2019-07-30T05:56:44Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md

-* [Contact us](https://www.yugabyte.com/about/contact/) to learn more about licensing, pricing or to schedule a technical overview.
+* [深入比较](https://docs.yugabyte.com/latest/comparisons/) YugaByte DB 和 [CockroachDB](https://www.yugabyte.com/yugabyte-db-vs-cockroachdb/)，Google Cloud Spanner 与 MongoDB 的不同之处。
+* [开始](https://docs.yugabyte.com/latest/quick-start/)使用 YugaByte DB ，在 macOS，Linux，Docker 和 Kubernetes 中使用它.
+* [联系我们](https://www.yugabyte.com/about/contact/) 了解证书及收费问题或预约一个技术面谈。


Suggested change

* [联系我们](https://www.yugabyte.com/about/contact/) 了解证书及收费问题或预约一个技术面谈。

* [联系我们](https://www.yugabyte.com/about/contact/)了解证书及收费问题或预约一个技术面谈。

JaneLdq

校对完毕。翻译质量很高啦，校对意见供参考。
稍稍提一下，译者可以留意一下「的」「地」的使用场景哈～

JaneLdq · 2019-07-30T12:29:28Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-Enterprises of all sizes are embracing rapid modernization of user-facing applications as part of their broader digital transformation strategy. The relational database (RDBMS) infrastructure that such applications rely on suddenly needs to support much larger data sizes and transaction volumes. However, a monolithic RDBMS tends to quickly get overloaded in such scenarios. One of the most common architectures to get more performance and scalability in an RDBMS is to “shard” the data. In this blog, we will learn what sharding is and how it can be used to scale a database. We will also review the pros and cons of common sharding architectures, plus explore how sharding is implemented in distributed SQL-based RDBMS like [YugaByte DB.](https://github.com/YugaByte/yugabyte-db)
+如今，所有规模的企业都在拥抱用户导向应用的高速现代化，以此来作为它们迈向更广阔的数字转型策略的其中一步。因此，这些应用所依赖的 RDBMS（关系型数据库基础设施），如今就需要支持更大的数据量和事务量。然而，在这种场景中，一个单体 RDBMS 通常很快会达到过载状态。数据分片是用于解决这种问题的其中一种最为普遍的架构，它能够使 RDBMS 得到更好的性能和更高的扩展性。在这篇文章中，我们会探讨几种常见分片架构的优劣，还会探索在分布式 SQL 数据库中，例如 [YugaByte DB](https://github.com/YugaByte/yugabyte-db) 是如何实现数据分片的。 


「以此来作为它们迈向更广阔的数字转型策略的其中一步。」=> 「以此作为它们更广阔的数字转型策略中的一部分」
个人认为这里直译比较好，“迈向策略”感觉有点奇怪

JaneLdq · 2019-07-30T12:35:01Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-Enterprises of all sizes are embracing rapid modernization of user-facing applications as part of their broader digital transformation strategy. The relational database (RDBMS) infrastructure that such applications rely on suddenly needs to support much larger data sizes and transaction volumes. However, a monolithic RDBMS tends to quickly get overloaded in such scenarios. One of the most common architectures to get more performance and scalability in an RDBMS is to “shard” the data. In this blog, we will learn what sharding is and how it can be used to scale a database. We will also review the pros and cons of common sharding architectures, plus explore how sharding is implemented in distributed SQL-based RDBMS like [YugaByte DB.](https://github.com/YugaByte/yugabyte-db)
+如今，所有规模的企业都在拥抱用户导向应用的高速现代化，以此来作为它们迈向更广阔的数字转型策略的其中一步。因此，这些应用所依赖的 RDBMS（关系型数据库基础设施），如今就需要支持更大的数据量和事务量。然而，在这种场景中，一个单体 RDBMS 通常很快会达到过载状态。数据分片是用于解决这种问题的其中一种最为普遍的架构，它能够使 RDBMS 得到更好的性能和更高的扩展性。在这篇文章中，我们会探讨几种常见分片架构的优劣，还会探索在分布式 SQL 数据库中，例如 [YugaByte DB](https://github.com/YugaByte/yugabyte-db) 是如何实现数据分片的。 


「我们还会探索在分布式 SQL 数据库中，例如 YugaByte DB 是如何实现数据分片的。」=> 「我们还会探索在如 YugaByte DB 这样的分布式 SQL 数据库中是如何实现数据分片的。」

JaneLdq · 2019-07-30T12:44:12Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-Sharding is the process of breaking up large tables into smaller chunks called **shards** that are spread across multiple servers. A **shard** is essentially a horizontal data partition that contains a subset of the total data set, and hence is responsible for serving a portion of the overall workload. The idea is to distribute data that can’t fit on a single node onto a **cluster** of database nodes. Sharding is also referred to as **horizontal partitioning**. The distinction between horizontal and vertical comes from the traditional tabular view of a database. A database can be split vertically — storing different table columns in a separate database, or horizontally — storing rows of the same table in multiple database nodes.
+分片是一种把大表切分成**数据分片**的过程，分割后的数据块会分布在多个服务器中。**数据分片**必须是水平切分的，各个分片是整个数据集的子集，它们各自负责总体工作量的一部分。这种方法的中心思想，便是将原本难以放在单体中的庞大数据，分散到一个**数据库集群**中。分片也称为**水平切分**，水平切分和垂直切分的区别来自于传统的表式数据库。一个数据库可以被垂直切分（把表中不同的列分散在数据库中），也可以被水平切分（把不同的行分散到多个数据库节点中）。


「单体」=> 「单节点」
个人感觉这个上下文直译为单节点就行

JaneLdq · 2019-07-30T12:56:31Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-On the other hand, horizontally partitioning a table means more compute capacity to serve incoming queries, and therefore you end up with faster query response times and index builds. By continuously balancing the load and data set over additional nodes, sharding also allows easy expansion to accommodate more capacity. Moreover, a network of smaller, cheaper servers may be more cost effective in the long term than maintaining one big server.
+从另一方面来看，对表格进行水平切分意味着拥有更多的计算资源去应对查询请求，你会得到更短的响应时间并能够建立更多的索引。分片通过持续的平衡额外节点之间的数据量和工作量，能在扩张中更有效的利用新资源。不仅如此，维护一组更小更廉价的服务器比维护一个大型的服务器要实惠的多。


「你会得到更短的响应时间并能够建立更多的索引」=> 「你会得到更短的响应时间并能够更快地创建索引」
这里的的faster应该既修饰响应时间也修饰创建索引的时间

JaneLdq · 2019-07-30T13:18:10Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-On the other hand, horizontally partitioning a table means more compute capacity to serve incoming queries, and therefore you end up with faster query response times and index builds. By continuously balancing the load and data set over additional nodes, sharding also allows easy expansion to accommodate more capacity. Moreover, a network of smaller, cheaper servers may be more cost effective in the long term than maintaining one big server.
+从另一方面来看，对表格进行水平切分意味着拥有更多的计算资源去应对查询请求，你会得到更短的响应时间并能够建立更多的索引。分片通过持续的平衡额外节点之间的数据量和工作量，能在扩张中更有效的利用新资源。不仅如此，维护一组更小更廉价的服务器比维护一个大型的服务器要实惠的多。


「分片通过持续的平衡...更有效的利用」=> 「分片通过持续地平衡...更有效地利用」
注意「的」 => 「地」

JaneLdq · 2019-07-30T13:53:46Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-Hash-based sharding takes a shard key’s value and generates a hash value from it. The hash value is then used to determine in which shard the data should reside. With a uniform hashing algorithm such as ketama, the hash function can evenly distribute data across servers, reducing the risk of hotspots. With this approach, data with close shard keys are unlikely to be placed on the same shard. This architecture is thus great for targeted data operations.
+基于哈希的分片使用分片主键来产生一些哈希值，这些哈希值将被用于决定这一条数据存储在哪里。通过使用一个通用的哈希算法 ketama ，哈希函数能够在服务器间平均的分摊数据，以此来减少部分节点的过载。在这种方法里，那些分片主键相近的数据不太可能会被分配在同一个分片中。这个架构因此十分适用于目标明确的数据操作。


「通过使用一个通用的哈希算法 ketama，」=> 「通过使用一个通用的哈希算法，比如 ketama，」

JaneLdq · 2019-07-30T13:59:08Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-Range-based sharding allows for efficient queries that reads target data within a contiguous range or range queries. However, range-based sharding needs the user to apriori choose the shard keys, and poorly chosen shard keys could result in database hotspots.
+基于范围的分片能让依据目标数据范围的查询，或范围查询变得更加高效。然而这种分片方式需要用户事先选择分片主键，如果分片主键选的不好，可能会导致部分节点过载。


「基于范围的分片能让依据目标数据范围的查询，或范围查询变得更加高效。」=> 「基于范围的分片能让读取连续范围内的数据，或范围查询变得更加高效。」

JaneLdq · 2019-07-30T14:01:49Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-A good rule-of-thumb is to pick shard keys that have large cardinality, low recurring frequency, and that do not increase, or decrease, monotonically. Without proper shard key selections, data could be unevenly distributed across shards, and specific data could be queried more compared to the others, creating potential system bottlenecks in the shards that get a heavier workload.
+一个好的原则就是选择那些基数更大重复率更低的键作为分片主键，这些键通常十分稳定，不会增加和减少，是无变化的。如果没有正确的选择分片主键，数据会不均等的分配在分片中，特定的数据会比其他数据的访问频率更高，这让那些工作量较大的分片产生瓶颈。


「正确的选择分片主键」=> 「正确地选择分片主键」

JaneLdq · 2019-07-30T14:04:52Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-The ideal solution to uneven shard sizes is to perform automatic shard splitting and merging. If the shard becomes to big or hosts a frequently accessed row, then breaking the shard into multiple shards and then rebalancing them across all the available nodes leads to better performance. Similarly, the opposite process can be undertaken when there are too many small shards.
+解决不均等分片的理想方法是进行归并和自动化。如果分片变得过大或者其中的某一行被频繁的访问，那么最好就将这个大的分片再进行更细的分片，并将这些小的分片重新平均的分配到各个节点中。同样的，当小分片过多的时候，我们可以做相反的事情。


「将这个大的分片再进行更细的分片」 => 「将这个大的分片再进行更细地分片」
如果分片在这里做动词用，就需要将「的」改成「地」，或者翻译成「将这个大的分片切分成更细的分片」

JaneLdq · 2019-07-30T14:14:14Z

TODO1/how-data-sharding-works-in-a-distributed-sql-database.md


-Data sharding is a solution for business applications with large data sets and scale needs. There are a variety of sharding architectures to choose from, each of which provides different capabilities. Before settling on a sharding architecture, the needs and workload requirements of your app must be mapped out. Manual sharding should be avoided in most circumstances given significant increase in application logic complexity. [YugaByte DB](https://github.com/YugaByte/yugabyte-db) is an auto-sharded distributed SQL database with support for hash-based sharding today and support for range-based/geo-based sharding coming soon. You can see YugaByte DB’s automatic sharding in action in this [tutorial.](https://docs.yugabyte.com/latest/explore/auto-sharding/)
+数据分片是一种在商业应用中用于建设大型数据集和满足扩展性需求的解决方案。目前有许多数据分片架构供我们选择，每一种都提供了不同的功能。在决定用哪一种架构之前，我们需要清晰的列出你的项目需求和预期负载量。由于会显著的增加应用逻辑的复杂度，我们应该在绝大部分情况下尽量避免手动分片。[YugaByte DB](https://github.com/YugaByte/yugabyte-db) 是一种具备自动分片功能的分布式 SQL 数据库，它目前支持基于哈希的分片，而基于范围和基于地理位置的分片功能将很快能够用到。你可以查看这个[教程](https://docs.yugabyte.com/latest/explore/auto-sharding/)来学习 YugaByte DB 的自动分片功能。


「清晰的列出...会显著的增加」 => 「清晰地列出...会显著地增加」

Ultrasteve · 2019-07-31T12:55:00Z

@JackEggie @JaneLdq @leviding
校对完毕，没问题我就交上去了

leviding

@Ultrasteve 还有很多校对意见没有修改，是看漏了还是怎样？

Ultrasteve · 2019-08-01T00:52:32Z

@Ultrasteve 还有很多校对意见没有修改，是看漏了还是怎样？
没有很多8，没改那些都是我觉得不用改的地方

leviding · 2019-08-01T04:12:17Z

@Ultrasteve 已经 merge 啦~ 快快麻溜发布到掘金，然后在本 PR 下回复文章链接，方便及时添加积分哟。

掘金翻译计划有自己的知乎专栏，你也可以投稿哈，推荐使用一个好用的插件。
专栏地址：https://zhuanlan.zhihu.com/juejinfanyi

Ultrasteve · 2019-08-01T06:29:44Z

@leviding @JackEggie @JaneLdq
已发布 https://juejin.im/post/5d42867a6fb9a06ac76d915d

Ultrasteve added 2 commits July 26, 2019 22:15

数据分片是如何在分布式 SQL 数据库中起作用的

1be784c

翻译完成，幸苦校对的同学了

数据分片是如何在分布式 SQL 数据库中起作用的

64e8164

翻译完成，幸苦校对同学了

fanyijihua added the 校对认领 label Jul 26, 2019

fanyijihua mentioned this pull request Jul 26, 2019

数据分片是如何在分布式 SQL 数据库中起作用的 #6204

Closed

Ultrasteve changed the title ~~Translation/how data sharding works in a distributed sql database.md~~ 数据分片是如何在分布式 SQL 数据库中起作用的 Jul 26, 2019

leviding added the 后端 label Jul 27, 2019

fanyijihua added the 正在校对 label Jul 29, 2019

fanyijihua removed the 校对认领 label Jul 30, 2019

JackEggie reviewed Jul 30, 2019

View reviewed changes

JaneLdq reviewed Jul 30, 2019

View reviewed changes

JackEggie added enhancement 等待译者修改 and removed 正在校对 labels Jul 31, 2019

数据分片是如何在分布式 SQL 数据库中起作用的（校对完成）

262afb2

leviding added 标注待管理员 Review and removed enhancement 等待译者修改 labels Jul 31, 2019

leviding added 2 commits July 31, 2019 22:36

Update how-data-sharding-works-in-a-distributed-sql-database.md

8294c9b

Update how-data-sharding-works-in-a-distributed-sql-database.md

69d5905

leviding reviewed Jul 31, 2019

View reviewed changes

leviding added 审校 enhancement 等待译者修改 and removed 标注待管理员 Review 审校 labels Jul 31, 2019

leviding approved these changes Aug 1, 2019

View reviewed changes

leviding merged commit a5b2941 into xitu:master Aug 1, 2019

leviding added 翻译完成 and removed enhancement 等待译者修改 labels Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

数据分片是如何在分布式 SQL 数据库中起作用的 #6227

数据分片是如何在分布式 SQL 数据库中起作用的 #6227

Ultrasteve commented Jul 26, 2019

JackEggie commented Jul 29, 2019

fanyijihua commented Jul 29, 2019

JaneLdq commented Jul 30, 2019

fanyijihua commented Jul 30, 2019

JackEggie left a comment

JackEggie Jul 30, 2019

JaneLdq Jul 30, 2019

JaneLdq Jul 30, 2019

JackEggie Jul 30, 2019

JackEggie Jul 30, 2019

JackEggie Jul 30, 2019

JackEggie Jul 30, 2019

JackEggie Jul 30, 2019

JackEggie Jul 30, 2019

JackEggie Jul 30, 2019

JackEggie Jul 30, 2019

JackEggie Jul 30, 2019

JaneLdq left a comment

JaneLdq Jul 30, 2019

JaneLdq Jul 30, 2019

JaneLdq Jul 30, 2019

JaneLdq Jul 30, 2019

JaneLdq Jul 30, 2019

JaneLdq Jul 30, 2019

JaneLdq Jul 30, 2019

JaneLdq Jul 30, 2019

JaneLdq Jul 30, 2019

JaneLdq Jul 30, 2019

Ultrasteve commented Jul 31, 2019

leviding left a comment

Ultrasteve commented Aug 1, 2019

leviding commented Aug 1, 2019

Ultrasteve commented Aug 1, 2019


		Enterprises of all sizes are embracing rapid modernization of user-facing applications as part of their broader digital transformation strategy. The relational database (RDBMS) infrastructure that such applications rely on suddenly needs to support much larger data sizes and transaction volumes. However, a monolithic RDBMS tends to quickly get overloaded in such scenarios. One of the most common architectures to get more performance and scalability in an RDBMS is to “shard” the data. In this blog, we will learn what sharding is and how it can be used to scale a database. We will also review the pros and cons of common sharding architectures, plus explore how sharding is implemented in distributed SQL-based RDBMS like [YugaByte DB.](https://github.com/YugaByte/yugabyte-db)
		如今，所有规模的企业都在拥抱用户导向应用的高速现代化，以此来作为它们迈向更广阔的数字转型策略的其中一步。因此，这些应用所依赖的 RDBMS（关系型数据库基础设施），如今就需要支持更大的数据量和事务量。然而，在这种场景中，一个单体 RDBMS 通常很快会达到过载状态。数据分片是用于解决这种问题的其中一种最为普遍的架构，它能够使 RDBMS 得到更好的性能和更高的扩展性。在这篇文章中，我们会探讨几种常见分片架构的优劣，还会探索在分布式 SQL 数据库中，例如 [YugaByte DB](https://github.com/YugaByte/yugabyte-db) 是如何实现数据分片的。


		One of the most significant challenges with manual sharding is uneven shard allocation. Disproportionate distribution of data could cause shards to become unbalanced, with some overloaded while others remain relatively empty. It’s best to avoid accruing too much data on a shard, because a hotspot can lead to slowdowns and server crashes. This problem could also arise from a small shard set, which forces data to be spread across too few shards. This is acceptable in development and testing environments, but not in production. Uneven data distribution, hotspots, and storing data on too few shards can all cause shard and server resource exhaustion.
		手动分片的其中一个重大挑战便是不平均的分片。不成比例的分配数据将导致分片变得不平衡，这意味着当一些节点过载时其他节点可能是空闲的。因为部分节点的过载可能会拖累整体的响应速度并导致服务崩溃，我们最好在分片时尽可能少的增加数据。这个问题也有可能在一个小的分片集中发生，因为小的分片集意味着将数据分散到极少数量的分片中。这虽然在开发环境和测试环境中是可以接受的，但生产环境中是不允许的。不平均的数据分配，部分节点过载和过少的数据分配都会导致分片和服务资源的枯竭。


		Finally, manual sharding can complicate operational processes. Backups will now have to be performed for multiple servers. Data migration and schema changes must be carefully coordinated to ensure all shards have the same schema copy. Without sufficient optimization, database joins across multiple servers could highly inefficient and difficult to perform.
		最后，手动分片会使操作过程复杂化。现在需要在多个服务器中进行备份了。为了保证所有分片都有相同的模式，数据迁移和模式的变化现在需要更小心的进行协调。在缺乏足够优化的情况下，在多个服务器中进行数据库 join 操作会变得不高效和难以执行。

	图一：垂直切分与水平切分
	图一：垂直切分与水平切分（来源：Medium）


		Range-based sharding divides data based on ranges of the data value (aka the keyspace). Shard keys with nearby values are more likely to fall into the same range and onto the same shards. Each shard essentially preserves the same schema from the original database. Sharding becomes as easy as identifying the data’s appropriate range and placing it on the corresponding shard.
		基于范围的分片，参照数据值的范围来分割数据。切分主键值相同的数据更容易落到同一个范围中，因此也更容易落到同一个分片中。每个分片都必须保存于原数据库相同的模式。数据分片将变得十分简单，正如辨别数据正确范围并放到相应的分片中一样容易。

	基于范围的分片，参照数据值的范围来分割数据。切分主键值相同的数据更容易落到同一个范围中，因此也更容易落到同一个分片中。每个分片都必须保存于原数据库相同的模式。数据分片将变得十分简单，正如辨别数据正确范围并放到相应的分片中一样容易。
	基于范围的分片，参照数据值的范围来分割数据。分片主键值相近的数据更容易落到同一个范围中，因此也更容易落到同一个分片中。每个分片都必须保存与原数据库相同的结构。数据分片将变得十分简单，正如辨别数据正确范围并放到相应的分片中一样容易。


		The ideal solution to uneven shard sizes is to perform automatic shard splitting and merging. If the shard becomes to big or hosts a frequently accessed row, then breaking the shard into multiple shards and then rebalancing them across all the available nodes leads to better performance. Similarly, the opposite process can be undertaken when there are too many small shards.
		解决不均等分片的理想方法是进行归并和自动化。如果分片变得过大或者其中的某一行被频繁的访问，那么最好就将这个大的分片再进行更细的分片，并将这些小的分片重新平均的分配到各个节点中。同样的，当小分片过多的时候，我们可以做相反的事情。

	解决不均等分片的理想方法是进行归并和自动化。如果分片变得过大或者其中的某一行被频繁的访问，那么最好就将这个大的分片再进行更细的分片，并将这些小的分片重新平均的分配到各个节点中。同样的，当小分片过多的时候，我们可以做相反的事情。
	解决不均等分片的理想方法是进行归并和自动化分片。如果分片变得过大或者其中的某一行被频繁的访问，那么最好就将这个大的分片再进行更细的分片，并将这些小的分片重新平均的分配到各个节点中。同样的，当小分片过多的时候，我们可以做相反的事情。


		YugaByte DB is an auto-sharded, ultra-resilient, high-performance, geo-distributed SQL database built with inspiration from Google Spanner. It currently supports hash-based sharding by default. Range-based sharding is an active work-in-progress project while geo-based sharding is on the roadmap for later this year. Each data shard is called a tablet, and it resides on a corresponding tablet server.
		YugaByte DB 是一个具备自动分片功能和高度弹性的高性能分布式 SQL 数据库，它由 Google Spanner 开发。它目前默认支持基于哈希的分片方式。它是一个活跃更新的项目，而基于地理位置和基于范围的分片功能将在今年年尾加入。在 YugaByte DB 中每一个数据分片被称作子表，它们被分配在相应的子表服务器中。


		In read/write operations, the primary keys are first converted into internal keys and their corresponding hash values. The operation is served by collecting data from the appropriate tablets. (Figure 3)
		在读写操作中，主键是最先被转化成内键和它们对应的哈希值。这个操作通过收集可用子表中的数据来实现。（图三）

	* [开始](https://docs.yugabyte.com/latest/quick-start/)使用 YugaByte DB ，在 macOS，Linux，Docker 和 Kubernetes 中使用它.
	* [初学](https://docs.yugabyte.com/latest/quick-start/) YugaByte DB ，在 macOS，Linux，Docker 和 Kubernetes 中使用它.

	* [联系我们](https://www.yugabyte.com/about/contact/) 了解证书及收费问题或预约一个技术面谈。
	* [联系我们](https://www.yugabyte.com/about/contact/)了解证书及收费问题或预约一个技术面谈。


		Sharding is the process of breaking up large tables into smaller chunks called shards that are spread across multiple servers. A shard is essentially a horizontal data partition that contains a subset of the total data set, and hence is responsible for serving a portion of the overall workload. The idea is to distribute data that can’t fit on a single node onto a cluster of database nodes. Sharding is also referred to as horizontal partitioning. The distinction between horizontal and vertical comes from the traditional tabular view of a database. A database can be split vertically — storing different table columns in a separate database, or horizontally — storing rows of the same table in multiple database nodes.
		分片是一种把大表切分成数据分片的过程，分割后的数据块会分布在多个服务器中。数据分片必须是水平切分的，各个分片是整个数据集的子集，它们各自负责总体工作量的一部分。这种方法的中心思想，便是将原本难以放在单体中的庞大数据，分散到一个数据库集群中。分片也称为水平切分，水平切分和垂直切分的区别来自于传统的表式数据库。一个数据库可以被垂直切分（把表中不同的列分散在数据库中），也可以被水平切分（把不同的行分散到多个数据库节点中）。


		On the other hand, horizontally partitioning a table means more compute capacity to serve incoming queries, and therefore you end up with faster query response times and index builds. By continuously balancing the load and data set over additional nodes, sharding also allows easy expansion to accommodate more capacity. Moreover, a network of smaller, cheaper servers may be more cost effective in the long term than maintaining one big server.
		从另一方面来看，对表格进行水平切分意味着拥有更多的计算资源去应对查询请求，你会得到更短的响应时间并能够建立更多的索引。分片通过持续的平衡额外节点之间的数据量和工作量，能在扩张中更有效的利用新资源。不仅如此，维护一组更小更廉价的服务器比维护一个大型的服务器要实惠的多。


		Hash-based sharding takes a shard key’s value and generates a hash value from it. The hash value is then used to determine in which shard the data should reside. With a uniform hashing algorithm such as ketama, the hash function can evenly distribute data across servers, reducing the risk of hotspots. With this approach, data with close shard keys are unlikely to be placed on the same shard. This architecture is thus great for targeted data operations.
		基于哈希的分片使用分片主键来产生一些哈希值，这些哈希值将被用于决定这一条数据存储在哪里。通过使用一个通用的哈希算法 ketama ，哈希函数能够在服务器间平均的分摊数据，以此来减少部分节点的过载。在这种方法里，那些分片主键相近的数据不太可能会被分配在同一个分片中。这个架构因此十分适用于目标明确的数据操作。


		Range-based sharding allows for efficient queries that reads target data within a contiguous range or range queries. However, range-based sharding needs the user to apriori choose the shard keys, and poorly chosen shard keys could result in database hotspots.
		基于范围的分片能让依据目标数据范围的查询，或范围查询变得更加高效。然而这种分片方式需要用户事先选择分片主键，如果分片主键选的不好，可能会导致部分节点过载。


		A good rule-of-thumb is to pick shard keys that have large cardinality, low recurring frequency, and that do not increase, or decrease, monotonically. Without proper shard key selections, data could be unevenly distributed across shards, and specific data could be queried more compared to the others, creating potential system bottlenecks in the shards that get a heavier workload.
		一个好的原则就是选择那些基数更大重复率更低的键作为分片主键，这些键通常十分稳定，不会增加和减少，是无变化的。如果没有正确的选择分片主键，数据会不均等的分配在分片中，特定的数据会比其他数据的访问频率更高，这让那些工作量较大的分片产生瓶颈。


		Data sharding is a solution for business applications with large data sets and scale needs. There are a variety of sharding architectures to choose from, each of which provides different capabilities. Before settling on a sharding architecture, the needs and workload requirements of your app must be mapped out. Manual sharding should be avoided in most circumstances given significant increase in application logic complexity. [YugaByte DB](https://github.com/YugaByte/yugabyte-db) is an auto-sharded distributed SQL database with support for hash-based sharding today and support for range-based/geo-based sharding coming soon. You can see YugaByte DB’s automatic sharding in action in this [tutorial.](https://docs.yugabyte.com/latest/explore/auto-sharding/)
		数据分片是一种在商业应用中用于建设大型数据集和满足扩展性需求的解决方案。目前有许多数据分片架构供我们选择，每一种都提供了不同的功能。在决定用哪一种架构之前，我们需要清晰的列出你的项目需求和预期负载量。由于会显著的增加应用逻辑的复杂度，我们应该在绝大部分情况下尽量避免手动分片。[YugaByte DB](https://github.com/YugaByte/yugabyte-db) 是一种具备自动分片功能的分布式 SQL 数据库，它目前支持基于哈希的分片，而基于范围和基于地理位置的分片功能将很快能够用到。你可以查看这个[教程](https://docs.yugabyte.com/latest/explore/auto-sharding/)来学习 YugaByte DB 的自动分片功能。

数据分片是如何在分布式 SQL 数据库中起作用的 #6227

数据分片是如何在分布式 SQL 数据库中起作用的 #6227

Conversation

Ultrasteve commented Jul 26, 2019

JackEggie commented Jul 29, 2019

fanyijihua commented Jul 29, 2019

JaneLdq commented Jul 30, 2019

fanyijihua commented Jul 30, 2019

JackEggie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaneLdq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ultrasteve commented Jul 31, 2019

leviding left a comment

Choose a reason for hiding this comment

Ultrasteve commented Aug 1, 2019

leviding commented Aug 1, 2019

Ultrasteve commented Aug 1, 2019