rfc: add a shared storage #247

huachaohuang · 2022-01-04T07:06:59Z

Rendered version: https://github.com/huachaohuang/engula/blob/cached-storage-rfc/docs/rfcs/2022-01-04-cached-storage.md

Ref: #245 #246

tisonkun · 2022-01-04T07:29:19Z

Shall this PR supersedes #246 and thus we can close #246 preferring this one?

huachaohuang · 2022-01-04T08:11:53Z

Shall this PR supersedes #246 and thus we can close #246 preferring this one?

#246 describes an abstraction and this PR describes a more concrete design. But I have closed #246 anyway.

zojw

I have tried to taking into account how to implement and have some detail questions :)

zojw · 2022-01-04T12:43:36Z

docs/rfcs/2022-01-04-cached-storage.md

+
+### Write path
+
+The write an object, a client contacts the master to get a list of locations to store the object. The client must ensure that the object has been persisted in the base storage before claiming a successful write. The client can further ensure that the object has been cached to avoid reading from the base storage later.


when we have 100 cache nodes, it should useful only write 2 or 3 or ... of 100, but do we need that?

but 2 or 3 or ... is suitable be configured in object level or bucket level or cluster level? if so we may need "some place" to store it? and it seems little complex to keep N/M along with node's down and up...

We can have some global options like min_cache_replicas and max_cache_replicas, and allow the storage to decide the appropriate number of replicas for individual objects for load balance.

docs/rfcs/2022-01-04-cached-storage.md

w41ter · 2022-01-04T13:58:38Z

docs/rfcs/2022-01-04-cached-storage.md

+
+### Write path
+
+The write an object, a client contacts the master to get a list of locations to store the object. The client must ensure that the object has been persisted in the base storage before claiming a successful write. The client can further ensure that the object has been cached to avoid reading from the base storage later.


The write an object, a client contacts the master to get a list of locations to store the object.

How about reusing the existing mechanism in reading path , instead of a writer contacts to master to get locations to store the object?

notifies the master to cache the object for future reads.

What do you mean "existing mechanism in reading path"?

@huachaohuang

If the object is not cached, the client reads from the base storage and then notifies the master to cache the object for future reads.

huachaohuang · 2022-01-05T04:41:06Z

I will start prototyping to get more information about the design to further improve the document.

docs/rfcs/2022-01-04-cached-storage.md

tisonkun · 2022-01-13T06:07:40Z

docs/rfcs/2022-01-04-cached-storage.md

+- Status: draft
+- Pull Request:


Suggested change

- Status: draft

- Pull Request:

- Status: accepted

- Pull Request: https://github.com/engula/engula/pull/247

- Tracking Issue: https://github.com/engula/engula/issues/263

co https://engula.zulipchat.com/#narrow/stream/306467-general/topic/random/near/267827815

cc @huachaohuang @zojw

huachaohuang · 2022-01-15T03:38:21Z

This RFC is ready. Although some implementation details are missing, it should be good enough in this early stage to get the work in #263 started.

skyzh · 2022-01-15T08:43:16Z

docs/rfcs/2022-01-04-shared-storage.md

+
+### Implementation
+
+A base storage can be built on a cheap and highly reliable cloud object storage(e.g., AWS S3). A cache storage can be a custom-built storage service that stores data on local SSD or cloud block storage (e.g., AWS EBS). An orchestrator can be built on Kubernetes, which acts as an operator. Kubernetes provides most of the features we need from the orchestrator.


If the base storage is S3, at what level do you want to cache? (e.g., SST level, multi-part upload level, block level)

From public benchmarks like https://github.com/dvassallo/s3-benchmark, S3 services' latency is relatively high, and will be affected by multiple factors, like how the file is uploaded to s3, the range of get, whether EC2 accesses S3 using VPC, and the configuration of EC2 itself, etc. From my perspective, apart from cache service design, the cache algorithm and cache content is also an interesting part to investigate.

Thanks for your information. I'm only considering TP scenarios for now. In this case, I think the basic strategy is to cache all data since we don't want to see even a single read from S3. In the future, we may leave some cool SST in S3 according to the access statistics of individual SST to further save costs. But in the current stage, a simple full cache is good enough for the luna engine.

tisonkun

Comments inline for understanding.

tisonkun · 2022-01-16T07:11:09Z

docs/rfcs/2022-01-04-shared-storage.md

+
+![Architecture](images/shared-storage-architecture.drawio.svg)
+
+`SharedStorage` consists of a master, a base storage, a set of cache storages, and a storage orchestrator. The base storage is the single point of truth and should offer reliable object storage. The cache storages cache objects from the base storage to improve read performance.


Please define "master", "base storage", "orchestrator" before you talk about them. They're all new concepts without definition.

For example, a master is a coordinator of all cache storage instances that lives along with the Kernel. A base storage is...what? A storage server or S3 cluster?

You may try to connect this architecture with the overall design so that we know what part it is of Engula, instead of an isolated design.

tisonkun · 2022-01-16T07:11:46Z

docs/rfcs/2022-01-04-shared-storage.md

+
+### Read path
+
+To read an object, a client contacts the master to get a list of locations that serve the object. If the object is not cached, the client reads from the base storage and then notifies the master to cache the object for future reads.


Pseudo code or ordered list could be better.

tisonkun · 2022-01-16T07:11:51Z

docs/rfcs/2022-01-04-shared-storage.md

+
+### Write path
+
+To write an object, a client contacts the master to get a list of locations to store the object. The client must ensure that the object persisted in the base storage before claiming a successful write. The client can further ensure that the object has been cached to avoid reading from the base storage later.


Pseudo code or ordered list could be better.

tisonkun · 2022-01-16T08:00:33Z

@huachaohuang according to current state, I tend to regard this RFC and #280 as an evolving design & implementation documents instead of a proposal required to be accepted before we start implementing.

In this way, we can continuously prototyping while keep polishing such document to stabilize our ideas, instead of forcing to merge something indeterminate or unclear.

huachaohuang · 2022-01-18T03:20:50Z

@huachaohuang according to current state, I tend to regard this RFC and #280 as an evolving design & implementation documents instead of a proposal required to be accepted before we start implementing.

In this way, we can continuously prototyping while keep polishing such document to stabilize our ideas, instead of forcing to merge something indeterminate or unclear.

Sounds good to me. My original intention about this document is just to align the design and guide the development, instead of figuring out all the details first. I think we can leave these PRs as drafts and evolve them with the implementation until they are stable enough.

huachaohuang · 2022-02-08T09:34:59Z

Closed in favor of #361.

rfc: add a cached storage

107c435

Fix the graph

ef0cc35

zojw reviewed Jan 4, 2022

View reviewed changes

w41ter reviewed Jan 4, 2022

View reviewed changes

huachaohuang mentioned this pull request Jan 5, 2022

storage: add a cached storage #252

Merged

huachaohuang marked this pull request as draft January 5, 2022 11:07

ZhongliGao reviewed Jan 6, 2022

View reviewed changes

docs/rfcs/2022-01-04-cached-storage.md Outdated Show resolved Hide resolved

zojw mentioned this pull request Jan 7, 2022

Support Shared Storage #263

Closed

8 tasks

huachaohuang mentioned this pull request Jan 8, 2022

Cloud Engine on AWS #270

Closed

tisonkun reviewed Jan 13, 2022

View reviewed changes

huachaohuang added 3 commits January 15, 2022 11:10

Merge remote-tracking branch 'upstream/main' into cached-storage-rfc

4402322

Adapt the RFC template

3dcfa22

Fix wording

d29a442

huachaohuang marked this pull request as ready for review January 15, 2022 03:33

huachaohuang changed the title ~~rfc: add a cached storage~~ rfc: add a shared storage Jan 15, 2022

skyzh reviewed Jan 15, 2022

View reviewed changes

tisonkun reviewed Jan 16, 2022

View reviewed changes

huachaohuang marked this pull request as draft January 18, 2022 03:21

zojw mentioned this pull request Jan 27, 2022

Support Shared Storage zojw/shared-storage#4

Open

4 tasks

huachaohuang closed this Feb 8, 2022

huachaohuang deleted the cached-storage-rfc branch February 8, 2022 09:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: add a shared storage #247

rfc: add a shared storage #247

huachaohuang commented Jan 4, 2022 •

edited

Loading

tisonkun commented Jan 4, 2022

huachaohuang commented Jan 4, 2022

zojw left a comment •

edited

Loading

zojw Jan 4, 2022

huachaohuang Jan 4, 2022

w41ter Jan 4, 2022

huachaohuang Jan 4, 2022

w41ter Jan 5, 2022

huachaohuang commented Jan 5, 2022

tisonkun Jan 13, 2022

tisonkun Jan 13, 2022

huachaohuang commented Jan 15, 2022

skyzh Jan 15, 2022

huachaohuang Jan 15, 2022

tisonkun left a comment

tisonkun Jan 16, 2022

tisonkun Jan 16, 2022

tisonkun Jan 16, 2022

tisonkun commented Jan 16, 2022

huachaohuang commented Jan 18, 2022

huachaohuang commented Feb 8, 2022


		### Write path

		The write an object, a client contacts the master to get a list of locations to store the object. The client must ensure that the object has been persisted in the base storage before claiming a successful write. The client can further ensure that the object has been cached to avoid reading from the base storage later.

-- Status: draft
-- Pull Request:
+- Status: accepted
+- Pull Request: https://github.com/engula/engula/pull/247
+- Tracking Issue: https://github.com/engula/engula/issues/263


		### Implementation

		A base storage can be built on a cheap and highly reliable cloud object storage(e.g., AWS S3). A cache storage can be a custom-built storage service that stores data on local SSD or cloud block storage (e.g., AWS EBS). An orchestrator can be built on Kubernetes, which acts as an operator. Kubernetes provides most of the features we need from the orchestrator.


		![Architecture](images/shared-storage-architecture.drawio.svg)

		`SharedStorage` consists of a master, a base storage, a set of cache storages, and a storage orchestrator. The base storage is the single point of truth and should offer reliable object storage. The cache storages cache objects from the base storage to improve read performance.


		### Read path

		To read an object, a client contacts the master to get a list of locations that serve the object. If the object is not cached, the client reads from the base storage and then notifies the master to cache the object for future reads.


		### Write path

		To write an object, a client contacts the master to get a list of locations to store the object. The client must ensure that the object persisted in the base storage before claiming a successful write. The client can further ensure that the object has been cached to avoid reading from the base storage later.

rfc: add a shared storage #247

rfc: add a shared storage #247

Conversation

huachaohuang commented Jan 4, 2022 • edited Loading

tisonkun commented Jan 4, 2022

huachaohuang commented Jan 4, 2022

zojw left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huachaohuang commented Jan 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huachaohuang commented Jan 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tisonkun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tisonkun commented Jan 16, 2022

huachaohuang commented Jan 18, 2022

huachaohuang commented Feb 8, 2022

huachaohuang commented Jan 4, 2022 •

edited

Loading

zojw left a comment •

edited

Loading