-
Notifications
You must be signed in to change notification settings - Fork 65
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to taking into account how to implement and have some detail questions :)
|
||
### Write path | ||
|
||
The write an object, a client contacts the master to get a list of locations to store the object. The client must ensure that the object has been persisted in the base storage before claiming a successful write. The client can further ensure that the object has been cached to avoid reading from the base storage later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we have 100 cache nodes, it should useful only write 2 or 3 or ... of 100, but do we need that?
but 2 or 3 or ... is suitable be configured in object level or bucket level or cluster level? if so we may need "some place" to store it? and it seems little complex to keep N/M along with node's down and up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have some global options like min_cache_replicas
and max_cache_replicas
, and allow the storage to decide the appropriate number of replicas for individual objects for load balance.
|
||
### Write path | ||
|
||
The write an object, a client contacts the master to get a list of locations to store the object. The client must ensure that the object has been persisted in the base storage before claiming a successful write. The client can further ensure that the object has been cached to avoid reading from the base storage later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The write an object, a client contacts the master to get a list of locations to store the object.
How about reusing the existing mechanism in reading path , instead of a writer contacts to master to get locations to store the object?
notifies the master to cache the object for future reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean "existing mechanism in reading path"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the object is not cached, the client reads from the base storage and then notifies the master to cache the object for future reads.
I will start prototyping to get more information about the design to further improve the document. |
- Status: draft | ||
- Pull Request: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Status: draft | |
- Pull Request: | |
- Status: accepted | |
- Pull Request: https://github.com/engula/engula/pull/247 | |
- Tracking Issue: https://github.com/engula/engula/issues/263 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RFC is ready. Although some implementation details are missing, it should be good enough in this early stage to get the work in #263 started. |
|
||
### Implementation | ||
|
||
A base storage can be built on a cheap and highly reliable cloud object storage(e.g., AWS S3). A cache storage can be a custom-built storage service that stores data on local SSD or cloud block storage (e.g., AWS EBS). An orchestrator can be built on Kubernetes, which acts as an operator. Kubernetes provides most of the features we need from the orchestrator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the base storage is S3, at what level do you want to cache? (e.g., SST level, multi-part upload level, block level)
From public benchmarks like https://github.com/dvassallo/s3-benchmark, S3 services' latency is relatively high, and will be affected by multiple factors, like how the file is uploaded to s3, the range of get, whether EC2 accesses S3 using VPC, and the configuration of EC2 itself, etc. From my perspective, apart from cache service design, the cache algorithm and cache content is also an interesting part to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your information. I'm only considering TP scenarios for now. In this case, I think the basic strategy is to cache all data since we don't want to see even a single read from S3. In the future, we may leave some cool SST in S3 according to the access statistics of individual SST to further save costs. But in the current stage, a simple full cache is good enough for the luna engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments inline for understanding.
|
||
![Architecture](images/shared-storage-architecture.drawio.svg) | ||
|
||
`SharedStorage` consists of a master, a base storage, a set of cache storages, and a storage orchestrator. The base storage is the single point of truth and should offer reliable object storage. The cache storages cache objects from the base storage to improve read performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please define "master", "base storage", "orchestrator" before you talk about them. They're all new concepts without definition.
For example, a master is a coordinator of all cache storage
instances that lives along with the Kernel
. A base storage is...what? A storage server or S3 cluster?
You may try to connect this architecture with the overall design so that we know what part it is of Engula, instead of an isolated design.
|
||
### Read path | ||
|
||
To read an object, a client contacts the master to get a list of locations that serve the object. If the object is not cached, the client reads from the base storage and then notifies the master to cache the object for future reads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pseudo code or ordered list could be better.
|
||
### Write path | ||
|
||
To write an object, a client contacts the master to get a list of locations to store the object. The client must ensure that the object persisted in the base storage before claiming a successful write. The client can further ensure that the object has been cached to avoid reading from the base storage later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pseudo code or ordered list could be better.
@huachaohuang according to current state, I tend to regard this RFC and #280 as an evolving design & implementation documents instead of a proposal required to be accepted before we start implementing. In this way, we can continuously prototyping while keep polishing such document to stabilize our ideas, instead of forcing to merge something indeterminate or unclear. |
Sounds good to me. My original intention about this document is just to align the design and guide the development, instead of figuring out all the details first. I think we can leave these PRs as drafts and evolve them with the implementation until they are stable enough. |
Closed in favor of #361. |
Rendered version: https://github.com/huachaohuang/engula/blob/cached-storage-rfc/docs/rfcs/2022-01-04-cached-storage.md
Ref: #245 #246