[RFC] Memory cache for preheat tasks #3742

SouthWest7 · 2025-01-02T08:34:25Z

Feature request:

Currently, Dragonfly downloads data directly to disk when processing preheat tasks. To enhance performance and reduce latency, I propose introducing a caching mechanism. This will optimize both download and upload efficiency by writing data to both memory and disk, allowing faster access from memory while ensuring data persistence on disk. Specifically, the caching mechanism will work as follows:

During preheat tasks: Instead of downloading the data to disk, the content will be written to the cache, enabling faster access in future operations.
During regular uploads: The system will first check the cache for the required content. If a cache miss occurs, it will then fall back to disk storage to retrieve the data.

This approach aims to reduce disk IO, improve overall system efficiency, and significantly lower the time spent retrieving data from remote peers during preheat tasks.

Use case:

UI Example:

Scope:

The caching mechanism will only affect whether piece content is read/written from the cache or disk during downloads and uploads. Other functional modules are not impacted.

Design

Write to Cache

Goal: Store downloaded piece content into the local cache after retrieving it from a remote peer.
Design Details:
- Extend the existing method to include cache-writing logic after processing the downloaded content.
- Design a caching mechanism that ensures the downloaded pieces can be retrieved on demand.

Read from Cache

Goal: Retrieve piece content from the cache. If the cache does not contain the data, fall back to reading from local storage.
Design Details:
- Add cache-reading logic to the download_piece method.
- If the cache contains the corresponding piece, return it as the content for DownloadPieceResponse. If not, proceed with the current flow to read from local storage.

Configuration

storage:
  # cache defines configuration settings for the cache, used to store piece content for preheat tasks.
  cache:
    # enable determines whether the cache is enabled. Set to true to enable caching of piece content, false to disable it.
    enable: true
    
    # capacity: Specifies the maximum number of entries the cache can hold. The default value is 100 entries.
    # Adjust this value based on the expected number of piece content entries for preheat tasks that need to be cached.
    capacity: 100

API Definition

message Download{
  // load_to_cache indicates whether the content downloaded will be stored in the storage cache.
  // Storage cache is designed to store downloaded piece content from preheat tasks, 
  // allowing other peers to access the content from memory instead of disk.
  bool load_to_cache = 21;
}

Actions:

protocol definition & configuration: w1
implementation: w1
full process : w2
unit tests, E2E, rerformance testing: w3

SouthWest7 added the enhancement New feature or request label Jan 2, 2025

This was referenced Jan 3, 2025

feat: add 'load_to_cache' field to Download message dragonflyoss/api#437

Open

feat: add cache configuration support for preheat tasks dragonflyoss/client#930

Open

gaius-qi assigned SouthWest7 Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Memory cache for preheat tasks #3742

[RFC] Memory cache for preheat tasks #3742

SouthWest7 commented Jan 2, 2025 •

edited

Loading

[RFC] Memory cache for preheat tasks #3742

[RFC] Memory cache for preheat tasks #3742

Comments

SouthWest7 commented Jan 2, 2025 • edited Loading

Feature request:

Use case:

UI Example:

Scope:

Design

Write to Cache

Read from Cache

Configuration

API Definition

Actions:

SouthWest7 commented Jan 2, 2025 •

edited

Loading