Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Library’s In-Memory Dictionaries to Redis with Minimal Code Changes #2198

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

akiva10b
Copy link
Contributor

@akiva10b akiva10b commented Dec 22, 2024

Summary
This pull request replaces the large in-memory dictionaries in Library with Redis-backed dictionary wrappers, drastically reducing our Python RAM footprint while preserving the original dictionary syntax in the rest of the codebase.

Background
The Library class previously stored a large amount of data (indexes, titles, terms, etc.) as standard Python dictionaries. Over time, this led to high memory usage and increased startup times. We want to offload these dictionaries to Redis, but minimize disruptive code changes.

Implementation

  1. Redis Wrappers
    • Added two new utility classes:
      • RedisNestedHash: Behaves like a “dictionary of dictionaries,” allowing usage like library._index_title_maps["en"]["Genesis"] = {...}.
      • RedisHashPrefix: Behaves like a single-level dictionary, storing keys in a Redis hash under a specific prefix.
    • Both classes store/retrieve data in JSON-serialized form to/from Redis.
  2. Library Class Refactor
    • Replaced _index_title_maps, _title_node_maps, and _term_ref_maps with RedisNestedHash instances.
    • Replaced _index_map with a RedisHashPrefix instance.
    • Preserved the same public interface, so existing code (e.g., library._index_title_maps["en"][...] = ...) continues to work without modifications.
  3. Minimal Code Disruption
    • Since the Redis wrappers implement standard dictionary methods (__getitem__, __setitem__, etc.), there are no major changes needed in consumer code.
    • The rest of the class initialization and usage remain the same.

Testing & Verification

  • Verified that setting and retrieving data (e.g., library._index_title_maps["en"]["Genesis"]) reads/writes correctly to Redis.
  • Confirmed that removing items from the dictionary removes them from Redis.
  • Basic load/performance tests show reduced memory usage in the main Python process.

Next Steps

  • Monitor Redis memory usage and adjust Redis eviction policies if needed.
  • Migrating other data structures (like _toc or _topic_toc).

By merging this PR, we ensure that the Library class now leverages Redis for data storage without breaking existing code patterns or forcing widespread refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant