Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Hash-Based Session Caching in Genomeshader #11

Open
5 tasks
bshifaw opened this issue Jun 20, 2024 · 0 comments
Open
5 tasks

Implement Hash-Based Session Caching in Genomeshader #11

bshifaw opened this issue Jun 20, 2024 · 0 comments

Comments

@bshifaw
Copy link

bshifaw commented Jun 20, 2024

Problem:

Currently, Genomeshader does not have a mechanism to reuse previously created sessions. This leads to unnecessary computation and storage usage when the same session is created multiple times.

Proposed Solution:

Implement a caching mechanism using hash-based identifiers for sessions. The idea is to generate a unique hash from the input used to create a session and use this hash as the name of the session's parquet file stored in the cache directory (either based locally or cloud). When a user starts a new session, Genomeshader should:

  1. Generate a hash from the provided input.
  2. Check if a parquet file with a name matching the generated hash already exists in the cache directory.
  3. If a match is found, reuse the existing parquet file to create the session.
  4. If no match is found, create a new session and save the session's parquet file in the cache directory with the generated hash as its name.

This approach will allow Genomeshader to avoid unnecessary computations and storage usage by reusing previously created sessions when the same input is provided.


Tasks:

  • Implement a function to generate a unique hash from the input used to create a session.
  • Modify the session creation process to check for an existing parquet file with a name matching the generated hash before creating a new session.
  • If a matching parquet file is found, modify the session creation process to reuse the existing parquet file.
  • If no matching parquet file is found, modify the session creation process to save the new session's parquet file with the generated hash as its name.
  • Test the new functionality with various inputs to ensure it works as expected.

Acceptance Criteria:

  • A session should be able to be created with a unique hash generated from its input.
  • If a session with the same hash already exists, Genomeshader should reuse the existing session instead of creating a new one.
  • If a session with the same hash does not exist, Genomeshader should create a new session and save its parquet file with the generated hash as its name.
  • The new functionality should be covered by tests to ensure it works as expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant