Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized SHA-256 Hash Function for Performance and Parallelism #10

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

analyzer1
Copy link
Collaborator

Function immutability and parallel execution were indicated. Additionally, the function was simplified.

Initial testing indicates comparable or better performance running in serial or parallel compared to the native sha256 function. Testing was completed within the PGRX Postgres 16 instance; however, more testing is needed to validate superior performance.

-- Performance Test
DROP TABLE IF EXISTS random_text_data;
CREATE TABLE random_text_data (
    random_text TEXT
);
CREATE INDEX idx_random_text ON random_text_data (random_text);

INSERT INTO random_text_data (random_text)
SELECT encode(sha256(convert_to(generate_series::varchar, 'UTF8')), 'hex') as random_text
FROM generate_series(1, 1000000);

SHOW max_parallel_workers_per_gather;
SHOW max_parallel_workers;
SHOW max_worker_processes;
SHOW parallel_setup_cost;
SHOW parallel_tuple_cost;

SET max_parallel_workers_per_gather = 4;
SET max_parallel_workers = 8;
SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;

VACUUM ANALYZE random_text_data;

-- Initial Indication of speedup of 40%+ vs native function
SELECT proname, proparallel, provolatile, procost FROM pg_proc WHERE proname = 'hash';
EXPLAIN ANALYZE
SELECT auto_dw.hash(random_text) 
FROM random_text_data; --1.082s, 0.918s, 0.939s per 1M records

SELECT proname, proparallel, provolatile, procost FROM pg_proc WHERE proname = 'sha256';
EXPLAIN ANALYZE
SELECT encode(sha256(convert_to(random_text, 'UTF8')), 'hex')
FROM random_text_data; -- 1.485s, 1.910s, 1.844 per 1M records

…el_safe for improved performance and parallel query execution.
@analyzer1 analyzer1 requested a review from theory September 25, 2024 14:57
@analyzer1 analyzer1 merged commit 5d9de8b into main Sep 26, 2024
1 check passed
@analyzer1 analyzer1 deleted the feature/PADW-66-hash-function-optimization branch September 26, 2024 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant