[ENH] SIMD vectorization for distance metrics #2084

sanketkedia · 2024-04-30T07:03:01Z

Description of changes

Adds SIMD vectorization for euclidean, cosine and inner product for x86, x86_64 and arm. Instruction sets whose support has been added are SSE, AVX and NEON.

Test plan

[+] Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

No

github-actions · 2024-04-30T07:03:13Z

HammadB · 2024-04-30T15:35:31Z

rust/worker/Cargo.toml

+[[bench]]
+name = "distance_metrics"
+path = "src/benches/distance_metrics.rs"
+harness = false


learning: what does this do (harness)

harness disables libtest benchmarking since I am using the criterion crate

HammadB · 2024-04-30T15:36:00Z

rust/worker/Cargo.toml

@@ -47,6 +52,7 @@ proptest = "1.4.0"
 proptest-state-machine = "0.1.0"
 "rand" = "0.8.5"
 rayon = "1.8.0"
+criterion = "0.3"


out of curiosity what alternatives did we evaluate?

Good question. By default, we have libtest using which we can do basic benchmarking. My main motive behind using criterion was to enable sophisticated benchmarking for the future. In particular criterion provides:

Statistics: Statistical analysis detects if, and by how much, performance has changed since the last benchmark run

Charts: Uses gnuplot to generate detailed graphs of benchmark results

These would be useful in future to run these benchmarks in our CI/CD pipeline or nightly/weekly (some cadence) to detect perf regressions for e.g.

Awesome thank you

HammadB · 2024-04-30T15:37:31Z

rust/worker/src/distance/types.rs

+                    all(target_feature = "avx", target_feature = "fma")
+                ))]
+                {
+                    if std::arch::is_x86_feature_detected!("avx")


should we add tests to validate that the simd impls (whatever can run on the target machine) match the base impl?

I think there is a test already in types.rs - test_distance_function_l2sqr() which I was relying on. If that regresses then there is some bug in the SIMD impls. That test validates inner product and l2 norm. Since, cosine is the same as inner product for us, I am guessing that is not needed

Ok great good point

codetheweb

this is super cool, wonder if it'd be worth using an existing vectorized implementation like this one instead of implementing it ourselves?

sanketkedia · 2024-05-01T02:36:00Z

@codetheweb the main issue with using a third-party is that our distance functions have slightly different definition. For e.g. cosine similarity assumes the vectors are normalized

vercel · 2024-05-01T20:21:39Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
chroma	❌ Failed (Inspect)			May 1, 2024 8:21pm

sanketkedia · 2024-05-01T20:39:51Z

Verified that there is about 10x perf improvement for both arm and x86_64. Going to merge this unless anyone else has any objections.

SIMD for distance metrics

8ae56c9

sanketkedia requested review from HammadB and Ishiihara April 30, 2024 07:21

sanketkedia self-assigned this Apr 30, 2024

HammadB reviewed Apr 30, 2024

View reviewed changes

codetheweb reviewed Apr 30, 2024

View reviewed changes

Add license attribution

4a7eaf2

vercel bot had a problem deploying to Preview May 1, 2024 20:21 Failure

sanketkedia enabled auto-merge (squash) May 1, 2024 21:38

sanketkedia disabled auto-merge May 1, 2024 21:42

sanketkedia merged commit e53ab49 into chroma-core:main May 1, 2024
123 of 124 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] SIMD vectorization for distance metrics #2084

[ENH] SIMD vectorization for distance metrics #2084

sanketkedia commented Apr 30, 2024

github-actions bot commented Apr 30, 2024

HammadB Apr 30, 2024 •

edited

Loading

sanketkedia May 1, 2024

HammadB Apr 30, 2024

sanketkedia May 1, 2024

HammadB May 1, 2024

HammadB Apr 30, 2024

sanketkedia May 1, 2024

HammadB May 1, 2024

codetheweb left a comment

sanketkedia commented May 1, 2024

vercel bot commented May 1, 2024 •

edited

Loading

sanketkedia commented May 1, 2024

[ENH] SIMD vectorization for distance metrics #2084

[ENH] SIMD vectorization for distance metrics #2084

Conversation

sanketkedia commented Apr 30, 2024

Description of changes

Test plan

Documentation Changes

github-actions bot commented Apr 30, 2024

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

HammadB Apr 30, 2024 • edited Loading

Choose a reason for hiding this comment

sanketkedia May 1, 2024

Choose a reason for hiding this comment

HammadB Apr 30, 2024

Choose a reason for hiding this comment

sanketkedia May 1, 2024

Choose a reason for hiding this comment

HammadB May 1, 2024

Choose a reason for hiding this comment

HammadB Apr 30, 2024

Choose a reason for hiding this comment

sanketkedia May 1, 2024

Choose a reason for hiding this comment

HammadB May 1, 2024

Choose a reason for hiding this comment

codetheweb left a comment

Choose a reason for hiding this comment

sanketkedia commented May 1, 2024

vercel bot commented May 1, 2024 • edited Loading

sanketkedia commented May 1, 2024

HammadB Apr 30, 2024 •

edited

Loading

vercel bot commented May 1, 2024 •

edited

Loading