Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(vector): updated marshalling of vector #9109

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

harshil-goel
Copy link
Contributor

@harshil-goel harshil-goel commented Jul 5, 2024

Earlier we were unmarshalling bytes to []float64 by iterating on each element and reading it little endian. But we are now doing it using unsafe pointers. This reduces thee time from O(size(bytes)) to O(1) basically.
Benchmark stats:

goos: linux
goarch: amd64
pkg: github.com/dgraph-io/dgraph/tok/index
cpu: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
BenchmarkEncodeDecodeUint64Matrix/Binary_Encoding/Decoding-8              185269              5991 ns/op
BenchmarkEncodeDecodeUint64Matrix/Gob_Encoding/Decoding-8                  46299             25547 ns/op
BenchmarkEncodeDecodeUint64Matrix/JSON_Encoding/Decoding-8                 77666             13723 ns/op
BenchmarkEncodeDecodeUint64Matrix/PB_Encoding/Decoding-8                  500092              2346 ns/op
BenchmarkEncodeDecodeUint64Matrix/Unsafe_Encoding/Decoding-8          1465863          826.2 ns/op
BenchmarkDotProduct/vek:size=96000-8                                      528499              2331 ns/op
BenchmarkDotProduct/dotProduct:size=96000-8                               170630              7765 ns/op
BenchmarkDotProduct/dotProductT:size=96000-8                              145855              8314 ns/op
BenchmarkFloatConverstion/Current:size=96000-8                          284873474                4.172 ns/op
BenchmarkFloatConverstion/pointerFloat:size=96000-8                     263052618                3.988 ns/op
BenchmarkFloatConverstion/littleEndianFloat:size=96000-8                   40446             35287 ns/op

Now indexing 500k vectors take about 5 minutes. (more than 5 hours before)

@harshil-goel harshil-goel requested a review from a team as a code owner July 5, 2024 04:40
@dgraph-bot dgraph-bot added area/testing Testing related issues area/core internal mechanisms go Pull requests that update Go code labels Jul 5, 2024
@CLAassistant
Copy link

CLAassistant commented Jul 15, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ harshil-goel
❌ Harshil goel


Harshil goel seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@harshil-goel harshil-goel force-pushed the harshil-goel/vector-fix branch 2 times, most recently from 107975e to 167743d Compare July 19, 2024 03:26
.golangci.yml Outdated Show resolved Hide resolved
dgraphtest/local_cluster.go Outdated Show resolved Hide resolved
dgraphtest/local_cluster.go Outdated Show resolved Hide resolved
dgraphtest/local_cluster.go Outdated Show resolved Hide resolved
posting/index.go Show resolved Hide resolved
systest/license/license_test.go Outdated Show resolved Hide resolved
tok/hnsw/helper.go Show resolved Hide resolved
tok/hnsw/helper.go Outdated Show resolved Hide resolved
@@ -335,7 +326,7 @@ func populateEdgeDataFromKeyWithCacheType(
if data == nil {
return false, nil
}
err = json.Unmarshal(data.([]byte), &edgeData)
err = decodeUint64MatrixUnsafe(data.([]byte), edgeData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change the disk format? Why can't we use protobuf here like we do for everything we write to disk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to iterate over the list to make it a protobuff. So to marshal and unmarshal becomes a linear task. Hence taking too much time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BenchmarkEncodeDecodeUint64Matrix/JSON_Encoding/Decoding-8 91692 12608 ns/op
BenchmarkEncodeDecodeUint64Matrix/PB_Encoding/Decoding-8 467174 2221 ns/op
BenchmarkEncodeDecodeUint64Matrix/Unsafe_Encoding/Decoding-8 1609965 748.9 ns/op

tok/hnsw/helper.go Show resolved Hide resolved
@harshil-goel harshil-goel force-pushed the harshil-goel/vector-fix branch from e52bdb7 to 3cce9dc Compare July 25, 2024 08:16
@harshil-goel harshil-goel force-pushed the harshil-goel/vector-fix branch from 60fca00 to fe5159c Compare July 25, 2024 11:33
@harshil-goel harshil-goel merged commit 7e02491 into main Jul 25, 2024
13 checks passed
@harshil-goel harshil-goel deleted the harshil-goel/vector-fix branch July 25, 2024 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core internal mechanisms area/testing Testing related issues go Pull requests that update Go code
Development

Successfully merging this pull request may close these issues.

5 participants