Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rhythm] Make ID generator more robust #4416

Merged
merged 7 commits into from
Dec 18, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 13 additions & 24 deletions modules/blockbuilder/util/id.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,43 +21,32 @@ type IDGenerator interface {
var _ IDGenerator = (*DeterministicIDGenerator)(nil)

type DeterministicIDGenerator struct {
seeds []int64
seq *atomic.Int64
tenantBytes []byte
seeds []int64
seq *atomic.Int64
}

func NewDeterministicIDGenerator(tenantID string, seeds ...int64) *DeterministicIDGenerator {
seeds = append(seeds, int64(binary.LittleEndian.Uint64(stringToBytes(tenantID))))
return &DeterministicIDGenerator{
seeds: seeds,
seq: atomic.NewInt64(0),
tenantBytes: []byte(tenantID),
seeds: seeds,
seq: atomic.NewInt64(0),
}
}

func (d *DeterministicIDGenerator) NewID() backend.UUID {
seq := d.seq.Inc()
seeds := append(d.seeds, seq)
return backend.UUID(newDeterministicID(seeds))
return backend.UUID(newDeterministicID(d.tenantBytes, append(d.seeds, seq)))
}

func newDeterministicID(seeds []int64) uuid.UUID {
b := int64ToBytes(seeds...)
func newDeterministicID(b []byte, seeds []int64) uuid.UUID {
mapno marked this conversation as resolved.
Show resolved Hide resolved
sl, dl := len(seeds), len(b)
data := make([]byte, dl+sl*8) // 8 bytes per int64
mapno marked this conversation as resolved.
Show resolved Hide resolved
copy(b, data)

return uuid.NewHash(hash, ns, b, 5)
}

// TODO - Try to avoid allocs here
func stringToBytes(s string) []byte {
return []byte(s)
}

func int64ToBytes(seeds ...int64) []byte {
l := len(seeds)
bytes := make([]byte, l*8)

// Use binary.LittleEndian or binary.BigEndian depending on your requirement
for i, seed := range seeds {
binary.LittleEndian.PutUint64(bytes[i*8:], uint64(seed))
binary.LittleEndian.PutUint64(data[dl+i*8:], uint64(seed))
}

return bytes
return uuid.NewHash(hash, ns, data, 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to call uuid.NewSHA1, since we are already using that hash function and version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so hash.Hash is not created each time, saving some allocations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current global var isn't safe for concurrent use, how about making it a struct var? I know the ID generator isn't called concurrently yet, but there's nothing preventing it, and seems likely (parallelism in the block builder).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been strongly preferring the fnv1a static methods that require no struct and have no concurrency concerns.

https://pkg.go.dev/github.com/segmentio/fasthash/fnv1a

Copy link
Member Author

@mapno mapno Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current global var isn't safe for concurrent use, how about making it a struct var?

Moved the hash.Hash to a struct field. Agree it's safer, and the difference won't be noticeable.

I've been strongly preferring the fnv1a static methods that require no struct and have no concurrency concerns.
https://pkg.go.dev/github.com/segmentio/fasthash/fnv1a

From what I read, the main arguments for using fasthash is saving allocs and avoiding inefficient string to bytes conversions.

We'd need to do the alloc manually since we're reusing the data byte slice between sequential IDs and already have bytes. I don't see a big benefit on using other than Go's sha1 in this case.

Copy link
Member

@joe-elliott joe-elliott Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i may be misunderstanding the issue, but hashing bytes does not alloc with fnv1a. i

https://github.com/segmentio/fasthash/blob/v1.0.3/fnv1a/hash.go#L76-L108

while working on ingester locking I benched it against this:

func (s *Tracker) token(traceID []byte) uint64 {
s.hash.Reset()
s.hash.Write(traceID)
return s.hash.Sum64()
}

neither function alloc'ed, but it's twice as fast, has no state and requires no locking. i think the only question is whether or not its hash alg is as collision resistant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agree that would be my concern with fnv. I think we can start with the SHAs in the uuid package and go from there. Realistically this is not called much (once per block), much less than trace/span hashing.

}
21 changes: 20 additions & 1 deletion modules/blockbuilder/util/id_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,28 @@ func TestDeterministicIDGenerator(t *testing.T) {
}
}

func FuzzDeterministicIDGenerator(f *testing.F) {
f.Skip()

f.Add(util.FakeTenantID, int64(42), int64(100))
f.Fuzz(func(t *testing.T, tenantID string, seed1, seed2 int64) {
gen := NewDeterministicIDGenerator(tenantID, seed1, seed2)

for i := 0; i < 3; i++ {
id := gen.NewID()
_, err := uuid.Parse(id.String())
if err != nil {
t.Fatalf("failed to parse UUID: %v", err)
}
}
})
}

func BenchmarkDeterministicID(b *testing.B) {
tenant := util.FakeTenantID
ts := time.Now().UnixMilli()
gen := NewDeterministicIDGenerator(util.FakeTenantID, ts)
partitionID := int64(0)
gen := NewDeterministicIDGenerator(tenant, partitionID, ts)
for i := 0; i < b.N; i++ {
_ = gen.NewID()
}
Expand Down
Loading