-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prefix lookup implementation #2
base: noot/double-hash
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
providers/providers_manager.go
Outdated
@@ -317,6 +361,7 @@ func (pm *ProviderManager) getProviderSetForKey(ctx context.Context, k []byte) ( | |||
} | |||
|
|||
if len(pset.providers) > 0 { | |||
// TODO: if this is a prefix, do we want to cache it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add only the target CID to the cache (if found), and not all content matching the prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I can change this to only cache full CIDs!
providers/providers_manager.go
Outdated
@@ -296,6 +319,27 @@ func (pm *ProviderManager) GetProviders(ctx context.Context, k []byte) ([]peer.A | |||
} | |||
} | |||
|
|||
// GetProvidersForPrefix returns the set of providers with the given prefix, as well | |||
// as the full key they provide. | |||
func (pm *ProviderManager) GetProvidersForPrefix(ctx context.Context, k []byte) (map[peer.ID][][]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return map[peer.ID][][]byte
here? Wouldn't it be better to return map[[]byte]peer.ID
? In the end we only care about one provider record associated with the CID. If we have a map[peer.ID][][]byte
we need to iterate on the full map.
How does it integrate with provider record decryption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the map value is what keys with the prefix the peer has. it's possible the peer has multiple keys with the same prefix, so that's why it's [][]byte
.
for provider record decryption, the peer.ID
is actually encrypted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when requesting a specific key, we expect Pr[many peerID for one CID matching prefix] >> Pr[many CIDs matching prefix provided by the same peerID]
as if a provider provides multiples CIDs, they will be stored at random locations in the DHT, hence we don't expect to have multiple hits for the same content provider for a single request (targeting a specific location).
I think that it would make more sense to have a map: second hash (dht identifier) -> list of encrypted peer.ID
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I guess even though that case is unlikely, it still could potentially happen (especially if the prefix length chosen isn't that short). =
I think a map of double hash -> encrypted peer IDs would also work, I think the reason I didn't do that is due to the format of the Message_Peer
type and how the pb messages are sent generally - they are sent by peer, not by provided data (since before there were no prefixes). let me revisit this and try to find a better solution!
update: I've updated the code to now use map[string][]*peer.AddrInfo
for prefix lookups, where the map key is the full lookup key. I think it's cleaner now, let me know what you think!
…uery is for a prefix key
Would it make things easier to add a |
@@ -419,6 +431,21 @@ func makeRoutingTable(dht *IpfsDHT, cfg dhtcfg.Config, maxLastSuccessfulOutbound | |||
return rt, err | |||
} | |||
|
|||
// SetPrefixLength sets the prefix length for DHT provider lookups. | |||
func (dht *IpfsDHT) SetPrefixLength(prefixLength int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is SetPrefixLength
called somewhere in go-libp2p-kad-dht
? Or is this function meant to be called by kubo
or go-libp2p
to configure the DHT settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's meant to be called externally. I needed this for testing, but thought it could be useful for users of the library also, since the only other way to set the prefix length is at DHT initialization.
@@ -97,7 +97,7 @@ func (m *messageSenderImpl) SendRequest(ctx context.Context, p peer.ID, pmes *pb | |||
|
|||
stats.Record(ctx, | |||
metrics.SentRequests.M(1), | |||
metrics.SentBytes.M(int64(pmes.Size())), | |||
metrics.SentBytes.M(int64(pmes.XXX_Size())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the XXX
stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, it's auto-generated from protobuf, haha
providers/provider_set.go
Outdated
set map[peer.ID]time.Time | ||
providers []peer.ID | ||
set map[peer.ID]time.Time | ||
keyToProviders map[string][]peer.ID // TODO change to map of maps, otherwise there can be duplicate peer IDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would indeed make sense to have providers
as a map, and keyToProviders
as a map of maps 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually now that I thought about it, I think these are okay as slices because there can't be duplicate provider records in the datastore, since the DB key is the provider ID + provided CID essentially. so when these slices are written to while the database is being iterated over, it shouldn't be possible to have duplicate peer IDs.
Type: typ, | ||
Key: &Message_Key{ | ||
Key: key, | ||
PrefixBitLength: uint32(prefixBitLength), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint16
should be enough, but I am totally unfamiliar with how pb
works and whether the type can actually be changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no uint16 type in protobuf sadly :( uint32 is the smallest
@@ -197,6 +203,12 @@ func New(ctx context.Context, h host.Host, options ...Option) (*IpfsDHT, error) | |||
return nil, err | |||
} | |||
|
|||
dht.prefixLength = cfg.PrefixLookupLength | |||
if dht.prefixLength >= 256 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: Is the prefix length set globally for the kubo
node, or can it be changed for each request (e.g with Options, or argument)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be changed for each request by calling SetPrefixLength
right before the lookup. Another option I just thought of (that might be nicer API-wise?) would be to create an option specifically for the FindProviders
function that the caller can optionally pass in. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be awesome if the prefix can be set on a per request basis, with a global default value (either static e.g taken from the conf, or adaptive) for if the option isn't specifically set.
prefixLength
as a config option to the DHTprefixLength
is set, only lookup prefix of hash(CID)Message_Peer
message type to include the key the peer provides (maybe there's a nicer way to do this?)