-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guard Tower Redux: Initial interface and struct setups for distributed key generation #43
Conversation
We re-separate the dep ensure step so that it doesn't run when a simple Go file changes. We also fix the directory paths so that imports that reference the full keep-core GitHub path will resolve correctly (currently no files with these references are compiled, which is how we missed it, but they're coming).
Includes a healthy dose of documentation, as well as a NewMember constructor function. There are still some open questions around how to handle the id, but this should be a good starting point.
The one Member struct becomes a LocalMember, SharingMember, JustifyingMember, and Member struct. Each one of these represents a phase in DKG, except SharingMember which represents both the commitment and private share receiving/validation phase. These can occur at the same time without any issues, so we combine the struct. Each struct has an associated method to transition it to the next phase, in which we do any final work (e.g., when we go from LocalMember to SharingMember we compute the private member shares that are to be sent to each member of the threshold group). Each struct also exposes the various methods that make sense for its phase. The Member struct now represents a fully-fleshed-out member, ready to participate in threshold signatures. Note that there are still a few todos around properly tracking accusation/justification. There is also some validation that needs to happen so that we don't generate a final Member if we've failed to meet our threshold requirements for qualified members.
This starts the formalization of distinctions between these two. Eventually private messages will need to encrypt, for example.
This allows message sends to not block the caller, so that the caller can move on to receiving from other participants.
There are a few places where we need it for filtering, though it's very likely these will be unnecessary if our underlying channels give us a way to not receive our own messages.
We were forgetting to set it, so even though we took it into the function we never propagated it to the member. We'll be using this for better public key creation once the DKG process is done.
OtherMemberIDs is just a convenient way to get the list of group member IDs other than a member's own ID, which we use when computing and shipping out private shares.
Without these methods we don't have a way externally of knowing that the member no longer needs to receive additional shares or commitments.
We were accidentally checking ourselves for received shares while deciding who to accuse. Of course we didn't receive shares for ourselves -- we simply have them available.
We were expecting to find this member's commitments in the commitments map, but they are tracked separately. We also change the combined commitments allocation to reference the threshold, since this is most directly correct.
BeaconConfig is a stub for getting the current configuration of the relay beacon from the chain. BlockCounter is a stub for setting up notifications of when a certain number of blocks has been observed.
DKG.Execute takes a block counter, broadcast channel, group size, and threshold, and attempts to initialize a single member in a threshold group. Multiple DKG.Execute calls executed in parallel, combined with a local broadcast channel, will result in a local DKG process occurring. This gives us the infrastructure to do a local non-networked test of DKG, with interface stubs to interact with a networked version as well. Note that currently the DKG relies entirely on block timeouts for phase synchronization. We'll need to consider whether that's necessary throughout.
We use a local broadcast channel and a channel for reporting the complete members as they wrap up. No group signing yet though!
Not a huge deal, but don't want a million packages.
We were expecting the group member's own share to be in the receivedShares map, but it was not. Instead, we have to query the member's secret shares to see the share it generated for itself. Then we add all the received shares.
No substantive change, but the explicit assignment makes it clear where the public key is coming from.
thresholdgroup.Member now has SignatureShare(message) to compute its share of a threshold signature, and VerifySignature(shares, message) to take a set of shares and verify that they are a valid signature of the message for this group. The shares are all passed around as byte arrays, so that the details of BLS are tucked away behind everything with the exception of BLS IDs (which we may hide later).
Notably, this doesn't do anything about thresholds, it does the whole thing.
When we receive an accusation for a member A from another member B, we need to track this fact so that we only consider member A to be a valid member if we receive their justification for member B. We were tracking these at a lower granularity, namely by simply tracking the member as being accused, and accepting any justification from them as validting all accusations against them. We now track the accusations between all member pairs, and record the justifications between member pairs. When we go to finalize the member, we do a final disqualification of all members who were accused without a corresponding valid justification. This yields a correct final list of qualified members.
The methods are defined in order from first used to last used during the DKG/relay process.
This is what we'll need to work with on-chain implementations, from our current understanding.
heightMutex implies the mutex locks only access to block height information, but in fact the mutex protects access to all struct members.
We append + assign further down in the function anyway.
We were wrapping the whole tick in the mutex, including waiter notification, but really we can read all the state we need quickly and notify waiters separately.
We were dividing seconds, but multiplication > division.
It's unclear how these interact with the rest of the BLS stuff, so let's just avoid them altogether.
We were failing to do this, which resulted in all sorts of weird runtime behavior when every member's channel didn't end up in the channel list.
The buffer is big enough to hold all private shares in a share exchange for a 250-member group. We need these buffers so that a message broadcast doesn't block on a member who is behind, which is likely to happen if number of cores < number of members.
We were letting these contribute to our overall count, which meant we could stop looking for justifications or accusations before we had actually received all of them. Specifically, it meant some members could miss exactly one such message.
These timeouts were tested for 250 members on a 2017 MacBook Pro. They are meant to allow all members to complete one phase before any member proceeds to the next one, in a fully local simulation.
We now report how many ids we saw joining the broadcast channel (helps to make sure everyone made it if something goes wrong), and we stop printing output for each received share and each sent share (this adds up fast). Instead, we print when we start and finish shipping out all private shares. Then we rely on other output to note when all shares have been received.
The Bn254 curve doesn't actually produce valid combined signatures... We'll look into that separately in our research on using the curve implemented by Ethereum.
We weren't starting with the function name like we're supposed to ;)
go/beacon/broadcast/broadcast.go
Outdated
go func(recvChans []chan Message) { | ||
for _, recvChan := range recvChans { | ||
recvChan <- message | ||
} | ||
}(channel.recvChans) | ||
channel.recvChansMutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lock may potentially get let go before the go routine finishes executing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm… Which is not fine. I meant it to be fine because we snapshot channel.recvChans
when passing it to the goroutine, but we actually don't. I'll copy it <_<
go/beacon/broadcast/broadcast.go
Outdated
|
||
return true | ||
} | ||
|
||
func (channel *localChannel) RecvChan() <-chan Message { | ||
newChan := make(chan Message) | ||
newChan := make(chan Message, 62500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
250*250, or roughly the number of total expected private share messages we're throwing around during private share exchange with a 250-member group. I don't think it actually needs to be this high, but I didn't want to waste further time on figuring out a better number since this is really just for local smoke testing.
go/beacon/broadcast/broadcast.go
Outdated
name string | ||
recvChans []chan Message | ||
name string | ||
recvChansMutex sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
go/beacon/broadcast/broadcast.go
Outdated
@@ -73,5 +80,5 @@ func (channel *localChannel) RecvChan() <-chan Message { | |||
// that is returned to the caller, so that all receive channels can receive | |||
// the message. | |||
func LocalChannel(name string) Channel { | |||
return &localChannel{name, make([]chan Message, 0)} | |||
return &localChannel{name, sync.Mutex{}, make([]chan Message, 0)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the sync.Mutex (unless you'd like us to be very explicit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel super-strongly, but the other option was to use named parameters (that is, I can't just leave the parameter off as the compiler will choke). I just did the thing that looked easiest, tbh.
go/beacon/broadcast/broadcast.go
Outdated
@@ -73,5 +80,5 @@ func (channel *localChannel) RecvChan() <-chan Message { | |||
// that is returned to the caller, so that all receive channels can receive | |||
// the message. | |||
func LocalChannel(name string) Channel { | |||
return &localChannel{name, make([]chan Message, 0)} | |||
return &localChannel{name, sync.Mutex{}, make([]chan Message, 0)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a simple slice (not a slice of slices, right?) - so you can drop the zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't believe that's true. Slices need an initial length. Compiler seems to agree:
beacon/broadcast/broadcast.go:83:47: missing len argument to make([]chan Message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhmm, good to know, thanks.
go/beacon/relay/dkg.go
Outdated
@@ -113,7 +117,7 @@ func ExecuteDKG(blockCounter chain.BlockCounter, channel broadcast.Channel, grou | |||
fmt.Printf("[member:%v] Waiting for share exchange timeout...\n", memberID) | |||
<-waiter | |||
|
|||
waiter = blockCounter.BlockWaiter(3) | |||
waiter = blockCounter.BlockWaiter(15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it these numbers are for demo/illustrative purposes and we'll be ripping things out from here later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, see above.
go/beacon/relay/dkg.go
Outdated
@@ -197,11 +201,12 @@ done: | |||
} | |||
|
|||
func sendShares(channel broadcast.Channel, member *thresholdgroup.SharingMember) error { | |||
fmt.Printf("[member:%v] Despatching shares!\n", member.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We sure are!
go/beacon/relay/dkg.go
Outdated
channel.Send(broadcast.NewPrivateMessage(member.BlsID, receiverID, MemberShareMessage{share})) | ||
} | ||
fmt.Printf("[member:%v] Shares despatched!\n", member.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boom! 🎉
go/beacon/relay/dkg.go
Outdated
memberIDs := justifyingMember.OtherMemberIDs() | ||
seenAccusations := make(map[bls.ID]bool, len(memberIDs)) | ||
done: | ||
for msg := range recvChan { | ||
switch accusationMsg := msg.Data.(type) { | ||
case AccusationsMessage: | ||
if msg.Sender.IsEqual(myID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are the sender of this message, move on (ie. avoid the loopback case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notably, we'll ultimately probably handle the “ignore messages from me” at a different level, I just needed a quick-and-dirty solution here.
go/main.go
Outdated
@@ -12,7 +12,7 @@ import ( | |||
) | |||
|
|||
func main() { | |||
bls.Init(bls.CurveFp254BNb) | |||
bls.Init(bls.CurveFp382_1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand all these curves and what going with one vs the other means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I did kick out my old number theory book and remembered that I actually know some of this stuff ;-P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a full understanding yet either.
The core of it is in #31 and #40 ; namely, we would like to use the curve that has optimized support for contract usage on the EVM, which was chosen for its relevance to zkSNARKs. This is actually neither of these curves, and @mhluongo is working on that piece. What I can say is that using CurveFp254BNb
failed to produce verifiable group signatures. I didn't get a chance to investigate that in depth, and figured I'd punt since @mhluongo is playing in that space and I'd rather move forward on other stuff while he investigates. For now I stuck with the curve that is giving good sigs, which we can use to make sure we haven't broken everything :)
The mutex wasn't guarding access to recvChans, since recvChans was passed to a goroutine that escaped the mutex's lock. We now lock the mutex and create a snapshot of recvChans, then use that snapshot in the goroutine.
Clarified the zero-ID loop and fixed the mutex around Ahhhh, mutability. |
go/beacon/broadcast/broadcast.go
Outdated
recvChan <- message | ||
} | ||
}(channel.recvChans) | ||
channel.recvChansMutex.Unlock() | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass in snapshot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie.
go func(snappyshot []chan Message) {
...
}(snapshot)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh. Annoying, but style is style :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say that… Is this accepted style? We're working with a local variable that's never written again, seems like we should be okay with a simple closure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you're right that it doesn't matter.
This looks good to me. I'll leave it to you to wait for more reviews or merge this in. |
Gah, sorry. That button is right there haha |
If you could give an approval review, that would be cool. Would still like to see another person look at it, but if I get far enough ahead on the followup branch we may skip that for now. We'll see tommorrow. |
|
||
members := make([]*thresholdgroup.Member, 0, beaconConfig.GroupSize) | ||
memberChannel := make(chan *thresholdgroup.Member) | ||
for i := 0; i < beaconConfig.GroupSize; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The incrementor variable (i) is not correct at the time it's referenced in the goroutine; i will always be beaconConfig.GroupSize (in our case 10).
Adjusted the fmt.Fprintf verbs.
Perhaps, we could pass i into the goroutine?
Also, maybe we could also pass i into the relay.ExecuteDKG function, s.t. we can see the member number in the log messages?
It's fun to see which one gets initialized first!
[member:ce9386d781fe8c823babdd7751995eb9dff85ef5edad30cda4f1bbba27f864183] Initializing member.
[member:172f95991f82dbcfcaba39977c01696b3943c678a63c1b01543472f3ba7d080b6] Initializing member.
[member:22123ac33a8dbcf8369a1fe78253028c70341e8c7ad9e924501ca8ac252bfa7d9] Initializing member.
[member:47c7f73fbc1ab3614d907ff0c40d36687487f5de25f58e3dcc039fb20ddcc5e94] Initializing member.
[member:ddeb99f4891253efd82ce981cd3d55a1f00e9b6a47e2e34402b53e43e89f3e1b0] Initializing member.
[member:521d24ded34005a945f4740265fd9508cb916871e7eb59b6f1ff534980cfd2405] Initializing member.
[member:c4bf40b166ac476c10231e7e5f2d7f428c5fd9ef16b88f0092f88c0e3c56a7bf1] Initializing member.
[member:bcac18c6f1fb2cb9f1dd5e59566270d6ed18ca1feb80aaf56599f147dc265dc32] Initializing member.
[member:b16cd724b2b317093e4285c7e4146a55d33c9cdaa54b5bd75c738213d6627b197] Initializing member.
[member:3bf3ea889db364e81cdda5ba30dd5b83c84aba6538e761e353e4865a5d1f5a4e8] Initializing member.
. . .
go/main.go
for i := 0; i < beaconConfig.GroupSize; i++ {
go func(i int) { // <= change
member, err := relay.ExecuteDKG(chainCounter, channel, beaconConfig.GroupSize, beaconConfig.Threshold, i) // <= change
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to run DKG for member %d: [%v].", i, err) // <= change
memberChannel <- nil
return
}
memberChannel <- member
}(i)
}
go/beacon/relay/dkg.go
func ExecuteDKG(blockCounter chain.BlockCounter, channel broadcast.Channel, groupSize, threshold, memberNum int) (*thresholdgroup.Member, error) { // <= change
// FIXME Probably pass in a way to ask for a receiver's public key?
// FIXME Need a way to time out in a given stage, especially the waiting
// ones.
// Generate a nonzero memberID; loop until rand.NewRand returns something
// other than 0, hopefully no more than once :)
memberID := "0"
for memberID = rand.NewRand().String(); memberID == "0"; {
}
memberID += fmt.Sprintf("%d", memberNum) // <= change
fmt.Printf("[member:%v] Initializing member.\n", memberID)
. . .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really interested in including the index in the output; members are identified entirely by their BLS id everywhere except where we don't have that id available, so I don't think we gain much by having it.
However, you're completely right that the index isn't properly captured. Nice catch on your part… Embarrassing oversight on mine hahaha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just realized the way this is structured right now there's no way for the BLS id to not be set when we error out either… So we can just report that error alongside the BLS id instead of the index 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
members are identified entirely by their BLS id
Put differently: the only situation where we care about the index is if the member's BLS id wasn't initialized properly. That is only hypothetically possible in the return of the ExecuteDKG
call; in fact, it's not possible in that loop, because we initialize the BLS id before we start getting the ability to return an error. However, for completeness, and because we have no way to require that the BLS id is set when we return from ExecuteDKG
, I implemented it so that error reports both BLS id AND index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this some more, since apparently I'm in a comment-spamming mood lol… I actually don't think we ever care about the index, since it doesn't give us actionable information. If the member fails to run DKG, it's unlikely to have anything to do with what index it was.
We include the index because the calling code has no way to know that the BLS id is definitely initialized on an error return; however, we also include the BLS id because it should be.
@rargulati still need an approving review from you here 🙏 @mhluongo once we have that perhaps you can either review or merge, depending on how you're feeling about it? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments, otherwise LGTM!
channel.recvChansMutex.Lock() | ||
snapshot := make([]chan Message, len(channel.recvChans)) | ||
copy(snapshot, channel.recvChans) | ||
channel.recvChansMutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would defer
make sense here? In this case I don't think it matters much, but bare resource frees make me nervous, and are best avoided in most langs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly my goal here was to keep the the scope of the lock super-tight. It doesn't really matter that much though---this is a local implementation that we'll be swapping out for a network-based one in practice.
That said, it seems odd that sync.Mutex
doesn't provide a Wrapped
method or something that basically lets you run a func
inside the lock and return a result. Guess that would be too functional for the Go folks :p
WaitForBlocks(numBlocks int) | ||
// BlockWaiter returns a channel that will emit the current block height | ||
// after the given number of blocks has elapsed and then immediately close. | ||
BlockWaiter(numBlocks int) <-chan int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name here is unfortunate. How do folks typically distinguish between sync and async in Go func names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.Timer
just exposes the channel directly. Doesn't work as well here with multiple timers and consumers though. Could just be that instead of a BlockCounter
interface, we have a package with a function that returns a block waiter that is for a particular block count. It could look basically the same as time.Timer
.
That said, (a) the naming seems all right to me (a waiter is something you wait on, no?) and (b) this interface needs to be a little more abstract anyway (discovered in super-smush-bros). Let's keep the issue in mind but handle it when we hit the right abstraction level.
Merging this in current state! |
Okay, this is a rebase of #32 onto the current master. This gives us a working local run of group distributed key gen over a simulated broadcast channel, proven by a group signature (not distributed over the channel).
The committed config is 10 members with a threshold of 4, but I've tested it up to 250 members with a threshold of 125. The latter is very slow and super-CPU-intensive (on the order of 10-15 minutes), but works (mostly… ran into some issues here that I need to check again).
Still missing (left for a future PR):
This incorporates some fixes to issues in
guard-tower
; you can restrict the diff to just those changes by clicking here (and do your usual review).