Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize expiry based de-duplication, dsmr #1810

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

tsachiherman
Copy link
Contributor

@tsachiherman tsachiherman commented Nov 22, 2024

What ?

Integrate the generic de-deduplication logic into the dsmr

@tsachiherman tsachiherman self-assigned this Nov 22, 2024
x/dsmr/block.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated

return n.storage.SetMin(block.Timestamp, chunkIDs)
func (n *Node[T]) GetExecutionBlock(ctx context.Context, blkID ids.ID) (validitywindow.ExecutionBlock[*ChunkCertificate], error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can defer this to the VM to maintain a chain index for us that we pass in - similar to how we handle this in the existing chain/ package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np

@tsachiherman tsachiherman marked this pull request as ready for review November 25, 2024 18:20
x/dsmr/node.go Outdated
@@ -172,19 +193,28 @@ func (n *Node[T]) BuildChunk(
return chunk, n.storage.AddLocalChunkWithCert(chunk, &chunkCert)
}

func (n *Node[T]) BuildBlock(parent Block, timestamp int64) (Block, error) {
if timestamp <= parent.Timestamp {
// BuildBlock(ctx context.Context, parentView state.View, parent *ExecutionBlock) (*ExecutionBlock, *ExecutedBlock, merkledb.View, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this comment?

x/dsmr/node_test.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
@tsachiherman
Copy link
Contributor Author

tsachiherman commented Nov 25, 2024 via email

Comment on lines 73 to 81
// make sure we have no repeats within the block itself.
blkTxsIDs := make(map[ids.ID]bool, len(blk.Txs()))
for _, tx := range blk.Txs() {
id := tx.GetID()
if _, has := blkTxsIDs[id]; has {
return fmt.Errorf("%w: duplicate in block", ErrDuplicateContainer)
}
blkTxsIDs[id] = true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this check twice now in chain since we previously handled this within NewExecutionBlock

We should only apply the check once, which do you think is the better place for it? The other addition within NewExecutionBlock is pre-populating the signature job. It would probably be better to remove that from the execution block and handle it in AsyncVerify. One good reason to do this is that handling this within ParseBlock can be a DoS vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why testing this twice is an issue on it's own, and I think that I have a better solution here;
Testing it in NewExecutionBlock is not ideal from my perspective, as it should be construction step ( i.e. no errors ).
How do you feel about the following:
in ExecutionBlock, we would add the following function:

func (b *ExecutionBlock) ValidateDuplicateTransactions() error {
    if len(b.Txs) != b.txsSet.Len() {
        ErrDuplicateTx
    }
    return nil

than in validitywindow.go VerifyExpiryReplayProtection, we can just call this method:

   ...
   if err := blk.ValidateDuplicateTransactions(); err != nil {
      return err
   }
   ...

and remove the test from NewExecutionBlock

x/dsmr/block.go Outdated
Comment on lines 147 to 154
func (b Block) Contains(id ids.ID) bool {
for _, c := range b.ChunkCerts {
if c.ChunkID == id {
return true
}
}
return false
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a set of chunk cert IDs to make this faster? Could also leave for now and add only if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm perfectly fine adding it. will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

x/dsmr/node.go Outdated
Comment on lines 103 to 124
type (
Node[T Tx] struct {
nodeID ids.NodeID
networkID uint32
chainID ids.ID
pk *bls.PublicKey
signer warp.Signer
getChunkClient *TypedClient[*dsmr.GetChunkRequest, Chunk[T], []byte]
getChunkSignatureClient *TypedClient[*dsmr.GetChunkSignatureRequest, *dsmr.GetChunkSignatureResponse, []byte]
chunkCertificateGossipClient *TypedClient[[]byte, []byte, *dsmr.ChunkCertificateGossip]
validators []Validator
validityWindow timeValidityWindow

GetChunkHandler *GetChunkHandler[T]
GetChunkSignatureHandler *acp118.Handler
ChunkCertificateGossipHandler *ChunkCertificateGossipHandler[T]
storage *chunkStorage[T]
log logging.Logger
tracer trace.Tracer
validityWindowDuration int64
}
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the type declaration block in favor of simply declaring the node type? We do not use this style anywhere else in the codebase, so would prefer not to introduce it here.

x/dsmr/node.go Outdated
Comment on lines 222 to 225
if dup.Len() == len(chunkCerts) && len(chunkCerts) > 0 {
return Block{}, ErrAllChunkCertsDuplicate
}
return Block{}, ErrNoAvailableChunkCerts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why differentiate this error? This handling seems overly complicated to provide a different error type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that you've explicitly asked for a separate error when all the certs are duplicate.. did I get it wrong ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I meant a separate test case if we attempt to build an empty block: #1810

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh.. yes, I can do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

x/dsmr/node.go Outdated
Comment on lines 247 to 250
if block.Tmstmp <= parentBlock.Tmstmp && parentBlock.Hght > 0 {
return ErrTimestampNotMonotonicallyIncreasing
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this check from this PR?

We should enforce timestamp >= parent timestamp (should allow them to be equal in case a block builder pushes the timestamp ahead of wall clock time for some nodes.

We should not allow the case that a malicious node builds a block with a timestamp that we consider valid less than 1s ahead of our wall clock time, but still ahead of our wall clock time, such that when we build a block on top of it, we fail because the current timestamp is ahead of our local timestamp.

We should update the check applied in BuildBlock imo.

We should also never execute the genesis block, so the check for parentBlock.Hght > 0 should be removed.

if blk, has := ti.blocks[id]; has {
return blk, nil
}
return nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching a non-existing block should return an error, can we return database.ErrNotFound if the block is not available?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the ux of us using the sentinel database.ErrNotFound experience to be very awkward. This is due to the existing interface definition and not the changes in this PR though.... if we had something like GetExecutionBlock() (ExecutionBlock[T], bool, error) we wouldn't have to introspect the error and could just return an error if error is non-nil and check the bool if it just wasn't there.

@@ -69,6 +70,15 @@ func (v *TimeValidityWindow[Container]) VerifyExpiryReplayProtection(
if dup.Len() > 0 {
return fmt.Errorf("%w: duplicate in ancestry", ErrDuplicateContainer)
}
// make sure we have no repeats within the block itself.
blkTxsIDs := make(map[ids.ID]bool, len(blk.Txs()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use set.Set here?

Comment on lines 58 to 60
func (c ChunkCertificate) GetExpiry() int64 { return c.Expiry }

func (c *ChunkCertificate) GetSlot() int64 { return c.Expiry }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are duplicated - GetSlot is meant to be Expiry

x/dsmr/node.go Outdated
log logging.Logger,
tracer trace.Tracer,
chainIndex ChainIndex,
validityWindowDuration int64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use time.Duration here?

x/dsmr/node.go Outdated
Comment on lines 204 to 207
oldestAllowed := timestamp - n.validityWindowDuration
if oldestAllowed < 0 {
oldestAllowed = 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: min(0, oldestAllowed)

blocks map[ids.ID]validitywindow.ExecutionBlock[*ChunkCertificate]
}

func (ti *testingChainIndex) GetExecutionBlock(_ context.Context, id ids.ID) (validitywindow.ExecutionBlock[*ChunkCertificate], error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits:

  1. receiver name should just be t
  2. id -> blkID or blockID

x/dsmr/block.go Outdated
Comment on lines 127 to 131
func (b Block) GetID() ids.ID {
return b.blkID
}

func (b Block) Parent() ids.ID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care which we pick, but we should be consistent on the naming of either Foo() or GetFoo().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to dodge this one by saying that this method is no longer needed.

x/dsmr/block.go Outdated
Comment on lines 143 to 145
func (b Block) Txs() []*ChunkCertificate {
return b.ChunkCerts
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird because these are not returning txs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree. let's discuss this as a group ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the term containers?

x/dsmr/node.go Outdated
Comment on lines 58 to 59
log logging.Logger,
tracer trace.Tracer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make these (logger + tracer) the first args in this fn

x/dsmr/node.go Outdated
Comment on lines 37 to 40
type (
ChainIndex = validitywindow.ChainIndex[*ChunkCertificate]
timeValidityWindow = *validitywindow.TimeValidityWindow[*ChunkCertificate]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we're doing this to avoid the caller depending on an internal package. I'm wondering if it even makes sense for validitywindow to be in internal at all... should this just be merged into dsmr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronbuchwald asked to remove this section completely.

x/dsmr/block.go Outdated
"github.com/ava-labs/hypersdk/utils"
)

const InitialChunkSize = 250 * 1024

type Tx interface {
GetID() ids.ID
GetExpiry() int64
emap.Item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exposes the internal/emap package into the caller. I think the previous wrapping pattern where we wrapped this interface w/ a type that implemented the emap interface actually looked cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once merged, we won't need wrapping interface anymore. . unless I'm missing something ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @joshua-kim 's point is to have the internal type implemented as makes sense in this package and then wrap it with another type that converts between that structure and the required interface when we need to use it in the validity window or expiry map where we need a specific interface. Correct me if I'm wrong @joshua-kim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

x/dsmr/block.go Outdated
Comment on lines 143 to 145
func (b Block) Txs() []*ChunkCertificate {
return b.ChunkCerts
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the term containers?

x/dsmr/node.go Outdated
Comment on lines 222 to 225
if dup.Len() == len(chunkCerts) && len(chunkCerts) > 0 {
return Block{}, ErrAllChunkCertsDuplicate
}
return Block{}, ErrNoAvailableChunkCerts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I meant a separate test case if we attempt to build an empty block: #1810

x/dsmr/block.go Outdated
"github.com/ava-labs/hypersdk/utils"
)

const InitialChunkSize = 250 * 1024

type Tx interface {
GetID() ids.ID
GetExpiry() int64
emap.Item
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @joshua-kim 's point is to have the internal type implemented as makes sense in this package and then wrap it with another type that converts between that structure and the required interface when we need to use it in the validity window or expiry map where we need a specific interface. Correct me if I'm wrong @joshua-kim

}

type ChainIndex[Container emap.Item] interface {
GetExecutionBlock(ctx context.Context, blkID ids.ID) (ExecutionBlock[Container], error)
GetExecutionBlock(ctx context.Context, blkID ids.ID) (ExecutionBlock[Container], bool, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding the boolean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that this is to return true/false, and use the error if something goes wrong instead of just returning the block type + an error, and you have to introspect the error using a check against database.ErrNotFound.

Copy link
Contributor Author

@tsachiherman tsachiherman Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronbuchwald see @joshua-kim's request above to change the function signature to GetExecutionBlock() (ExecutionBlock[T], bool, error)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is it valid to not have an execution block that the validity window requests?

We should always have a window of blocks that goes past the time validity window. I am not sure of any case where not having a block ie. returning false should not also be treated as an error. Unless there is such a case, I think we're better off just returning an error and not adding to the return type here.

parent, err := v.chainIndex.GetExecutionBlock(ctx, blk.Parent())
if err != nil {
parent, hasBlock, err := v.chainIndex.GetExecutionBlock(ctx, blk.Parent())
if err != nil || !hasBlock {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should return a boolean for hasBlock here, but if we did not have the block, we would not be able correctly verify the block. It's not safe to mark a block as passing verification if we are missing the information required to verify it, we must guarantee we don't reach a point like that in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're correct. I'll be adding an error case here which would be returned in case the hasBlock is false ( since it shouldn't happen ).

Comment on lines 1031 to 1036
blk := Block{
ParentID: ids.GenerateTestID(),
Hght: 1,
Tmstmp: 1,
blkID: ids.GenerateTestID(),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be building this block through node.BuildBlock and calling Verify prior to calling Accept (see other tests as an example). This is technically not a valid block because it references an unknown parent instead of the last accepted block which would be caught if Verify was called on it. Building a block should generally take the form of calling BuildChunk -> BuildBlock -> Verify -> Accept

}
r.NoError(node.Accept(context.Background(), blk))

r.NoError(node.storage.AddLocalChunkWithCert(chunk, &chunkCert))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have to call this, since the chunk cert is already added locally once the chunk is finished being built from node.BuildChunk, and remote chunks are added within Accept if we don't have them (ref).

I also think this is a weird way of writing the test because we're manually injecting the chunk into the database instead of testing how this would happen through the actual user-flow of dsmr. Maybe a better way would be to test the cases where you are calling BuildChunk a second time w/ the same args to make a duplicate chunk, and trying to build + accept the block? We should also test the case where a peer sends us a block referencing a duplicate chunk that we try to verify and fail.

Comment on lines 1097 to 1117
initChunks := func(node *Node[tx]) ([]Chunk[tx], []*ChunkCertificate) {
var chunks []Chunk[tx]
var chunksCerts []*ChunkCertificate
for expiry := int64(0); expiry < 5; expiry++ {
chunk, cert, err := node.BuildChunk(
context.Background(),
[]tx{
{
ID: ids.GenerateTestID(),
Expiry: expiry,
},
},
expiry,
codec.Address{},
)
r.NoError(err)
chunks = append(chunks, chunk)
chunksCerts = append(chunksCerts, &cert)
}
return chunks, chunksCerts
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just move this to the test body? It's only used once so I find that this makes the test more confusing to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

}
return chunks, chunksCerts
}
testCases := []struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In our codebase we usually just call this tests

Comment on lines 1176 to 1261
indexer := newTestingChainIndexer()
valWind := testCase.validalidityWindowDuration
if valWind == 0 {
valWind = testingDefaultValidityWindowDuration
}

getChunkPeers := make(map[ids.NodeID]p2p.Handler)

chunkCertGossipPeers := make(map[ids.NodeID]p2p.Handler)

chunkStorage, err := NewChunkStorage[tx](NoVerifier[tx]{}, memdb.New())
require.NoError(t, err)

getChunkHandler := &GetChunkHandler[tx]{
storage: chunkStorage,
}
chunkSignatureRequestHandler := acp118.NewHandler(ChunkSignatureRequestVerifier[tx]{
verifier: NoVerifier[tx]{},
storage: chunkStorage,
}, signer)
chunkCertificateGossipHandler := ChunkCertificateGossipHandler[tx]{
storage: chunkStorage,
}
genesisBlk := Block{
ParentID: ids.Empty,
Hght: 0,
Tmstmp: 0,
blkID: ids.Empty,
}

nodeID := ids.GenerateTestNodeID()

validators := []Validator{
{
NodeID: nodeID,
Weight: 1,
PublicKey: pk,
},
}

chunkSignaturePeers := map[ids.NodeID]p2p.Handler{
nodeID: chunkSignatureRequestHandler,
}

node, err := New[tx](
logging.NoLog{},
trace.Noop,
nodeID,
networkID,
chainID,
pk,
signer,
chunkStorage,
getChunkHandler,
chunkSignatureRequestHandler,
chunkCertificateGossipHandler,
p2ptest.NewClientWithPeers(
t,
context.Background(),
ids.EmptyNodeID,
getChunkHandler,
getChunkPeers,
),
p2ptest.NewClientWithPeers(
t,
context.Background(),
ids.EmptyNodeID,
chunkSignatureRequestHandler,
chunkSignaturePeers,
),
p2ptest.NewClientWithPeers(
t,
context.Background(),
ids.EmptyNodeID,
chunkCertificateGossipHandler,
chunkCertGossipPeers,
),
validators,
genesisBlk,
1,
1,
indexer,
valWind,
)
r.NoError(err)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use newTestNode here? I think it would simplify a lot of the setup

@@ -38,6 +40,29 @@ var (
chainID = ids.Empty
)

const testingDefaultValidityWindowDuration = time.Duration(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we move this to a const block at the top of the file?

@@ -38,6 +40,29 @@ var (
chainID = ids.Empty
)

const testingDefaultValidityWindowDuration = time.Duration(5)

type testingChainIndex struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we call this something like testValidityWindow + an interface check at the top of this file?

}

type ChainIndex[Container emap.Item] interface {
GetExecutionBlock(ctx context.Context, blkID ids.ID) (ExecutionBlock[Container], error)
GetExecutionBlock(ctx context.Context, blkID ids.ID) (ExecutionBlock[Container], bool, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that this is to return true/false, and use the error if something goes wrong instead of just returning the block type + an error, and you have to introspect the error using a check against database.ErrNotFound.

)
for {
parent, err = s.chainIndex.GetExecutionBlock(ctx, parent.Parent())
if err != nil {
parent, hasBlock, err = s.chainIndex.GetExecutionBlock(ctx, parent.Parent())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ok is usually idiomatic in go for existence checks

return e.innerBlock.ParentID
}

func (e ExecutionBlock) Txs() []*emapChunkCertificate {
Copy link
Collaborator

@aaronbuchwald aaronbuchwald Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to a neutral name instead of txs like containers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants