Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AcceptListGrpcQuerier is not concurrent-safe #2051

Open
dzmitryhil opened this issue Dec 5, 2024 · 2 comments
Open

AcceptListGrpcQuerier is not concurrent-safe #2051

dzmitryhil opened this issue Dec 5, 2024 · 2 comments

Comments

@dzmitryhil
Copy link

AcceptListGrpcQuerier is not concurrent-safe

Issue description

The AcceptListGrpcQuerier is not concurrent-safe. This leads to the state when one query changes the result of another. Or even worse, when a query changes the state of the transaction that uses the query (leads to consensus failure).

The AcceptListGrpcQuerier accepts the map of the [request.Path]Response that is a singleton for the app. Hence there is a chance, when you query a smart contract (which uses the GRPC query to query the chain) to get the result of the query which is processing in parallel.

Example with step-by-step comments:

// 2 goroutines read bank balance of acc1 and acc2 concurrently, so they both get BalanceResponse{}
protoResponse, accepted := acceptList[request.Path]
if !accepted {
return nil, wasmvmtypes.UnsupportedRequest{Kind: fmt.Sprintf("'%s' path is not allowed from the contract", request.Path)}
}

// the rote is also the same for both goroutines
route := queryRouter.Route(request.Path)
if route == nil {
return nil, wasmvmtypes.UnsupportedRequest{Kind: fmt.Sprintf("No route to query '%s'", request.Path)}
}

// the requests are different, so the res is different for each goroutine
res, err := route(ctx, &abci.RequestQuery{
Data: request.Data,
Path: request.Path,
})
if err != nil {
return nil, err
}

// but here, becasue of the race condition, one goroutine can get the result of another goroutine
return ConvertProtoToJSONMarshal(codec, protoResponse, res.Value)

Test that proves the statement:

package keeper_test

import (
  "fmt"
  "sync/atomic"
  "testing"

  abci "github.com/cometbft/cometbft/abci/types"
  "github.com/cosmos/cosmos-sdk/baseapp"
  "github.com/cosmos/cosmos-sdk/codec"
  sdk "github.com/cosmos/cosmos-sdk/types"
  banktypes "github.com/cosmos/cosmos-sdk/x/bank/types"
  "github.com/stretchr/testify/require"
  "golang.org/x/sync/errgroup"

  "github.com/CosmWasm/wasmd/app"
  "github.com/CosmWasm/wasmd/x/wasm/keeper"
  wasmvmtypes "github.com/CosmWasm/wasmvm/v2/types"
)

const (
  denom1 = "denom1"
  denom2 = "denom2"
)

var _ keeper.GRPCQueryRouter = mockedQueryRouter{}

type mockedQueryRouter struct {
  codec codec.Codec
}

func (m mockedQueryRouter) Route(_ string) baseapp.GRPCQueryHandler {
  return func(ctx sdk.Context, req *abci.RequestQuery) (*abci.ResponseQuery, error) {
    balanceReq := &banktypes.QueryBalanceRequest{}
    if err := m.codec.Unmarshal(req.Data, balanceReq); err != nil {
      return nil, err
    }

    coin := sdk.NewInt64Coin(balanceReq.Denom, 1)
    balanceRes := &banktypes.QueryBalanceResponse{
      Balance: &coin,
    }

    resValue, err := m.codec.Marshal(balanceRes)
    if err != nil {
      return nil, err
    }

    return &abci.ResponseQuery{
      Value: resValue,
    }, nil
  }
}

func TestGRPCQuerier(t *testing.T) {
  acceptedQueries := keeper.AcceptedQueries{
    "/bank.Balance": &banktypes.QueryBalanceResponse{},
  }
  cdc := app.MakeEncodingConfig(t).Codec
  router := mockedQueryRouter{
    codec: cdc,
  }
  querier := keeper.AcceptListStargateQuerier(acceptedQueries, router, cdc)

  eg := errgroup.Group{}
  //eg.SetLimit(1) // uncomment to check that the test passes with a single goroutine
  errorsCount := atomic.Uint64{}
  for range 50 {
    for _, denom := range []string{denom1, denom2} {
      denom := denom // copy
      eg.Go(func() error {
        queryReq := &banktypes.QueryBalanceRequest{
          Address: keeper.RandomBech32AccountAddress(t),
          Denom:   denom,
        }
        grpcData, err := cdc.Marshal(queryReq)
        if err != nil {
          return err
        }

        wasmGrpcReq := &wasmvmtypes.StargateQuery{
          Data: grpcData,
          Path: "/bank.Balance",
        }

        wasmGrpcRes, err := querier(sdk.Context{}, wasmGrpcReq)
        if err != nil {
          return err
        }

        queryRes := &banktypes.QueryBalanceResponse{}
        if err := cdc.UnmarshalJSON(wasmGrpcRes, queryRes); err != nil {
          return err
        }

        expectedCoin := sdk.NewInt64Coin(denom, 1)
        if queryRes.Balance == nil {
          fmt.Printf(
            "Error: expected %s, got nil\n",
            expectedCoin.String(),
          )
          errorsCount.Add(1)
          return nil
        }
        if queryRes.Balance.String() != expectedCoin.String() {
          fmt.Printf(
            "Error: expected %s, got %s\n",
            expectedCoin.String(),
            queryRes.Balance.String(),
          )
          errorsCount.Add(1)
          return nil
        }

        return nil
      })
    }
  }

  require.NoError(t, eg.Wait())
  require.Zero(t, errorsCount.Load())
}

The output is:

=== RUN   TestGRPCQuerier
Error: expected 1denom2, got nil
Error: expected 1denom1, got 1denom2
Error: expected 1denom2, got nil
Error: expected 1denom1, got nil
Error: expected 1denom1, got 1denom2
Error: expected 1denom2, got 1denom1
Error: expected 1denom2, got nil
Error: expected 1denom1, got nil
Error: expected 1denom1, got 1denom2
Error: expected 1denom1, got nil
Error: expected 1denom1, got nil
Error: expected 1denom1, got nil
Error: expected 1denom2, got nil
Error: expected 1denom2, got 1denom1
Error: expected 1denom1, got 1denom2
Error: expected 1denom2, got nil
Error: expected 1denom1, got nil
Error: expected 1denom1, got nil
Error: expected 1denom2, got nil
Error: expected 1denom2, got nil
Error: expected 1denom1, got 1denom2
Error: expected 1denom1, got nil
Error: expected 1denom2, got 1denom1
Error: expected 1denom2, got nil
Error: expected 1denom2, got 1denom1
Error: expected 1denom2, got nil
Error: expected 1denom1, got 1denom2
Error: expected 1denom1, got nil
Error: expected 1denom2, got 1denom1
Error: expected 1denom1, got 1denom2
Error: expected 1denom2, got nil
Error: expected 1denom1, got nil
Error: expected 1denom1, got 1denom2
Error: expected 1denom1, got nil
Error: expected 1denom2, got 1denom1
Error: expected 1denom1, got 1denom2
Error: expected 1denom1, got nil
Error: expected 1denom1, got 1denom2
Error: expected 1denom1, got 1denom2
    query_plugins_test.go:121:
        	Error Trace:	/go/src/github.com/CosmWasm/wasmd/x/wasm/keeper/query_plugins_test.go:121
        	Error:      	Should be zero, but was 39
        	Test:       	TestGRPCQuerier

Solutions

  • The best solution for the issue is to change the signature of the AcceptedQueries, and instead of the proto.Message, use the func()proto.Message to use a unique proto response per-goroutine/request.
AcceptedQueries map[string]func()proto.Message
  • Or, add a synchronization (locks), which is not the best solution, because it can be a bottleneck for the app.

  • Or, add a deep copy of the proto object before using it, which is not the best solution, because it might not work with all possible implementations of the proto.Message interface.

@chipshort
Copy link
Collaborator

Great finding! So, basically the problem is that we are using the same Response object per path, right?
I agree with your three possible solutions.

The best solution for the issue is to change the signature of the AcceptedQueries, and instead of the proto.Message, use the func()proto.Message to use a unique proto response per-goroutine/request.

That requires an (addmitedly small) breaking change. Not the end of the world, but nice if we can avoid it.

Or, add a deep copy of the proto object before using it, which is not the best solution, because it might not work with all possible implementations of the proto.Message interface.

So, basically calling proto.Clone on the message in the map. Do you have an example where that would not work?
Because I think this would be the ideal solution otherwise.

@dzmitryhil
Copy link
Author

It depends on the way you define the AcceptedQueries.

If you define it as in example:

acceptedQueries := keeper.AcceptedQueries{
  "/bank.Balance": &banktypes.QueryBalanceResponse{},
}

The proto.Clone works.

But if you parse the proto-files to find a tag "cosmos.query.v1.module_query_safe" and build the results dynamically using "google.golang.org/protobuf/types/dynamicpb", for example. The usage of the proto.Clone leads to panic. Because of the parallel access on the map used in the dynamicpb,

But it might be too specific use case. For such the developers can use their own implementation of the querier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants