Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proto: Unmarshal takes more than 100 seconds at 22k QPS #888

Closed
Anteoy opened this issue Jul 4, 2019 · 17 comments
Closed

proto: Unmarshal takes more than 100 seconds at 22k QPS #888

Anteoy opened this issue Jul 4, 2019 · 17 comments

Comments

@Anteoy
Copy link

Anteoy commented Jul 4, 2019

Is this normal performance?
mass goroutine are blocked at table_unmarshal.go:117
unmarshalInfoLock.Lock().

207251 @ 0x43105f 0x441619 0x4415ef 0x44138d 0x474939 0xc9025a 0xc901de 0xc6ed8c 0xc6e8f1 0xce167c 0x45e8f1
#	0x44138c	sync.runtime_SemacquireMutex+0x3c							/home/zhoudazhuang/usr/local/go1.12.1/go/src/runtime/sema.go:71
#	0x474938	sync.(*Mutex).Lock+0x108								/home/zhoudazhuang/usr/local/go1.12.1/go/src/sync/mutex.go:134
#	0xc90259	acs/vendor/github.com/golang/protobuf/proto.getUnmarshalInfo+0x39			/home/zhoudazhuang/db11/jm/pro/src/acs/vendor/github.com/golang/protobuf/proto/table_unmarshal.go:117
#	0xc901dd	acs/vendor/github.com/golang/protobuf/proto.(*InternalMessageInfo).Unmarshal+0xdd	/home/zhoudazhuang/db11/jm/pro/src/acs/vendor/github.com/golang/protobuf/proto/table_unmarshal.go:59
#	0xc6ed8b	acs/vendor/github.com/golang/protobuf/proto.(*Buffer).Unmarshal+0x1eb			/home/zhoudazhuang/db11/jm/pro/src/acs/vendor/github.com/golang/protobuf/proto/decode.go:425
#	0xc6e8f0	acs/vendor/github.com/golang/protobuf/proto.Unmarshal+0x170				/home/zhoudazhuang/db11/jm/pro/src/acs/vendor/github.com/golang/protobuf/proto/decode.go:343
#	0xce167b	acs/comet/client.(*Client).HandleRequest+0xecb						/home/zhoudazhuang/db11/jm/pro/src/acs/comet/client/client.go:626

I use pb like that:

s := time.Now()
		err := pb.Unmarshal(req.Data, registerInfo)
		e := time.Now().Sub(s).Seconds()
		if e > 1 {
			fmt.Printf("pb unmarshal cost1: %+v\n",e)
}

I got :

pb unmarshal cost1: 124.445243586
pb unmarshal cost1: 124.463055651
pb unmarshal cost1: 128.71240536
pb unmarshal cost1: 128.712357591
pb unmarshal cost1: 128.712357605
@Anteoy
Copy link
Author

Anteoy commented Jul 4, 2019

$ go version
go version go1.12.1 linux/amd64

$ go env
GOARCH="amd64"
GOBIN="/home/zhoudazhuang/gobin/"
GOCACHE="/home/zhoudazhuang/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/zhoudazhuang/db11/jm/pro"
GOPROXY=""
GORACE=""
GOROOT="/home/zhoudazhuang/usr/local/go1.12.1/go"
GOTMPDIR=""
GOTOOLDIR="/home/zhoudazhuang/usr/local/go1.12.1/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build631445118=/tmp/go-build -gno-record-gcc-switches"

@Anteoy
Copy link
Author

Anteoy commented Jul 4, 2019

My local pb version is obtained with go get, how do I get its version?

@Anteoy
Copy link
Author

Anteoy commented Jul 4, 2019

func getUnmarshalInfo(t reflect.Type) *unmarshalInfo {
	// It would be correct to return a new unmarshalInfo
	// unconditionally. We would end up allocating one
	// per occurrence of that type as a message or submessage.
	// We use a cache here just to reduce memory usage.
	unmarshalInfoLock.Lock()
	defer unmarshalInfoLock.Unlock()
	u := unmarshalInfoMap[t]
	if u == nil {
		u = &unmarshalInfo{typ: t}
		// Note: we just set the type here. The rest of the fields
		// will be initialized on first use.
		unmarshalInfoMap[t] = u
	}
	return u
}

Any kind of pb.Unmarshal will call this method? I also call pb.Unmarshal elsewhere, and it is blocked here too.Equally time consuming

@Anteoy
Copy link
Author

Anteoy commented Jul 4, 2019

top - 17:12:03 up 41 days, 18:45,  7 users,  load average: 11.42, 12.44, 12.89
Tasks: 644 total,   2 running, 642 sleeping,   0 stopped,   0 zombie
Cpu(s): 18.1%us, 15.9%sy,  0.0%ni, 54.1%id,  0.0%wa,  0.0%hi, 12.0%si,  0.0%st
Mem:  99061716k total, 50930052k used, 48131664k free,    16052k buffers
Swap:        0k total,        0k used,        0k free, 32538128k cached

My server load is very low. Not the same unmarshal object

@dsnet
Copy link
Member

dsnet commented Jul 4, 2019

It is difficult to diagnose this without knowing:

  1. what version of the golang/protobuf package you are using.
  2. what the structure of the Go type being passed to Unmarshal looks like.
  3. related to 2, is whether you are passing in messages generated by golang/protobuf/protoc-gen-go?

@dsnet dsnet changed the title pb.Unmarshal takes more than 100 seconds at 22k QPS proto: Unmarshal takes more than 100 seconds at 22k QPS Jul 4, 2019
@Anteoy
Copy link
Author

Anteoy commented Jul 5, 2019

  1. golang protobuf version: v1.3.1 (I update for the lastest, And it recurrent)
  2. the structrue:
// RegisterInfo 客户端向服务发送的注册信息.
type RegisterInfo struct {
	Appver   *string `protobuf:"bytes,1,opt,name=Appver" json:"Appver,omitempty"`
	// app id 用于区别不同的用户
	BundleID *string `protobuf:"bytes,2,opt,name=BundleID" json:"BundleID,omitempty"`
	// 系统类型。0 Iphone, 1 Android, 2 Ipad, 3 AndroidPad
	System    *int32  `protobuf:"varint,3,opt,name=System" json:"System,omitempty"`
	OSVersion *string `protobuf:"bytes,4,opt,name=OSVersion" json:"OSVersion,omitempty"`
	Brand     *string `protobuf:"bytes,5,opt,name=Brand" json:"Brand,omitempty"`
	Model     *string `protobuf:"bytes,6,opt,name=Model" json:"Model,omitempty"`
	UID       *string `protobuf:"bytes,7,opt,name=UID" json:"UID,omitempty"`
	UUID      *string `protobuf:"bytes,8,opt,name=UUID" json:"UUID,omitempty"`
	// 上次更新的patch id
	LPID *string `protobuf:"bytes,9,opt,name=LPID" json:"LPID,omitempty"`
	// 上次更新的schema id
	LSID             *string `protobuf:"bytes,10,opt,name=LSID" json:"LSID,omitempty"`
	XXX_unrecognized []byte  `json:"-"`
}
  1. I confuse about what you say. I accept []byte throught tcp connection, can Unmarshal correctly, no error . what dose it mean of messages generated by golang/protobuf/protoc-gen-go? I generated the pb go file only once

@dsnet
Copy link
Member

dsnet commented Jul 5, 2019

What are the methods on RegisterInfo?

@Anteoy
Copy link
Author

Anteoy commented Jul 5, 2019

image
this?

@dsnet
Copy link
Member

dsnet commented Jul 5, 2019

If that's the full method set, it seems that RegisterInfo has not be regenerated in some time. Since v1.1.0, each message type has an XXX_{Marshal,Unmarshal} method that provides an optimized implementation.

@Anteoy
Copy link
Author

Anteoy commented Jul 5, 2019

Yes, that's the full method set. no XXX_{Marshal,Unmarshal}.
I take over the project which is start at few years ago.The code is old ,And we did't have mass user a few years ago.
Does it mean I should generate the proto buf code by lastest version ? And use the XXX_Marshal instead of the Marshal?

@Anteoy
Copy link
Author

Anteoy commented Jul 5, 2019

Did you have the performance bottleneck about this lib?
how much qps it can be reached on normal use?

@dsnet
Copy link
Member

dsnet commented Jul 5, 2019

Does it mean I should generate the proto buf code by lastest version ?

Yes, please.

And use the XXX_Marshal instead of the Marshal?

Nope. The proto.Unmarshal and proto.Marshal knows to automatically use the XXX_Marshal and XXX_Unmarshal methods if present. They are never intended to be called directly from user code. If fact, doing so is likely to cause issues.

@dsnet
Copy link
Member

dsnet commented Jul 5, 2019

how much qps it can be reached on normal use?

It's pretty much impossible to provide QPS metrics because the cost of a "query" is dependent on the user's workload.

Did you have the performance bottleneck about this lib?

The golang/protobuf library is intended to be as fast as reasonably can be. It is heavily used in production by many services at Google. I believe your performances issues come from messages that have not been generated in a long time. There seems to be heavy lock contention trying to grab the unmarshalInfoLock lock. That lock is only obtained when the optimized XXX_Unmarshal method is not present.

@Anteoy
Copy link
Author

Anteoy commented Jul 5, 2019

This has a certain effect,not particularly large.
image
image
thx very much.
It’s not as big cost as it used to be.
I'll go and see other optimization problems of the system.

@Anteoy
Copy link
Author

Anteoy commented Jul 5, 2019

image
... It Reappeared.
And the system load and cpu cost is low when it reappearing. Any other ideas ?

@dsnet
Copy link
Member

dsnet commented Jul 6, 2019

Any other ideas?

Unfortunately not. Without a minimal reproduction, it is very difficult for us to diagnose this.

@TennyZhuang
Copy link

@golang golang locked as resolved and limited conversation to collaborators Jun 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants