Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proto: optimize global (un)marshal lock using RWMutex #1004

Closed
wants to merge 2 commits into from

Conversation

TennyZhuang
Copy link

@TennyZhuang TennyZhuang commented Dec 25, 2019

Signed-off-by: TennyZhuang [email protected]

Thie PR Use RWMutex to optimize getMarshalInfo and getUnmarshalInfo, for these functions, only n (number of message type) will hit Write, and m(number of message) - n will hit Read, it's the best case to use RWMutex instead of Mutex.

This optimization introduce huge improvement in our scenario.

We have 1000 worker and 1 controller, and work and controller keep heartbeat with gRPC. They also exchange the job info with each other.

The message is like

message Job {
    uint64 job_id = 1;
    // Some other info
}

message Heartbeat {
    repeated Job jobs = 1;
}

About 10000 jobs in every Heartbeat, and the heartbeat QPS in controller is about 1000.

The controller handle the Heartbeat in about 10ms, and the network latency is about 10ms, but the client will use about 30s in maximum to finish a RPC call.

We use golang pprof block profile, and it seems that almost all block is caused by one global Mutex in protobuf package.

image

After optimization, in our use case, the rpc call from client will only use about 30ms, as our expected.

@TennyZhuang
Copy link
Author

@dsnet PTAL

Copy link
Collaborator

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most computational taxing thing here is acquiring the write lock. Rechecking your critical conditions is insignificant overhead, prevents overhead from reinitialization.

Eliding the recheck of critical conditions not only goes against good locking sanitation, it is at best a micro–optimization, which I guarantee is not saving more than a few nanoseconds every run of the program. The code under the write lock is not hot–loop code that needs to be microoptimized.

proto/table_marshal.go Outdated Show resolved Hide resolved
proto/table_unmarshal.go Outdated Show resolved Hide resolved
Signed-off-by: TennyZhuang <[email protected]>
@TennyZhuang
Copy link
Author

OK, I've added a double check.

@puellanivis

@TennyZhuang
Copy link
Author

A simple reproduce code can be found at https://github.com/TennyZhuang/protobuf-lock-reproduce

The PR also resolve #888

Copy link
Collaborator

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@TennyZhuang
Copy link
Author

Any more reviewers?

@dsnet
Copy link
Member

dsnet commented Jan 8, 2020

getMarshalInfo is expected to be costly, but is only computed once per type in the program. After a program has been running for some time, it is no longer called. This PR is suggesting a solution in a situation where the root problem itself is not well understood.

@TennyZhuang
Copy link
Author

TennyZhuang commented Jan 8, 2020

@dsnet in our scenario, I add counter log at the beginning of getMarshalInfo, and about 1000000 calls in 1 second. Is there some other bug to cause the function is called too many times?

@dsnet
Copy link
Member

dsnet commented Jan 8, 2020

Is there some other bug to cause the function is called too many times?

Possibly? That's the question that is more interesting to figure out. This code here:

u := atomicLoadMarshalInfo(&a.marshal)
if u == nil {
// Get marshal information from type of message.
t := reflect.ValueOf(msg).Type()
if t.Kind() != reflect.Ptr {
panic(fmt.Sprintf("cannot handle non-pointer message type %v", t))
}
u = getMarshalInfo(t.Elem())
// Store it in the cache for later users.
// a.marshal = u, but atomically.
atomicStoreMarshalInfo(&a.marshal, u)
}
return u

atomically caches the computed marshalInfo, so it shouldn't happen again and again.

@dsnet
Copy link
Member

dsnet commented Jan 8, 2020

Have you used the most recent protoc-gen-go to generate your .pb.go files? The pprof graph you show shouldn't happen with newly generated .pb.go files.

@TennyZhuang
Copy link
Author

I reproduce it in https://github.com/TennyZhuang/protobuf-lock-reproduce (very high latency), protoc-gen-go 1.3.2.

I will try to inspect into it later.

@TennyZhuang
Copy link
Author

TennyZhuang commented Jan 8, 2020

Sorry, the reproduce demo is not correct, I will try to create a correct reproduce case.

@TennyZhuang
Copy link
Author

Sorry, this is a bug of gogo/protobuf#656

@dsnet
Copy link
Member

dsnet commented Jan 12, 2020

Got it. I'm going to close this then. In v2, we use sync.Map to cache for the equivalent code, which scales much better as it reaches steady state.

@dsnet dsnet closed this Jan 12, 2020
@golang golang locked and limited conversation to collaborators Jul 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants