Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: fix the issue that panic when collecting hot-cache metrics #1091

Merged
merged 3 commits into from
May 25, 2018

Conversation

nolouch
Copy link
Contributor

@nolouch nolouch commented May 25, 2018

error like:

fatal error: concurrent map iteration and map write

goroutine 2133516 [running]:
runtime.throw(0xfa9913, 0x26)
	/usr/local/go/src/runtime/panic.go:616 +0x81 fp=0xc424d699f8 sp=0xc424d699d8 pc=0x42acf1
runtime.mapiternext(0xc424d69aa8)
	/usr/local/go/src/runtime/hashmap.go:747 +0x55c fp=0xc424d69a88 sp=0xc424d699f8 pc=0x408b3c
github.com/pingcap/pd/server/core.(*StoresInfo).TotalWrittenBytes(0xc4276262b0, 0xc4294793c0)
	/home/jenkins/workspace/build_pd_2.0/go/src/github.com/pingcap/pd/server/core/store.go:430 +0x8f fp=0xc424d69b18 sp=0xc424d69a88 pc=0x885e5f
github.com/pingcap/pd/server/schedule.calculateWriteHotThreshold(0xc4276262b0, 0x0)
	/home/jenkins/workspace/build_pd_2.0/go/src/github.com/pingcap/pd/server/schedule/hot_cache.go:104 +0x2f fp=0xc424d69b38 sp=0xc424d69b18 pc=0x88fe9f
github.com/pingcap/pd/server/schedule.(*HotSpotCache).CollectMetrics(0xc423f17ac0, 0xc4276262b0)
	/home/jenkins/workspace/build_pd_2.0/go/src/github.com/pingcap/pd/server/schedule/hot_cache.go:210 +0x185 fp=0xc424d69c10 sp=0xc424d69b38 pc=0x890795
github.com/pingcap/pd/server.(*coordinator).collectHotSpotMetrics(0xc42437bc20)
	/home/jenkins/workspace/build_pd_2.0/go/src/github.com/pingcap/pd/server/coordinator.go:369 +0xc11 fp=0xc424d69e80 sp=0xc424d69c10 pc=0xc82f11
github.com/pingcap/pd/server.(*RaftCluster).collectMetrics(0xc42021f030)
	/home/jenkins/workspace/build_pd_2.0/go/src/github.com/pingcap/pd/server/cluster.go:484 +0x121 fp=0xc424d69f08 sp=0xc424d69e80 pc=0xc7bae1
github.com/pingcap/pd/server.(*RaftCluster).runBackgroundJobs(0xc42021f030, 0xdf8475800)
	/home/jenkins/workspace/build_pd_2.0/go/src/github.com/pingcap/pd/server/cluster.go:519 +0xf7 fp=0xc424d69fd0 sp=0xc424d69f08 pc=0xc7beb7
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc424d69fd8 sp=0xc424d69fd0 pc=0x459cb1
created by github.com/pingcap/pd/server.(*RaftCluster).start
	/home/jenkins/workspace/build_pd_2.0/go/src/github.com/pingcap/pd/server/cluster.go:113 +0x528

@nolouch nolouch requested review from huachaohuang and disksing May 25, 2018 08:54
@huachaohuang
Copy link
Contributor

LGTM

@@ -236,6 +237,29 @@ func dispatchHeartbeat(c *C, co *coordinator, region *core.RegionInfo, stream *m
co.dispatch(region)
}

func (s *testCoordinatorSuite) TestCollectMetrics(c *C) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the test take too much time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it finished soon.

@nolouch nolouch force-pushed the fix-concurrent-map branch from 2c7feb0 to 1af9ab5 Compare May 25, 2018 09:21
@nolouch nolouch added the needs-cherry-pick-release-2.0 The PR needs to cherry pick to release-2.0 branch. label May 25, 2018
@disksing
Copy link
Contributor

LGTM. @nolouch Please fire an issue and check if we have similar problem. Especially check the access of inner fields of clusterInfo.

@nolouch nolouch merged commit a5c2be9 into tikv:master May 25, 2018
@nolouch nolouch deleted the fix-concurrent-map branch May 25, 2018 10:12
@disksing disksing added the type/bug The issue is confirmed as a bug. label May 25, 2018
nolouch added a commit to nolouch/pd that referenced this pull request May 30, 2018
…ikv#1091)

* server: fix the issue that panic when collecting hot-cache metrics
siddontang pushed a commit that referenced this pull request May 31, 2018
* server: fix the issue that panic when collecting hot-cache metrics (#1091)

* server: fix the issue that panic when collecting hot-cache metrics

* server, schedule: check region epoch before adding operators. (#1095)

* server, schedule: check region epoch before adding operators.

* add test.
ti-chi-bot pushed a commit that referenced this pull request Feb 15, 2023
…1.11.1 in /tools/pd-tso-bench (#5990)

ref #897, ref #962, ref #969, ref #974, ref #975, ref #976, ref #986, ref prometheus/client_golang#987, ref #987, ref #989, ref #998, ref #1013, ref #1014, ref #1025, ref #1028, ref #1031, ref #1043, ref #1055, ref #1075, ref #1091, ref #1094, ref #1102, ref #1103, ref #1118, ref #1146, ref #1148, ref #1150

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
ti-chi-bot pushed a commit that referenced this pull request Feb 15, 2023
…1.11.1 in /tests/client (#5992)

ref #897, ref #962, ref #969, ref #974, ref #975, ref #976, ref #986, ref #987, ref prometheus/client_golang#987, ref #989, ref #998, ref #1013, ref #1014, ref #1025, ref #1028, ref #1031, ref #1043, ref #1055, ref #1075, ref #1091, ref #1094, ref #1102, ref #1103, ref #1118, ref #1146, ref #1148, ref #1150, ref #4399

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
ti-chi-bot added a commit that referenced this pull request Feb 15, 2023
…1.11.1 in /tests/mcs (#5993)

ref #897, ref #962, ref #969, ref #974, ref #975, ref #976, ref #986, ref #987, ref prometheus/client_golang#987, ref #989, ref #998, ref #1013, ref #1014, ref #1025, ref #1028, ref #1031, ref #1043, ref #1055, ref #1075, ref #1091, ref #1094, ref #1102, ref #1103, ref #1118, ref #1146, ref #1148, ref #1150, ref #4399

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ti Chi Robot <[email protected]>
ti-chi-bot added a commit that referenced this pull request Feb 15, 2023
…1.11.1 in /client (#5991)

ref #897, ref #962, ref #969, ref #974, ref #975, ref #976, ref #986, ref #987, ref prometheus/client_golang#987, ref #989, ref #998, ref #1013, ref #1014, ref #1025, ref #1028, ref #1031, ref #1043, ref #1055, ref #1075, ref #1091, ref #1094, ref #1102, ref #1103, ref #1118, ref #1146, ref #1148, ref #1150, ref #4399

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ti Chi Robot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-2.0 The PR needs to cherry pick to release-2.0 branch. type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants