Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt at adding hostport info to logs #6152

Merged
merged 3 commits into from
Jul 17, 2024

Conversation

agautam478
Copy link
Contributor

What changed?
Our internal ring-use / task-processing doesn't report the intended destination, so zone-specific issues can be hard to identify / hard to narrow to more specific causes. This is an attempt to make the logs more meaning ful. an attempt was already made but it's not the most optimal because of:

  • results in a (tiny) performance degradation
  • results in logs being totally spammed - you cannot aggregate this instead.

Trying to put a log at this location as we have a retrier that has access to that peer here is it's used nearly for all of the similar requests.

Why?
As a part of the incident resolution.

How did you test it?

Potential risks

Release notes

Documentation Changes

@@ -158,6 +158,7 @@ func (c *clientImpl) DescribeHistoryHost(
peer, err = c.peerResolver.FromHostAddress(request.GetHostAddress())
}
if err != nil {
c.logger.Error("peer could not be resolved for host.", tag.Error(err), tag.Address(request.GetHostAddress()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also include shardid or workflowid as a tag (whichever is used above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on it.

@@ -158,6 +158,7 @@ func (c *clientImpl) DescribeHistoryHost(
peer, err = c.peerResolver.FromHostAddress(request.GetHostAddress())
}
if err != nil {
c.logger.Error("peer could not be resolved for host.", tag.Error(err), tag.ShardID(int(request.GetShardIDForHost())), tag.WorkflowID(request.ExecutionForHost.GetWorkflowID()), tag.Address(request.GetHostAddress()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request.ExecutionForHost.GetWorkflowID() can panic.

may be worth putting this in each branch tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on it. tbh I am not sure if there is any other alternative here apart from putting these logs.

Copy link
Member

@Groxx Groxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a thing definitely worth fixing, but "put a log in here" seems reasonable. we don't have much of this info elsewhere.
eventually we can wrap these errors and take care of it in a centralized way, but we're not there yet.

if we're trying to log "host X has errored" and not "could not decide a host" (as I think that might be more accurate for the issue driving this PR), personally I was planning to add it to the op-retryer. it has access to the peer in all cases (except the ratelimiter API, but it doesn't retry by design)

@agautam478 agautam478 requested a review from Groxx July 8, 2024 16:34
Copy link
Member

@Groxx Groxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay!

@agautam478 agautam478 enabled auto-merge (squash) July 17, 2024 16:37
Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Project coverage is 72.67%. Comparing base (ca824a0) to head (30106b5).
Report is 13 commits behind head on master.

Additional details and impacted files
Files Coverage Δ
client/history/client.go 71.72% <0.00%> (-0.87%) ⬇️

... and 15 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca824a0...30106b5. Read the comment docs.

@agautam478 agautam478 merged commit 568fcc2 into cadence-workflow:master Jul 17, 2024
19 of 21 checks passed
@coveralls
Copy link

Pull Request Test Coverage Report for Build 01909332-6a65-4ffa-bb92-200b7f6577c6

Details

  • 3 of 12 (25.0%) changed or added relevant lines in 1 file are covered.
  • 55 unchanged lines in 11 files lost coverage.
  • Overall coverage decreased (-0.03%) to 71.491%

Changes Missing Coverage Covered Lines Changed/Added Lines %
client/history/client.go 3 12 25.0%
Files with Coverage Reduction New Missed Lines %
common/task/weighted_round_robin_task_scheduler.go 2 89.05%
service/matching/tasklist/db.go 2 73.23%
common/membership/hashring.go 2 84.69%
service/matching/tasklist/matcher.go 2 90.91%
service/history/task/transfer_standby_task_executor.go 3 86.23%
service/history/task/task.go 3 84.81%
common/log/tag/tags.go 3 49.64%
common/persistence/nosql/nosql_task_store.go 3 85.52%
service/history/execution/cache.go 6 74.61%
service/matching/tasklist/task_reader.go 9 75.86%
Totals Coverage Status
Change from base Build 01907465-3a13-43f8-89cc-21fc60d6d529: -0.03%
Covered Lines: 105277
Relevant Lines: 147260

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants