Do the worker HB timeout check when HB's are updated #706

d2r · 2013-10-10T21:25:02Z

Instead of asking zookeeper for the latest heartbeat data for all topologies and then checking which heartbeats have timed out, combine the timeout check with the heartbeat update.

This makes nimbus more tolerant of a slow zookeeper server, in which case the updating of heartbeats can take so long that by the time the nimbus actually examines the new heartbeats for timeouts, most of the earliest topologies have complete timeouts of executors.

d2r · 2013-10-10T21:25:43Z

This pull request is initially intended to start discussion.

Current unit tests pass with this change.

d2r · 2013-10-10T21:32:25Z

This could mitigate a possible root cause for #689.

brndnmtthws · 2013-10-19T18:47:24Z

👍

Good stuff.

ptgoetz · 2013-10-21T20:52:27Z

+1
(tested in distributed mode with 5-node cluster)

nathanmarz · 2013-10-21T21:01:56Z

+1

xumingming · 2013-10-22T02:36:52Z

+1

Do the worker HB timeout check when HB's are updated

Do the worker HB timeout check when HB's are updated

edbb17c

ptgoetz added a commit that referenced this pull request Oct 25, 2013

Merge pull request #706 from d2r/d2r-nimbus-hb-check-timeout-on-update

a0bc262

Do the worker HB timeout check when HB's are updated

ptgoetz merged commit a0bc262 into nathanmarz:master Oct 25, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do the worker HB timeout check when HB's are updated #706

Do the worker HB timeout check when HB's are updated #706

d2r commented Oct 10, 2013

d2r commented Oct 10, 2013

d2r commented Oct 10, 2013

brndnmtthws commented Oct 19, 2013

ptgoetz commented Oct 21, 2013

nathanmarz commented Oct 21, 2013

xumingming commented Oct 22, 2013

Do the worker HB timeout check when HB's are updated #706

Do the worker HB timeout check when HB's are updated #706

Conversation

d2r commented Oct 10, 2013

d2r commented Oct 10, 2013

d2r commented Oct 10, 2013

brndnmtthws commented Oct 19, 2013

ptgoetz commented Oct 21, 2013

nathanmarz commented Oct 21, 2013

xumingming commented Oct 22, 2013