Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent log #6316

Closed
pihme opened this issue Feb 11, 2021 · 22 comments · Fixed by #6345
Closed

Inconsistent log #6316

pihme opened this issue Feb 11, 2021 · 22 comments · Fixed by #6345
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/critical Marks a stop-the-world bug, with a high impact and no existing workaround

Comments

@pihme
Copy link
Contributor

pihme commented Feb 11, 2021

Describe the bug
Observed in logs: https://console.cloud.google.com/errors/CMyAu52UiKPKggE?service=zeebe&time=P7D&project=camunda-cloud-240911

Log/Stacktrace

Full Stacktrace

java.lang.IllegalStateException: Expected that position '204620965' of current event is higher then position '204621051' of last event, but was not. Inconsistent log detected!
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.scanLog(ReProcessingStateMachine.java:180) ~[zeebe-workflow-engine-0.26.0.jar:0.26.0]
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.startRecover(ReProcessingStateMachine.java:150) ~[zeebe-workflow-engine-0.26.0.jar:0.26.0]
	at io.zeebe.engine.processing.streamprocessor.StreamProcessor.onActorStarted(StreamProcessor.java:119) ~[zeebe-workflow-engine-0.26.0.jar:0.26.0]
	at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) [zeebe-util-0.26.0.jar:0.26.0]
	at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.26.0.jar:0.26.0]
	at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-0.26.0.jar:0.26.0]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:94) [zeebe-util-0.26.0.jar:0.26.0]
	at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) [zeebe-util-0.26.0.jar:0.26.0]
	at io.zeebe.util.sched.ActorThread.run(ActorThread.java:191) [zeebe-util-0.26.0.jar:0.26.0]

Logs

Debug 2021-02-11 01:44:25.539 CET "Opened database from '/usr/local/zeebe/data/raft-partition/partitions/4/runtime'."
Debug 2021-02-11 01:44:25.539 CET "Recovered state from snapshot 'FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/4/snapshots/123178899-2970-1612966216938-204608972-204608858, metadata=FileBasedSnapshotMetadata{index=123178899, term=2970, timestamp=2021-02-10 02:10:16,938, processedPosition=204608972, exporterPosition=204608858}}'"
Debug 2021-02-11 01:44:25.657 CET "Configured log appender back pressure at partition 4 as AppenderVegasCfg{initialLimit=1024, maxConcurrency=32768, alphaLimit=0.7, betaLimit=0.95}. Window limiting is disabled"
Debug 2021-02-11 01:44:25.657 CET "Recovering state of partition 4 from snapshot"
Info 2021-02-11 01:44:25.658 CET "Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@17374569)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@4548ef45, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@18598dfd, configuration: Configuration(false)]"
Info 2021-02-11 01:44:25.661 CET "Recovered state of partition 4 from snapshot at position 204608972"
Info 2021-02-11 01:44:25.662 CET "Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@25c9e2e3)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@878f73e, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@ac813b3, configuration: Configuration(false)]"
Debug 2021-02-11 01:44:25.666 CET "Partition 4 transition complete, installed 11 resources!"
Debug 2021-02-11 01:44:25.666 CET "Recovering exporter from snapshot"
Debug 2021-02-11 01:44:25.667 CET "Recovered exporter 'Broker-0-Exporter-4' from snapshot at lastExportedPosition 204608858"
Debug 2021-02-11 01:44:25.667 CET "Configure exporter with id 'elasticsearch'"
Debug 2021-02-11 01:44:25.668 CET "Exporter configured with ElasticsearchExporterConfiguration{url='http://elasticsearch:9200', index=IndexConfiguration{indexPrefix='zeebe-record', createTemplate=true, command=false, event=true, rejection=false, error=true, deployment=true, incident=true, job=true, message=false, messageSubscription=false, variable=true, variableDocument=true, workflowInstance=true, workflowInstanceCreation=false, workflowInstanceSubscription=false, ignoreVariablesAbove=8191}, bulk=BulkConfiguration{delay=5, size=1000, memoryLimit=10485760}}"
Debug 2021-02-11 01:44:25.668 CET "Configure exporter with id 'MetricsExporter'"
Debug 2021-02-11 01:44:25.668 CET "Set event filter for exporters: ExporterEventFilter{acceptRecordTypes={COMMAND_REJECTION=true, NULL_VAL=true, COMMAND=true, EVENT=true, SBE_UNKNOWN=true}, acceptValueTypes={JOB=true, MESSAGE_SUBSCRIPTION=true, WORKFLOW_INSTANCE_CREATION=true, WORKFLOW_INSTANCE_RESULT=true, VARIABLE_DOCUMENT=true, WORKFLOW_INSTANCE_SUBSCRIPTION=true, TIMER=true, DEPLOYMENT=true, JOB_BATCH=true, MESSAGE=true, ERROR=true, SBE_UNKNOWN=true, MESSAGE_START_EVENT_SUBSCRIPTION=true, VARIABLE=true, NULL_VAL=true, WORKFLOW_INSTANCE=true, INCIDENT=true}}"
Debug 2021-02-11 01:44:25.669 CET "Open exporter with id 'elasticsearch'"
Info 2021-02-11 01:44:25.670 CET "Exporter opened"
Debug 2021-02-11 01:44:25.670 CET "Received metadata change from Broker 0, partitions {1=LEADER, 2=FOLLOWER, 3=FOLLOWER, 4=LEADER}, terms {1=2557, 4=3028} and health {1=HEALTHY, 2=HEALTHY, 3=HEALTHY, 4=HEALTHY}."
Debug 2021-02-11 01:44:25.670 CET "Open exporter with id 'MetricsExporter'"
Error 2021-02-11 01:44:25.723 CET "Actor Broker-0-StreamProcessor-4 failed in phase STARTED."
Debug 2021-02-11 01:44:25.729 CET "Remove blocking request BrokerActivateJobsRequest{requestDto={"type":"adyen-authorisation","worker":"default","timeout":300000,"maxJobsToActivate":32,"jobKeys":[],"jobs":[],"variables":[],"truncated":false}} for job type adyen-authorisation after timeout of PT10S"
Debug 2021-02-11 01:44:25.729 CET "Expected to handle gRPC request, but request timed out between gateway and broker"
Debug 2021-02-11 01:44:25.731 CET "Expected to handle gRPC request, but request timed out between gateway and broker"
Debug 2021-02-11 01:44:25.736 CET "Remove blocking request BrokerActivateJobsRequest{requestDto={"type":"adyen-capture","worker":"default","timeout":300000,"maxJobsToActivate":32,"jobKeys":[],"jobs":[],"variables":[],"truncated":false}} for job type adyen-capture after timeout of PT10S"
Debug 2021-02-11 01:44:25.736 CET "Expected to handle gRPC request, but request timed out between gateway and broker"
Debug 2021-02-11 01:44:25.738 CET "Remove blocking request BrokerActivateJobsRequest{requestDto={"type":"zeebe-slurper","worker":"default","timeout":300000,"maxJobsToActivate":32,"jobKeys":[],"jobs":[],"variables":[],"truncated":false}} for job type zeebe-slurper after timeout of PT10S"
Debug 2021-02-11 01:44:25.746 CET "Detected unhealthy components. The current health status of components: {Raft-4=HEALTHY, ZeebePartition-4=HEALTHY, Broker-0-StreamProcessor-4=UNHEALTHY, logstream-raft-partition-partition-4=HEALTHY}"
Error 2021-02-11 01:44:25.750 CET "Partition-4 failed, marking it as unhealthy"

Environment:

  • OS: Camunda Cloud
  • Zeebe Version: 0.26.0
@pihme pihme added the kind/bug Categorizes an issue or PR as a bug label Feb 11, 2021
@npepinpe npepinpe added Impact: Availability scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/critical Marks a stop-the-world bug, with a high impact and no existing workaround labels Feb 11, 2021
@pihme
Copy link
Contributor Author

pihme commented Feb 11, 2021

@Zelldon asked to look for append failure prior to this occurring. Here is what I found (not clear yet whether it is actually related):

Warning 2021-02-10 16:31:55.260 CET "RaftServer{raft-partition-partition-4} - AppendRequest{term=3023, leader=2, prevLogIndex=123186215, prevLogTerm=2970, entries=0, commitIndex=123186321} to 1 failed: RaftError{type=UNAVAILABLE, message=null}"
Debug 2021-02-10 16:31:58.769 CET "Partition role transitioning from CANDIDATE to FOLLOWER"
Debug 2021-02-10 16:32:03.451 CET "Partition role transitioning from FOLLOWER to CANDIDATE"
Warning 2021-02-10 16:32:10.510 CET "RaftServer{raft-partition-partition-4} - AppendRequest{term=3023, leader=2, prevLogIndex=123186215, prevLogTerm=2970, entries=0, commitIndex=123186321} to 1 failed: RaftError{type=UNAVAILABLE, message=null}"
Debug 2021-02-10 16:32:11.054 CET "Created new snapshot FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/2/snapshots/123423799-2952-1612970881691-204956877-204956878, metadata=FileBasedSnapshotMetadata{index=123423799, term=2952, timestamp=2021-02-10 03:28:01,691, processedPosition=204956877, exporterPosition=204956878}}"
Info 2021-02-10 16:32:11.054 CET "Current commit position 204956880 is greater than 204956880, snapshot 123423799-2952-1612970881691-204956877-204956878 is valid and has been persisted."
Warning 2021-02-10 16:32:25.010 CET "RaftServer{raft-partition-partition-4} - AppendRequest{term=3023, leader=2, prevLogIndex=123186215, prevLogTerm=2970, entries=0, commitIndex=123186321} to 1 failed: RaftError{type=UNAVAILABLE, message=null}"
Warning 2021-02-10 16:32:37.760 CET "RaftServer{raft-partition-partition-4} - AppendRequest{term=3023, leader=2, prevLogIndex=123186215, prevLogTerm=2970, entries=0, commitIndex=123186321} to 1 failed: RaftError{type=UNAVAILABLE, message=null}"
Warning 2021-02-10 16:32:51.010 CET "RaftServer{raft-partition-partition-4} - AppendRequest{term=3023, leader=2, prevLogIndex=123186215, prevLogTerm=2970, entries=0, commitIndex=123186321} to 1 failed: RaftError{type=UNAVAILABLE, message=null}"
Debug 2021-02-10 16:32:58.865 CET "Opened database from '/usr/local/zeebe/data/raft-partition/partitions/4/runtime'."
Debug 2021-02-10 16:32:58.866 CET "Recovered state from snapshot 'FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/4/snapshots/123178899-2970-1612966216938-204608972-204608858, metadata=FileBasedSnapshotMetadata{index=123178899, term=2970, timestamp=2021-02-10 02:10:16,938, processedPosition=204608972, exporterPosition=204608858}}'"
Debug 2021-02-10 16:33:01.688 CET "Based on lowest exporter position '204957812' and last processed position '204957809', determined '204957809' as snapshot position."
Debug 2021-02-10 16:33:01.699 CET "Taking temporary snapshot into /usr/local/zeebe/data/raft-partition/partitions/2/pending/123424699-2952-1612971181699-204957809-204957812."
Debug 2021-02-10 16:33:03.268 CET "Configured log appender back pressure at partition 4 as AppenderVegasCfg{initialLimit=1024, maxConcurrency=32768, alphaLimit=0.7, betaLimit=0.95}. Window limiting is disabled"
Debug 2021-02-10 16:33:03.269 CET "Recovering state of partition 4 from snapshot"
Info 2021-02-10 16:33:03.269 CET "Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@b2c6409)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@68b27fe5, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@30914e88, configuration: Configuration(false)]"
Debug 2021-02-10 16:33:04.111 CET "Created snapshot for Broker-2-StreamProcessor-2"
Info 2021-02-10 16:33:04.112 CET "Finished taking snapshot, need to wait until last written event position 204957816 is committed, current commit position is 204957812. After that snapshot can be marked as valid."
Warning 2021-02-10 16:33:04.261 CET "RaftServer{raft-partition-partition-4} - AppendRequest{term=3023, leader=2, prevLogIndex=123186215, prevLogTerm=2970, entries=0, commitIndex=123186321} to 1 failed: RaftError{type=UNAVAILABLE, message=null}"
Debug 2021-02-10 16:33:05.028 CET "Purging snapshots older than FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/2/snapshots/123424699-2952-1612971181699-204957809-204957812, metadata=FileBasedSnapshotMetadata{index=123424699, term=2952, timestamp=2021-02-10 03:33:01,699, processedPosition=204957809, exporterPosition=204957812}}"
Debug 2021-02-10 16:33:05.028 CET "Deleting snapshot FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/2/snapshots/123423799-2952-1612970881691-204956877-204956878, metadata=FileBasedSnapshotMetadata{index=123423799, term=2952, timestamp=2021-02-10 03:28:01,691, processedPosition=204956877, exporterPosition=204956878}}"
Debug 2021-02-10 16:33:07.058 CET "Search for orphaned snapshots below oldest valid snapshot with index FileBasedSnapshotMetadata{index=123424699, term=2952, timestamp=2021-02-10 03:33:01,699, processedPosition=204957809, exporterPosition=204957812} in /usr/local/zeebe/data/raft-partition/partitions/2/pending"
Debug 2021-02-10 16:33:07.058 CET "New snapshot 123424699-2952-1612971181699-204957809-204957812 was persisted. Start replicating."
Debug 2021-02-10 16:33:07.058 CET "Compacting Atomix log up to index 123424699"
Info 2021-02-10 16:33:07.294 CET "Recovered state of partition 4 from snapshot at position 204608972"
Info 2021-02-10 16:33:07.294 CET "Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@17e7b03a)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@7f0382e3, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@4b597b1, configuration: Configuration(false)]"
Debug 2021-02-10 16:33:07.296 CET "Partition 4 transition complete, installed 11 resources!"
Debug 2021-02-10 16:33:07.296 CET "Recovering exporter from snapshot"
Error 2021-02-10 16:33:07.343 CET "Actor Broker-2-StreamProcessor-4 failed in phase STARTED."

The first occurrence was also close to an incident: https://docs.google.com/document/d/1_8mUHzksCWEn5vsJ3datYe2bt3e90RYZvvTRtWCgFnI/edit#heading=h.e7don26yefrg

@ChrisKujawa
Copy link
Member

I checked the metrics and it looks like there were a LOT of leader changes before the first occurrence. At some point we seem to miss metrics. Might be related to the outage of yesterday

leader-changes

We can see that the backpressure is for partition 4 100%. I think the partition is completely dead now.
partition

BTW: It is one of our problematic cloud clusters we currently have and is related to multiple incidents. This cluster already reached 6 gig of snapshot size.

snapshot

@hisImminence
Copy link

Created snapshots for the zeebe-brokers:

@ChrisKujawa
Copy link
Member

ChrisKujawa commented Feb 12, 2021

What I can see in the log before the inconsistency is detected that we have a lot of started elections, this correlates to the metrics I commented here #6316 (comment)

Interesting is that we can see that we took a snapshot with positions: processedPosition=204621068, exporterPosition=204621065

And that the dispatcher seems to clear an append job from its actor queue
Discard job io.zeebe.logstreams.impl.log.LogStorageAppender$$Lambda$1284/0x0000000800908c40 QUEUED from fastLane of Actor Broker-1-LogAppender-4. not 100% sure how problematic it currently is. I think the dispatcher position is recalculated from the log on restart.

See related log:

D 2021-02-10T14:38:05.512915Z Search for orphaned snapshots below oldest valid snapshot with index FileBasedSnapshotMetadata{index=123186199, term=2984, timestamp=2021-02-10 02:37:55,754, processedPosition=204621068, exporterPosition=204621065} in /usr/local/zeebe/data/raft-partition/partitions/4/pending 
D 2021-02-10T14:38:05.513490Z Compacting Atomix log up to index 123186199 
D 2021-02-10T14:40:31.399415Z Partition role transitioning from LEADER to FOLLOWER 
D 2021-02-10T14:40:31.399763Z Closing Zeebe-Partition-4: ExporterDirector 
I 2021-02-10T14:40:31.407178Z Exporter closed 
D 2021-02-10T14:40:31.407901Z Closed exporter director 'Broker-1-Exporter-4'. 
D 2021-02-10T14:40:31.408077Z Closing Zeebe-Partition-4: ExporterDirector closed successfully 
D 2021-02-10T14:40:31.408221Z Closing Zeebe-Partition-4: RocksDB metric timer 
D 2021-02-10T14:40:31.408302Z Closing Zeebe-Partition-4: RocksDB metric timer closed successfully 
D 2021-02-10T14:40:31.408397Z Closing Zeebe-Partition-4: AsyncSnapshotDirector 
D 2021-02-10T14:40:34.608366Z Partition role transitioning from FOLLOWER to CANDIDATE 
D 2021-02-10T14:40:34.957089Z Partition role transitioning from CANDIDATE to FOLLOWER 
D 2021-02-10T14:40:38.499348Z Partition role transitioning from FOLLOWER to CANDIDATE 
D 2021-02-10T14:40:41.926864Z Partition role transitioning from CANDIDATE to FOLLOWER 
D 2021-02-10T14:40:43.605929Z Received metadata change for 1, partitions {1=FOLLOWER, 2=FOLLOWER, 3=LEADER, 4=LEADER} terms {3=5112, 4=2984} 
D 2021-02-10T14:40:50.909920Z Partition role transitioning from FOLLOWER to CANDIDATE 
E 2021-02-10T14:41:02.065983Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
E 2021-02-10T14:41:02.066468Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
D 2021-02-10T14:42:05.100792Z Partition role transitioning from CANDIDATE to FOLLOWER 
D 2021-02-10T14:42:07.952168Z Partition role transitioning from FOLLOWER to CANDIDATE 
D 2021-02-10T14:42:55.076068Z Partition role transitioning from CANDIDATE to FOLLOWER 
D 2021-02-10T14:42:57.631169Z Partition role transitioning from FOLLOWER to CANDIDATE 
E 2021-02-10T14:43:03.326871Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
E 2021-02-10T14:43:03.327386Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
E 2021-02-10T14:43:03.329571Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
E 2021-02-10T14:43:03.330050Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
D 2021-02-10T14:43:05.001287Z Created new snapshot FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/4/snapshots/123186199-2984-1612967875754-204621068-204621065, metadata=FileBasedSnapshotMetadata{index=123186199, term=2984, timestamp=2021-02-10 02:37:55,754, processedPosition=204621068, exporterPosition=204621065}} 
D 2021-02-10T14:43:05.002157Z Closing Zeebe-Partition-4: AsyncSnapshotDirector closed successfully 
D 2021-02-10T14:43:05.002367Z Closing Zeebe-Partition-4: StreamProcessor 
D 2021-02-10T14:43:05.003050Z Closed stream processor controller Broker-1-StreamProcessor-4. 
D 2021-02-10T14:43:05.003129Z Closing Zeebe-Partition-4: StreamProcessor closed successfully 
D 2021-02-10T14:43:05.003425Z Closing Zeebe-Partition-4: ZeebeDb 
D 2021-02-10T14:43:05.003560Z Closing Zeebe-Partition-4: ZeebeDb closed successfully 
D 2021-02-10T14:43:05.003623Z Closing Zeebe-Partition-4: RegisterSnapshotListener 
D 2021-02-10T14:43:05.003721Z Closing Zeebe-Partition-4: RegisterSnapshotListener closed successfully 
D 2021-02-10T14:43:05.003777Z Closing Zeebe-Partition-4: LogDeletionService 
D 2021-02-10T14:43:05.003914Z Closing Zeebe-Partition-4: LogDeletionService closed successfully 
D 2021-02-10T14:43:05.004052Z Closing Zeebe-Partition-4: StateController 
D 2021-02-10T14:43:09.822220Z Closing Zeebe-Partition-4: StateController closed successfully 
D 2021-02-10T14:43:09.822307Z Closing Zeebe-Partition-4: SnapshotReplication 
D 2021-02-10T14:43:09.822412Z Closing Zeebe-Partition-4: SnapshotReplication closed successfully 
D 2021-02-10T14:43:09.822470Z Closing Zeebe-Partition-4: RaftLogReader 
D 2021-02-10T14:43:09.822930Z Closing Zeebe-Partition-4: RaftLogReader closed successfully 
D 2021-02-10T14:43:09.823033Z Closing Zeebe-Partition-4: logstream 
I 2021-02-10T14:43:09.823131Z Close appender for log stream logstream-raft-partition-partition-4 
D 2021-02-10T14:43:09.823242Z Discard job io.zeebe.logstreams.impl.log.LogStorageAppender$$Lambda$1284/0x0000000800908c40 QUEUED from fastLane of Actor Broker-1-LogAppender-4. 
  undefined

If we look at the first occurence of the inconsistency detection then it complains about:

Expected that position '204620965' of current event is higher then position '204621051' of last event, but was not. Inconsistent log detected! We saw in the last snapshot that it took a snapshot for processedPosition=204621068, exporterPosition=204621065.

@ChrisKujawa
Copy link
Member

ChrisKujawa commented Feb 12, 2021

If we take a closer look near by the inconsistency detection we see that the broker for that partition became INACTIVE, this normally indicates an error (e.g. uncatched exception) in the raft implementation.

D 2021-02-10T15:17:24.551426Z Partition role transitioning from FOLLOWER to INACTIVE 
E 2021-02-10T15:17:24.551567Z ZeebePartition-4 failed, marking it as unhealthy 
D 2021-02-10T15:17:24.551673Z Detected unhealthy components. The current health status of components: {Raft-4=HEALTHY, ZeebePartition-4=UNHEALTHY} 
E 2021-02-10T15:17:24.551820Z Raft-4 failed, marking it as unhealthy 
D 2021-02-10T15:17:24.551904Z Closing Zeebe-Partition-4: ConsumeReplicatedSnapshots 
D 2021-02-10T15:17:24.551987Z Closing Zeebe-Partition-4: ConsumeReplicatedSnapshots closed successfully 
D 2021-02-10T15:17:24.552085Z Closing Zeebe-Partition-4: LogDeletionService 
E 2021-02-10T15:17:24.552087Z Partition-4 failed, marking it as unhealthy 
D 2021-02-10T15:17:24.552290Z Detected unhealthy components. The current health status of components: {Partition-2=HEALTHY, Partition-1=HEALTHY, Partition-4=UNHEALTHY, Partition-3=HEALTHY} 
D 2021-02-10T15:17:24.552510Z Closing Zeebe-Partition-4: LogDeletionService closed successfully 
D 2021-02-10T15:17:24.552609Z Closing Zeebe-Partition-4: StateController 
D 2021-02-10T15:17:24.552660Z Closing Zeebe-Partition-4: StateController closed successfully 
D 2021-02-10T15:17:24.552703Z Closing Zeebe-Partition-4: SnapshotReplication 
D 2021-02-10T15:17:24.552784Z Closing Zeebe-Partition-4: SnapshotReplication closed successfully 
D 2021-02-10T15:17:24.552852Z Closing Zeebe-Partition-4: RaftLogReader 
D 2021-02-10T15:17:24.553007Z Closing Zeebe-Partition-4: RaftLogReader closed successfully 
D 2021-02-10T15:17:24.553082Z Partition 4 closed all previous open resources, before transitioning. 
D 2021-02-10T15:17:24.553147Z Partition 4 transition complete, installed 0 resources! 
D 2021-02-10T15:17:24.553213Z Partition 4 closed all previous open resources, before transitioning. 
D 2021-02-10T15:17:24.553261Z Partition 4 transition complete, installed 0 resources! 
D 2021-02-10T15:17:24.704381Z Received metadata change for 1, partitions {1=FOLLOWER, 2=FOLLOWER, 3=FOLLOWER, 4=FOLLOWER} terms {} 
E 2021-02-10T15:18:44.198672Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
E 2021-02-10T15:18:44.199488Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
E 2021-02-10T15:18:53.071301Z Partition-1 failed, marking it as unhealthy 
D 2021-02-10T15:18:53.074448Z Detected unhealthy components. The current health status of components: {Partition-2=HEALTHY, Partition-1=UNHEALTHY, Partition-4=HEALTHY, Partition-3=HEALTHY} 
D 2021-02-10T15:18:53.236071Z Received metadata change for 2, partitions {1=LEADER, 2=LEADER, 3=FOLLOWER, 4=FOLLOWER} terms {1=2530, 2=2952} 
E 2021-02-10T15:19:00.069354Z Partition-2 failed, marking it as unhealthy 
D 2021-02-10T15:19:00.264241Z Received metadata change for 2, partitions {1=LEADER, 2=LEADER, 3=FOLLOWER, 4=FOLLOWER} terms {1=2530, 2=2952} 
D 2021-02-10T15:19:15.886275Z Received metadata change for 1, partitions {1=FOLLOWER, 2=FOLLOWER, 3=LEADER, 4=FOLLOWER} terms {3=5116} 
I 2021-02-10T15:19:53.081169Z Partition-1 recovered, marking it as healthy 
D 2021-02-10T15:19:53.326933Z Received metadata change for 2, partitions {1=LEADER, 2=LEADER, 3=FOLLOWER, 4=FOLLOWER} terms {1=2530, 2=2952} 
I 2021-02-10T15:20:00.091772Z Partition-2 recovered, marking it as healthy 
D 2021-02-10T15:20:00.091905Z The components are healthy. The current health status of components: {Partition-2=HEALTHY, Partition-1=HEALTHY, Partition-4=HEALTHY, Partition-3=HEALTHY} 
D 2021-02-10T15:20:00.329888Z Received metadata change for 2, partitions {1=LEADER, 2=LEADER, 3=FOLLOWER, 4=FOLLOWER} terms {1=2530, 2=2952} 
E 2021-02-10T15:20:14.457584Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
E 2021-02-10T15:20:14.459344Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
E 2021-02-10T15:20:14.462943Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
E 2021-02-10T15:20:14.463349Z Uncaught exception in 'GatewayLongPollingJobHandler' in phase 'STARTED'. Continuing with next job. 
D 2021-02-10T15:21:52.497564Z Dispatcher closed 
D 2021-02-10T15:21:52.645248Z Received metadata change for 2, partitions {1=FOLLOWER, 2=LEADER, 3=FOLLOWER, 4=FOLLOWER} terms {2=2952} 
D 2021-02-10T15:30:34.014433Z Partition role transitioning from CANDIDATE to LEADER 
D 2021-02-10T15:30:34.014816Z Closing Zeebe-Partition-4: ConsumeReplicatedSnapshots 
D 2021-02-10T15:30:34.014930Z Closing Zeebe-Partition-4: ConsumeReplicatedSnapshots closed successfully 
D 2021-02-10T15:30:34.015020Z Closing Zeebe-Partition-4: LogDeletionService 
D 2021-02-10T15:30:34.015165Z Closing Zeebe-Partition-4: LogDeletionService closed successfully 
D 2021-02-10T15:30:34.015218Z Closing Zeebe-Partition-4: StateController 
D 2021-02-10T15:30:34.015352Z Closing Zeebe-Partition-4: StateController closed successfully 
D 2021-02-10T15:30:34.015415Z Closing Zeebe-Partition-4: SnapshotReplication 
D 2021-02-10T15:30:34.015784Z Closing Zeebe-Partition-4: SnapshotReplication closed successfully 
D 2021-02-10T15:30:34.015872Z Closing Zeebe-Partition-4: RaftLogReader 
D 2021-02-10T15:30:34.016438Z Closing Zeebe-Partition-4: RaftLogReader closed successfully 
D 2021-02-10T15:30:34.016547Z Partition 4 closed all previous open resources, before transitioning. 
D 2021-02-10T15:33:03.268464Z Configured log appender back pressure at partition 4 as AppenderVegasCfg{initialLimit=1024, maxConcurrency=32768, alphaLimit=0.7, betaLimit=0.95}. Window limiting is disabled 
D 2021-02-10T15:33:03.269092Z Recovering state of partition 4 from snapshot 
I 2021-02-10T15:33:03.269683Z Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@b2c6409)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@68b27fe5, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@30914e88, configuration: Configuration(false)] 
I 2021-02-10T15:33:07.294142Z Recovered state of partition 4 from snapshot at position 204608972 
I 2021-02-10T15:33:07.294831Z Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@17e7b03a)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@7f0382e3, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@4b597b1, configuration: Configuration(false)] 
D 2021-02-10T15:33:07.296949Z Partition 4 transition complete, installed 11 resources! 
D 2021-02-10T15:33:07.296953Z Recovering exporter from snapshot 
E 2021-02-10T15:33:07.343722Z Actor Broker-2-StreamProcessor-4 failed in phase STARTED. 
  undefined

After searching for the cause I found the following:

java.lang.IndexOutOfBoundsException: Cannot truncate committed index: 123186194
	at io.atomix.storage.journal.SegmentedJournalWriter.truncate(SegmentedJournalWriter.java:101) ~[atomix-storage-0.26.0.jar:0.26.0]
	at io.atomix.storage.journal.DelegatingJournalWriter.truncate(DelegatingJournalWriter.java:66) ~[atomix-storage-0.26.0.jar:0.26.0]
	at io.atomix.raft.roles.PassiveRole.replaceExistingEntry(PassiveRole.java:694) ~[atomix-cluster-0.26.0.jar:0.26.0]
	at io.atomix.raft.roles.PassiveRole.tryToAppend(PassiveRole.java:633) ~[atomix-cluster-0.26.0.jar:0.26.0]
	at io.atomix.raft.roles.PassiveRole.appendEntries(PassiveRole.java:589) ~[atomix-cluster-0.26.0.jar:0.26.0]
	at io.atomix.raft.roles.PassiveRole.handleAppend(PassiveRole.java:441) ~[atomix-cluster-0.26.0.jar:0.26.0]
	at io.atomix.raft.roles.ActiveRole.onAppend(ActiveRole.java:51) ~[atomix-cluster-0.26.0.jar:0.26.0]
	at io.atomix.raft.roles.FollowerRole.onAppend(FollowerRole.java:195) ~[atomix-cluster-0.26.0.jar:0.26.0]
	at io.atomix.raft.impl.RaftContext.lambda$registerHandlers$12(RaftContext.java:208) ~[atomix-cluster-0.26.0.jar:0.26.0]
	at io.atomix.raft.impl.RaftContext.lambda$runOnContext$19(RaftContext.java:219) ~[atomix-cluster-0.26.0.jar:0.26.0]
	at io.atomix.utils.concurrent.SingleThreadContext$WrappedRunnable.run(SingleThreadContext.java:188) [atomix-utils-0.26.0.jar:0.26.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
	at java.lang.Thread.run(Unknown Source) [?:?]

If we take a look at the code we see that this only happens if the follower wants to truncate entries, which are already committed.

  @Override
  public void truncate(final long index) {
    if (index < journal.getCommitIndex()) {
      throw new IndexOutOfBoundsException("Cannot truncate committed index: " + index);
    }

@ChrisKujawa
Copy link
Member

Interesting is that actually only Broker one and two is really participating the raft. Broker zero seems to be partitioned away. Plus it was down for around 30 min, might be related to the google resource outage we experienced on this day.

broker-0

Still the Broker zero can RECEIVE snapshots from the other nodes, but is not able to send POLL requests to the others.

D 2021-02-10T15:24:20.801814Z Consume snapshot snapshotChunk MANIFEST-682616 of snapshot 123178899-2970-1612966216938-204608972-204608858 
D 2021-02-10T15:24:20.818487Z Consume snapshot snapshotChunk OPTIONS-682619 of snapshot 123178899-2970-1612966216938-204608972-204608858 
W 2021-02-10T15:24:29.215382Z RaftServer{raft-partition-partition-1}{role=FOLLOWER} - Poll request to 1 failed: java.util.concurrent.TimeoutException: Request type raft-partition-partition-1-poll timed out in 5000 milliseconds 
D 2021-02-10T15:24:30.220861Z Purging snapshots older than FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/4/snapshots/123178899-2970-1612966216938-204608972-204608858, metadata=FileBasedSnapshotMetadata{index=123178899, term=2970, timestamp=2021-02-10 02:10:16,938, processedPosition=204608972, exporterPosition=204608858}} 
D 2021-02-10T15:24:30.239225Z Deleting snapshot FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/4/snapshots/123115799-2970-1612963371553-204504887-204504580, metadata=FileBasedSnapshotMetadata{index=123115799, term=2970, timestamp=2021-02-10 01:22:51,553, processedPosition=204504887, exporterPosition=204504580}} 
D 2021-02-10T15:24:34.740621Z Search for orphaned snapshots below oldest valid snapshot with index FileBasedSnapshotMetadata{index=123178899, term=2970, timestamp=2021-02-10 02:10:16,938, processedPosition=204608972, exporterPosition=204608858} in /usr/local/zeebe/data/raft-partition/partitions/4/pending 
D 2021-02-10T15:24:34.823578Z Created new snapshot FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/4/snapshots/123178899-2970-1612966216938-204608972-204608858, metadata=FileBasedSnapshotMetadata{index=123178899, term=2970, timestamp=2021-02-10 02:10:16,938, processedPosition=204608972, exporterPosition=204608858}} 
W 2021-02-10T15:24:38.423514Z RaftServer{raft-partition-partition-1}{role=FOLLOWER} - Poll request to 1 failed: java.util.concurrent.TimeoutException: Request type raft-partition-partition-1-poll timed out in 5000 milliseconds 
W 2021-02-10T15:24:38.423851Z RaftServer{raft-partition-partition-1}{role=FOLLOWER} - Poll request to 2 failed: java.util.concurrent.TimeoutException: Request type raft-partition-partition-1-poll timed out in 5000 milliseconds 
W 2021-02-10T15:24:43.401707Z RaftServer{raft-partition-partition-4}{role=FOLLOWER} - Poll request to 2 failed: java.util.concurrent.TimeoutException: Request type raft-partition-partition-4-poll timed out in 5000 milliseconds 
W 2021-02-10T15:24:46.049585Z RaftServer{raft-partition-partition-1}{role=FOLLOWER} - Poll request to 1 failed: java.util.concurrent.TimeoutException: Request type raft-partition-partition-1-poll timed out in 5000 milliseconds 
W 2021-02-10T15:24:46.049951Z RaftServer{raft-partition-partition-1}{role=FOLLOWER} - Poll request to 2 failed: java.util.concurrent.TimeoutException: Request type raft-partition-partition-1-poll timed out in 5000 milliseconds 
W 2021-02-10T15:24:52.702658Z RaftServer{raft-partition-partition-4}{role=FOLLOWER} - Poll request to 2 failed: java.util.concurrent.TimeoutException: Request type raft-partition-partition-4-poll timed out in 5000 milliseconds 

@ChrisKujawa
Copy link
Member

Discussed and debugged further together with @deepthidevaki our current assumptions are the following:

  • Due to an outage of gcloud (not enough resources in our zone) we experience pod scheduling and network issues
  • One of the brokers (Broker 0) was restarted multiple times and was not able to communicate to other brokers (outgoing doesn't work), but it looks like incoming works. For example it was able to receive snapshots and we see no timeouts from the other nodes to Broker 0.
  • At the same time the zeebe cluster becomes unstable, which caused several leader changes and elections in a short period of time
  • Several Leader changes between Broker 1 and Broker 2 happen, it looks like that Broker 0 is able to accept votes, which causes that the other Brokers can become leader.
  • Note: On leader changes it might happen that followers need to truncate there log, because they have has uncommitted data. But it seems to happen that the log from Broker 2 and Broker 1 are not matching after these period of leader changes such that the Broker 1, which is at this time Follower tries to truncate committed data, which fails with the above exception.
  • The above error should normally not happen and means we have inconsistency between the different Broker logs
  • These inconsistency are also related then to the detected position inconsistencies
  • Assumption is that the journal would give some more insights here, but we assume that it is already truncated.

The cluster seems to have recovered and keeps going. I can try to reproduce this via an chaos experiment where I introduce some network partitions.

@ChrisKujawa
Copy link
Member

ChrisKujawa commented Feb 12, 2021

I tried to reproduce this with a cluster (one partition three nodes, replication factor three) on stable nodes, but without success.

I used the following script to cause repeated network partitions:

#!/bin/bash
set -exuo pipefail

# this scripts expects a setup of 5 nodes with replication factor 5 or higher

source utils.sh

partition=1
namespace=$(getNamespace)
gateway=$(getGateway)

broker0=$(getBroker "0")
broker0Ip=$(kubectl get pod "$broker0" -n "$namespace" --template="{{.status.podIP}}")
broker1=$(getBroker "1")
broker1Ip=$(kubectl get pod "$broker1" -n "$namespace" --template="{{.status.podIP}}")
broker2=$(getBroker "2")
broker2Ip=$(kubectl get pod "$broker2" -n "$namespace" --template="{{.status.podIP}}")

# To print the topology in the journal
 kubectl exec "$gateway" -n "$namespace" -- zbctl status --insecure

# we put all into one function because we need to make sure that even after preemption the 
# dependency is installed
function disconnect() {
 toChangedPod="$1"
 targetIp="$2"

 # update to have access to ip
 kubectl exec -n "$namespace" "$toChangedPod" -- apt update
 kubectl exec -n "$namespace" "$toChangedPod" -- apt install -y iproute2
 kubectl exec "$toChangedPod" -n "$namespace" -- ip route add unreachable "$targetIp"
}

function connect() {
 toChangedPod="$1"
 targetIp="$2"

 # update to have access to ip
 kubectl exec -n "$namespace" "$toChangedPod" -- apt update
 kubectl exec -n "$namespace" "$toChangedPod" -- apt install -y iproute2
 kubectl exec "$toChangedPod" -n "$namespace" -- ip route del unreachable "$targetIp"
}


function netloss() {
 toChangedPod="$1"

 # update to have access to ip
 kubectl exec -n "$namespace" "$toChangedPod" -- apt update
 kubectl exec -n "$namespace" "$toChangedPod" -- apt install -y iproute2
 kubectl exec "$toChangedPod" -n "$namespace" -- tc qdisc add dev eth0 root netem loss 5%
}


#retryUntilSuccess netloss "$broker1"
#retryUntilSuccess netloss "$broker2"
retryUntilSuccess netloss "$broker0"

# We disconnect Broker 0 from others, still they can send him request

#retryUntilSuccess disconnect "$broker0" "$broker1Ip"
#retryUntilSuccess disconnect "$broker0" "$broker2Ip"

# We disconnect Broker 1 from Broker 2, to make the cluster a bit disruptive

retryUntilSuccess disconnect "$broker1" "$broker2Ip"

previousCoin=1
while true;
do

  echo "Disconnected..."
  sleep 5

  if [ $previousCoin -eq 1 ]
  then
    retryUntilSuccess connect "$broker1" "$broker2Ip"
  else
    retryUntilSuccess connect "$broker2" "$broker1Ip"
  fi

  sleep 145

  coin=$(($RANDOM%2))

  if [ $coin -eq 1 ]
  then
    retryUntilSuccess disconnect "$broker1" "$broker2Ip"
  else
    retryUntilSuccess disconnect "$broker2" "$broker1Ip"
  fi
  previousCoin=$coin
  echo "set prev coin: $previousCoin"

  coin=$(($RANDOM%2))
  if [ coin -eq 1 ]
  then
    retryUntilSuccess connect "$broker0" "$broker1Ip"
    sleep 45
    retryUntilSuccess disconnect "$broker0" "$broker1Ip"
  else
    retryUntilSuccess connect "$broker0" "$broker2Ip"
    sleep 45
    retryUntilSuccess disconnect "$broker0" "$broker2Ip"
  fi

done

leaderchanges

I think what comes to play in the failing cluster is that we have multiple partitions and BIG state (~ 6 gig of snapshots) which can consume a lot of resources when we try to replicate that. I can imagine that this plays also an role here.

If we take a look at the metrics of the broken cluster we see for example quite high java heap usage.

resources

@ChrisKujawa
Copy link
Member

ChrisKujawa commented Feb 12, 2021

I checked the data of Zeebe 1 with zdb:

[zell data-1/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ zdb log consistency -p raft-partition/partitions/4/
Opening log raft-partition/partitions/4
Inconsistent log lastRecordPosition204621051 is higher than next lowestRecordPosition 204620965 at index123186195
LOG IS CONSISTENT. <=== WRONG OUTPUT
Scanned entries: 7489
Log checked in 44 ms

We can see it reports for Broker 1 the same positions as above and the index corresponds to the index where it tried to compact.

When I check the entries we can see the following:

[zell data-1/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ zdb log search -idx 123186195 -p raft-partition/partitions/4/
Searching log raft-partition/partitions/4
Found entry with index '123186195'
Indexed Entry: Indexed{index=123186195, entry=ZeebeEntry{term=2970, timestamp=1612966726056, lowestPosition=204620965, highestPosition=204620965}}
Searched log in 57 ms

[zell data-1/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ zdb log search -idx 123186194 -p raft-partition/partitions/4/
Searching log raft-partition/partitions/4
Found entry with index '123186194'
Indexed Entry: Indexed{index=123186194, entry=ZeebeEntry{term=2984, timestamp=1612967753896, lowestPosition=204621050, highestPosition=204621051}}
Searched log in 58 ms

The Zeebe entry on index 123186194 has an higher term (2984) then the entry on index 123186195 (term=2970). Furthermore the position is also higher then on the next index.

Interesting is also when I search for 204620965 as position I find the an entry with index 123186173.

[zell data-1/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ zdb log search -pos 204620965 -p raft-partition/partitions/4/
Searching log raft-partition/partitions/4
Searched for record position '204620965' and found entry on index 123186173 with lowestPosition 204620961 and highestPosition 204620978
Found: {"partitionId":1,"value":{"name":"saleId","value":"\"a3ee3e5ecfc9a7d9cfa5721f2d05c516\"","scopeKey":9007199374004152,"workflowInstanceKey":9007199374004140,"workflowKey":2251799825545053},"timestamp":1612967753850,"key":9007199374004164,"position":204620965,"valueType":"VARIABLE","intent":"CREATED","recordType":"EVENT","rejectionType":"NULL_VAL","rejectionReason":"","brokerVersion":"0.26.0","sourceRecordPosition":204620957}
Searched log in 310 ms

If we check the status of the log the initial entries are printed and we see that there term is out of order.

InitialEntries: [InitializeEntry{term=2971, timestamp=2021-02-10 03:15:59,934},
 InitializeEntry{term=2973, timestamp=2021-02-10 03:18:56,289}, 
InitializeEntry{term=2975, timestamp=2021-02-10 03:19:09,022},
 InitializeEntry{term=2977, timestamp=2021-02-10 03:22:09,518},
 InitializeEntry{term=2979, timestamp=2021-02-10 03:22:22,408},
 InitializeEntry{term=2981, timestamp=2021-02-10 03:24:38,394}, 
InitializeEntry{term=2983, timestamp=2021-02-10 03:27:30,906}, 
InitializeEntry{term=2984, timestamp=2021-02-10 03:31:03,999},
 InitializeEntry{term=2972, timestamp=2021-02-10 03:18:51,044}, 
InitializeEntry{term=2974, timestamp=2021-02-10 03:19:01,392}, 
InitializeEntry{term=2976, timestamp=2021-02-10 03:22:03,033}, 
InitializeEntry{term=2978, timestamp=2021-02-10 03:22:14,059},
 InitializeEntry{term=2980, timestamp=2021-02-10 03:24:20,144}, 
InitializeEntry{term=2982, timestamp=2021-02-10 03:26:48,999}
, InitializeEntry{term=2985, timestamp=2021-02-10 03:38:03,701},
 InitializeEntry{term=2987, timestamp=2021-02-10 03:40:38,517}, 
InitializeEntry{term=2989, timestamp=2021-02-10 03:42:07,966}, 
InitializeEntry{term=2991, timestamp=2021-02-10 03:44:47,562}, 
InitializeEntry{term=2993, timestamp=2021-02-10 03:46:54,454},
 InitializeEntry{term=2995, timestamp=2021-02-10 03:49:20,204}, 
...

It seems that the log is equal (inconsistent) on the other nodes

\cc @deepthidevaki

@ChrisKujawa
Copy link
Member

Turned out that the logs were the same because the disk haven't been properly mounted on the VM. After we fixed that we have no access to the different logs, but still it seems to be inconsistent on all nodes.

Status

[zell target/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ java -jar zdb-0.4.0-SNAPSHOT.jar log status -p /home/zell/playground/inconsistentLog/data-0/raft-partition/partitions/4/
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/home/zell/goPath/src/github.com/zelldon/zdb/zdb/target/zdb-0.4.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Scan log...
Log scanned in 122 ms
Meta: 
	Last voted for: 0
	Persisted term: 3028
Configuration: Configuration{index=0, time=2020-11-13 11:41:30,995, members=[DefaultRaftMember{id=1, type=ACTIVE, updated=2020-11-13T10:41:30.995318Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2020-11-13T10:41:30.995318Z}, DefaultRaftMember{id=2, type=ACTIVE, updated=2020-11-13T10:41:30.995318Z}]}

Scanned entries: 7489
Maximum entry size: 11848
Minimum entry size: 14
Avg entry size: 773.4973961810656
LowestRecordPosition: 204608746
HighestRecordPosition: 204621183
HighestIndex: 123186388
LowestIndex: 123178900
InitialEntries: [InitializeEntry{term=2971, timestamp=2021-02-10 03:15:59,934}, InitializeEntry{term=2973, timestamp=2021-02-10 03:18:56,289}, InitializeEntry{term=2975, timestamp=2021-02-10 03:19:09,022}, InitializeEntry{term=2977, timestamp=2021-02-10 03:22:09,518}, InitializeEntry{term=2979, timestamp=2021-02-10 03:22:22,408}, InitializeEntry{term=2981, timestamp=2021-02-10 03:24:38,394}, InitializeEntry{term=2983, timestamp=2021-02-10 03:27:30,906}, InitializeEntry{term=2984, timestamp=2021-02-10 03:31:03,999}, InitializeEntry{term=2972, timestamp=2021-02-10 03:18:51,044}, InitializeEntry{term=2974, timestamp=2021-02-10 03:19:01,392}, InitializeEntry{term=2976, timestamp=2021-02-10 03:22:03,033}, InitializeEntry{term=2978, timestamp=2021-02-10 03:22:14,059}, InitializeEntry{term=2980, timestamp=2021-02-10 03:24:20,144}, InitializeEntry{term=2982, timestamp=2021-02-10 03:26:48,999}, InitializeEntry{term=2985, timestamp=2021-02-10 03:38:03,701}, InitializeEntry{term=2987, timestamp=2021-02-10 03:40:38,517}, InitializeEntry{term=2989, timestamp=2021-02-10 03:42:07,966}, InitializeEntry{term=2991, timestamp=2021-02-10 03:44:47,562}, InitializeEntry{term=2993, timestamp=2021-02-10 03:46:54,454}, InitializeEntry{term=2995, timestamp=2021-02-10 03:49:20,204}, InitializeEntry{term=2997, timestamp=2021-02-10 03:52:26,666}, InitializeEntry{term=2999, timestamp=2021-02-10 03:55:19,408}, InitializeEntry{term=3001, timestamp=2021-02-10 03:57:26,688}, InitializeEntry{term=3003, timestamp=2021-02-10 03:58:48,385}, InitializeEntry{term=3005, timestamp=2021-02-10 04:01:16,230}, InitializeEntry{term=3007, timestamp=2021-02-10 04:03:07,766}, InitializeEntry{term=3009, timestamp=2021-02-10 04:05:49,928}, InitializeEntry{term=3011, timestamp=2021-02-10 04:07:32,429}, InitializeEntry{term=3013, timestamp=2021-02-10 04:10:10,244}, InitializeEntry{term=3015, timestamp=2021-02-10 04:11:50,673}, InitializeEntry{term=3017, timestamp=2021-02-10 04:13:17,654}, InitializeEntry{term=3019, timestamp=2021-02-10 04:14:40,159}, InitializeEntry{term=3021, timestamp=2021-02-10 04:16:03,563}, InitializeEntry{term=3023, timestamp=2021-02-10 04:16:15,409}, InitializeEntry{term=3024, timestamp=2021-02-10 05:07:36,169}, InitializeEntry{term=3025, timestamp=2021-02-10 07:53:34,548}, InitializeEntry{term=3026, timestamp=2021-02-10 08:48:12,638}, InitializeEntry{term=3027, timestamp=2021-02-11 01:41:24,250}, InitializeEntry{term=3028, timestamp=2021-02-11 01:42:43,666}]

[zell target/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ java -jar zdb-0.4.0-SNAPSHOT.jar log status -p /home/zell/playground/inconsistentLog/data-1/raft-partition/partitions/4/
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/home/zell/goPath/src/github.com/zelldon/zdb/zdb/target/zdb-0.4.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Scan log...
Log scanned in 1741 ms
Meta: 
	Last voted for: 0
	Persisted term: 3028
Configuration: Configuration{index=0, time=2020-11-13 11:41:28,489, members=[DefaultRaftMember{id=1, type=ACTIVE, updated=2020-11-13T10:41:28.489937Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2020-11-13T10:41:28.489937Z}, DefaultRaftMember{id=2, type=ACTIVE, updated=2020-11-13T10:41:28.489937Z}]}

Scanned entries: 315181
Maximum entry size: 28728
Minimum entry size: 14
Avg entry size: 982.4472541174754
LowestRecordPosition: 204118351
HighestRecordPosition: 204621183
HighestIndex: 123186388
LowestIndex: 122871208
InitialEntries: [InitializeEntry{term=2964, timestamp=2021-02-10 08:27:51,704}, InitializeEntry{term=2965, timestamp=2021-02-10 08:46:39,179}, InitializeEntry{term=2966, timestamp=2021-02-10 09:39:11,798}, InitializeEntry{term=2967, timestamp=2021-02-10 11:04:07,452}, InitializeEntry{term=2968, timestamp=2021-02-10 01:41:24,887}, InitializeEntry{term=2969, timestamp=2021-02-10 01:52:09,413}, InitializeEntry{term=2970, timestamp=2021-02-10 01:55:15,062}, InitializeEntry{term=2971, timestamp=2021-02-10 03:15:59,934}, InitializeEntry{term=2973, timestamp=2021-02-10 03:18:56,289}, InitializeEntry{term=2975, timestamp=2021-02-10 03:19:09,022}, InitializeEntry{term=2977, timestamp=2021-02-10 03:22:09,518}, InitializeEntry{term=2979, timestamp=2021-02-10 03:22:22,408}, InitializeEntry{term=2981, timestamp=2021-02-10 03:24:38,394}, InitializeEntry{term=2983, timestamp=2021-02-10 03:27:30,906}, InitializeEntry{term=2984, timestamp=2021-02-10 03:31:03,999}, InitializeEntry{term=2972, timestamp=2021-02-10 03:18:51,044}, InitializeEntry{term=2974, timestamp=2021-02-10 03:19:01,392}, InitializeEntry{term=2976, timestamp=2021-02-10 03:22:03,033}, InitializeEntry{term=2978, timestamp=2021-02-10 03:22:14,059}, InitializeEntry{term=2980, timestamp=2021-02-10 03:24:20,144}, InitializeEntry{term=2982, timestamp=2021-02-10 03:26:48,999}, InitializeEntry{term=2985, timestamp=2021-02-10 03:38:03,701}, InitializeEntry{term=2987, timestamp=2021-02-10 03:40:38,517}, InitializeEntry{term=2989, timestamp=2021-02-10 03:42:07,966}, InitializeEntry{term=2991, timestamp=2021-02-10 03:44:47,562}, InitializeEntry{term=2993, timestamp=2021-02-10 03:46:54,454}, InitializeEntry{term=2995, timestamp=2021-02-10 03:49:20,204}, InitializeEntry{term=2997, timestamp=2021-02-10 03:52:26,666}, InitializeEntry{term=2999, timestamp=2021-02-10 03:55:19,408}, InitializeEntry{term=3001, timestamp=2021-02-10 03:57:26,688}, InitializeEntry{term=3003, timestamp=2021-02-10 03:58:48,385}, InitializeEntry{term=3005, timestamp=2021-02-10 04:01:16,230}, InitializeEntry{term=3007, timestamp=2021-02-10 04:03:07,766}, InitializeEntry{term=3009, timestamp=2021-02-10 04:05:49,928}, InitializeEntry{term=3011, timestamp=2021-02-10 04:07:32,429}, InitializeEntry{term=3013, timestamp=2021-02-10 04:10:10,244}, InitializeEntry{term=3015, timestamp=2021-02-10 04:11:50,673}, InitializeEntry{term=3017, timestamp=2021-02-10 04:13:17,654}, InitializeEntry{term=3019, timestamp=2021-02-10 04:14:40,159}, InitializeEntry{term=3021, timestamp=2021-02-10 04:16:03,563}, InitializeEntry{term=3023, timestamp=2021-02-10 04:16:15,409}, InitializeEntry{term=3024, timestamp=2021-02-10 05:07:36,169}, InitializeEntry{term=3025, timestamp=2021-02-10 07:53:34,548}, InitializeEntry{term=3026, timestamp=2021-02-10 08:48:12,638}, InitializeEntry{term=3027, timestamp=2021-02-11 01:41:24,250}, InitializeEntry{term=3028, timestamp=2021-02-11 01:42:43,666}]

[zell target/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ java -jar zdb-0.4.0-SNAPSHOT.jar log status -p /home/zell/playground/inconsistentLog/data-2/raft-partition/partitions/4/
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/home/zell/goPath/src/github.com/zelldon/zdb/zdb/target/zdb-0.4.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Scan log...
Log scanned in 2544 ms
Meta: 
	Last voted for: 0
	Persisted term: 3028
Configuration: Configuration{index=0, time=2020-11-13 11:41:31,465, members=[DefaultRaftMember{id=1, type=ACTIVE, updated=2020-11-13T10:41:31.465040Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2020-11-13T10:41:31.465040Z}, DefaultRaftMember{id=2, type=ACTIVE, updated=2020-11-13T10:41:31.465040Z}]}

Scanned entries: 569295
Maximum entry size: 28728
Minimum entry size: 14
Avg entry size: 836.6357898804662
LowestRecordPosition: 203712195
HighestRecordPosition: 204621183
HighestIndex: 123186388
LowestIndex: 122617094
InitialEntries: [InitializeEntry{term=2960, timestamp=2021-02-10 02:39:41,066}, InitializeEntry{term=2961, timestamp=2021-02-10 03:11:01,559}, InitializeEntry{term=2962, timestamp=2021-02-10 03:20:39,826}, InitializeEntry{term=2963, timestamp=2021-02-10 03:26:00,767}, InitializeEntry{term=2964, timestamp=2021-02-10 08:27:51,704}, InitializeEntry{term=2965, timestamp=2021-02-10 08:46:39,179}, InitializeEntry{term=2966, timestamp=2021-02-10 09:39:11,798}, InitializeEntry{term=2967, timestamp=2021-02-10 11:04:07,452}, InitializeEntry{term=2968, timestamp=2021-02-10 01:41:24,887}, InitializeEntry{term=2969, timestamp=2021-02-10 01:52:09,413}, InitializeEntry{term=2970, timestamp=2021-02-10 01:55:15,062}, InitializeEntry{term=2971, timestamp=2021-02-10 03:15:59,934}, InitializeEntry{term=2973, timestamp=2021-02-10 03:18:56,289}, InitializeEntry{term=2975, timestamp=2021-02-10 03:19:09,022}, InitializeEntry{term=2977, timestamp=2021-02-10 03:22:09,518}, InitializeEntry{term=2979, timestamp=2021-02-10 03:22:22,408}, InitializeEntry{term=2981, timestamp=2021-02-10 03:24:38,394}, InitializeEntry{term=2983, timestamp=2021-02-10 03:27:30,906}, InitializeEntry{term=2984, timestamp=2021-02-10 03:31:03,999}, InitializeEntry{term=2972, timestamp=2021-02-10 03:18:51,044}, InitializeEntry{term=2974, timestamp=2021-02-10 03:19:01,392}, InitializeEntry{term=2976, timestamp=2021-02-10 03:22:03,033}, InitializeEntry{term=2978, timestamp=2021-02-10 03:22:14,059}, InitializeEntry{term=2980, timestamp=2021-02-10 03:24:20,144}, InitializeEntry{term=2982, timestamp=2021-02-10 03:26:48,999}, InitializeEntry{term=2985, timestamp=2021-02-10 03:38:03,701}, InitializeEntry{term=2987, timestamp=2021-02-10 03:40:38,517}, InitializeEntry{term=2989, timestamp=2021-02-10 03:42:07,966}, InitializeEntry{term=2991, timestamp=2021-02-10 03:44:47,562}, InitializeEntry{term=2993, timestamp=2021-02-10 03:46:54,454}, InitializeEntry{term=2995, timestamp=2021-02-10 03:49:20,204}, InitializeEntry{term=2997, timestamp=2021-02-10 03:52:26,666}, InitializeEntry{term=2999, timestamp=2021-02-10 03:55:19,408}, InitializeEntry{term=3001, timestamp=2021-02-10 03:57:26,688}, InitializeEntry{term=3003, timestamp=2021-02-10 03:58:48,385}, InitializeEntry{term=3005, timestamp=2021-02-10 04:01:16,230}, InitializeEntry{term=3007, timestamp=2021-02-10 04:03:07,766}, InitializeEntry{term=3009, timestamp=2021-02-10 04:05:49,928}, InitializeEntry{term=3011, timestamp=2021-02-10 04:07:32,429}, InitializeEntry{term=3013, timestamp=2021-02-10 04:10:10,244}, InitializeEntry{term=3015, timestamp=2021-02-10 04:11:50,673}, InitializeEntry{term=3017, timestamp=2021-02-10 04:13:17,654}, InitializeEntry{term=3019, timestamp=2021-02-10 04:14:40,159}, InitializeEntry{term=3021, timestamp=2021-02-10 04:16:03,563}, InitializeEntry{term=3023, timestamp=2021-02-10 04:16:15,409}, InitializeEntry{term=3024, timestamp=2021-02-10 05:07:36,169}, InitializeEntry{term=3025, timestamp=2021-02-10 07:53:34,548}, InitializeEntry{term=3026, timestamp=2021-02-10 08:48:12,638}, InitializeEntry{term=3027, timestamp=2021-02-11 01:41:24,250}, InitializeEntry{term=3028, timestamp=2021-02-11 01:42:43,666}]

Inconsistency:

[zell target/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ java -jar zdb-0.4.0-SNAPSHOT.jar log consistency -p /home/zell/playground/inconsistentLog/data-2/raft-partition/partitions/4/
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/home/zell/goPath/src/github.com/zelldon/zdb/zdb/target/zdb-0.4.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Opening log /home/zell/playground/inconsistentLog/data-2/raft-partition/partitions/4
Inconsistent log lastRecordPosition '204621051' is higher than next lowestRecordPosition '204620965' at index: '123186195'
LOG IS INCONSISTENT!
Scanned entries: 569295
Log checked in 1592 ms

[zell target/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ java -jar zdb-0.4.0-SNAPSHOT.jar log consistency -p /home/zell/playground/inconsistentLog/data-1/raft-partition/partitions/4/
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/home/zell/goPath/src/github.com/zelldon/zdb/zdb/target/zdb-0.4.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Opening log /home/zell/playground/inconsistentLog/data-1/raft-partition/partitions/4
Inconsistent log lastRecordPosition '204621051' is higher than next lowestRecordPosition '204620965' at index: '123186195'
LOG IS INCONSISTENT!
Scanned entries: 315181
Log checked in 1129 ms

[zell target/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ java -jar zdb-0.4.0-SNAPSHOT.jar log consistency -p /home/zell/playground/inconsistentLog/data-0/raft-partition/partitions/4/
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/home/zell/goPath/src/github.com/zelldon/zdb/zdb/target/zdb-0.4.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Opening log /home/zell/playground/inconsistentLog/data-0/raft-partition/partitions/4
Inconsistent log lastRecordPosition '204621051' is higher than next lowestRecordPosition '204620965' at index: '123186195'
LOG IS INCONSISTENT!
Scanned entries: 7489
Log checked in 64 ms

@deepthidevaki
Copy link
Contributor

The order of initial entries doesn't make any sense. The timestamps actually matches actual leader transitions.

@ChrisKujawa
Copy link
Member

I enriched the output to also contain the index of the initial entries:

Broker 0:

[zell target/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ java -jar zdb-0.4.0-SNAPSHOT.jar log  status -p /home/zell/playground/inconsistentLog/data-0/raft-partition/partitions/4/
Scan log...
Log scanned in 141 ms
Meta: 
	Last voted for: 0
	Persisted term: 3028
Configuration: Configuration{index=0, time=2020-11-13 11:41:30,995, members=[DefaultRaftMember{id=1, type=ACTIVE, updated=2020-11-13T10:41:30.995318Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2020-11-13T10:41:30.995318Z}, DefaultRaftMember{id=2, type=ACTIVE, updated=2020-11-13T10:41:30.995318Z}]}

Scanned entries: 7489
Maximum entry size: 11848
Minimum entry size: 14
Avg entry size: 773.4973961810656
LowestRecordPosition: 204608746
HighestRecordPosition: 204621183
HighestIndex: 123186388
LowestIndex: 123178900
InitialEntries: [Indexed{index=123186138, entry=InitializeEntry{term=2971, timestamp=2021-02-10 03:15:59,934}}, Indexed{index=123186139, entry=InitializeEntry{term=2973, timestamp=2021-02-10 03:18:56,289}}, Indexed{index=123186140, entry=InitializeEntry{term=2975, timestamp=2021-02-10 03:19:09,022}}, Indexed{index=123186141, entry=InitializeEntry{term=2977, timestamp=2021-02-10 03:22:09,518}}, Indexed{index=123186142, entry=InitializeEntry{term=2979, timestamp=2021-02-10 03:22:22,408}}, Indexed{index=123186143, entry=InitializeEntry{term=2981, timestamp=2021-02-10 03:24:38,394}}, Indexed{index=123186144, entry=InitializeEntry{term=2983, timestamp=2021-02-10 03:27:30,906}}, Indexed{index=123186145, entry=InitializeEntry{term=2984, timestamp=2021-02-10 03:31:03,999}}, Indexed{index=123186295, entry=InitializeEntry{term=2972, timestamp=2021-02-10 03:18:51,044}}, Indexed{index=123186297, entry=InitializeEntry{term=2974, timestamp=2021-02-10 03:19:01,392}}, Indexed{index=123186298, entry=InitializeEntry{term=2976, timestamp=2021-02-10 03:22:03,033}}, Indexed{index=123186299, entry=InitializeEntry{term=2978, timestamp=2021-02-10 03:22:14,059}}, Indexed{index=123186300, entry=InitializeEntry{term=2980, timestamp=2021-02-10 03:24:20,144}}, Indexed{index=123186301, entry=InitializeEntry{term=2982, timestamp=2021-02-10 03:26:48,999}}, Indexed{index=123186302, entry=InitializeEntry{term=2985, timestamp=2021-02-10 03:38:03,701}}, Indexed{index=123186303, entry=InitializeEntry{term=2987, timestamp=2021-02-10 03:40:38,517}}, Indexed{index=123186304, entry=InitializeEntry{term=2989, timestamp=2021-02-10 03:42:07,966}}, Indexed{index=123186305, entry=InitializeEntry{term=2991, timestamp=2021-02-10 03:44:47,562}}, Indexed{index=123186306, entry=InitializeEntry{term=2993, timestamp=2021-02-10 03:46:54,454}}, Indexed{index=123186307, entry=InitializeEntry{term=2995, timestamp=2021-02-10 03:49:20,204}}, Indexed{index=123186308, entry=InitializeEntry{term=2997, timestamp=2021-02-10 03:52:26,666}}, Indexed{index=123186309, entry=InitializeEntry{term=2999, timestamp=2021-02-10 03:55:19,408}}, Indexed{index=123186310, entry=InitializeEntry{term=3001, timestamp=2021-02-10 03:57:26,688}}, Indexed{index=123186311, entry=InitializeEntry{term=3003, timestamp=2021-02-10 03:58:48,385}}, Indexed{index=123186312, entry=InitializeEntry{term=3005, timestamp=2021-02-10 04:01:16,230}}, Indexed{index=123186313, entry=InitializeEntry{term=3007, timestamp=2021-02-10 04:03:07,766}}, Indexed{index=123186314, entry=InitializeEntry{term=3009, timestamp=2021-02-10 04:05:49,928}}, Indexed{index=123186315, entry=InitializeEntry{term=3011, timestamp=2021-02-10 04:07:32,429}}, Indexed{index=123186316, entry=InitializeEntry{term=3013, timestamp=2021-02-10 04:10:10,244}}, Indexed{index=123186317, entry=InitializeEntry{term=3015, timestamp=2021-02-10 04:11:50,673}}, Indexed{index=123186318, entry=InitializeEntry{term=3017, timestamp=2021-02-10 04:13:17,654}}, Indexed{index=123186319, entry=InitializeEntry{term=3019, timestamp=2021-02-10 04:14:40,159}}, Indexed{index=123186320, entry=InitializeEntry{term=3021, timestamp=2021-02-10 04:16:03,563}}, Indexed{index=123186321, entry=InitializeEntry{term=3023, timestamp=2021-02-10 04:16:15,409}}, Indexed{index=123186334, entry=InitializeEntry{term=3024, timestamp=2021-02-10 05:07:36,169}}, Indexed{index=123186346, entry=InitializeEntry{term=3025, timestamp=2021-02-10 07:53:34,548}}, Indexed{index=123186359, entry=InitializeEntry{term=3026, timestamp=2021-02-10 08:48:12,638}}, Indexed{index=123186371, entry=InitializeEntry{term=3027, timestamp=2021-02-11 01:41:24,250}}, Indexed{index=123186372, entry=InitializeEntry{term=3028, timestamp=2021-02-11 01:42:43,666}}]

Broker-1

[zell target/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ java -jar zdb-0.4.0-SNAPSHOT.jar log  status -p /home/zell/playground/inconsistentLog/data-1/raft-partition/partitions/4/
Scan log...
Log scanned in 2117 ms
Meta: 
	Last voted for: 0
	Persisted term: 3028
Configuration: Configuration{index=0, time=2020-11-13 11:41:28,489, members=[DefaultRaftMember{id=1, type=ACTIVE, updated=2020-11-13T10:41:28.489937Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2020-11-13T10:41:28.489937Z}, DefaultRaftMember{id=2, type=ACTIVE, updated=2020-11-13T10:41:28.489937Z}]}

Scanned entries: 315181
Maximum entry size: 28728
Minimum entry size: 14
Avg entry size: 982.4472541174754
LowestRecordPosition: 204118351
HighestRecordPosition: 204621183
HighestIndex: 123186388
LowestIndex: 122871208
InitialEntries: [Indexed{index=122871216, entry=InitializeEntry{term=2964, timestamp=2021-02-10 08:27:51,704}}, Indexed{index=122877862, entry=InitializeEntry{term=2965, timestamp=2021-02-10 08:46:39,179}}, Indexed{index=122898043, entry=InitializeEntry{term=2966, timestamp=2021-02-10 09:39:11,798}}, Indexed{index=122939283, entry=InitializeEntry{term=2967, timestamp=2021-02-10 11:04:07,452}}, Indexed{index=123065749, entry=InitializeEntry{term=2968, timestamp=2021-02-10 01:41:24,887}}, Indexed{index=123077236, entry=InitializeEntry{term=2969, timestamp=2021-02-10 01:52:09,413}}, Indexed{index=123077239, entry=InitializeEntry{term=2970, timestamp=2021-02-10 01:55:15,062}}, Indexed{index=123186138, entry=InitializeEntry{term=2971, timestamp=2021-02-10 03:15:59,934}}, Indexed{index=123186139, entry=InitializeEntry{term=2973, timestamp=2021-02-10 03:18:56,289}}, Indexed{index=123186140, entry=InitializeEntry{term=2975, timestamp=2021-02-10 03:19:09,022}}, Indexed{index=123186141, entry=InitializeEntry{term=2977, timestamp=2021-02-10 03:22:09,518}}, Indexed{index=123186142, entry=InitializeEntry{term=2979, timestamp=2021-02-10 03:22:22,408}}, Indexed{index=123186143, entry=InitializeEntry{term=2981, timestamp=2021-02-10 03:24:38,394}}, Indexed{index=123186144, entry=InitializeEntry{term=2983, timestamp=2021-02-10 03:27:30,906}}, Indexed{index=123186145, entry=InitializeEntry{term=2984, timestamp=2021-02-10 03:31:03,999}}, Indexed{index=123186295, entry=InitializeEntry{term=2972, timestamp=2021-02-10 03:18:51,044}}, Indexed{index=123186297, entry=InitializeEntry{term=2974, timestamp=2021-02-10 03:19:01,392}}, Indexed{index=123186298, entry=InitializeEntry{term=2976, timestamp=2021-02-10 03:22:03,033}}, Indexed{index=123186299, entry=InitializeEntry{term=2978, timestamp=2021-02-10 03:22:14,059}}, Indexed{index=123186300, entry=InitializeEntry{term=2980, timestamp=2021-02-10 03:24:20,144}}, Indexed{index=123186301, entry=InitializeEntry{term=2982, timestamp=2021-02-10 03:26:48,999}}, Indexed{index=123186302, entry=InitializeEntry{term=2985, timestamp=2021-02-10 03:38:03,701}}, Indexed{index=123186303, entry=InitializeEntry{term=2987, timestamp=2021-02-10 03:40:38,517}}, Indexed{index=123186304, entry=InitializeEntry{term=2989, timestamp=2021-02-10 03:42:07,966}}, Indexed{index=123186305, entry=InitializeEntry{term=2991, timestamp=2021-02-10 03:44:47,562}}, Indexed{index=123186306, entry=InitializeEntry{term=2993, timestamp=2021-02-10 03:46:54,454}}, Indexed{index=123186307, entry=InitializeEntry{term=2995, timestamp=2021-02-10 03:49:20,204}}, Indexed{index=123186308, entry=InitializeEntry{term=2997, timestamp=2021-02-10 03:52:26,666}}, Indexed{index=123186309, entry=InitializeEntry{term=2999, timestamp=2021-02-10 03:55:19,408}}, Indexed{index=123186310, entry=InitializeEntry{term=3001, timestamp=2021-02-10 03:57:26,688}}, Indexed{index=123186311, entry=InitializeEntry{term=3003, timestamp=2021-02-10 03:58:48,385}}, Indexed{index=123186312, entry=InitializeEntry{term=3005, timestamp=2021-02-10 04:01:16,230}}, Indexed{index=123186313, entry=InitializeEntry{term=3007, timestamp=2021-02-10 04:03:07,766}}, Indexed{index=123186314, entry=InitializeEntry{term=3009, timestamp=2021-02-10 04:05:49,928}}, Indexed{index=123186315, entry=InitializeEntry{term=3011, timestamp=2021-02-10 04:07:32,429}}, Indexed{index=123186316, entry=InitializeEntry{term=3013, timestamp=2021-02-10 04:10:10,244}}, Indexed{index=123186317, entry=InitializeEntry{term=3015, timestamp=2021-02-10 04:11:50,673}}, Indexed{index=123186318, entry=InitializeEntry{term=3017, timestamp=2021-02-10 04:13:17,654}}, Indexed{index=123186319, entry=InitializeEntry{term=3019, timestamp=2021-02-10 04:14:40,159}}, Indexed{index=123186320, entry=InitializeEntry{term=3021, timestamp=2021-02-10 04:16:03,563}}, Indexed{index=123186321, entry=InitializeEntry{term=3023, timestamp=2021-02-10 04:16:15,409}}, Indexed{index=123186334, entry=InitializeEntry{term=3024, timestamp=2021-02-10 05:07:36,169}}, Indexed{index=123186346, entry=InitializeEntry{term=3025, timestamp=2021-02-10 07:53:34,548}}, Indexed{index=123186359, entry=InitializeEntry{term=3026, timestamp=2021-02-10 08:48:12,638}}, Indexed{index=123186371, entry=InitializeEntry{term=3027, timestamp=2021-02-11 01:41:24,250}}, Indexed{index=123186372, entry=InitializeEntry{term=3028, timestamp=2021-02-11 01:42:43,666}}]

Broker-2

[zell target/ cluster: zeebe-long-running ns:zell-inconsistent-log]$ java -jar zdb-0.4.0-SNAPSHOT.jar log  status -p /home/zell/playground/inconsistentLog/data-2/raft-partition/partitions/4/
Scan log...
Log scanned in 4932 ms
Meta: 
	Last voted for: 0
	Persisted term: 3028
Configuration: Configuration{index=0, time=2020-11-13 11:41:31,465, members=[DefaultRaftMember{id=1, type=ACTIVE, updated=2020-11-13T10:41:31.465040Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2020-11-13T10:41:31.465040Z}, DefaultRaftMember{id=2, type=ACTIVE, updated=2020-11-13T10:41:31.465040Z}]}

Scanned entries: 569295
Maximum entry size: 28728
Minimum entry size: 14
Avg entry size: 836.6357898804662
LowestRecordPosition: 203712195
HighestRecordPosition: 204621183
HighestIndex: 123186388
LowestIndex: 122617094
InitialEntries: [Indexed{index=122638057, entry=InitializeEntry{term=2960, timestamp=2021-02-10 02:39:41,066}}, Indexed{index=122675480, entry=InitializeEntry{term=2961, timestamp=2021-02-10 03:11:01,559}}, Indexed{index=122685185, entry=InitializeEntry{term=2962, timestamp=2021-02-10 03:20:39,826}}, Indexed{index=122685186, entry=InitializeEntry{term=2963, timestamp=2021-02-10 03:26:00,767}}, Indexed{index=122871216, entry=InitializeEntry{term=2964, timestamp=2021-02-10 08:27:51,704}}, Indexed{index=122877862, entry=InitializeEntry{term=2965, timestamp=2021-02-10 08:46:39,179}}, Indexed{index=122898043, entry=InitializeEntry{term=2966, timestamp=2021-02-10 09:39:11,798}}, Indexed{index=122939283, entry=InitializeEntry{term=2967, timestamp=2021-02-10 11:04:07,452}}, Indexed{index=123065749, entry=InitializeEntry{term=2968, timestamp=2021-02-10 01:41:24,887}}, Indexed{index=123077236, entry=InitializeEntry{term=2969, timestamp=2021-02-10 01:52:09,413}}, Indexed{index=123077239, entry=InitializeEntry{term=2970, timestamp=2021-02-10 01:55:15,062}}, Indexed{index=123186138, entry=InitializeEntry{term=2971, timestamp=2021-02-10 03:15:59,934}}, Indexed{index=123186139, entry=InitializeEntry{term=2973, timestamp=2021-02-10 03:18:56,289}}, Indexed{index=123186140, entry=InitializeEntry{term=2975, timestamp=2021-02-10 03:19:09,022}}, Indexed{index=123186141, entry=InitializeEntry{term=2977, timestamp=2021-02-10 03:22:09,518}}, Indexed{index=123186142, entry=InitializeEntry{term=2979, timestamp=2021-02-10 03:22:22,408}}, Indexed{index=123186143, entry=InitializeEntry{term=2981, timestamp=2021-02-10 03:24:38,394}}, Indexed{index=123186144, entry=InitializeEntry{term=2983, timestamp=2021-02-10 03:27:30,906}}, Indexed{index=123186145, entry=InitializeEntry{term=2984, timestamp=2021-02-10 03:31:03,999}}, Indexed{index=123186295, entry=InitializeEntry{term=2972, timestamp=2021-02-10 03:18:51,044}}, Indexed{index=123186297, entry=InitializeEntry{term=2974, timestamp=2021-02-10 03:19:01,392}}, Indexed{index=123186298, entry=InitializeEntry{term=2976, timestamp=2021-02-10 03:22:03,033}}, Indexed{index=123186299, entry=InitializeEntry{term=2978, timestamp=2021-02-10 03:22:14,059}}, Indexed{index=123186300, entry=InitializeEntry{term=2980, timestamp=2021-02-10 03:24:20,144}}, Indexed{index=123186301, entry=InitializeEntry{term=2982, timestamp=2021-02-10 03:26:48,999}}, Indexed{index=123186302, entry=InitializeEntry{term=2985, timestamp=2021-02-10 03:38:03,701}}, Indexed{index=123186303, entry=InitializeEntry{term=2987, timestamp=2021-02-10 03:40:38,517}}, Indexed{index=123186304, entry=InitializeEntry{term=2989, timestamp=2021-02-10 03:42:07,966}}, Indexed{index=123186305, entry=InitializeEntry{term=2991, timestamp=2021-02-10 03:44:47,562}}, Indexed{index=123186306, entry=InitializeEntry{term=2993, timestamp=2021-02-10 03:46:54,454}}, Indexed{index=123186307, entry=InitializeEntry{term=2995, timestamp=2021-02-10 03:49:20,204}}, Indexed{index=123186308, entry=InitializeEntry{term=2997, timestamp=2021-02-10 03:52:26,666}}, Indexed{index=123186309, entry=InitializeEntry{term=2999, timestamp=2021-02-10 03:55:19,408}}, Indexed{index=123186310, entry=InitializeEntry{term=3001, timestamp=2021-02-10 03:57:26,688}}, Indexed{index=123186311, entry=InitializeEntry{term=3003, timestamp=2021-02-10 03:58:48,385}}, Indexed{index=123186312, entry=InitializeEntry{term=3005, timestamp=2021-02-10 04:01:16,230}}, Indexed{index=123186313, entry=InitializeEntry{term=3007, timestamp=2021-02-10 04:03:07,766}}, Indexed{index=123186314, entry=InitializeEntry{term=3009, timestamp=2021-02-10 04:05:49,928}}, Indexed{index=123186315, entry=InitializeEntry{term=3011, timestamp=2021-02-10 04:07:32,429}}, Indexed{index=123186316, entry=InitializeEntry{term=3013, timestamp=2021-02-10 04:10:10,244}}, Indexed{index=123186317, entry=InitializeEntry{term=3015, timestamp=2021-02-10 04:11:50,673}}, Indexed{index=123186318, entry=InitializeEntry{term=3017, timestamp=2021-02-10 04:13:17,654}}, Indexed{index=123186319, entry=InitializeEntry{term=3019, timestamp=2021-02-10 04:14:40,159}}, Indexed{index=123186320, entry=InitializeEntry{term=3021, timestamp=2021-02-10 04:16:03,563}}, Indexed{index=123186321, entry=InitializeEntry{term=3023, timestamp=2021-02-10 04:16:15,409}}, Indexed{index=123186334, entry=InitializeEntry{term=3024, timestamp=2021-02-10 05:07:36,169}}, Indexed{index=123186346, entry=InitializeEntry{term=3025, timestamp=2021-02-10 07:53:34,548}}, Indexed{index=123186359, entry=InitializeEntry{term=3026, timestamp=2021-02-10 08:48:12,638}}, Indexed{index=123186371, entry=InitializeEntry{term=3027, timestamp=2021-02-11 01:41:24,250}}, Indexed{index=123186372, entry=InitializeEntry{term=3028, timestamp=2021-02-11 01:42:43,666}}]

@ChrisKujawa
Copy link
Member

I have setup a benchmark on stable nodes with network partitioning and packet loss as described above. Plus with large state, so no workers to complete the jobs. It uses the 0.26.0 version and runs over the weekend and died, since it reached 14 gig snapshot size.

general
raft
snapshot

Still the log seems to be consistent, so I was not able to reproduce it. What we need here is actually more information what actually has happened in atomix, so more logs (stackdriver), but unfortunately atomix log is per default to high (warn). I will open an issue on the controller to lower the log level.

root@zell-inconsistent-log-zeebe-0:/usr/local/zeebe# java -jar zdb-0.4.0-SNAPSHOT.jar log consistency -p data/raft-partition/partitions/1/
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -Xlog:gc*:file=/usr/local/zeebe/data/gc.log:time:filecount=7,filesize=8M
Opening log data/raft-partition/partitions/1
LOG IS CONSISTENT.
Scanned entries: 111213
Log checked in 4444 ms

root@zell-inconsistent-log-zeebe-0:/usr/local/zeebe# java -jar zdb-0.4.0-SNAPSHOT.jar log consistency -p data/raft-partition/partitions/2/
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -Xlog:gc*:file=/usr/local/zeebe/data/gc.log:time:filecount=7,filesize=8M
Opening log data/raft-partition/partitions/2
LOG IS CONSISTENT.
Scanned entries: 810367
Log checked in 25126 ms

root@zell-inconsistent-log-zeebe-0:/usr/local/zeebe# java -jar zdb-0.4.0-SNAPSHOT.jar log consistency -p data/raft-partition/partitions/3/
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -Xlog:gc*:file=/usr/local/zeebe/data/gc.log:time:filecount=7,filesize=8M
Opening log data/raft-partition/partitions/3
LOG IS CONSISTENT.
Scanned entries: 75062
Log checked in 1666 ms

@ChrisKujawa
Copy link
Member

ChrisKujawa commented Feb 15, 2021

If you guys @npepinpe @deepthidevaki @miguelpires have any ideas just speak up, otherwise I will stop the further investigation until it happens again and I will created an issue on the controller to lower the atomix log level. https://github.com/camunda-cloud/zeebe-controller-k8s/issues/415

@ChrisKujawa ChrisKujawa removed their assignment Feb 15, 2021
@ChrisKujawa
Copy link
Member

ChrisKujawa commented Feb 15, 2021

I checked from the Benchmark the logs from the node 1 and it seems that one log is inconsistent 🚨

root@zell-inconsistent-log-zeebe-1:/usr/local/zeebe#  java -jar zdb-0.4.0-SNAPSHOT.jar log consistency -p data/raft-partition/partitions/3/
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -Xlog:gc*:file=/usr/local/zeebe/data/gc.log:time:filecount=7,filesize=8M
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/usr/local/zeebe/zdb-0.4.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Opening log data/raft-partition/partitions/3
Inconsistent log lastRecordPosition '14849134' is higher than next lowestRecordPosition '0' at index: '10384145'
LOG IS INCONSISTENT!
Scanned entries: 222576
Log checked in 4849 ms

@ChrisKujawa
Copy link
Member

ChrisKujawa commented Feb 15, 2021

I investigated with @deepthidevaki a bit, it looks a bit like an different error. It seems that this entry is zeroed out, but is still readable. We checked all nodes and Broker zero seems to be consistent, but on Broker 1 and 2 the partition is inconsistent, but on different indexes. It looks like it is only about the last entry.

Broker-1

Opening log data/raft-partition/partitions/3
Inconsistent log lastRecordPosition '14849134' is higher than next lowestRecordPosition '0' at index: '10384145'
LOG IS INCONSISTENT!
Scanned entries: 222576
Log checked in 4849 ms

root@zell-inconsistent-log-zeebe-1:/usr/local/zeebe#  java -jar zdb-0.4.0-SNAPSHOT.jar log status -p data/raft-partition/partitions/3/
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -Xlog:gc*:file=/usr/local/zeebe/data/gc.log:time:filecount=7,filesize=8M
Scan log...
Log scanned in 14962 ms
Meta: 
	Last voted for: 0
	Persisted term: 971
Configuration: Configuration{index=0, time=2021-02-12 01:12:45,394, members=[DefaultRaftMember{id=2, type=ACTIVE, updated=2021-02-12T13:12:45.394912Z}, DefaultRaftMember{id=1, type=ACTIVE, updated=2021-02-12T13:12:45.394912Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2021-02-12T13:12:45.394912Z}]}

Scanned entries: 222576
Maximum entry size: 154294
Minimum entry size: 14
Avg entry size: 3957.3113363525267
LowestRecordPosition: 0
HighestRecordPosition: 14849134
HighestIndex: 10384145
LowestIndex: 10161570
InitialEntries: [Indexed{index=10373257, entry=InitializeEntry{term=969, timestamp=2021-02-13 09:28:49,623}}, Indexed{index=10373258, entry=InitializeEntry{term=971, timestamp=2021-02-13 10:05:44,213}}, Indexed{index=10161745, entry=InitializeEntry{term=965, timestamp=2021-02-13 08:09:57,175}}, Indexed{index=10161746, entry=InitializeEntry{term=966, timestamp=2021-02-13 08:11:22,172}}, Indexed{index=10161747, entry=InitializeEntry{term=967, timestamp=2021-02-13 08:24:14,405}}, Indexed{index=10161748, entry=InitializeEntry{term=968, timestamp=2021-02-13 09:14:21,688}}]

Broker-2

Opening log data/raft-partition/partitions/3
Inconsistent log lastRecordPosition '17577030' is higher than next lowestRecordPosition '0' at index: '12303537'
LOG IS INCONSISTENT!
Scanned entries: 2006169
Log checked in 56234 ms

root@zell-inconsistent-log-zeebe-2:/usr/local/zeebe# java -jar zdb-0.4.0-SNAPSHOT.jar log status -p data/raft-partition/partitions/3/
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -Xlog:gc*:file=/usr/local/zeebe/data/gc.log:time:filecount=7,filesize=8M
Scan log...
Log scanned in 162565 ms
Meta: 
	Last voted for: null
	Persisted term: 971
Configuration: Configuration{index=0, time=2021-02-12 01:12:45,549, members=[DefaultRaftMember{id=2, type=ACTIVE, updated=2021-02-12T13:12:45.549928Z}, DefaultRaftMember{id=1, type=ACTIVE, updated=2021-02-12T13:12:45.549928Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2021-02-12T13:12:45.549928Z}]}

Scanned entries: 2006169
Maximum entry size: 154294
Minimum entry size: 14
Avg entry size: 3882.256762017557
LowestRecordPosition: 0
HighestRecordPosition: 17577030
HighestIndex: 12303537
LowestIndex: 10297369
InitialEntries: [Indexed{index=10373257, entry=InitializeEntry{term=969, timestamp=2021-02-13 09:28:49,623}}, Indexed{index=10373258, entry=InitializeEntry{term=971, timestamp=2021-02-13 10:05:44,213}}]

Broker-0

Searching log data/raft-partition/partitions/3
Found entry with index '12303537'
Indexed Entry: Indexed{index=12303537, entry=ZeebeEntry{term=971, timestamp=1613257070438, lowestPosition=17577031, highestPosition=17577031}}
Searched log in 1347 ms

root@zell-inconsistent-log-zeebe-0:/usr/local/zeebe# java -jar zdb-0.4.0-SNAPSHOT.jar log status -p data/raft-partition/partitions/3/
Picked up JAVA_TOOL_OPTIONS: -XX:MaxRAMPercentage=25.0 -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -Xlog:gc*:file=/usr/local/zeebe/data/gc.log:time:filecount=7,filesize=8M
Scan log...
Log scanned in 1951 ms
Meta: 
	Last voted for: 0
	Persisted term: 971
Configuration: Configuration{index=0, time=2021-02-12 01:12:59,791, members=[DefaultRaftMember{id=2, type=ACTIVE, updated=2021-02-12T13:12:59.791917Z}, DefaultRaftMember{id=1, type=ACTIVE, updated=2021-02-12T13:12:59.791917Z}, DefaultRaftMember{id=0, type=ACTIVE, updated=2021-02-12T13:12:59.791917Z}]}

Scanned entries: 75062
Maximum entry size: 84174
Minimum entry size: 326
Avg entry size: 3853.7150089259544
LowestRecordPosition: 17470997
HighestRecordPosition: 17577452
HighestIndex: 12303838
LowestIndex: 12228777
InitialEntries: No initial entries in the log.

Checking the logs in stackdriver I found several errors for example an SIGSEG on closing the broker (@npepinpe mentiones it is know ?)

I also found some errors from the log appender which shows that there are problems on appending

"java.lang.IllegalStateException: Unexpected position 14527160 was encountered after position 14527201 when appending positions <14527160, 14527175>.
	at io.atomix.raft.roles.LeaderRole.safeAppendEntry(LeaderRole.java:701) ~[atomix-cluster-0.26.0.jar:0.26.0]
	at io.atomix.raft.roles.LeaderRole.lambda$appendEntry$10(LeaderRole.java:679) ~[atomix-cluster-0.26.0.jar:0.26.0]
	at io.atomix.utils.concurrent.SingleThreadContext$WrappedRunnable.run(SingleThreadContext.java:188) [atomix-utils-0.26.0.jar:0.26.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
	at java.lang.Thread.run(Unknown Source) [?:?]

@ChrisKujawa
Copy link
Member

Together with @deepthidevaki we checked the logs and related partitions data of the benchmark further, but we found no real cause for this. It seems that this happen after becoming leader and after some appends have succeeded or at least we see no initial entry before and after. It might also that the log has been truncated, because the entries haven't been committed. We are not able to see what is the first position the dispatcher is initialized to, which would help here. Might something I can add as a log statement.

In general it looks different then the error in the production system, since first it is prevented from being written to the log and the inconsistency which is detected (by zdb) is only at the end of the log, and it looks like it is being zeroed. Furthermore only two nodes have this inconsistency on different indexes, since the log length is different. It might be due to several out of direct memory and SIGSEG errors, which we can see in the log. It is not clear.

In the production system we can see that the inconsistency is really persisted in the log on all nodes and multiple records are following with the lower positions. The position is not zero, it is just decreased. The inconsistent partition contain only one segment so it can't be the case that segments are maybe read in the wrong order by the reader.

There are several open questions:

  • Why the positions decreased?
  • How can it happen that this was persisted?
  • Why wasn't that detected by the writer?
  • Why is the term much lower then the previous term?

@ChrisKujawa
Copy link
Member

I checked the latest KW benchmarks and was wondering why the KW-03 is failing. It seems it has the same issue:

java.lang.IllegalStateException: Expected that position '3371991128' of current event is higher then position '3372057653' of last event, but was not. Inconsistent log detected!
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.scanLog(ReProcessingStateMachine.java:180) ~[zeebe-workflow-engine-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.startRecover(ReProcessingStateMachine.java:150) ~[zeebe-workflow-engine-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.engine.processing.streamprocessor.StreamProcessor.onActorStarted(StreamProcessor.java:119) ~[zeebe-workflow-engine-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:94) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorThread.run(ActorThread.java:191) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]

general
general1
snapshot
throughput
atomix

It seems it is related again to big state and multiple leader changes. 🤔

🔥 @deepthidevaki @npepinpe

@ChrisKujawa
Copy link
Member

ChrisKujawa commented Feb 15, 2021

OK I think I have it.

TL;DR;

It seems that it can happen that the partition is not closed fast enough, which means we have an concurrent log appender on becoming leader again, which is writing an event on the stream. It is detect, but it will still be written to the stream.

Long version:

Remember each step is closed separately and one by one, so we wait until everything is stopped. In the end of the closing we close dispatcher and log appender etc. The problem is if something takes to long to close the dispatcher/logappender is still open.

For example in our failing benchmark we can see that the AsyncSnapshotDirector will be closed and the appender still tries to append blocks, which fails.

 D 2021-02-12T03:39:50.766004Z zell-kw-03-benchmark-zeebe-1 Closing Zeebe-Partition-2: AsyncSnapshotDirector  zell-kw-03-benchmark-zeebe-1
D 2021-02-12T03:39:50.766150Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371990537. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1
D 2021-02-12T03:39:50.766348Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371990538. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1
D 2021-02-12T03:39:50.766519Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371990539. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1
D 2021-02-12T03:39:50.766685Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371990540. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1
D 2021-02-12T03:39:50.766833Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371990541. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1
D 2021-02-12T03:39:50.766988Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371990542. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1
D 2021-02-12T03:39:50.767181Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371990543. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1

It seems that the appender has a lot of blocks to append, because it is logged quite often.

We can also see that the broker is able to reprocess on that partition, but still our append on Broker-1 is open.

D 2021-02-12T03:40:20.808070Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371990767. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1
I 2021-02-12T03:40:48.184151Z zell-kw-03-benchmark-zeebe-0 Processor finished reprocessing at event position 3371990275  zell-kw-03-benchmark-zeebe-0
D 2021-02-12T03:40:48.186818Z zell-kw-03-benchmark-zeebe-0 Expected to have a rate limiter listener for request-2-4238496, but none found. (This can happen during fail over.)  zell-kw-03-benchmark-zeebe-0
D 2021-02-12T03:40:48.188312Z zell-kw-03-benchmark-zeebe-0 Expected to have a rate limiter listener for request-2-4238498, but none found. (This can happen during fail over.)  zell-kw-03-benchmark-zeebe-0
D 2021-02-12T03:40:48.189177Z zell-kw-03-benchmark-zeebe-0 Expected to have a rate limiter listener for request-2-4238500, but none found. (This can happen during fail over.)  zell-kw-03-benchmark-zeebe-0
D 2021-02-12T03:40:48.191104Z zell-kw-03-benchmark-zeebe-0 Expected to have a rate limiter listener for request-2-4238514, but none found. (This can happen during fail over.)  zell-kw-03-benchmark-zeebe-0
D 2021-02-12T03:40:48.191812Z zell-kw-03-benchmark-zeebe-0 Expected to have a rate limiter listener for request-2-4238515, but none found. (This can happen during fail over.)  zell-kw-03-benchmark-zeebe-0
D 2021-02-12T03:40:50.793630Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371990768. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1

At some point we see that the partition switches the role to Candidate again on Broker-1, unfortunately this is only half of the truth. Actually inside Raft we already switch to leader again, but we are not able to see that in the log. This means that our appender is not able to append again to the log.

D 2021-02-12T03:41:50.893750Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371991126. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1
D 2021-02-12T03:41:50.893889Z zell-kw-03-benchmark-zeebe-1 Failed to append block with last event position 3371991127. This can happen during a leader change.  zell-kw-03-benchmark-zeebe-1
D 2021-02-12T03:42:07.644634Z zell-kw-03-benchmark-zeebe-0 Purging snapshots older than FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/2/snapshots/2746487499-105-1613101326657-3372056567-3372056233, metadata=FileBasedSnapshotMetadata{index=2746487499, term=105, timestamp=2021-02-12 03:42:06,657, processedPosition=3372056567, exporterPosition=3372056233}}  zell-kw-03-benchmark-zeebe-0
D 2021-02-12T03:42:07.645652Z zell-kw-03-benchmark-zeebe-0 Deleting snapshot FileBasedSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/2/snapshots/2746331099-103-1613100445378-3371853861-3371853518, metadata=FileBasedSnapshotMetadata{index=2746331099, term=103, timestamp=2021-02-12 03:27:25,378, processedPosition=3371853861, exporterPosition=3371853518}}  zell-kw-03-benchmark-zeebe-0
D 2021-02-12T03:42:07.843971Z zell-kw-03-benchmark-zeebe-0 Search for orphaned snapshots below oldest valid snapshot with index FileBasedSnapshotMetadata{index=2746487499, term=105, timestamp=2021-02-12 03:42:06,657, processedPosition=3372056567, exporterPosition=3372056233} in /usr/local/zeebe/data/raft-partition/partitions/2/pending  zell-kw-03-benchmark-zeebe-0
D 2021-02-12T03:42:07.844620Z zell-kw-03-benchmark-zeebe-0 Compacting Atomix log up to index 2746487499  zell-kw-03-benchmark-zeebe-0
D 2021-02-12T03:42:11.341948Z zell-kw-03-benchmark-zeebe-1 Partition role transitioning from FOLLOWER to CANDIDATE  zell-kw-03-benchmark-zeebe-1
E 2021-02-12T03:42:20.870169Z zell-kw-03-benchmark-zeebe-1 Actor Broker-1-LogAppender-2 failed in phase STARTED.  zell-kw-03-benchmark-zeebe-1

This is detected by our verification and throws an exception:

java.lang.IllegalStateException: Unexpected position 3371991128 was encountered after position 3372057653 when appending positions <3371991128, 3371991128>.
	at io.atomix.raft.roles.LeaderRole.safeAppendEntry(LeaderRole.java:701) ~[atomix-cluster-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.atomix.raft.roles.LeaderRole.lambda$appendEntry$10(LeaderRole.java:679) ~[atomix-cluster-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.atomix.utils.concurrent.SingleThreadContext$WrappedRunnable.run(SingleThreadContext.java:188) ~[atomix-utils-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:834) ~[?:?]

The problem here is that it looks like that this is still appended to the log, because on our reprocessing we also find this position.

java.lang.IllegalStateException: Expected that position '3371991128' of current event is higher then position '3372057653' of last event, but was not. Inconsistent log detected!
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.scanLog(ReProcessingStateMachine.java:180) ~[zeebe-workflow-engine-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.engine.processing.streamprocessor.ReProcessingStateMachine.startRecover(ReProcessingStateMachine.java:150) ~[zeebe-workflow-engine-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.engine.processing.streamprocessor.StreamProcessor.onActorStarted(StreamProcessor.java:119) ~[zeebe-workflow-engine-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:94) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]
	at io.zeebe.util.sched.ActorThread.run(ActorThread.java:191) [zeebe-util-0.27.0-SNAPSHOT.jar:0.27.0-SNAPSHOT]

If we check the actual code we can see why this is the case.

In the ZeebeEntryValidator we check the positions https://github.com/zeebe-io/zeebe/blob/develop/logstreams/src/main/java/io/zeebe/logstreams/impl/log/ZeebeEntryValidator.java#L24-L29

      if (lastPosition != -1 && position != lastPosition + 1) {
        return ValidationResult.failure(
            String.format(
                "Unexpected position %d was encountered after position %d when appending positions <%d, %d>.",
                position, lastPosition, entry.lowestPosition(), entry.highestPosition()));
      }

In the LeaderRole we use this code to validate the entry BEFORE we append it.
https://github.com/zeebe-io/zeebe/blob/develop/atomix/cluster/src/main/java/io/atomix/raft/roles/LeaderRole.java#L567-L571

    final ValidationResult result = raft.getEntryValidator().validateEntry(lastZbEntry, entry);
    if (result.failed()) {
      appendListener.onWriteError(new IllegalStateException(result.getErrorMessage()));
      raft.transition(Role.FOLLOWER);
    }

    append(entry)

The issue here is that we not returning, which means after the role change is done to follower we still append it to the log.
If we now become leader again, which we do in our benchmark then this gets committed and is persisted on the log, boom we have a nice inconssitency.

@ChrisKujawa
Copy link
Member

I think it is quite likely that this is also how it becomes inconsistent in the production env.

We can see in the log also these append failure. We see the transition to candidate in the Zeebe Partition so it is likely that we already become leader in raft.

Furthermore it looks like we also replicating the entry if we successful append to the log, see here https://github.com/zeebe-io/zeebe/blob/develop/atomix/cluster/src/main/java/io/atomix/raft/roles/LeaderRole.java#L589 This might cause further disturbance.

I assume this stands also in related to the truncation error we see later on this node. Where we not able to truncate the log java.lang.IndexOutOfBoundsException: Cannot truncate committed index: 123186194

I will fix this and then we can see whether the issues are gone or not.

@deepthidevaki
Copy link
Contributor

Good job @Zell 👍 😌

I think a part of the problem is also that we get the LeaderRole object everytime we append. This means that we are trying to append with a LeaderRole object at a newer term, while the LogStorageAppender is from a previous term.
https://github.com/zeebe-io/zeebe/blob/e76d9256080e4e3331cfdd786210780e9b00ea35/logstreams/src/main/java/io/zeebe/logstreams/storage/atomix/AtomixLogStorage.java#L49

If it was using the old LeaderRole object to append, it would not be able to append because the role is already closed.

@ChrisKujawa
Copy link
Member

Thanks @deepthidevaki and thanks for your help on investigating this 🙇‍♂️

I think you're right and we should solve this as well.
Could you please open a separate issue for this so we can solve this separately?

I created an PR to prevent appending invaild entries. #6345

@ghost ghost closed this as completed in a50f2dc Feb 16, 2021
ghost pushed a commit that referenced this issue Feb 16, 2021
6349: [Backport stable/0.26] fix(atomix): do not append invalid entries r=Zelldon a=github-actions[bot]

# Description
Backport of #6345 to `stable/0.26`.

closes #6316

Co-authored-by: Christopher Zell <[email protected]>
ghost pushed a commit that referenced this issue Feb 16, 2021
6348: [Backport stable/0.25] fix(atomix): do not append invalid entries r=Zelldon a=github-actions[bot]

# Description
Backport of #6345 to `stable/0.25`.

closes #6316

Co-authored-by: Christopher Zell <[email protected]>
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/critical Marks a stop-the-world bug, with a high impact and no existing workaround
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants