Implement draining logic for transactional flow #757

szymonm · 2019-03-25T10:36:44Z

Pull Request Checklist

Have you read through the contributor guidelines?
Have you signed the Lightbend CLA?
Have you updated the documentation?
Have you added tests for any changed functionality?

Fixes

Duplications in a transactional flow that occur, when either closing the stream or partitions are revoked by Kafka.

Purpose

This PR introduces draining logic for TransactionalSource. Specifically, for each partition managed by the source, we keep track of the message offsets emitted by the source.
When closing the stage or revoking partitions, we make sure that all the offsets are committed back to the Kafka by the producer.
To know that an offset was committed by the producer we attach CommittedMarker to each message produced by the stream. TransactionalProducerStage uses this marker to tell the source that an offset was committed.

Background Context

In a transactional stream processing partition P, we need to wait for all commits to be acknowledged by Kafka before closing the consumer.
If we don't, it's possible that another consumer is assigned P, which triggers fetching offset that can be stale or overwritten later. This causes data duplication.

References

#758
#756

core/src/main/scala/akka/kafka/internal/TransactionalProducerStage.scala

szymonm · 2019-04-01T17:08:54Z

Some afterthoughts:
IIUC, we need to ensure that consumer is not closed before producer finishes committing transaction also in the case of stream failure.
IIUC, in this case the failure message will reach source stage and stop the consumer. However, it will not reach TransactionalProducerStage if producer.commitTransaction will be in progress (it's blocking). This can lead to race conditions, when producer commits after some other consumer is assigned the partition.

core/src/main/scala/akka/kafka/internal/TransactionalSource.scala

2m · 2019-04-04T11:16:04Z

Rebased on top of the latest master, and also fixed a bug, where a wrong Committed message was being handled in the Trnsational Source stage.

szymonm · 2019-04-04T11:24:14Z

core/src/main/scala/akka/kafka/internal/TransactionalProducerStage.scala

@@ -151,9 +151,9 @@ private final class TransactionalProducerStageLogic[K, V, P](stage: Transactiona
  private def maybeCommitTransaction(beginNewTransaction: Boolean = true): Unit = {
    val awaitingConf = awaitingConfirmation.get
    batchOffsets match {
-      case batch: NonemptyTransactionBatch if awaitingConf == 0 =>
+      case batch: NonemptyTransactionBatch if awaitingConf == 0 && !hasBeenPulled(stage.in) =>


Can you explain why we need this logic?

With this PR, the transaction commit is split into two synchronous blocks in the Producer stage. That means that in between, a message could be sent to the stage which would be sent to the Kafka producer, but the transaction has not yet been started.

So this guard was needed to be sure that no message will come in between the the committing of transaction and beginning of the new one.

Still didn't quite get it. I was hoping that suspending demand when transaction is in progress will be enough.
Also, I'm wondering if we just shouldn't block waiting for internal commit.
kafkaProducer.commitTrasnaction() is already blocking the whole stage waiting on a network call. And internalCommit is just waiting for local memory change, so it shouldn't add much...
That would simplify the code a bit.

Making internalCommitjust send a message would also be possible. Seems like TransactionalProducerStage does not have to wait for the SourceActor to acknowledge getting the message, right?

I was hoping that suspending demand when transaction is in progress will be enough.

Calling suspendDemand does make sure that stage.in is not going to be pulled anymore from that point on, but it could have been pulled from before.

Seems like TransactionalProducerStage does not have to wait for the SourceActor to acknowledge getting the message, right?

Yes, currently it looks like that's the case. I'll need to think about it a bit more.

szymonm · 2019-04-04T11:24:37Z

Thanks, @2m

jotek7 · 2019-04-05T14:36:50Z

core/src/main/scala/akka/kafka/internal/TransactionalSource.scala

+      def waitForDraining(): Unit = {
+        import akka.pattern.ask
+        implicit val timeout = Timeout(5.seconds)
+        Await.result(ask(stageActor.ref, Drain(stageActor.ref, Drained)), 10.seconds)


I'm afraid if the behaviour in case of a timeout is correct.
IIUC TimeoutException is thrown and it's caught in kafka code by ConsumerCoordinator,
here:

try { listener.onPartitionsRevoked(revoked); } catch (WakeupException | InterruptException e) { throw e; } catch (Exception e) { log.error("User provided listener {} failed on partition revocation", listener.getClass().getName(), e); }

So due to catch (Exception e) we won't fail the stream and we'll just continue like nothing happened and the stream was drained.

Therefore if it takes too long to drain then we still allow messages to remain in the stream. In the meantime some other consumer might start processing data from the same partition because we just released it.

@szymonm @2m Could you verify if my reasoning is correct?

You're right @jotek7. In this case we should fail the stream starting from the producer. Which is similar to what we should do generally in the failure case (see my comments)

Yea, the stage needs to be stopped in that scenario. But I am still not sure if the stopping should start from the consumer: see #757 (comment)

2m · 2019-04-08T13:28:42Z

core/src/main/scala/akka/kafka/internal/TransactionalSource.scala

+        case (_, Drain(ack, msg)) =>
+          if (inFlightRecords.empty()) {
+            log.debug("Source drained")
+            ack ! msg


This should also send an answer back to the sender otherwise the ask on line 139 will not be completed.

Good catch, fixed in the next commit.

2m · 2019-04-09T13:50:29Z

IIUC, in this case the failure message will reach source stage and stop the consumer. However, it will not reach TransactionalProducerStage if producer.commitTransaction will be in progress (it's blocking). This can lead to race conditions, when producer commits after some other consumer is assigned the partition.

I am not sure if this is a problem. If the stream is being torn down, then after the blocking producer.commitTransaction is done, it will complete the stage with onCompletionFailure and will not commit the next transaction, which might have duplicates. Am I missing something here?

2m · 2019-04-09T16:53:48Z

I added a commit where it fails the consumer stage if the draining timeout happens.

szymonm · 2019-04-10T08:39:42Z

tests/src/test/scala/akka/kafka/scaladsl/TransactionsSpec.scala

@@ -266,7 +266,7 @@ class TransactionsSpec extends SpecBase(kafkaPort = KafkaPorts.TransactionsSpec)
    "provide consistency when multiple transactional streams are being restarted" in {
      val sourcePartitions = 10
      val destinationPartitions = 4
-      val consumers = 3
+      val consumers = 1


why did you need to change this here?

I noticed that this test-case does not finish when running with 3 consumers. The stopping of the stage on drain timeout does not work nicely with how this test uses the RestartSource and it gets restarted all the time and the test never finishes. I also noticed that in Travis it did not finish even with 1 consumer. I need to improve that test-case.

szymonm · 2019-04-10T09:11:33Z

I am not sure if this is a problem. If the stream is being torn down, then after the blocking producer.commitTransaction is done, it will complete the stage with onCompletionFailure and will not commit the next transaction, which might have duplicates. Am I missing something here?

There is still a chance that consumer will be closed before the producer finishes producer.commitTransaction. This can cause the partition to be picked up by another consumer that will start from the offsets that are no yet committed.

szymonm · 2019-04-10T13:14:54Z

What are the requirements for binary compatibility? After turning PartitionOffset case class to a normal class we have some warnings. Should we fix them all?

2m · 2019-04-10T13:18:05Z

Should we fix them all?

Yes, otherwise we will not be able to include this in a patch release.

szymonm · 2019-04-11T08:53:29Z

Last travis run reports: The job exceeded the maximum time limit for jobs, and has been terminated.

Tests for TransacionsSpec take a lot of time:

[info] - must provide consistency when multiple transactional streams are being restarted (8 minutes, 18 seconds)

[info] - must drain stream on partitions rebalancing (2 minutes, 10 seconds)

szymonm · 2019-04-11T09:25:05Z

Looks like the test provide consistency when multiple transactional streams are being restarted takes 8 mins. We run it twice (for each scala version) and thus we increase the test times from ~30mins to over 50mins (the test for draining takes another 2 mins for each run).

Can we make the test case smaller, @2m ?

2m · 2019-04-11T09:31:42Z

Yes, lets decrease the number of messages to 100k and see how long does that take.

2m · 2019-04-11T10:59:48Z

core/src/main/scala/akka/kafka/internal/TransactionalSource.scala

+
+      def waitForDraining(): Boolean = {
+        import akka.pattern.ask
+        // Shall we use commitTimeout here?


Yes, it should be at least commit timeout. Possible a bit larger as well. Can you refactor this from hardcoded constant to configuration value?

After some thoughts I think using commitTimeout is just ok, will use that.

2m · 2019-04-11T11:00:24Z

core/src/main/scala/akka/kafka/internal/TransactionalSource.scala

+            log.debug(s"Draining partitions {}", inFlightRecords)
+            // TODO: This timeout should be somehow related to the committing interval.
+            // For instance having 10.millis makes no sense if we commit transaction every second
+            materializer.scheduleOnce(30.millis, new Runnable {


Would be good to move this to configuration as well.

2m · 2019-04-11T14:50:52Z

Two of the other tests in the TransactionalSpec started failing. Could be the new timing configuration not playing nicely with the test-cases.

szymonm · 2019-04-12T09:51:12Z

These tests actually catch a valid issue now:
We keep track of offsets emitted by the source and now we are waiting for all emitted offsets. But we should not wait for the offset emitted if processing of this element caused an exception.

szymonm · 2019-04-12T09:52:10Z

If produced is closed after committing a transaction we can safely close the consumer.

szymonm · 2019-04-12T13:13:20Z

Looks like IntegrationSpec signal rebalance events to actor test is flakky.

szymonm · 2019-04-12T13:30:54Z

TBH there is no reason why this test should work reliably.
It uses ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest" config with no offsets storing.

This means that after a rebalancing consumers pick up the latest element from the queue. If there is more elements written to a ktp during the rebalancing this elements are never read.

szymonm · 2019-04-12T13:34:30Z

Can you run tests again?

2m · 2019-04-15T12:00:53Z

We keep track of offsets emitted by the source and now we are waiting for all emitted offsets. But we should not wait for the offset emitted if processing of this element caused an exception.

Great catch. That was the reason I saw the multiple "waiting for draining" messages in the logs when the stage was stopping because of failure.

Looks like IntegrationSpec signal rebalance events to actor test is flaky.

I do not see it failing anymore, but if we see that again, lets create an issue and fix it in a separate PR.

szymonm · 2019-04-15T12:22:01Z

I do not see it failing anymore, but if we see that again, lets create an issue and fix it in a separate PR.

👍

szymonm · 2019-04-15T12:23:41Z

What are the next steps with the PR?

2m · 2019-04-15T12:27:53Z

I am currently running the test with consumers = 3 and after a little tweak, where we let the copy flow to restart a couple more times, it is passing. I am going to push that and if that looks green on Travis, this will be good to go in.

2m · 2019-04-16T08:51:23Z

With multiple consumers the test fails with missing messages.

seglo

I left some minor comments. I think the only missing piece might be some updates to the documentation to explain the draining configuration and its purpose during the commit process. Though maybe that work could be deferred until after PR's for the synchronous rebalance listeners and partitioned transaction sources are merged, which will making distributed transactional workloads more robust.

I'm really happy to see this feature get used and hardened. Thanks a lot for all the hard work!

seglo · 2019-04-16T15:25:56Z

core/src/main/resources/reference.conf

@@ -93,6 +93,10 @@ akka.kafka.consumer {
  # This value is used instead of Kafka's default from `default.api.timeout.ms`
  # which is 1 minute.
  metadata-request-timeout = 5s
+
+  # Interval for cheking that transaction was completed before closing the consumer.


I don't see any more documentation for the end user about his new functionality. I suggest updating the Transactions section in the docs.

[Minor] Typo on "checking".

seglo · 2019-04-16T15:33:12Z

core/src/main/scala/akka/kafka/internal/TransactionalProducerStage.scala

+      }(materializer.executionContext)
+  }
+
+  val onInternallCommitAckCb: AsyncCallback[Boolean] = {


[Minor] Typo onInternalCommitAckCb

seglo · 2019-04-16T15:38:56Z

core/src/main/scala/akka/kafka/internal/TransactionalProducerStage.scala

@@ -159,21 +190,35 @@ private final class TransactionalProducerStageLogic[K, V, P](stage: Transactiona
  override def onCompletionFailure(ex: Throwable): Unit = {
    log.debug("Aborting transaction due to stage failure")
    abortTransaction()
+    batchOffsets.committingFailed()
    super.onCompletionFailure(ex)
  }

  private def commitTransaction(batch: NonemptyTransactionBatch, beginNewTransaction: Boolean): Unit = {
    val group = batch.group
    log.debug("Committing transaction for consumer group '{}' with offsets: {}", group, batch.offsetMap())


[Minor] Use the local variable offsetMap so we don't have to build it for each log line?

ennru

Great work, I left a few minor comments.

tests/src/test/scala/akka/kafka/scaladsl/TransactionsSpec.scala

core/src/main/scala/akka/kafka/internal/TransactionalSource.scala

core/src/main/scala/akka/kafka/internal/TransactionalProducerStage.scala

core/src/main/scala/akka/kafka/internal/MessageBuilder.scala

core/src/main/scala/akka/kafka/ConsumerSettings.scala

ennru · 2019-04-22T20:15:26Z

core/src/main/resources/reference.conf

+
+  # Interval for checking that transaction was completed before closing the consumer.
+  # Used in the transactional flow.
+  draining-check-interval = 30ms


I wonder if the setting name should contain eos or something. There are so many settings now it is hard to see which belong together.

tbh... eos is cryptic even for me...

but we may have some distinct section for transactional processing?

core/src/main/scala/akka/kafka/ConsumerMessage.scala

szymonm · 2019-04-24T13:08:04Z

The test for the failure case is still failing showing both duplication and data loss.

My intuition what happens is as follows:

Data loss
Since we send messages and offsets in transaction the only possibility to loose data is to loose some records while they are in the stream and commit records with higher offsets.
Duplication
We close consumer before the producer in a single stream. Thus another consumer starts consuming from the partition before the previous producer flushes last offsets.

szymonm · 2019-04-24T13:09:24Z

I'm also seeing some problems with the draining mechanism, specifically it happens that it waits for the last message that is never drained.

szymonm · 2019-04-24T13:12:36Z

Also setting higher backoff for restarts increases the probability of test success (from what I observed)

RestartSource
          .onFailuresWithBackoff(1.second, 5.seconds, 0.2)

2m · 2019-04-24T14:18:54Z

Since we send messages and offsets in transaction the only possibility to loose data is to loose some records while they are in the stream and commit records with higher offsets.

I only noticed lost messages when running that test-case with multiple consumers. Investigating that right now.

Duplication

I have not seen test-case failing because of duplication for quite some time now. The duplication check is first, and at least in Travis it passes now for quite some time before the test-case fails with missing elements check.

2m · 2019-04-26T09:48:29Z

Pushed two more commits that improve the failing test-case in a couple of ways. When it comes to duplicates, it could very well be, that we get duplicated messages when rebalance happens while consuming messages after transactional processing. For that I added offset tracking which will ignore duplicates that have same offsets. We should only care about duplicates with different offsets, because those would be from the transactions.

Also I moved the place where the transactional processing flow is terminated in the test stream, so the stream is always terminated with an error (from upstream) instead of cancellation (from downstream). Lets see what Travis thinks about that when it comes to missing messages.

szymonm · 2019-04-26T09:53:37Z

I'm looking into it too, and what worries me know is seeing java.lang.Error: Timeout while draining. My intuition is that with embedded kafka and timeout set to 15s we have plenty of time to drain.
Maybe timeout indicates that we don't pull correctly while draining?

szymonm · 2019-04-26T09:54:52Z

Also, when the timeout happens, we should make sure that close the producer before closing the consumer. This doesn't happen, right?

2m · 2019-04-26T10:17:31Z

Also, when the timeout happens, we should make sure that close the producer before closing the consumer.

Not explicitly, but there is a akka.kafka.consumer.stop-timeout that delays stopping the ConsumerStage so late commits can happen when using a CommittableSource.

I see that we are overriding stopConsumerActor in TransactionalSource and not doing the delay anymore. We should probably bring the delay back so the producer gets closed first.

Maybe timeout indicates that we don't pull correctly while draining?

Yea, noticed those timeouts as well. But I also see Source drained. in the logs. Could it be that the reply after draining does not come back to complete the ask Future?

szymonm · 2019-04-26T10:45:00Z

Yea, noticed those timeouts as well. But I also see Source drained. in the logs. Could it be that the reply after draining does not come back to complete the ask Future?

That was another idea I had!

szymonm · 2019-04-26T11:27:26Z

After increasing the parallelism of sending messages to kafka we don't have to wait for messages to be flushed. As a result we are not failing stream draining that often, which gives very good chance of the test to pass (didn't see a single failure on my machine after changing that).

It also speeds up the test a lot!

2m · 2019-04-29T11:30:05Z

Looks good. I also like the simplification in the Producer where the transaction is started sooner. That simplification means that these two commits are not needed anymore, because transaction is committed and new transaction is started in the same pass.

7be3450
9c04dcf

2m · 2019-04-30T11:31:17Z

Pushed a revert of those two commits. I'll do a docs update and then this will be good to go.

szymonm · 2019-04-30T11:31:39Z

Yeap, happy to see this simplified again!

ennru

Some nitpicks.

core/src/main/scala/akka/kafka/ConsumerMessage.scala

core/src/main/scala/akka/kafka/internal/SingleSourceLogic.scala

docs/src/main/paradox/transactions.md

2m · 2019-05-02T05:29:26Z

Rebased on top of master and addressed the latest feedback.

* extract rebalancing hanling and stopping consumer to separate methods * introduce CommittedMarker and hook it up to TransactionalMessage * add marking committed offsets logic to TransactionalProducerStage * add draining mechanism to TransactionalSource * add draining-check-interval to the reference config * handle committing failure when draining the stream * add test for transactions and multi messages * get continuous blocks for missing messages * distinguish duplicates that happen during counting from transactions * update documentation

szymonm mentioned this pull request Mar 25, 2019

Transactional flow duplicates inflight messages during rebalancing #758

Closed

szymonm changed the title ~~Add test for draining transactional flow~~ Implement draining logic for transactional flow Apr 1, 2019

2m reviewed Apr 1, 2019

View reviewed changes

core/src/main/scala/akka/kafka/internal/TransactionalProducerStage.scala Outdated Show resolved Hide resolved

2m reviewed Apr 2, 2019

View reviewed changes

core/src/main/scala/akka/kafka/internal/TransactionalSource.scala Outdated Show resolved Hide resolved

2m force-pushed the szymon-add-tests-for-draining-transactional-flow branch from c0e92db to 9c04dcf Compare April 4, 2019 11:15

szymonm commented Apr 4, 2019

View reviewed changes

jotek7 reviewed Apr 5, 2019

View reviewed changes

2m reviewed Apr 8, 2019

View reviewed changes

szymonm commented Apr 10, 2019

View reviewed changes

2m reviewed Apr 11, 2019

View reviewed changes

ennru mentioned this pull request Apr 11, 2019

Sprint plan Alpakka Team 2019-03-11 akka/akka-meta#97

Closed

13 tasks

2m force-pushed the szymon-add-tests-for-draining-transactional-flow branch from 91fa31f to 488baca Compare April 15, 2019 13:45

seglo reviewed Apr 16, 2019

View reviewed changes

ennru reviewed Apr 22, 2019

View reviewed changes

ennru approved these changes Apr 30, 2019

View reviewed changes

ennru added this to the 1.0.2 milestone Apr 30, 2019

2m force-pushed the szymon-add-tests-for-draining-transactional-flow branch from 58046cb to 7203516 Compare May 2, 2019 05:28

2m force-pushed the szymon-add-tests-for-draining-transactional-flow branch from 7203516 to 5101c54 Compare May 2, 2019 07:29

ennru merged commit 1b273e6 into akka:master May 2, 2019

ennru mentioned this pull request May 2, 2019

Clean up warnings from transactional PR #782

Merged

2m mentioned this pull request May 8, 2019

Transactional source and sink duplicates messages when restarted #756

Closed

seglo mentioned this pull request Oct 11, 2019

Add partitioned transactional source phase 1 #930

Merged

8 tasks

Implement draining logic for transactional flow #757

Implement draining logic for transactional flow #757

Conversation

szymonm commented Mar 25, 2019 • edited Loading

Pull Request Checklist

Fixes

Purpose

Background Context

References

szymonm commented Apr 1, 2019

2m commented Apr 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szymonm commented Apr 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2m commented Apr 9, 2019

2m commented Apr 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szymonm commented Apr 10, 2019 • edited Loading

szymonm commented Apr 10, 2019

2m commented Apr 10, 2019

szymonm commented Apr 11, 2019 • edited Loading

szymonm commented Apr 11, 2019

2m commented Apr 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2m commented Apr 11, 2019

szymonm commented Apr 12, 2019

szymonm commented Apr 12, 2019

szymonm commented Apr 12, 2019

szymonm commented Apr 12, 2019

szymonm commented Apr 12, 2019

2m commented Apr 15, 2019 • edited Loading

szymonm commented Apr 15, 2019 • edited Loading

szymonm commented Apr 15, 2019

2m commented Apr 15, 2019

2m commented Apr 16, 2019

seglo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seglo Apr 16, 2019 • edited Loading

Choose a reason for hiding this comment

ennru left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szymonm commented Apr 24, 2019

szymonm commented Apr 24, 2019 • edited Loading

szymonm commented Apr 24, 2019 • edited Loading

2m commented Apr 24, 2019

2m commented Apr 26, 2019

szymonm commented Apr 26, 2019

szymonm commented Apr 26, 2019

2m commented Apr 26, 2019

szymonm commented Apr 26, 2019

szymonm commented Apr 26, 2019

2m commented Apr 29, 2019

2m commented Apr 30, 2019

szymonm commented Apr 30, 2019

ennru left a comment

Choose a reason for hiding this comment

2m commented May 2, 2019

szymonm commented Mar 25, 2019 •

edited

Loading

szymonm commented Apr 10, 2019 •

edited

Loading

szymonm commented Apr 11, 2019 •

edited

Loading

2m commented Apr 15, 2019 •

edited

Loading

szymonm commented Apr 15, 2019 •

edited

Loading

seglo Apr 16, 2019 •

edited

Loading

szymonm commented Apr 24, 2019 •

edited

Loading

szymonm commented Apr 24, 2019 •

edited

Loading