Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[broker] Do not log stacktrace for 'Failed to flush mark-delete position' case #17432

Merged
merged 1 commit into from
Sep 14, 2022

Conversation

eolivelli
Copy link
Contributor

@eolivelli eolivelli commented Sep 2, 2022

Motivation

It is very common, under load, to see this stack trace, this is very annoying and it is scary for system administrator.

org.apache.bookkeeper.mledger.ManagedLedgerException: java.lang.IllegalArgumentException: Mark deleting an already mark-deleted position. Current mark-delete: 1160:823636 -- attempted mark delete: 1160:823579
Caused by: java.lang.IllegalArgumentException: Mark deleting an already mark-deleted position. Current mark-delete: 1160:823636 -- attempted mark delete: 1160:823579
        at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.setAcknowledgedPosition(ManagedCursorImpl.java:1715) ~[com.datastax.oss-managed-ledger-2.10.1.5-SNAPSHOT.jar:2.10.1.5-SNAPSHOT]
        at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.asyncMarkDelete(ManagedCursorImpl.java:1855) ~[com.datastax.oss-managed-ledger-2.10.1.5-SNAPSHOT.jar:2.10.1.5-SNAPSHOT]
        at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.flush(ManagedCursorImpl.java:3245) ~[com.datastax.oss-managed-ledger-2.10.1.5-SNAPSHOT.jar:2.10.1.5-SNAPSHOT]
        at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.lambda$flushCursors$1(ManagedLedgerFactoryImpl.java:238) ~[com.datastax.oss-managed-ledger-2.10.1.5-SNAPSHOT.jar:2.10.1.5-SNAPSHOT]
        at java.lang.Iterable.forEach(Iterable.java:75) ~[?:?]
        at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.lambda$flushCursors$2(ManagedLedgerFactoryImpl.java:238) ~[com.datastax.oss-managed-ledger-2.10.1.5-SNAPSHOT.jar:2.10.1.5-SNAPSHOT]
        at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4772) ~[?:?]
        at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.flushCursors(ManagedLedgerFactoryImpl.java:234) ~[com.datastax.oss-managed-ledger-2.10.1.5-SNAPSHOT.jar:2.10.1.5-SNAPSHOT]
        at org.apache.pulsar.common.util.Runnables$CatchingAndLoggingRunnable.run(Runnables.java:54) ~[com.datastax.oss-pulsar-common-2.10.1.5-SNAPSHOT.jar:2.10.1.5-SNAPSHOT]
        at org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203) ~[com.datastax.oss-bookkeeper-common-4.14.5.1.0.2.jar:4.14.5.1.0.2]
        at org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203) ~[com.datastax.oss-bookkeeper-common-4.14.5.1.0.2.jar:4.14.5.1.0.2]
        at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run(MoreExecutors.java:718) ~[com.google.guava-guava-31.0.1-jre.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]

Modifications

This is a simple fix that detects that case and does not print the full stacktrace.

  1. Don't print the stacktrace
  2. Do not log at WARN level but INFO

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

  • doc-not-needed

@github-actions
Copy link

github-actions bot commented Sep 2, 2022

@eolivelli Please provide a correct documentation label for your PR.
Instructions see Pulsar Documentation Label Guide.

@AnonHxy
Copy link
Contributor

AnonHxy commented Sep 2, 2022

It seems that this patch #17343 solve the same problem @hrsakai

@eolivelli eolivelli added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Sep 5, 2022
@eolivelli
Copy link
Contributor Author

@AnonHxy thank's for your pointer

In #17343 hiding all IllegalArgumentException is dangerous. we should only suppress that case.

I think that this approach is better

@merlimat you approved #17343, please take a look to this patch as well

@hrsakai
Copy link
Contributor

hrsakai commented Sep 5, 2022

@eolivelli
I see.

Checking for IllegalArgumentException in some logic, should we fix all of them?

e.g.

public void asyncClearBacklog(final ClearBacklogCallback callback, Object ctx) {
asyncMarkDelete(ledger.getLastPosition(), new MarkDeleteCallback() {
@Override
public void markDeleteComplete(Object ctx) {
callback.clearBacklogComplete(ctx);
}
@Override
public void markDeleteFailed(ManagedLedgerException exception, Object ctx) {
if (exception.getCause() instanceof IllegalArgumentException) {
// There could be a race condition between calling clear backlog and other mark delete operations.
// If we get an exception it means the backlog was already cleared in the meantime.
callback.clearBacklogComplete(ctx);
} else {
callback.clearBacklogFailed(exception, ctx);
}
}
}, ctx);
}


Also, should new exception extend ManagedLedgerException?

e.g.

-              if (exception.getCause() instanceof MarkDeletingMarkedPosition) {
+              if (exception instanceof MarkDeletingMarkedPosition) {
                    // this is not actually a problem, we should not log a stacktrace
                    log.info("[{}][{}] Cannot flush mark-delete position: {}", ledger.getName(),
                            name, exception.getCause().getMessage());
                } else {
                    log.warn("[{}][{}] Failed to flush mark-delete position", ledger.getName(), name, exception);
                }

@eolivelli
Copy link
Contributor Author

Checking for IllegalArgumentException in some logic, should we fix all of them?
Any other error is an unknown error and we must log it at WARN level.
This patch is handling a very specific, well-known case. The patch explicitly handles only that case.

Also, should new exception extend ManagedLedgerException?
No. ManagedLedgerException is a top level concept in Pulsar. we must not pollute it with this corner case.
Probably some day we will handle in a better way this situation and we will prevent this case from happening (but I don't see a clear way to deal with it)

@eolivelli eolivelli force-pushed the impl/reduce-log-markdelete-flush branch from 5e5d791 to 07ee9f0 Compare September 13, 2022 14:31
@eolivelli eolivelli merged commit 5a87c47 into apache:master Sep 14, 2022
@eolivelli eolivelli deleted the impl/reduce-log-markdelete-flush branch September 14, 2022 08:03
eolivelli added a commit to datastax/pulsar that referenced this pull request Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants