Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement] fix some successfully precommitted cannot be aborted lead to LabelAlreadyExists #362

Merged
merged 18 commits into from
Apr 15, 2024

Conversation

JNSimba
Copy link
Member

@JNSimba JNSimba commented Apr 9, 2024

Proposed changes

Issue Number: close #xxx

Problem Summary:

  1. In the case of writing to multiple tables, such as tables A, B, and C, if tables D and E are added during the period (there is no streamloader for this table in the previous Checkpoint), in the precommit phase, if table D succeeds but table E fails, this will retry from the last Checkpoint, but there is no streamloader of table D in the last checkpoint, so the txn of table D that has been successfully precommitted cannot be aborted, and the error LabelAlreadyExists will be reported next time

  2. In the scenario of writing to multiple tables, if the sink is multi-concurrent,
    At this time, the A thread reports an error, and the B thread Http request has ended, and the txnid of successful precommit has not been obtained. However, when jobmanager tries to cancel thread B, the txnid of thread B may not be aborted in time, and an error labelAlreadExist will be reported next time.

Therefore, when the thread exits, it must be aborted according to the label (only available on doris 2.1+)

  1. The simultaneous asynchronous check thread is closed in the case of multiple tables.
  2. During abort, it is currently based on txnidabort, so a pre-request with the same label will be initiated first. So if this pre-request abort fails, the label will already exist during the next write.
    For example:
    restore from ck-106
    label107 alread exist,abort label107
    label108 not use,abort fail
    restore from ck-106
    label107 not use,exit abort
    label108 write,label already exists

At this time, you need to actively abort the label and let flink restart

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@JNSimba JNSimba changed the title [improvement] add abort log [improvement] fix some successfully precommitted cannot be aborted Apr 10, 2024
@JNSimba JNSimba changed the title [improvement] fix some successfully precommitted cannot be aborted [improvement] fix some successfully precommitted cannot be aborted lead to LabelAlreadyExists Apr 10, 2024
Copy link
Member

@CalvinKirs CalvinKirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done

@JNSimba JNSimba merged commit 616c9b2 into apache:master Apr 15, 2024
6 checks passed
@JNSimba JNSimba mentioned this pull request May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants