Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle INACTIVE partition role in gateway #6053

Closed
menski opened this issue Jan 4, 2021 · 4 comments · Fixed by #6119, #6129 or #6130
Closed

Handle INACTIVE partition role in gateway #6053

menski opened this issue Jan 4, 2021 · 4 comments · Fixed by #6119, #6129 or #6130
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug severity/low Marks a bug as having little to no noticeable impact for the user

Comments

@menski
Copy link
Contributor

menski commented Jan 4, 2021

Describe the bug
When a partition fails to recover or goes unhealthy the partition switches to the role INACTIVE. The gateway cannot handle the INACTIVE role and fails to decode it and show it correctly in the topology.

To Reproduce

Break a partition or corrupt a snapshot so the partition cannot recover.

Expected behavior

The gateway can handle INACTIVE partitions.

Log/Stacktrace
If possible add the full stacktrace or Zeebe log which contains the issue.

Full Stacktrace { "insertId": "nprv5kyjydx40a283", "jsonPayload": { "serviceContext": { "service": "zeebe", "version": "0.25.3" }, "context": { "threadName": "Broker-0-zb-actors-0", "threadId": 28, "loggerName": "io.zeebe.protocol", "actor-name": "GatewayTopologyManager", "threadPriority": 5 }, "message": "Failed to decode broker info, found unknown partition role: INACTIVE", "logger": "io.zeebe.protocol", "thread": "Broker-0-zb-actors-0" }, "timestamp": "2021-01-04T08:46:45.794372Z", "severity": "WARNING", "logName": "projects/camunda-cloud-240911/logs/stdout", "sourceLocation": { "file": "BrokerInfo.java", "line": "443", "function": "lambda$consumePartitions$1" }, "receiveTimestamp": "2021-01-04T08:46:49.049897806Z" }

<STACKTRACE>

Environment:

  • OS: [e.g. Linux]
  • Zeebe Version: [e.g. 0.20.0]
  • Configuration: [e.g. exporters etc.]
@menski menski added kind/bug Categorizes an issue or PR as a bug severity/low Marks a bug as having little to no noticeable impact for the user Impact: Usability labels Jan 4, 2021
@npepinpe
Copy link
Member

npepinpe commented Jan 5, 2021

Are there any side effects, or it just logs a warning but nothing else happens? Assumption here is once a new leader is up it would effectively be identified, so it's just that the old leader might not be removed "fast".

@menski
Copy link
Contributor Author

menski commented Jan 5, 2021

Until now we only saw it in single node clusters, where the SNAPSHOT could not be read on recovery. So I don't know if there are effects as the system is already broken. But the warnings are constantly printed which should be fixed and just handled nicely.

@miguelpires miguelpires self-assigned this Jan 14, 2021
@ghost ghost closed this as completed in bdf24e8 Jan 18, 2021
@miguelpires miguelpires linked a pull request Jan 18, 2021 that will close this issue
9 tasks
@miguelpires miguelpires linked a pull request Jan 18, 2021 that will close this issue
9 tasks
ghost pushed a commit that referenced this issue Jan 18, 2021
6129: [Backport 0.26] Handle INACTIVE role in gateway and clients r=saig0 a=MiguelPires

## Description

Backports PR to handle inactive in the gateway and clients. The backport doesn't change anything about the PR, the only conflict was in `go.sum`

## Related issues

Related to #6119
closes #6053 

## Definition of Done

_Not all items need to be done depending on the issue and the pull request._

Code changes:
* [ ] The changes are backwards compatibility with previous versions
* [ ] If it fixes a bug then PRs are created to [backport](https://github.com/zeebe-io/zeebe/compare/stable/0.24...develop?expand=1&template=backport_template.md&title=[Backport%200.24]) the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. `backport stable/0.25`) to the PR, in case that fails you need to create backports manually.

Testing:
* [ ] There are unit/integration tests that verify all acceptance criterias of the issue
* [ ] New tests are written to ensure backwards compatibility with further versions
* [ ] The behavior is tested manually
* [ ] The change has been verified by a QA run
* [ ] The impact of the changes is verified by a benchmark 

Documentation: 
* [ ] The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
* [ ] New content is added to the [release announcement](https://drive.google.com/drive/u/0/folders/1DTIeswnEEq-NggJ25rm2BsDjcCQpDape)


Co-authored-by: Miguel Pires <[email protected]>
ghost pushed a commit that referenced this issue Jan 18, 2021
6130: [Backport 0.25] Handle INACTIVE role in gateway and clients r=saig0 a=MiguelPires

## Description

Backports PR to handle inactive in the gateway and clients. There were some conflicts caused by some health topology stuff (the develop branch sends health info to the clients and 0.25 doesn't)

## Related issues

related to #6119
closes #6053 

## Definition of Done

_Not all items need to be done depending on the issue and the pull request._

Code changes:
* [ ] The changes are backwards compatibility with previous versions
* [ ] If it fixes a bug then PRs are created to [backport](https://github.com/zeebe-io/zeebe/compare/stable/0.24...develop?expand=1&template=backport_template.md&title=[Backport%200.24]) the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. `backport stable/0.25`) to the PR, in case that fails you need to create backports manually.

Testing:
* [ ] There are unit/integration tests that verify all acceptance criterias of the issue
* [ ] New tests are written to ensure backwards compatibility with further versions
* [ ] The behavior is tested manually
* [ ] The change has been verified by a QA run
* [ ] The impact of the changes is verified by a benchmark 

Documentation: 
* [ ] The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
* [ ] New content is added to the [release announcement](https://drive.google.com/drive/u/0/folders/1DTIeswnEEq-NggJ25rm2BsDjcCQpDape)


Co-authored-by: Miguel Pires <[email protected]>
@Bec-k
Copy link

Bec-k commented Mar 10, 2021

This is still not fixed in 1.0.0-alpha2

@ChrisKujawa
Copy link
Member

Can you elaborate on that @denissabramovs

github-merge-queue bot pushed a commit that referenced this issue Mar 14, 2024
* chore: add cross-env support for playwright scripts

* test: fix process instance modification screenshot test

* docs: add screenshots for process instance migration
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug severity/low Marks a bug as having little to no noticeable impact for the user
Projects
None yet
5 participants