Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BugFix] Fix thrift rpc not reopen after failed (#49619)
## Why I'm doing: When an error occurs on rpc. client->reopen needs to be called, otherwise client parsing will return to the connection pool. This causes other rpc's to fail. ``` ~ClientConnection() { if (_client != nullptr) { _client_cache->release_client(&_client); } } ``` This programming model is very bad, but this PR needs backport. i will refactor the client cache related logic in the next PR. ## What I'm doing: Fixes case: ``` W0805 16:55:14.940848 1227 thrift_rpc_helper.cpp:86] call frontend service failed, address=TNetworkAddress(=****/), port=9020), reason=invalid TType ``` reproduce case: apply this patch to FE ``` diff --git a/fe/fe-core/src/main/java/com/starrocks/common/Config.java b/fe/fe-core/src/main/java/com/starrocks/common/Config.java index 750c9e759c..f45512f424 100644 --- a/fe/fe-core/src/main/java/com/starrocks/common/Config.java +++ b/fe/fe-core/src/main/java/com/starrocks/common/Config.java @@ -1303,6 +1303,9 @@ public class Config extends ConfigBase { @ConfField public static boolean enable_udf = false; + @ConfField(mutable = true) + public static int sleep_times = 30; + @ConfField(mutable = true) public static boolean enable_decimal_v3 = true; diff --git a/fe/fe-core/src/main/java/com/starrocks/service/FrontendServiceImpl.java b/fe/fe-core/src/main/java/com/starrocks/service/FrontendServiceImpl.java index 5efbe38d83..af02ecbd7d 100644 --- a/fe/fe-core/src/main/java/com/starrocks/service/FrontendServiceImpl.java +++ b/fe/fe-core/src/main/java/com/starrocks/service/FrontendServiceImpl.java @@ -589,6 +589,13 @@ public class FrontendServiceImpl implements FrontendService.Iface { if (!params.isSetUser_ident()) { throw new TException("missed user_identity"); } + if (Config.sleep_times > 0) { + try { + Thread.sleep(1000 * Config.sleep_times); + } catch (InterruptedException e) { + throw new RuntimeException(e); + } + } // TODO: check privilege UserIdentity userIdentity = UserIdentity.fromThrift(params.getUser_ident()); @@ -1146,6 +1153,13 @@ public class FrontendServiceImpl implements FrontendService.Iface { @OverRide public TReportExecStatusResult reportExecStatus(TReportExecStatusParams params) throws TException { + if (Config.sleep_times > 0) { + try { + Thread.sleep(1000 * Config.sleep_times); + } catch (InterruptedException e) { + throw new RuntimeException(e); + } + } return QeProcessorImpl.INSTANCE.reportExecStatus(params, getClientAddr()); } ``` then send the query multi times. we will see be: ``` W20240808 21:01:52.651160 139898530641472 pipeline_driver_executor.cpp:346] [Driver] Fail to report exec state: fragment_instance_id=38ad1356-5586-11ef-a78f-c26b621cd046, status: Internal erro r: ReportExecStatus() to TNetworkAddress(hostname=172.17.0.1, port=8505) failed: THRIFT_EAGAIN (timed out), retry_times=1 W20240808 21:01:52.789011 139897653958208 thrift_rpc_helper.cpp:135] Rpc error: FE RPC response parsing failure, address=TNetworkAddress(hostname=172.17.0.1, port=8505).The FE may be busy, ple ase retry later W20240808 21:01:52.789011 139897653958208 thrift_rpc_helper.cpp:135] Rpc error: FE RPC response parsing failure, address=TNetworkAddress(hostname=172.17.0.1, port=8505).The FE may be busy, ple ase retry later ``` Signed-off-by: stdpain <[email protected]> (cherry picked from commit c8bd900)
- Loading branch information