-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
increase TLS passthrough buffer size from 4k to 16K #9540
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@scholzj , thank you very much for details of the issue. It helps a lot.
Secondly, there is a plan to re-implement TLS passthrough and related functionality. It is to be attempted sooner than later. I am not certain if this will impact your use-case but I think it likely will. I would recommend that much more information be exchanged. Please feel free to attend the community meetings deails here https://github.com/kubernetes/community/tree/master/sig-network 16K and passthrough means someone needs to authoritatively talk about potential of undesired payload in that 16K and other stuff that is specific to TLS and the current passthrough implementation. Nginx seems to having a native streaming ability. /remove-kind bug /retitle increase TLS passthrough buffer size from 4k to 16K |
@longwuyuan Thanks for the response. TBH, I managed to debug it up to identifying that the buffer is the problem here. But I'm not a TLS expert or Ingress expert to judge the consequences of increasing the buffer. And as I said, I have no idea if tomorrow someone will not come and say it needs to be 32K or something like that. If you would decide that making it configurable to offload the decision to the users is the right way, I can try to contribute it. But if a rewrite of the TLS passthrough is planned anyway, I understand that such a solution might not be desired. In the meantime, it seems that setting the Java system property |
That is ok because you have presented a genuine problem that is being worked on. Please do keep track of project changes and hopefully join a community meeting. Once the actual re-implementation of the TLS passthrough function is underway, you will see a related PR. |
I don't know the ramifications of changing this so I think a good compromise is to make it configurable by the end user and not hard set at 4k. |
Ok it seems to make sense to make it 16k by default but I still like the idea to make configurable. Byte 0 = SSL record type |
Thanks for fixing this! 😍 |
@scholzj is that on the Strimzi brokers? We've added this and still seeing the issue 😞 Is there any further change required to the Java clients as well or is it only the brokers that require this? Our config looks as below:
|
@nkostoulas To be honest, I do not remember it in detail anymore as it has been fixed for some time now and I did not need to use it anymore. As far as I remember, it was on the broker side, I do not think I had to do anything else on the client side. |
I see, thanks @scholzj . |
What happened:
In Strimzi with support for the Ingress Nginx controller as one of the ways to access Apache Kafka from outside of Kubernetes. Since Apache Kafka has its own binary protocol not based on HTTP, we have to use TLS-passthrough for it. For a long time, this worked fine and it is relatively popular among our users (BTW: Thanks for all the effort you put into this project).
But recently, we moved from Java 11 to Java 17 as the runtime for our Apache Kafka clusters. Java 17 seems to change slightly the TLS behavior. In particular, the applications using Java 17 seem to try to reestablish the TLS sessions when possible and they use for it a relatively big session ticket which was not used with Java 11. You can see an example of such TLS
ClientHello
here.It looks like the Nginx Ingress controller handles the TLS Passthrough itself in the
tcp.go
file and according to thetcp.go
it seems to allocate only 4096 bytes to get theClientHello
, decode it, get the TLS-SNI hostname from it and forward the connection. But when theClientHello
is too big and does not fit into the 4096 bytes, it fails to decode the TLS-SNI hostname and simply forwards the connection to the default backend instead of treating it as TLS Passthrough and forwarding it to the appropriate service.In the controller logs, I can see something like this for the initial connection which is successful because the first
ClientHello
is small:But for the follow-up connections which try to reuse the session and send the large
ClientHello
with the session ticket, I get only something like this when it forwards the connection to the default backend:To double-check the idea that the 4096 bytes are the issue, I tried to build my custom build of the controller which changes the buffer size from 4096 bytes to 16384 bytes. And that indeed seems to address the problem.
TBH, I'm not a TLS expert. So I do not really know if the 16384 bytes work for every case or if someone would need it bigger. But they helped my applications (I tried 8192 as well, but that was not enough).
Would it be possible to increase the value of the buffer for the TLS passthrough feature? Or maybe make it configurable if you are afraid that increasing it by default?
What you expected to happen:
That the TLS passthrough feature works even with Java 17 / applications that send
ClientHello
larger than 4096 bytes.NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): 1.5.1
Kubernetes version (use
kubectl version
): 1.26.0Environment:
How to reproduce this issue:
If needed, I can provide a guide on how to install Strimzi and Apache Kafka to reproduce it. Or I can try to create some more simple reproducer. But I was kinda hoping that the explanation above is sufficient. Please let me know.
The text was updated successfully, but these errors were encountered: