Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV during SSL.freeSSL on netty-tcnative-boringssl-static v2.0.61.Final #842

Open
gavinbunney opened this issue Feb 2, 2024 · 21 comments

Comments

@gavinbunney
Copy link

We periodically see crashes in io.netty.internal.tcnative.SSL.freeSSL running netty-tcnative-boringssl-static v2.0.61.Final (with netty 4.1.106.Final). This appears to happen on around 8 instances in our fleet each day, without any particular noticeable repro pattern.

The invalid memory reference happens during the ssl engine shutdown, sslReadErrorResult, when freeing the ssl engine refs.

full hotspot error - hs_err_pid4344.log

I have a few other hotspot error logs as well, but their thread stack show the same information:

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0xa9af5]  free+0x25
C  [libnetty_tcnative_linux_x86_642453035842228574760.so+0x715fa]
C  [libnetty_tcnative_linux_x86_642453035842228574760.so+0x6cda1]
C  [libnetty_tcnative_linux_x86_642453035842228574760.so+0x6cfb0]
C  [libnetty_tcnative_linux_x86_642453035842228574760.so+0x35a3c]
C  [libnetty_tcnative_linux_x86_642453035842228574760.so+0x36266]
C  [libnetty_tcnative_linux_x86_642453035842228574760.so+0x2b685]
J 48895  io.netty.internal.tcnative.SSL.freeSSL(J)V (0 bytes) @ 0x00007fc7d6143171 [0x00007fc7d61430a0+0x00000000000000d1]
J 85744 c2 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(IIII)Ljavax/net/ssl/SSLEngineResult; (35 bytes) @ 0x00007fc7d6cf3a30 [0x00007fc7d6cf3740+0x00000000000002f0]
J 82804 c2 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap([Ljava/nio/ByteBuffer;II[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult; (1471 bytes) @ 0x00007fc7d8adfd44 [0x00007fc7d8ade500+0x0000000000001844]
J 44080 c2 io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(Lio/netty/handler/ssl/SslHandler;Lio/netty/buffer/ByteBuf;ILio/netty/buffer/ByteBuf;)Ljavax/net/ssl/SSLEngineResult; (138 bytes) @ 0x00007fc7d6544164 [0x00007fc7d6543da0+0x00000000000003c4]
J 45287 c2 io.netty.handler.ssl.SslHandler.unwrap(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;I)I (514 bytes) @ 0x00007fc7d66f7cb4 [0x00007fc7d66f7680+0x0000000000000634]
J 44707 c2 io.netty.handler.ssl.SslHandler.decode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (34 bytes) @ 0x00007fc7d66198dc [0x00007fc7d6619840+0x000000000000009c]
J 41203 c2 io.netty.handler.codec.ByteToMessageDecoder.callDecode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (167 bytes) @ 0x00007fc7d6005fb4 [0x00007fc7d6005e80+0x0000000000000134]
J 58537 c2 io.netty.handler.codec.ByteToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (413 bytes) @ 0x00007fc7d78babb8 [0x00007fc7d78ba7e0+0x00000000000003d8]
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0xa9af5]  free+0x25
C  [libnetty_tcnative_linux_x86_6417461460711337491563.so+0x715fa]
C  [libnetty_tcnative_linux_x86_6417461460711337491563.so+0x6cda1]
C  [libnetty_tcnative_linux_x86_6417461460711337491563.so+0x6cfb0]
C  [libnetty_tcnative_linux_x86_6417461460711337491563.so+0x35a3c]
C  [libnetty_tcnative_linux_x86_6417461460711337491563.so+0x36266]
C  [libnetty_tcnative_linux_x86_6417461460711337491563.so+0x2b685]
J 46865  io.netty.internal.tcnative.SSL.freeSSL(J)V (0 bytes) @ 0x00007f03b4ffc4f1 [0x00007f03b4ffc420+0x00000000000000d1]
J 90883 c2 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(IIII)Ljavax/net/ssl/SSLEngineResult; (35 bytes) @ 0x00007f03b57e1fb8 [0x00007f03b57e1cc0+0x00000000000002f8]
J 86006 c2 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap([Ljava/nio/ByteBuffer;II[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult; (1471 bytes) @ 0x00007f03b8d7293c [0x00007f03b8d71100+0x000000000000183c]
J 40130 c2 io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(Lio/netty/handler/ssl/SslHandler;Lio/netty/buffer/ByteBuf;ILio/netty/buffer/ByteBuf;)Ljavax/net/ssl/SSLEngineResult; (138 bytes) @ 0x00007f03b5cc08e4 [0x00007f03b5cc0520+0x00000000000003c4]
J 64025 c2 io.netty.handler.ssl.SslHandler.unwrap(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;I)I (514 bytes) @ 0x00007f03b7d00bd0 [0x00007f03b7d00500+0x00000000000006d0]
J 43989 c2 io.netty.handler.ssl.SslHandler.decode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (34 bytes) @ 0x00007f03b52f855c [0x00007f03b52f84c0+0x000000000000009c]
J 41196 c2 io.netty.handler.codec.ByteToMessageDecoder.callDecode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (167 bytes) @ 0x00007f03b6035728 [0x00007f03b6035600+0x0000000000000128]
J 103617 c2 io.netty.handler.codec.ByteToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (413 bytes) @ 0x00007f03b5d06db4 [0x00007f03b5d06a20+0x0000000000000394]
@conet
Copy link

conet commented Feb 18, 2024

We're also seeing this crash, it seems to occur more frequently on services that have more ssl sessions per timeframe.

@conet
Copy link

conet commented Feb 18, 2024

Seems to be the same report in issue #833

@conet
Copy link

conet commented Feb 18, 2024

hs_err_pid389199.log

He're our hotspot error file, it happens with both java 11 and java 21.

@conet
Copy link

conet commented Feb 18, 2024

I would like to add that we're using netty via vert.x and the crash only appears when multiple server instances are deployed, when only one instance is deployed the crash does not happen.

@conet
Copy link

conet commented Feb 18, 2024

Sorry, I spoke too soon, it also happens when only one verticle is deployed.

@conet
Copy link

conet commented Feb 19, 2024

Just to give a recap of what happened yesterday: a few of our instances were overloaded (possible because of a DDoS attack) and under that load the process with openssl enabled would crash in a few seconds after start. If we switched the same process to the jdk ssl implementation the process would not crash. I'm not sure how helpful is this but it's a pointer into the right direction, there is a bug in the openssl implementation that appears with a frequency proportional to the usage of ssl code.

This also happens in:

netty-tcnative-boringssl-static v2.0.62.Final
netty-common 4.1.107.Final

@normanmaurer
Copy link
Member

Let me have a look... Never saw this in prod here tho.

@normanmaurer
Copy link
Member

@conet @gavinbunney would it be possible to run with: -Dio.netty.native.deleteLibAfterLoading=false and also enable core-dumps ?

@conet
Copy link

conet commented Feb 19, 2024

I will try to create a reproducer, if the number of ssl sessions is high enough is should work.

@normanmaurer
Copy link
Member

@conet thanks a lot

@normanmaurer
Copy link
Member

I wonder if it might be caused by #850

@normanmaurer
Copy link
Member

@conet @gavinbunney please check if this still happens with 2.0.63.Final

@conet
Copy link

conet commented Feb 20, 2024

Unfortunately it is still happening I will try to create the reproducer.

@normanmaurer
Copy link
Member

@conet ok... waiting for the reproducer then as I cant reproduce

@conet
Copy link

conet commented Feb 21, 2024

To be able to create the reproducer I tried to create a client that would crash the version that used 2.0.62.Final, I failed to do that no matter how I tried to overload the server which means that the public traffic that was causing this contains something that makes it more likely to happen and I failed to simulate that (we only saw the crash on traffic open to the internet). It's hard to find a tool out there that simulates connection open/close, I tried ab (apache benchmark) but I think even with a high concurrency once the connections are established they are reused. Which is different than the traffic that is causing the crash with a high rate of SSL connection create/destroy. An alternative is to write code that does that, so it will take some time.

@normanmaurer
Copy link
Member

@conet without a reproducer it is almost impossible for me to find the root cause... I inspected the code but cant see anything wrong atm :/

@gavinbunney
Copy link
Author

Thanks @normanmaurer for the changes. We are still seeing the crashes as well, with the same hotspot error running 2.0.63.Final alongside Netty 4.1.107.Final:

# C  [libc.so.6+0xa9af5]  free+0x25
...
Stack: [0x00007fa4aec00000,0x00007fa4aed00000],  sp=0x00007fa4aecfd250,  free space=1012k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0xa9af5]  free+0x25
C  [libnetty_tcnative_linux_x86_648083750136349949753.so+0x70d2a]
C  [libnetty_tcnative_linux_x86_648083750136349949753.so+0x6c661]
C  [libnetty_tcnative_linux_x86_648083750136349949753.so+0x6c850]
C  [libnetty_tcnative_linux_x86_648083750136349949753.so+0x34fbc]
C  [libnetty_tcnative_linux_x86_648083750136349949753.so+0x35796]
J 46322  io.netty.internal.tcnative.SSL.freeSSL(J)V (0 bytes) @ 0x00007fbdb080c571 [0x00007fbdb080c4a0+0x00000000000000d1]
J 92389 c2 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(IIII)Ljavax/net/ssl/SSLEngineResult; (35 bytes) @ 0x00007fbdb50496b8 [0x00007fbdb50493c0+0x00000000000002f8]
J 78633 c2 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap([Ljava/nio/ByteBuffer;II[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult; (1471 bytes) @ 0x00007fbdb46e41f8 [0x00007fbdb46e2800+0x00000000000019f8]
J 40272 c2 io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(Lio/netty/handler/ssl/SslHandler;Lio/netty/buffer/ByteBuf;ILio/netty/buffer/ByteBuf;)Ljavax/net/ssl/SSLEngineResult; (138 bytes) @ 0x00007fbdb0e3f6c0 [0x00007fbdb0e3f380+0x0000000000000340]
J 64149 c2 io.netty.handler.ssl.SslHandler.unwrap(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;I)I (514 bytes) @ 0x00007fbdb2e52d98 [0x00007fbdb2e52800+0x0000000000000598]
J 43196 c2 io.netty.handler.ssl.SslHandler.decode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (34 bytes) @ 0x00007fbdb1af2640 [0x00007fbdb1af25a0+0x00000000000000a0]

crash on 2.0.63.Final - hs_err_pid4365.log

@normanmaurer
Copy link
Member

Do you have a reproducer ?

@gavinbunney
Copy link
Author

gavinbunney commented Feb 27, 2024

Not yet :( We do see some logs with (about 70s before the crash):

io.netty.handler.codec.DecoderException: io.netty.handler.ssl.ReferenceCountedOpenSslEngine$OpenSslException: error:100003e8:SSL routines:OPENSSL_internal:SSLV3_ALERT_CLOSE_NOTIFY
  at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499)
  at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
  at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)
....
io.netty.handler.ssl.ReferenceCountedOpenSslEngine$OpenSslException: error:100003e8:SSL routines:OPENSSL_internal:SSLV3_ALERT_CLOSE_NOTIFY
  at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.newSSLExceptionForError(ReferenceCountedOpenSslEngine.java:1391)
  at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1103)
  at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1413)
  at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1339)
  at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1440)
  at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1483)
  at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:224)
  at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1445)
  at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1349)
  at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1389)
  at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
  at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
  at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
  at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)

@normanmaurer
Copy link
Member

bummer... keep me posted. I tried everything to reproduce but no look :/

@normanmaurer
Copy link
Member

Does this still happen with 2.0.66 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants