Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with black hole detection #1540

Closed
stormshield-guillaumed opened this issue Apr 17, 2023 · 14 comments
Closed

Issue with black hole detection #1540

stormshield-guillaumed opened this issue Apr 17, 2023 · 14 comments

Comments

@stormshield-guillaumed
Copy link

We (Stormshield) found an issue with the black hole detection. First, some context :

  • We use QUIC to tunnel network traffic
  • The traffic is captured on a TUN interface
  • It is sent using QUIC datagrams
  • The TUN is configured with a MTU of 1412
  • The QUIC initial max UDP payload size is configured at 1472

The bug was found during an iperf3 bench between two peers. After an unknown amount of time, the upload traffic reaches 0Mbits/s and never goes up again. After some digging, we found this code, which reset the MTU to 1200 when a black hole is detected, whether PLPMTUD is enabled or not.

Using #1529 fixes the issue because no black hole is detected during the iperf3 bench. We think an issue remains because, if a black hole is detected in another situation, the MTU is still reset. See the line 148 of mtud.rs of #1529 self.current_mtu = BASE_UDP_PAYLOAD_SIZE;.

It would be nice to control the reset value. If you need any additional information, you can ask.

@aochagavia
Copy link
Contributor

From an API design perspective, I see a few alternatives:

  1. Restore the original behavior of initial_max_udp_payload_size: if you get it wrong, you can cause an unrecoverable black hole. This means that we are free to use initial_max_udp_payload_size as a reset value when a black hole is detected.
  2. Keep the current behavior of initial_max_udp_payload_size and introduce a minimum_max_udp_payload_size (or something with a better name), meant to be used as MTU when a black hole is detected. Obviously, initial_max_udp_payload_size should be >= minimum_max_udp_payload_size.

I feel inclined towards 2, because IMO it separates concerns better (e.g. I can imagine someone might want to start with a high initial MTU and fall back to a lower one in case of a black hole).

@djc
Copy link
Member

djc commented Apr 17, 2023

Avoiding unrecoverable black holes definitely sounds preferable to having them. I'm still not completely sure on what the shortcoming is here. Why does resetting the MTU to 1200 cause throughput to go to 0 without discovery increasing it back to reasonable levels?

@stormshield-damiend
Copy link
Contributor

stormshield-damiend commented Apr 17, 2023

Cause our application does not pass the error SendDatagramError::TooLarge back to the TUN and thus the operating system PMTUD cannot lower the TUN MTU by itself.

Maybe later we will do it but it is not supported for now.

@djc
Copy link
Member

djc commented Apr 18, 2023

So basically you want us to implement a workaround for missing functionality in your TUN layer? I'm not entirely sure this is reasonable.

@stormshield-damiend
Copy link
Contributor

stormshield-damiend commented Apr 18, 2023

Indeed, our use case is particular, but this question is still valid : if the user set an initial_max_udp_payload_size explicitly, is it reasonable for the black hole detector to reset the MTU to a lower value than this one ?

@aochagavia
Copy link
Contributor

aochagavia commented Apr 18, 2023

Given Quinn's big user base, there is a chance other users will want to have an escape hatch now we have introduced black hole detection (e.g. similar to how the default in rustls is to enforce validating certificates, yet you can go out of your way to disable that using the dangerous_configuration feature, knowing you might cause security problems by doing so).

In an analogous case, someone in an "I know what I'm doing" scenario might want to tell Quinn that there is a guaranteed MTU that can be relied upon, even when the black hole detector thinks otherwise (this used to exist in the form of initial_max_udp_payload_size but was effectively removed by my changes).

@stormshield-guillaumed
Copy link
Author

without discovery increasing it back to reasonable levels?

It is because the MTU discovery is disabled, so, when the black hole is detected, the MTU is reset to 1200 and never changes again.

@djc
Copy link
Member

djc commented Apr 18, 2023

In an analogous case, someone in an "I know what I'm doing" scenario might want to tell Quinn that there is a guaranteed MTU that can be relied upon, even when the black hole detector thinks otherwise (this used to exist in the form of initial_max_udp_payload_size but was effectively removed by my changes).

Okay, I guess that's a more reasonable use case.

Indeed, our use case is particular, but this question is still valid : if the user set an initial_max_udp_payload_size explicitly, is it reasonable for the black hole detector to reset the MTU to a lower value than this one ?

I mean, that's how we defined the inital_max_udp_payload_size right now -- as an initial starting point, not as a minimum. Falling back to 1200 is the safest option.

Why does resetting the MTU to 1200 cause throughput to go to 0 without discovery increasing it back to reasonable levels?

Still thinking about this... Maybe I'm being dense but I think the main effect of Quinn having a smaller defined MTU for the connection is that it submits smaller packets, right, or is there something else that it does? And if this is the only effect, why does bandwidth go to 0? Submitting 1200-sized packets over a 1412-MTU tunnel should be fine?

@aochagavia
Copy link
Contributor

aochagavia commented Apr 18, 2023

Still thinking about this... Maybe I'm being dense but I think the main effect of Quinn having a smaller defined MTU for the connection is that it submits smaller packets, right, or is there something else that it does? And if this is the only effect, why does bandwidth go to 0? Submitting 1200-sized packets over a 1412-MTU tunnel should be fine?

I was also confused by this at first, because I've not worked with TUN interfaces in the past. This is my current understanding: the TUN's MTU is the MTU supported by the OS networking stack, whereas Quinn is used as the underlying layer, which will transport the packets passed to the TUN interface. In that case, the TUN interface will pass 1412-sized packets to Quinn, and they will never get through.

@stormshield-damiend Is the diagram below accurate? (I guess the arrows should go both ways, but the point is clear)

image

@djc
Copy link
Member

djc commented Apr 18, 2023

So you're implementing something where a TUN packet must result in exactly one UDP packet generated by Quinn? This also not really something Quinn is designed for...

@stormshield-damiend
Copy link
Contributor

One tun packet is send as one Quic Datagram.

@djc
Copy link
Member

djc commented Apr 18, 2023

Okay, I guess it might make sense to add a TransportConfig::min_udp_payload_size.

@Ralith
Copy link
Collaborator

Ralith commented Apr 19, 2023

I concur. Setting a hard lower bound for the search is gracefully symmetrical to the upper bound we already expose. Maybe place it alongside that in MtuDiscoveryConfig?

aochagavia added a commit to aochagavia/quinn that referenced this issue Apr 19, 2023
aochagavia added a commit to aochagavia/quinn that referenced this issue Apr 20, 2023
aochagavia added a commit to aochagavia/quinn that referenced this issue Apr 20, 2023
aochagavia added a commit to aochagavia/quinn that referenced this issue Apr 20, 2023
@stormshield-guillaumed
Copy link
Author

Fixed by #1529

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants