Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate improper TLS RSA certificates early-on #3496

Closed
wants to merge 2 commits into from

Conversation

yacovm
Copy link
Contributor

@yacovm yacovm commented Oct 24, 2024

Why this should be merged

This commit makes validation of TLS certificates with either too big RSA keys, or the wrong exponent, fail as soon as the remote node presents its TLS certificate, in contrast to after the TLS handshake.

How this works

Activates the VerifyConnection in the tls.Config, which makes the TLS handshake code in the golang TLS stack invoke the function on the TLS certificate chain presented by the remote node.

How this was tested

Added a unit test to added code.

Added unit tests for existing code, because why not?

Need to be documented in RELEASES.md?

No.

@yacovm yacovm force-pushed the TLS-RSA branch 6 times, most recently from bd1c956 to f4d5987 Compare October 24, 2024 22:38
@yacovm yacovm self-assigned this Oct 24, 2024
@yacovm yacovm added the enhancement New feature or request label Oct 24, 2024
network/peer/tls_config.go Outdated Show resolved Hide resolved
network/peer/tls_config.go Outdated Show resolved Hide resolved
expectedErr: errors.New("no certificates sent by peer"),
},
{
description: "No TLS certs given",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is the same as the previous test case.

network/peer/tls_config_test.go Show resolved Hide resolved
network/peer/tls_config_test.go Outdated Show resolved Hide resolved
staking/parse.go Show resolved Hide resolved
network/peer/upgrader_test.go Show resolved Hide resolved
network/peer/upgrader_test.go Show resolved Hide resolved
network/peer/upgrader_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@marun marun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were this change deployed, would it be possible for the rsa certs of existing validators to fail this new validation? If so, what would be the impact?

@yacovm
Copy link
Contributor Author

yacovm commented Oct 25, 2024

Were this change deployed, would it be possible for the rsa certs of existing validators to fail this new validation? If so, what would be the impact?

No, because all I did was moving the code that is called after the TLS handshake, during the TLS handshake when we parse the TLS key of the client but before we use that TLS key.

input: func() tls.ConnectionState {
return tls.ConnectionState{}
},
expectedErr: errors.New("no certificates sent by peer"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason you don't want to use the error object you've already been allocating ? ( ErrNoCertsSent )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, I rushed and forgot that, thanks

"github.com/ava-labs/avalanchego/staking"
)

// 8192RSA.pem is used here because it's too expensive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong feeling here, but maybe consider naming the file 8192RSA_test.pem so it would be clear that it's being used for testing purposes only ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say by putting a key in a file, implicitly it shouldn't be used in anything but tests ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed accordingly though

go func() {
defer wg.Done()
conn, err := listener.Accept()
require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this would generate a race error in case an error is returned, since the go testing framework wouldn't allow you failing from an inner go-routine.
to "solve" this issue, I'd suggest using an error channel instead of the waitgroup and test it's result on the main go-routine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Due to the waitgroup, all writes to memory the goroutine has made, would be synchronized by the main goroutine. I tested it with data race detector and no data races were observed in case of an error failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this causes a race condition, but the way that t.FailNow() works is by calling panic - which is normally caught by the testing harness managed when running a Test or in t.Run(...) but here that panic will escape the goroutine and cause the test to crash.

I doubt this is the only place in avalanchego where this could happen... so 🤷 on if we should fix it.

Copy link
Contributor

@StephenButtolph StephenButtolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider fully validating the peer certificate prior to the handshake rather than just RSA keys?

We currently require:

  1. TLS certifications are no larger than 2048 bytes.
  2. A public key is provided
  3. RSA modulus is eitther 2048 or 4096 bits
  4. RSA modulus is not even
  5. RSA exponent is 65537
  6. ECDSA keys are on the P-256 curve.
  7. Ed25519 keys are disallowed (will be allowed later)
  8. DSA keys are disallowed

This PR introduces checks 2-5, but not 1, 6-8.

Essentially should we be trying to guarantee that x509 parsing -> custom verification can never produce a cert that isn't parsable by custom parsing?

Comment on lines +56 to +58
switch rsaKey := pk.(type) {
case *rsa.PublicKey:
return staking.ValidateRSAPublicKeyIsWellFormed(rsaKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely me just being overly defensive, but should we attempt to handle the case that rsaKey is nil here?

Test would look like:

{
	description: "nil RSA key",
	input: func() tls.ConnectionState {
		key, err := rsa.GenerateKey(rand.Reader, 2048)
		require.NoError(t, err)

		x509CertWithNilPK := makeRSACertAndKey(t, key)
		x509CertWithNilPK.cert.PublicKey = (*rsa.PublicKey)(nil)
		return tls.ConnectionState{PeerCertificates: []*x509.Certificate{&x509CertWithNilPK.cert}}
	},
	expectedErr: peer.ErrEmptyPublicKey,
},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test + check in ValidateRSAPublicKeyIsWellFormed.

description: "Valid TLS cert",
input: func() tls.ConnectionState {
key, err := rsa.GenerateKey(rand.Reader, 2048)
require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these functions are referencing the wrong t. These would call operations on the TestValidateRSACertificate t rather than the inner t.Run(...) t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're right. I moved them because of code review comments and didn't notice it.

network/peer/tls_config_test.go Show resolved Hide resolved
conn, err := tls.Dial("tcp", listener.Addr().String(), &clientConfig)
require.NoError(t, err)

require.NoError(t, conn.Handshake())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised that this passes, is this just because the client signature verification is performed last?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TLS1.3 we first negotiate the encryption, and do the authentication last. The client is the last party to send its authentication, and the implementation doesn't wait for the acknowledgement from the server, i guess for performance reasons.

@yacovm
Copy link
Contributor Author

yacovm commented Oct 27, 2024

Should we consider fully validating the peer certificate prior to the handshake rather than just RSA keys?

We currently require:

  1. TLS certifications are no larger than 2048 bytes.
  2. A public key is provided
  3. RSA modulus is eitther 2048 or 4096 bits
  4. RSA modulus is not even
  5. RSA exponent is 65537
  6. ECDSA keys are on the P-256 curve.
  7. Ed25519 keys are disallowed (will be allowed later)
  8. DSA keys are disallowed

This PR introduces checks 2-5, but not 1, 6-8.

Essentially should we be trying to guarantee that x509 parsing -> custom verification can never produce a cert that isn't parsable by custom parsing?

1. There is a check in the TLS code that the payload of each message is less than 16MB. When we invoke the method VerifyConnection, we have already read the certificate in its full size, so I'm not sure what is gained by restricting the certificate size after we have parsed it.
6. I can do that, however - can the rest of the codebase handle P384? If so, how can we be sure there aren't networks with P384 that would get locked out as a result? Do we just assume everybody use whatever is in staking/tls.go ?
7. Same question as above.
8. DSA is not available in TLS 1.3, so they cannot verify such a signature, hence the handshake would most likely fail at the cipher suite negotiation phase which is before the signature check.

I "optimistically" pushed a follow-up PR to address 6+7

@yacovm yacovm force-pushed the TLS-RSA branch 3 times, most recently from 2e73fd5 to 36ca8da Compare October 27, 2024 20:23
@yacovm
Copy link
Contributor Author

yacovm commented Oct 27, 2024

Essentially should we be trying to guarantee that x509 parsing -> custom verification can never produce a cert that isn't parsable by custom parsing?

In this PR I'm just addressing low hanging fruits that make sense to handle early-on to avoid nonsensical exotic configuration that shouldn't be used, such as an avalanchego node not using the RSA verification exponent that is built-in in Golang, or using an exceedingly large modulus of 8K bits.

Of course, the space of certificates valid by x509 is a much larger superset than what we need to accept.

This commit makes validation of TLS certificates with either too big RSA keys, or the wrong exponent, fail as soon as the remote node presents its TLS certificate, in contrast to after the TLS handshake.

Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants