Skip to content

Commit

Permalink
[SPARK-47318] Updating documentation for AuthEngine KEX change
Browse files Browse the repository at this point in the history
  • Loading branch information
sweisdb committed Mar 14, 2024
1 parent 4204dd1 commit 7ace803
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 25 deletions.
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
Forward Secure Auth Protocol
Forward Secure Auth Protocol v1.1
==============================================

Deprecation Notice
------------------
This is a bespoke key exchange protocol that was implemented before Spark supported TLS (aka SSL) for RPC
calls. It is recommended that Spark users upgrade to using TLS for RPC calls between Spark processes. This protocol
will be deprecated and removed in the long-term.

See
the [Spark security documentation](https://github.com/apache/spark/blob/master/docs/security.md#ssl-encryption) for
more information on how to configure TLS.


Summary
-------

This file describes a forward secure authentication protocol which may be used by Spark. This
protocol is essentially ephemeral Diffie-Hellman key exchange using Curve25519, referred to as
X25519.
Expand Down Expand Up @@ -77,6 +91,7 @@ Now that the server has the client's ephemeral public key, it can generate its o
keypair and compute a shared secret.

sharedSecret = X25519.computeSharedSecret(clientPublicKey, serverKeyPair.privateKey())
derivedKey = HKDF(sharedSecret, salt=transcript, info="deriveKey")

With the shared secret, the server will also generate two initialization vectors to be used for
inbound and outbound streams. These IVs are not secret and will be bound to the preceding protocol
Expand All @@ -99,3 +114,13 @@ sessions. It would, however, allow impersonation of future sessions.
In the event of a pre-shared key compromise, messages would still be confidential from a passive
observer. Only active adversaries spoofing a session would be able to recover plaintext.

Security Changes & Compatibility
-------------

The original version of this protocol, retroactively called v1.0, did not apply an HKDF to `sharedSecret` and was
directly using the encoded X coordinate as key material. This is atypical and standard practice is to pass that shared
coordinate through an HKDF. The current version, v1.1, adds this additional HKDF to
derive `derivedKey`.

Consequently, older Spark versions using v1.0 of this protocol will not negotiate the same key as
Spark versions using v1.1 and will be **unable to send encrypted RPCs** across incompatible versions.
44 changes: 20 additions & 24 deletions docs/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,24 +149,32 @@ secret file agrees with the executors' secret file.

# Network Encryption

Spark supports two mutually exclusive forms of encryption for RPC connections.
Spark supports two mutually exclusive forms of encryption for RPC connections:

The first is an AES-based encryption which relies on a shared secret, and thus requires
RPC authentication to also be enabled.
The **preferred method** uses TLS (aka SSL) encryption via Netty's support for SSL. Enabling SSL
requires keys and certificates to be properly configured. SSL is standardized and considered more
secure.

The second is an SSL based encryption mechanism utilizing Netty's support for SSL. This requires
keys and certificates to be properly configured. It can be used with or without the authentication
mechanism discussed earlier.

One may prefer to use the SSL based encryption in scenarios where compliance mandates the usage
of specific protocols; or to leverage the security of a more standard encryption library. However,
the AES based encryption is simpler to configure and may be preferred if the only requirement
is that data be encrypted in transit.
The legacy method is an AES-based encryption mechanism relying on a shared secret. This requires
RPC authentication to also be enabled. This method uses a bespoke protocol and should be considered
deprecated in favor of SSL.

If both options are enabled in the configuration, the SSL based RPC encryption takes precedence
and the AES based encryption will not be used (and a warning message will be emitted).

## AES based Encryption
## SSL Encryption (Preferred)

Spark supports SSL based encryption for RPC connections. Please refer to the SSL Configuration
section below to understand how to configure it. The SSL settings are mostly similar across the UI
and RPC, however there are a few additional settings which are specific to the RPC implementation.
The RPC implementation uses Netty under the hood (while the UI uses Jetty), which supports a
different set of options.

Unlike the other SSL settings for the UI, the RPC SSL is *not* automatically enabled if
`spark.ssl.enabled` is set. It must be explicitly enabled, to ensure a safe migration path for users
upgrading Spark versions.

## AES-based Encryption (Legacy)

Spark supports AES-based encryption for RPC connections. For encryption to be enabled, RPC
authentication must also be enabled and properly configured. AES encryption uses the
Expand Down Expand Up @@ -228,18 +236,6 @@ The following table describes the different options available for configuring th
</tr>
</table>

## SSL Encryption

Spark supports SSL based encryption for RPC connections. Please refer to the SSL Configuration
section below to understand how to configure it. The SSL settings are mostly similar across the UI
and RPC, however there are a few additional settings which are specific to the RPC implementation.
The RPC implementation uses Netty under the hood (while the UI uses Jetty), which supports a
different set of options.

Unlike the other SSL settings for the UI, the RPC SSL is *not* automatically enabled if
`spark.ssl.enabled` is set. It must be explicitly enabled, to ensure a safe migration path for users
upgrading Spark versions.

# Local Storage Encryption

Spark supports encrypting temporary data written to local disks. This covers shuffle files, shuffle
Expand Down

0 comments on commit 7ace803

Please sign in to comment.