Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

G1: Add proposal for true binary metadata #19

Merged
merged 3 commits into from
Aug 25, 2017
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions G1-true-binary-metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
Title
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put in the real title.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

----
* Author(s): ctiller
* Approver: ejona
* Status: Draft
* Implemented in: n/a
* Last updated: 3/29/17
* Discussion at: https://groups.google.com/forum/#!topic/grpc-io/Ww8waYz_Nes

## Abstract

Add a HTTP2 extension to allow binary metadata (those suffixed with -bin) to be
sent without the base64 encode/decode step.

## Background

gRPC allows binary metadata to be sent by applications. These metadata elements
are keyed with a -bin suffix. When transmitted on the wire, since HTTP2 does not
allow binary headers, we base64 encode these elements and Huffman compress them.
On receipt, we transparently reverse the transformation.

This transformation is costly in terms of CPU, and there exist use cases where
this transformation can become the CPU bottleneck for gRPC.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some data here (i.e. metadata to payload ratio, metadata size, how much CPU does base64 encoding, base64 decoding, Huffman encoding and Huffman decoding each take)? It makes a lot of difference whether we're talking about 2% or 50% of total CPU processing time


### Related Proposals:

n/a

## Proposal

### New setting

Expose a custom setting in our HTTP2 settings exchange:
GRPC_ALLOW_TRUE_BINARY_METADATA = 0xfe03.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem gRPC-specific enough to justify GRPC_ prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That it only applies to -bin headers (which are a gRPC thing) makes it gRPC-specific enough (IMO)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only -bin headers? If we're using 0x00 magic prefix, then I don't see a reason for this limitation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So gRPC uses -bin to specify that a header is binary, and says that other metadata must be ascii (per http2). That -bin has semantic meaning (foo-bin is a different header than foo).

We use that as a signal to base64 encode things right now. Under this proposal we'd just switch to an alternate encoding.

Since we can't universally make all of the HTTP2 ecosystem take binary metadata, a mechanism to allow transmitting anything and dropping the -bin isn't possible for us (we need to keep interoperating - so proxies that support this would potentially need to do the base64 conversion if one side didn't support this mechanism).

If there were an extension that allowed sending binary values in metadata, we'd still end up needing a disambiguating byte for backwards compatibility (for the gRPC -bin suffixed headers), and we'd still maintain the requirement that non -bin headers must be ascii.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware of any HTTP/2 proxies that would reject requests because of "true binary" encoding of the header field? Unless it's a well-known header that must be processed by the proxy, most (all?) will just pass-through headers without ever looking at the content of the header fields.

Don't get me wrong, I'm all in for allowing binary header fields, but in order get this accepted into HTTP/2 ecosystem, this proposal should be a little more than gRPC implementation detail, that's why -bin suffix feels like an artificial limitation.

I believe that standardizing on "0x00 prefix of the header field indicates binary/raw representation" is much more reasonable, and doesn't prevent gRPC from only using it in headers with -bin suffix.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer this myself, RFC7540, sec. 10.3. Intermediary Encapsulation Attacks says:

Similarly, HTTP/2 allows header field values that are not valid.
While most of the values that can be encoded will not alter header
field parsing, carriage return (CR, ASCII 0xd), line feed (LF, ASCII
0xa), and the zero character (NUL, ASCII 0x0) might be exploited by
an attacker if they are translated verbatim. Any request or response
that contains a character not permitted in a header field value MUST
be treated as malformed (Section 8.1.2.6). Valid characters are
defined by the "field-content" ABNF rule in Section 3.2 of [RFC7230].

I also did some digging on the http-wg mailing list and it seems that sending binary blobs in header fields was discussed during HTTP/2 standardization, but they eventually decided against it, because apparently base64 in HPACK is "good enough".


This setting is randomly chosen (to avoid conflicts with other extensions), and
within the experimental range of HTTP extensions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding a link to https://tools.ietf.org/html/rfc7540#section-11.3 here to refer readers to information about the experimental range.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


The setting can have the values 0 (default) or 1. If the setting is 1, then
peers MAY use a 'true binary' encoding (described below), instead of the current
base64 encoding for -bin metadata.

Implementations SHOULD transmit this setting only once, and as part of the first

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SETTINGS apply only to a connection and not end-to-end, what about intermediaries? Are proxies supposed to re-encode header field to base64 if they received binary header field but are talking to a backend that doesn't support it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or they can just not advertise support for this feature.

settings frame.

### 'True binary' encoding

When transmitting metadata on a connection where the peer has specified
GRPC_ALLOW_TRUE_BINARY_METADATA, instead of encoding using base64, an
implementation MAY instead prefix a NUL byte to the metadata and transmit the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/metadata/metadata value/. Or maybe "header field value".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

data in binary form.

Since this is a HTTP2 extension and other extensions might alias this extension
id, it's possible that this becomes misconfigured. In that case, peers are
required to RST_STREAM with http error PROTOCOL_ERROR. If a binary encoding was
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call out that this is the normal behavior as specified in RFC7540§10.3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

attempted and such a RST_STREAM is received without any other headers,
implementations SHOULD retry the request with base64 encoding, and disable
binary encoding for future requests. Verbosely logging this condition is
encouraged.

### Examples

Suppose we wanted to send metadata element 'foo-bin: 0x01' (ie a single byte
containing '1').

Under base64, we'd send a http header 'foo-bin: AQ'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add period at end of sentence, or an additional newline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Under binary, we'd send 'foo-bin: 0x00 0x01' (ie prefixing a NUL byte and then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need NUL-byte here? Why can't we send binary header as foo-bin: 0x01?

I actually spent some time on Wednesday looking into this (for different reasons), and I couldn't find anything in the spec that forbids binary header values.

This was a limitation in SPDY, which used NUL-byte as a separator between multiple header values, but it seems that this "feature" was removed from HPACK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://tools.ietf.org/html/rfc7540#section-8.1 says that https://tools.ietf.org/html/rfc7230#section-3.2 is normative, which does not allow binary headers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...which means that foo-bin: 0x00 0x01 is also illegal.

My point is that nothing in HPACK prevents it from transmitting binary data, so we could add generic SETTINGS parameter for it (for example: SETTING_ALLOW_BINARY_HEADER_FIELD) and carry it over HPACK as any other value.

But I guess 0x00 prefix is here to differentiate binary from text? If so, do we really need -bin prefix in header name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-bin is not going away so long as there are clients, servers, and intermediaries that do not support binary headers.

sending the binary metadata value)

## Rationale

Binary metadata transmission performance is critical for a number of
applications.

Various workarounds were considered:
1. Switching to base16 - this would require a backwards incompatible protocol
change for gRPC, and has the disadvantage of bloating wire size, which
additionally interacts badly with hpack.
2. Adding a new suffix (-raw): this leaks further implementation details to
application developers, and likely would still need a base64 workaround

## Implementation

A trial implementation in gRPC C core is underway and expected to be ready in
coming weeks.

## Open issues (if applicable)

n/a