-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for unknown fields #2
Comments
I'm interested in this. In order to make this work, unknown fields have to be stored in the structs somehow, something along the lines of adding an This would be an API break, since existing struct instantiations would break because of not setting this field. A seemingly sensible way to make this sane is to recommend everyone to instantiate structs like this: syntax = "proto3";
message Person {
string name = 1;
} Person { name: "Per".to_string(), ..Default::default() }; This would work if all generated structs would For completeness, the parsing code would probably have to be updated to at least not discard groups, even if it's not made capable of actually parsing them. I don't know if known fields with unexpected type should go into the unknown fields set or not. Does this make sense? |
I have started hacking on this. There are some ugly corner cases that I'm not entirely sure how to deal with. For example the well-known types such as BoolValue when represented as their Rust native types obviously won't be able to keep unknown values. The same applies to maps. |
@per-gron thanks for the PR. Could you explain the motivation behind the unknown fields feature? I've read through the upstream docs linked in the first comment, but I don't really understand why it's become such a priority for Google to support this feature. My take is that unknown fields are full of subtle footguns1, and the described usecases are better either not a good fit for protobuf (RMW) or better done through preserving the original encoded message (intermediary servers). |
@danburkert I honestly don't know exactly why either, but I can speculate a bit: It seems like the fact that proto3 is different from proto2 in this regard has made it difficult for a lot of internal projects to adopt proto3. It's an API break that simply makes it scary to upgrade when you have a huge code base, and the difference is not important enough to justify a change of behavior. I agree that this feature is footgun-prone; yet I think it is better to behave as similarly as possible to the other protobuf libraries than to not have it at all. (It is of course possible to provide some kind of Prost setting to disable the feature for projects that don't care about having the same behavior across languages.) For me personally, I don't really care much about this particular feature or lack of it, the reason I wanted to work on this is that it seems like Prost is a very nice Rust Protobuf library, and I think it would be good for Rust to have a more "officially endorsed" version that people can trust will work as expected, including across languages. When working with large projects and across different languages, uniformity really helps a lot, even with really small details. (Note: Even though I work at Google I don't have any special power to "officialize" any particular Rust Protobuf library, but I hope that fixing details like this one so that its behavior is very close to the main Protobuf libraries along with having people from the Rust community agree that Prost's struct-based API is nice will go a long way.) |
Just to set expectations, it's never been a goal of mine to have
The upstream project has done a great job publishing conformance tests which As far as adding features for the sake of parity, there's no way I can feasibly do that as a single maintainer, nor do I think I could even maintain such a library if the features were contributed by others. |
Thanks for the information about your priorities with Prost. (I've been on vacation hence my slow reply.) The reasons that I wrote this PR are 1) there was an open issue about it written by you, 2) unlike JSON/groups/reflection this is something that is or will soon be required by the spec so this is not quite the same as those other features. Given help with maintenance, would you be interested in expanding the scope of this library to include features that makes it possible to get broader adoption? If not, would you support a fork with that goal? |
I happen to work at Google (not on protobufs!) and know folks who really pushed for this (reversing the proto 2->3 decision to drop unknown field handling). Those two use cases are the reason why. I'm not sure why you say protobuf is not a good fit for RMW; it's overwhelmingly common for databases to be full of protobuf values which servers do RMW on. It generally works well...but most teams are still using proto 2 (with no plans to adopt proto 3). Some that started using proto 3 got nasty surprises; thus the push. I see your point intermediary servers could preserve the original encoded message, but there are also reasons it's easier to work with a message field embedded in a message field rather than a bytes field embedded in a message field. Think ease of writing an ASCII message (to check in as test data or to send an RPC by hand on the commandline while debugging), ease of understanding debug representations (inverse of the above), and not having to explicitly have code do additional serialization/deserialization steps / possibly mess up the type of the contained proto. The
I'm also interested in the answer here (though I can't commit to significant help myself). I appreciate the hard work you've done on this project. Nonetheless, it's a bit frustrating that there are at least three apparently-commonly-used Rust proto implementations, with none being a superset of the others (much less matching Google's official C++ protobuf implementation). I'd love a path to resolve that, and that probably starts with knowing if the base should be prost or one of the others. |
Hi, what's the current status for supporting unknown fields? I see multiple attempts over the past couple of years but no progress. It would be great to |
Unknown fields still need to be supported in edition 2023 onwards. I can think of two reasonable ways to support this: In-message stateAdd a private field to every generated message to be exposed via an #[derive(prost::Message)]
pub struct MyMessage {
__prost_unknown_fields: Vec<prost::UknownField>,
#[prost(string, tag = "1")]
pub a_field: String,
} This will effectively make every message non-exhaustive for matching and impossible to construct with struct expressions, so it will need builders like in #901. These can be considered good features with regard to protobuf versioning semantics. This change can be introduced as a build option to preserve backward compatibility. One disadvantage is an increase in struct size for every message, regardless of need. Out-of-band data for encode/decodeAdd methods to pub trait Message {
// ...
fn decode_with_unknown(
mut buf: impl Buf,
) -> Result<(Self, UnknownFields), DecodeError>
where
Self: Default,
{
// ...
}
fn encode_with_unknown(
&self,
unknown_fields: &UnknownFields,
buf: &mut impl BufMut
) -> Result<(), EncodeError>
where
Self: Sized,
{
// ...
}
}
I think this could be added in a backward-compatible way by providing a kind of |
I like this approach. It guarantees that the unknown fields are part of the message and it doesn't have to break existing APIs.
It is not necessary to hide this field. We can make it a publicly accessible field as long as it doesn't have a conflicting name. The public field also prevents the struct from becoming non-exhaustive.
I think it should be an optional feature to save unknown fields. When people are interested, they can enable it in the code generator with a self chosen name. #574 takes a similar approach as I mention here. |
Only if it is opted in by matcher settings in
OK, with a configurable name they can avoid conflicts.
A developer who wants their generated Rust bindings to adhere to Protobuf semver semantics would actually want the message structs to be non-exhaustive, so that future field additions do not become breaking changes. But this can be done by other means, such as the I expect the added field to be a nuisance to any user who doesn't care about unknown fields, so they'll match it with
One problem with this is, it's yet another setting affecting the generated API that users might disagree on. In an ideal world, a set of proto packages is represented by a single Rust library crate that everybody else can depend on, rather than generating sets of incompatible types for the same thing (a-la prost-types vs. pbjson-types vs...). If the in-struct baggage for unknown fields will prove as unpopular as I expect, developers of such "designated binding" crates would be discouraged from enabling it, meaning that it won't be generally available for binding consumers. A feature gate guarding the struct members is not usually a good solution, because API should not be breakable by enabling feature gates. |
This comment was marked as outdated.
This comment was marked as outdated.
Instead, prost-build could generate separate struct variants augmented with unknown fields, in an added-on module, and provide conversions between the two: #[derive(prost::Message)]
pub struct Foo {
#[prost(message, optional, tag = "1")]
pub bar: Option<Bar>,
}
#[derive(prost::Message)]
pub struct Bar {}
#[cfg(feature = "unknown_fields")]
pub mod with_unknown_fields {
#[derive(prost::Message, prost::UnknownFields)]
pub struct Foo {
#[prost(unknown_fields)]
__prost_unknown_fields: Vec<prost::UnknownField>,
#[prost(message, optional, tag = "1")]
pub bar: Option<Bar>,
}
#[derive(prost::Message, prost::UnknownFields)]
pub struct Bar {
#[prost(unknown_fields)]
__prost_unknown_fields: Vec<prost::UnknownField>,
}
// Convert to a `Foo` message struct not augmented with information on unknown fields.
impl From<Foo> for super::Foo {
fn from(full_message: Foo) -> Self {
todo!()
}
}
// Convert from a `Foo` message struct carrying no unknown fields.
impl From<super::Foo> for Foo {
fn from(foo: super::Foo) -> Self {
todo!()
}
}
impl prost::WithUnknownFields for super::Foo {
type FullMessage = Foo;
fn with_unknown_fields(
self,
unknown_fields: into Vec<prost::UnknownField>,
) -> Result<Foo, prost::UnknownFieldError> {
todo!("check if the unknown field tags are not found among the known fields")
}
}
} So the simple users get their simple structs, and proxy use cases and other sophisticates can do something like |
See https://docs.google.com/document/d/1KMRX-G91Aa-Y2FkEaHeeviLRRNblgIahbsk4wA14gRk/view and protocolbuffers/protobuf#272.
The text was updated successfully, but these errors were encountered: