Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easy to write forward-compatible code: struct builders? #399

Open
wchargin opened this issue Dec 2, 2020 · 9 comments
Open

Make it easy to write forward-compatible code: struct builders? #399

wchargin opened this issue Dec 2, 2020 · 9 comments

Comments

@wchargin
Copy link
Contributor

wchargin commented Dec 2, 2020

In protobuf, adding a field to a message is considered a backward
compatible change.
It’s compatible at the wire format level and also
at the source level for many sets of bindings, including the standard
bindings for Python, Go, C++, and Java.

Prost makes it easy to directly initialize structs of message type:

// message Person { string name = 1; string email = 2; }
let person = proto::Person {
    name: "A. U. Thor".into(),
    email: "[email protected]".into(),
};

But then if a new type is added to the message, the Rust code fails to
compile, because it’s no longer initializing all the fields.

One way to achieve compatibility is to spread in ..Default::default():

let person = proto::Person {
    name: "A. U. Thor".into(),
    email: "[email protected]".into(),
    ..Default::default() // more fields may be added later
};

This works, but (a) you have to remember to do it every time, and (b) it
can get a bit spammy when there are multiple layers of nesting:

            ..Default::default()
        },
        ..Default::default()
    },
    ..Default::default()
}

It would be nice to have a way to guarantee better compatibility. One
approach would be some kind of optional builder API:

let person = proto::Person::builder()
    .name("A. U. Thor".into())
    .email("[email protected]".into())
    .build();

I don’t know what exactly the right ergonomics are here, especially
around nested fields. (I half-wonder if the builder should take a
closure, but I think that that might get a bit hairy.)

A nice side-effect of such a builder pattern would be that the methods
could take impl Into<FieldType>, so you could pass &str literals
rather than having to write .to_string()/.into() everywhere.

An option to set #[non_exhaustive] on structs doesn’t sound quite
right to me, since (a) that doesn’t do anything inside the same crate
and (b) then you can’t use functional record updates (..default()).

It’s fine with me if the struct fields remain public; we can have a
style recommendation that says “always use builders when building
messages of a type that you don’t control”, or something. Though being
able to control that via a codegen option would be nice, too.

This is an important consideration for using Prost in shared or large
codebases, especially monorepos. It’s common to have a shared protobuf
definition with autogenerated bindings that many teams depend on.
Enabling the proto owners to make forward-compatible changes without
breaking other teams’ builds would be a requirement to use Prost.

@danburkert
Copy link
Collaborator

I don't agree that adding a field to a message being a backwards compatible wire change is sufficient motivation to also guarantee that it's a source-compatible change. There are plenty of other guaranteed wire-compatible changes that typically are not source compatible in any (typed) language bindings, e.g. widening an integer, moving a field into a oneof, renaming a field, removing an optional field, etc.

I do think #[non_exhaustive] is the way forward for users who want to expose a prost-generated type in a public crate API that needs stability guarantees. I don't personally think this is a wise thing to do, but a lot of people seem to want to do it 🤷 . For workspace-style private codebase I find that the compiler pointing out where code needs to get updated is extremely helpful.

(b) then you can’t use functional record updates (..default())

This appears to work just fine: https://gist.github.com/a64fe9b1dd55d89ed8687381395f02d4

@danburkert
Copy link
Collaborator

This appears to work just fine

Maybe that's an artifact of the type being in the same crate? I haven't used #[non_exaustive] in anger since it's a relatively new feature, so I'm not exactly sure what its limitations are.

@wchargin
Copy link
Contributor Author

wchargin commented Dec 2, 2020

Hi! Thanks for your response.

I don't agree that adding a field to a message being a backwards compatible wire change is sufficient motivation to also guarantee that it's a source-compatible change. There are plenty of other guaranteed wire-compatible changes that typically are not source compatible in any (typed) language bindings, e.g. widening an integer, moving a field into a oneof, renaming a field, removing an optional field, etc.

Agreed: not all wire-compatible changes are source-compatible. But
adding a field specifically is meant to be both wire-compatible and
source-compatible. The guidelines (also linked above) say that
fields “may be added to existing APIs in the same version”, and also
say a single version “must” be source- as well as wire- and
semantic-compatible, which implies the conclusion.

Correspondingly, even though Go, Java, and C++ are all typed, their
language bindings treat this as a source-compatible change. Java and C++
only expose getters and setters (and/or builders). Go has structs like
Prost does, but omitted fields in Go struct literals are implicitly set
to zero values.

For workspace-style private codebase I find that the compiler pointing out where code needs to get updated is extremely helpful.

Absolutely; I’m a huge fan of this workflow as well (“change one type,
fix all the errors, it works first try”). To quote from Abseil’s “Use
Exhaustive switch Statements Responsibly”
(h/t @nfelt), for
this to work, you need one of three things:

  1. The owner of the enum guarantees no new enumerators will be
    added,
  2. The owner of the enum is willing and able to fix our code when new
    enumerators are added (e.g. the enum definition is part of the
    same project),
  3. The owner of the enum will not be blocked by breaking our build
    (e.g. their code is in a separate source control repository), and
    we’re willing to be forced to update our switch statements when
    updating to the latest version of the enum-owner’s code.

The “change one type, fix all the errors, it works first try” flow is
basically picking option (2), but for published protos this doesn’t
work. For protos that aren’t set in stone, this leaves (3), which may be
impossible (monorepo) or undesirable (if you want to take the proto
semantics).

This appears to work just fine

Maybe that's an artifact of the type being in the same crate?

Yeah, that’s right:

$ cat sub.rs
#[non_exhaustive]
#[derive(Debug, Default)]
pub struct S {
    pub a: i32,
    pub b: i32,
}
$ cat main.rs
extern crate sub;
fn main() {
    let s = sub::S {
        a: 1,
        b: 2,
        ..Default::default()
    };
    dbg!(s);
}
$ rustc sub.rs --crate-type=lib
$ rustc main.rs -L.
error[E0639]: cannot create non-exhaustive struct using struct expression
 --> main.rs:3:13
  |
3 |       let s = sub::S {
  |  _____________^
4 | |         a: 1,
5 | |         b: 2,
6 | |         ..Default::default()
7 | |     };
  | |_____^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0639`.

By the way, I played with sketching out an API for this, and was pleased
to see that if you put the builder-related methods in traits,
then the whole thing works fine even if the proto happens to have a
field called build or something else pathological. (Maybe obvious to
you, but it’s not too often that I see the tangible benefits of trait
methods being namespaced as they are.)

@danburkert
Copy link
Collaborator

The guidelines (also linked above) say that
fields “may be added to existing APIs in the same version”, and also
says a single version “must” be source- as well as wire- and
semantic-compatible.

That link appears to be to Google Cloud Client guidelines, that's not really relevant to the design of a general purpose protobuf library. I haven't seen any protobuf docs or guidelines that say anything about source compatibility of generated code in any context, and in practice there is no such guarantee as I pointed out.

WRT #[non_exhaustive] that's pretty unfortunate, it makes it a little less useful of a tool. It's still possible to use #[non_exhaustive] without a builder API though, doing something like:

let mut msg = MyMessage::default();
msg.foo = "bar".to_owned();
...

@danburkert
Copy link
Collaborator

Also note that you could in theory attach a general purpose builder derive macro to prost generated types, e.g. https://crates.io/crates/derive_builder. I haven't tried this so maybe there's some sticking points, but I believe all the hooks are available in prost-build to make it possible.

@Kixunil
Copy link

Kixunil commented Jul 6, 2021

Combination of #[non_exhaustive] and builder sounds like a best option to me since then it's possible to match on the struct and access the fields directly without creating compatibility hazard.

I kinda like the idea of general-purpose builder although I'm not sure if there's a reason why prost-specific could be better. One thing that comes to my mind is avoiding naming conflicts (if someone funny names protobuf messages Foo and FooBuilder).

My use case is a library for quite large gRPC protocol that I don't have the time to write cleaner bindings for. (I wish I had.)

@damelLP
Copy link
Contributor

damelLP commented Dec 14, 2022

https://lib.rs/crates/proto-builder-trait

This seems to do the job?
Maybe it could be upstreamed?

@mzabaluev
Copy link
Contributor

mzabaluev commented Apr 13, 2023

derive_builder does away with a lot of the boilerplate and is very flexible, but I think a code generator using specific knowledge of the protobuf type system would be able to provide a better API out of the box.

Consider repeated fields:

message HelloRequest {
    repeated string flags = 2;
}

The builder for HelloRequest could provide this setter method:

impl HelloRequestBuilder {
    pub fn flags<T>(mut self, flags: impl IntoIterator<Item = T>) -> Self
    where T: Into<String>,
    {
        self.inner.flags = flags.into_iter().map(Into::into).collect();
        self
    }
}

Stylistically, I don't like the builder type becoming a part of the API that you have to use by name. It's better to add an associated fn to the message type, and have the builder type tucked away in a submodule:

impl HelloRequest {
    pub fn builder() -> hello_request::HelloRequestBuilder {
        todo!()
    }
}

@mzabaluev
Copy link
Contributor

mzabaluev commented Sep 14, 2023

As realized in #901, a protobuf-specific generator can provide much more ergonomic API than a general purpose derive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants