Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend prost-build to generate message builders #901

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mzabaluev
Copy link
Contributor

@mzabaluev mzabaluev commented Aug 28, 2023

First crack at #399

Based on #1011

Advantages over generic macro-generated builders like derive_builder:

  • Type system knowledge from protobuf is applied to make the API more ergonomic:
    • The setter of a field with message type Foo does not need to be specially configured to not use Option<Foo> as the argument type, we assume the setter is only used when the field is meant to be set so it's impl Into<Foo>.
    • Similarly, setters of oneof fields take the oneof's enum type argument rather than Option, but no generics with the Into bound here as that could obfuscate call sites more than necessary (maybe it's a wrong assumption and the argument conventions should be made uniform).
    • Setters of repeated fields are generic with an impl IntoIterator argument.
    • Setters of enum fields use the domain-checked enum typed argument rather than the i32 type that the corresponding struct field is generated with.
  • The builder API is explicitly emitted in generated code which can be revision-controlled, rather than magicked in by a separate macro.
    • Faux-derive macros that do not actually derive a trait implementation, but inject additional named API into the scope, can be considered bad style.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Aug 29, 2023

Notes on the present implementation:

  • Builders for matching messages are configured with a PathMap.
  • Each matching message type gets an associated function to start off the builder, with the name given by the map value (converted to snake_case). This will conflict with a field getter method generated by the Message derive macro in the rare case when a message has an optional scalar or an enum field with the same name.
  • The builder type for a message is generated in the message-named module alongside the nested types of the message defined in the protobuf. The type's name is given by the configuration map value.
  • The final method of the builder is named build, but I intend to make it a configurable default.
  • The field setter methods of the builder use Into and IntoIterator conversions on the generic argument.
    • For primitive types, this allows passing any values as long as the conversion is lossless, as per the convention on From.
    • For boxed fields, impl Into<Box<Foo>> provides the convenience of being able to pass Foo or Box<Foo>.
    • For repeated fields, the IntoIterator bound allows population from any iterable object. For example, a repeated string field can be initialized from an array of string literals.
    • For message fields, the conversion can be useful if the application provides a Rust-idiomatic "domain type" with a From conversion into the generated message struct.
    • Enum and oneof field setters do not use Into generics, for clarity at the call sites and better code completion experience.

Copy link
Collaborator

@caspermeijn caspermeijn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is difficult to review. Could you do these things to make it easier to review?

  • Can you explain why this APi is better then a generic builder? For example: https://crates.io/crates/derive_builder

  • Add documentation about how to use this builder.

  • Reorganize the commits so that they change a single thing with a description of what and why you do that. If that commit generally improves the codebase, consider opening a separate PR for just that commit. This allows me to review each commit individually.

  • I am not sure about this one: Is it beneficial to split the build code generator into a separate source file?

@mzabaluev
Copy link
Contributor Author

Thanks for the feedback @caspermeijn! I have rebased the branch and squashed the changes into a few topical commits.

Can you explain why this APi is better then a generic builder? For example: https://crates.io/crates/derive_builder

I have added an explanation in the description comment.

Add documentation about how to use this builder.

I mean to do it, but first I'd like to settle on a more flexible way to configure the builders.
What do you think of a way to selectively generate the builders for matching paths, and be able to configure the name to satisfy author's stylistic preferences and avoid name conflicts?

prost_buid::Config::new()
    // .mypackage.Foo gets foo::Builder and the associated Foo::builder() function
    .builders(".mypackage", "Builder")
    // ...

I am not sure about this one: Is it beneficial to split the build code generator into a separate source file?

I was hesitant to make a separate module for this, as hardly any functionality used in code_generator.rs had been broken out at the time, and there is some code reuse and similarities with other generator functions. But I can reorganize if you think it's better.

Also, I feel that the builders should be a core feature, because as discussed in #399, this together with #[non_exhaustive] brings prost-build into compliance with Protobuf conventions on evolving message definitions, where adding a field should not result in breaking code that uses the generated bindings.

@mzabaluev
Copy link
Contributor Author

@caspermeijn

Reorganize the commits so that they change a single thing with a description of what and why you do that. If that commit generally improves the codebase, consider opening a separate PR for just that commit.

I could try to split off a PR with the refactoring changes in code_generator.rs that I needed to reuse data in the builder code generation, if that makes reviewing easier for you.

@mzabaluev
Copy link
Contributor Author

I could try to split off a PR with the refactoring changes in code_generator.rs that I needed to reuse data in the builder code generation, if that makes reviewing easier for you.

Done in #1011 and rebased on top.

mzabaluev added 5 commits May 9, 2024 15:29
For each message, optionally generate a struct to provide the
forward-compatible builder API, with inherent methods
for field setters and the .build() method to produce the fully
initialized message struct.

If generation of builders is enabled with a name `Builder` for a message
named `Foo`, an impl block with the associated fn `Foo::builder()` is
appended to the top-level file scope, and the definition of the builder
struct `Builder` and its methods is appended to the module `foo`.

Reuse the helpers that generate field definitions for a message,
but use a mode to output code for the builder fields,
which must be private and devoid of docs and prost annotations.

The added Config::builder method populates a PathMap
with values given in the second argument to configure the builder name.
Adorn a struct with #[non_exhaustive] to showcase and test how
the builder API (and Default) can be used to forward-compatibly
construct values of generated message types,
while the struct initializer syntax is forbidden.

Some of the doc tests showcase working around name conflicts between
the associated function generated for the builder API and a getter
method generated by the Message derive for a field named `builder`.
Also verify the no-conflict case where the field does not have a getter
generated for it, so the `builder` associated fn is fine.

The test crates in tests-2015 and tests-no-std are uniformly named
"tests" to simplify boilerplate in the doc tests.
This allows plugging a message builder into other builders'
setters for fields of the message type without the need to
call .build().
The setter for field named `test` in proto must be named `r#test`.
@jregistr
Copy link

jregistr commented Jul 3, 2024

What else does this PR need to be merged?

self.buf.push_str(" = ");
if repeated {
self.buf
.push_str("value.into_iter().map(Into::into).collect();\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A repeated type is a Vec, but this setter doesn't allow the direct setting of a Vec. It will reallocate. I think the setter should just use Vec as its input type.

Copy link
Contributor Author

@mzabaluev mzabaluev Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider the setter a convenience method. The field is public, so there is always the option of assigning to it directly rather than going through FromIterator.

When specialization is stabilized in future Rust, the setter can be specialized for Vec without breaking the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An impl Into<Vec<T>> argument looks like a reasonable compromise: apart from Vec<T> with zero copying, it accepts &[T], &[T; N], and [T; N], so it should cover the most common cases without extra conversions. To set from an iterator, the caller would have to call .collect::<Vec<_>>()

self.buf
.push_str("value.into_iter().map(Into::into).collect();\n");
} else if optional {
self.buf.push_str("Some(value.into());\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setter for optional fields should accept a Option. For "normal" fields I think it is good to forgo the Option.

Copy link
Contributor Author

@mzabaluev mzabaluev Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied the same rationale as for repeated fieds: the setter is complementary and allows shorthand code, while there is the option of assigning the field directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking of the most common use case here, where the application has a valid value to set. It should be rare to need to explicitly set a field to None, that is not addressed by Default impls on the struct and the Option field type.

Copy link
Contributor Author

@mzabaluev mzabaluev Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, maybe the case of an explicitly optional field (as opposed to a "normal" field which must have a default for the uninitialized case) is special enough that the developers meant it to be used this way. I will change to it to Option.

/// **`path`** - a path matching any number of types. It works the same way as in
/// [`btree_map`](#method.btree_map), just with the field name omitted.
///
/// **`builder_name`** - A name for the builder type. The struct with this name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path can match multiple messages. I think the builder_name should be a suffix so that it doesn't generate multiple builders with the same name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builder is generated inside the submodule named after each message, so a collision could only occur with the message's nested types and enums. This should be rare enough that it can be solved by a specific matcher for messages where it occurs.

Stylistically, I prefer my_message::Builder to my_message::MyMessageBuilder or self::MyMessageBuilder.

@@ -11,6 +11,7 @@ edition = "2015"
build = "../tests/src/build.rs"

[lib]
name = "tests"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? It is not related to builders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed so that the doctests I have added to the tests crate also work in these shim crates.

@@ -0,0 +1,139 @@
//!#[doc(hidden)]

/// # Doc tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use hidden doc tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are compile_fail test cases and other tests of generator behavior and corner cases that are of no interest to a normal library user.

Since the tests crate is publish = false, this should not matter much though.

@@ -24,6 +24,7 @@ cfg_if! {
}
}

pub mod builders;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see an additional test: generate a builder for TestAllTypesProto3 and write that to file. This has two purposes:

  • It adds a example of all posible setters to the PR
  • It forces changes to builder API to be visible in future PRs.

The repo has examples for a test writing to file and asserting nothing changes: test_generate_no_empty_outputs

@caspermeijn
Copy link
Collaborator

What else does this PR need to be merged?

@jregistr Could you provide an additional review of the PR and the generated code?

@@ -11,6 +11,7 @@ edition = "2015"
build = "../tests/src/build.rs"

[lib]
name = "tests"
doctest = false
Copy link
Contributor Author

@mzabaluev mzabaluev Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is doctest = false in both crates (also in tests-2018 which is not included in the workspace for some reason), but doctests are compiled by cargo test even without the --doc option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants