Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depend on syn #302

Closed
wants to merge 2 commits into from
Closed

Depend on syn #302

wants to merge 2 commits into from

Conversation

nbdd0121
Copy link
Member

Add syn and others as dependencies of macros crate, and use cargo
to build macros crate. Only host-facing libmacros.so is built this
way, not any of the other crates that are compiled for the target. This
cargo invocation has all crate versions locked and has --offline
option specified, so it won't access Internet during the build.

The current module! crate already shows it tedious and limited for
writing proc macros without syn. Pinning and vtable handling could
likely be drastically improved with the help of proc macros, and they
will require parsing Rust struct/trait/impl blocks. A dependency on
syn is highly desirable to avoid us essentially reinventing the wheel
when building our procedural macros.

A make rust-fetch-deps command is added and listed in the
documentation as a build requirement.

Related #278

@ksquirrel

This comment has been minimized.

Renaming to libmacros allows us to potentially have more procedural
macros in the future without introducing additional crates.

Signed-off-by: Gary Guo <[email protected]>
@ksquirrel

This comment has been minimized.

@ksquirrel

This comment has been minimized.

add `syn` and others as dependencies of `macros` crate, and use cargo
to build `macros` crate. Only host-facing `libmacros.so` is built this
way, not any of the other crates that are compiled for the target. This
cargo invocation has all crate versions locked and has `--offline`
option specified, so it won't access Internet during the build.

The current `module!` crate already shows it tedious and limited for
writing proc macros without `syn`. Pinning and vtable handling could
likely be drastically improved with the help of proc macros, and they
will require parsing Rust struct/trait/impl blocks. A dependency on
`syn` is highly desirable to avoid us essentially reinventing the wheel
when building our procedural macros.

A `make rust-fetch-deps` command is added and listed in the
documentation as a build requirement.

Signed-off-by: Gary Guo <[email protected]>
@ksquirrel
Copy link
Member

Review of 9098c6987309:

  • ✔️ Commit 03ecf73: Looks fine!
  • ✔️ Commit 9098c69: Looks fine!

@bjorn3
Copy link
Member

bjorn3 commented May 26, 2021

Should syn be vendored? cargo vendor helps with this.

@nbdd0121
Copy link
Member Author

Should syn be vendored? cargo vendor helps with this.

I don't think it should be checked into the kernel's git tree, and kernel don't use submodules. We could let make rust-fetch-deps call cargo vendor instead of cargo fetch though. Are there any benefits for using cargo vendor for this approach?

@wedsonaf
Copy link

I think we need more discussion around this.

In general, I think there is consensus that we need to limit dependency on third-party libraries to just what is absolutely necessary.

I'm not convinced syn falls in this category. I also think we should not encourage the use of proc macros.

@nbdd0121
Copy link
Member Author

In general, I think there is consensus that we need to limit dependency on third-party libraries to just what is absolutely necessary.

I agree, but proc macros are different from normal "library" dependencies. Because they only run at compile time, and themselves are not compiled into the binary. It's more similar to our dependency on bindgen. Rustc itself depends on syn: https://github.com/rust-lang/rust/blob/9814e830942dcc65e69a0d077cb2e013b5948795/Cargo.lock#L5010-L5011.

I also think we should not encourage the use of proc macros.

Why? Proc macros are such a powerful tool, and I think we should encourage use of proc macros if it can help with safety or ergonomics, and sometimes it may the only solution feasible.

  • module! macro is not feasible with declarative macros
  • #[pin_init]/#[pin_project] can eliminate the need of any unsafe when creating and using pinned types.
  • A hypothetical #[vtable] macro can be used to simplify our handling of vtable declaration, e.g. file_operations. It can be similar to what you suggested in this email. The current approach requires one declarative macro for each different type of vtable (see Rust NetDevice and NetDeviceOperationsVtable struct #208).

And these are ones that I think we need right now. I envision that as more bindings are added we will need more proc macros (attributes and derives are only possible with proc macros).

If procedural macros can improve safety, reduce boilerplate and improve ergonomics, I think there shouldn't be any reason not to use it. And safety and ergonomics are the key reasons that we want to use Rust in the kernel, right? With proc macros I can eliminate all unsafes used in samples/rust, and it will really show the people the power of Rust.

I'm not convinced syn falls in this category.

I think this really depend on your attitude towards proc macros only. If we need proc macros then we couldn't avoid syn -- it's just not feasible to do all parsing on our own. syn also has much better error reporting, which would certainly help driver developers.

@wedsonaf
Copy link

I agree, but proc macros are different from normal "library" dependencies. Because they only run at compile time, and themselves are not compiled into the binary. It's more similar to our dependency on bindgen. Rustc itself depends on syn: https://github.com/rust-lang/rust/blob/9814e830942dcc65e69a0d077cb2e013b5948795/Cargo.lock#L5010-L5011.

Thanks for the explanation. I am aware of how proc macros work though.

Regardless of how/when it is used, it is a dependency nonetheless: we'd depend on it to build Rust code in the kernel, bugs in it would have the potential to affect the final binary. (In case it isn't clear, bindgen is also a dependency I wish we could avoid, it just so happens that we have no alternative option that is practical.)

I also think we should not encourage the use of proc macros.

Why? Proc macros are such a powerful tool, and I think we should encourage use of proc macros if it can help with safety or ergonomics, and sometimes it may the only solution feasible.

Because proc macros, in my opinion, should be our last resort: we should only use them when no other reasonable option is available. (If you don't agree with this, please let me know and I can expand on it.)

Assuming we agree on the above, the reason to not encourage proc macros is to avoid inviting people (myself included) to go straight to proc macros and not take the time to explore alternatives: you, for example, started the CStr overhaul with proc macros and eventually (and thankfully) noticed that you didn't need them.

  • module! macro is not feasible with declarative macros
  • #[pin_init]/#[pin_project] can eliminate the need of any unsafe when creating and using pinned types.
  • A hypothetical #[vtable] macro can be used to simplify our handling of vtable declaration, e.g. file_operations. It can be similar to what you suggested in this email. The current approach requires one declarative macro for each different type of vtable (see Rust NetDevice and NetDeviceOperationsVtable struct #208).

And these are ones that I think we need right now. I envision that as more bindings are added we will need more proc macros (attributes and derives are only possible with proc macros).

If procedural macros can improve safety, reduce boilerplate and improve ergonomics, I think there shouldn't be any reason not to use it. And safety and ergonomics are the key reasons that we want to use Rust in the kernel, right? With proc macros I can eliminate all unsafes used in samples/rust, and it will really show the people the power of Rust.

When we eliminate unsafety via Rust constructs, we have a framework to reason about how safe abstractions can emerge from unsafe blocks. This is not the case for proc macros: we have a lot of code that is generating other code that we believe (hope?) provides safe abstractions.

Of course, one can claim that we already depend on the compiler, so why would additional compiler-like code be bad? My thinking here is that we should leave compiler-like tasks to the compiler folks; and with that, the support and testing that is needed for this sort of thing. In this project, we are the business of writing kernel code.

That said, I am not of the opinion that proc macros should be banned outright. I am, however, of the opinion that our use of it should be judicious.

I'm not convinced syn falls in this category.

I think this really depend on your attitude towards proc macros only. If we need proc macros then we couldn't avoid syn -- it's just not feasible to do all parsing on our own. syn also has much better error reporting, which would certainly help driver developers.

The module! macro doesn't use syn. Of course it is feasible to write proc macros without it.

@bjorn3
Copy link
Member

bjorn3 commented May 27, 2021

The module! macro doesn't have to parse rust code, only it's own made up syntax. Once you start parsing arbitrary type definitions (for #[pin_init]/#[pin_project]) you need to basically re-invent syn. syn is battle tested and can correctly parse pretty much all rust code. In addition when writing your own rust parser it is pretty easy to accidentally (or deliberately) depend on the (non-)existence of for example whitespace, trailing commas or none delimited groups in the compiler generated tokenstreams. However rustc provides no guarantees for this except that it can be parsed correctly. Crates have broken in the past as they avoided syn and wrote their own buggy parser. To avoid breaking stuff, there have been times where a pretty printer change has been disabled for crates with specific names until they got fixed. (For example rust-lang/rust#85387)

@wedsonaf
Copy link

The module! macro doesn't have to parse rust code, only it's own made up syntax. Once you start parsing arbitrary type definitions (for #[pin_init]/#[pin_project]) you need to basically re-invent syn. syn is battle tested and can correctly parse pretty much all rust code. In addition when writing your own rust parser it is pretty easy to accidentally (or deliberately) depend on the (non-)existence of for example whitespace, trailing commas or none delimited groups in the compiler generated tokenstreams. However rustc provides no guarantees for this except that it can be parsed correctly. Crates have broken in the past as they avoided syn and wrote their own buggy parser. To avoid breaking stuff, there have been times where a pretty printer change has been disabled for crates with specific names until they got fixed. (For example rust-lang/rust#85387)

I'm not arguing for the reinvention of syn. I'm arguing against complex proc macros.

@nbdd0121
Copy link
Member Author

Regardless of how/when it is used, it is a dependency nonetheless: we'd depend on it to build Rust code in the kernel, bugs in it would have the potential to affect the final binary.

That's why I mentioned that syn is used by rustc as well -- so any bug will affect rustc anyway. Cargo.lock and --locked --offline is used to ensure that syn wouldn't be updated unless we choose to, and we can certainly just use the verison of syn that our MSRV (Minimum Supported Rust Version) uses.

Because proc macros, in my opinion, should be our last resort: we should only use them when no other reasonable option is available. (If you don't agree with this, please let me know and I can expand on it.)

If a simple declarative macro is sufficient, sure. Declarative macros can also do complex parsing, but in that case proc macro is even better -- code will be more readable and error reporting would be much better.

My opinion is: use whatever method that is more readable and maintainable.

Assuming we agree on the above, the reason to not encourage proc macros is to avoid inviting people (myself included) to go straight to proc macros and not take the time to explore alternatives: you, for example, started the CStr overhaul with proc macros and eventually (and thankfully) noticed that you didn't need them.

The proc macro version I proposed earlier also accept c_str!(b"blah") though, which isn't possible now. The current version also requires a manual while loop instead of just .contains(&0) call just to make it const fn. For this c_str!, I don't really prefer proc macro or const eval, but all the other use cases I listed couldn't be done with const eval.

When we eliminate unsafety via Rust constructs, we have a framework to reason about how safe abstractions can emerge from unsafe blocks. This is not the case for proc macros: we have a lot of code that is generating other code that we believe (hope?) provides safe abstractions.

We can require proc macros to justify for all unsafe code it generates. It's Rust code that gets generated after all.

Of course, one can claim that we already depend on the compiler, so why would additional compiler-like code be bad? My thinking here is that we should leave compiler-like tasks to the compiler folks; and with that, the support and testing that is needed for this sort of thing. In this project, we are the business of writing kernel code.

That's exactly the reason to introduce syn. Leave compiler-like tasks to compiler folks, so we focus on the actual code getting generated, not how to parse Rust. With syn dealing with the complexity of Rust code parsing, the rest is not complex any more.

The module! macro doesn't use syn. Of course it is feasible to write proc macros without it.

module! macro is quite messy in the current state IMO with all string operation. syn and quote can improve the readability and maintainability by a order of a magnitude.

@nbdd0121
Copy link
Member Author

nbdd0121 commented May 27, 2021

This is the proc macro part of pin-init for example: https://github.com/nbdd0121/pin-init/blob/27fdfd409df4b17f49340cb095c3f9bb918dab63/pin-init-internal/src/pin_init.rs

It's just 300+ lines, but would help us get rid of most unsafe for initializing pinned structures. And the majority is the code generated, not parsing. It is not complex.

@ojeda
Copy link
Member

ojeda commented May 27, 2021

A few general points:

  • NAK on the dependency on Cargo and crates.io.
  • As discussed elsewhere, we will likely want to use syn and quote, but we have been avoiding adding dumps of third-party code until we get into mainline (as well as, in general, making things more complex than needed before that).
  • Off-topic, but worth mentioning: quote and syn (or similar) should be part of the Rust standard library. Is there any work on something like that?
  • I agree with Wedson that proc macros should be last resort. Macros, in general, are.

@ojeda
Copy link
Member

ojeda commented May 27, 2021

With proc ​macros I can eliminate all unsafes used in samples/rust, and it will really show the people the power of Rust.

Please note this is not about showing people the "power of Rust". In fact, showing too much "power" is likely a mistake. Even "simple" things like RAII and the ? operator already raise questions in the kernel community.

Furthermore, proc macros are cool in that they allow to do code parsing/generation without the need of extra build steps, tooling, etc. But they are code parsing/generation in the end, with all its pros & cons.

In fact, it is easy for someone to see macros as a lack of power in the "base" language: macros to imitate variadic functions, macros as bad const generics, macros to improve the ergonomics of pinning, etc.

So let's be careful about thinking people will automatically go "wow, so clever!". Instead, they may go "ugh... this is Rust's solution for this?".

@ojeda
Copy link
Member

ojeda commented May 27, 2021

And to be extra clear (since my messages are sometimes understood as being 100% against something :-P), I am not saying we should not use procedural macros, just that 1) we should weigh carefully their usage and that 2) we need to be particularly careful in the beginning (and even more so before we are in mainline).

So the question is not whether proc macros can help with pinning and other bits, but whether we think it is worth "paying the price" right now. Not just in terms of adding syn etc., but also in terms of the kernel community reading the code and having to understand pinning, plus Rust code transformations via proc macros, etc.

@nbdd0121
Copy link
Member Author

  • NAK on the dependency on Cargo and crates.io.

Why is Cargo and crates.io banned outright?

I don't understand why Cargo is banned entirely. I understand and agree that we shouldn't use it for kernel's core and module crates, but proc macros are self-contained. Their only artefact is the .so file.

When talking about external dependencies like crates.io, people are concerned about trustworthiness of code downloaded from Internet. That's why I ensured that the build process won't access the Internet via --offline and dependencies are locked with Cargo.lock and protected by checksums, so there should no security worries. The download process is listed a prerequisite for building, just like bindgen.

  • Off-topic, but worth mentioning: quote and syn (or similar) should be part of the Rust standard library. Is there any work on something like that?

quote is indeed in the process of being added to proc_macro, but I don't think syn will anytime soon.

In fact, it is easy for someone to see macros as a lack of power in the "base" language: macros to imitate variadic functions, macros as bad const generics, macros to improve the ergonomics of pinning, etc.

I thought macros are always pitched as one of the best features of Rust, and in Rust community people are using it quite extensively (like serde/thiserror etc). But you are right, I haven't considered these opinions from people who might not be familiar about Rust.

So let's be careful about thinking people will automatically go "wow, so clever!". Instead, they may go "ugh... this is Rust's solution for this?".

It's inevitable that some people will ugh. But it's really between them:

  • seeing procedural macros and ugh about the lack of feature in the base language; or
  • seeing even simplest thing like constructing Mutexes need unsafe, and say complain about the lack of expressiveness with safe Rust, with safety being the most often pitched aspect?

I understand all the concerns, but in my opinion, better ergonomics are the best approach to get people accept and get involved in using Rust in the kernel.

@wedsonaf
Copy link

Miguel's points about public perception (especially from a kernel C programmer's point of view) are spot on in my opinion (and a concern I've had but hadn't managed to articulate here).

@wedsonaf
Copy link

  • seeing procedural macros and ugh about the lack of feature in the base language; or
  • seeing even simplest thing like constructing Mutexes need unsafe, and say complain about the lack of expressiveness with safe Rust, with safety being the most often pitched aspect?

I understand all the concerns, but in my opinion, better ergonomics are the best approach to get people accept and get involved in using Rust in the kernel.

To be clear, no one is happy with the amount of unsafety involved in pinning.

I am of the opinion that this needs to be addressed by the language, see for example #145 (comment) -- my intent is to engage with them to improve this. But since it's going to be a while before we get something, it's lower in my priority list.

@ojeda
Copy link
Member

ojeda commented May 28, 2021

I don't understand why Cargo is banned entirely. I understand and agree that we shouldn't use it for kernel's core and module crates, but proc macros are self-contained. Their only artefact is the .so file.

It is not "banned", it is overkill for what we need, which is a single call to rustc. We are not using the downloading facilities, nor the dependency managing, etc.

If we wanted to use crates.io and download syn automatically and its dependencies and etc. etc., then sure, we would want to use Cargo. But that is not the case.

When talking about external dependencies like crates.io, people are concerned about trustworthiness of code downloaded from Internet. That's why I ensured that the build process won't access the Internet via --offline and dependencies are locked with Cargo.lock and protected by checksums, so there should no security worries.

I am aware of that, but security is not the issue. Library dependencies are not handled like that. Plus going through crates.io on top of that is completely unneeded.

The download process is listed a prerequisite for building, just like bindgen.

bindgen (and the rest of the toolchain) will be eventually provided by distributions. We do not want users to need to build bindgen on their own, nor use rustup, nor depend on crates.io, nor on GitHub, etc. etc. Thus, the same way, we do not want people to need to download third-party code.

quote is indeed in the process of being added to proc_macro, but I don't think syn will anytime soon.

That is great to hear! Then I would suggest we push for syn to happen too. After all, if people is expected to use syn for proc macros, it should be in the standard library.

Of course, this does not mean we should not start using syn on our own, but it is even better if we know it will be in the standard library at some point.

I thought macros are always pitched as one of the best features of Rust, and in Rust community people are using it quite extensively (like serde/thiserror etc). But you are right, I haven't considered these opinions from people who might not be familiar about Rust.

Yeah, Rust macros are great and enable many things that otherwise would be cumbersome. But we need to be careful when showing it to other people, specially because it does not look "like Rust".

For instance, take C++ template metaprogramming: specially before the modern standards came, things looked particularly horrendous and completely different than "normal C++ code". Yet it was pitched everywhere as a way to show "how powerful C++ is". But, in my opinion, that was a bad way to try to get more people into the language. For instance, some people would show C programmers things like Boost Spirit that lets you write a parser like:

        xml =
                start_tag                   [at_c<0>(_val) = _1]
            >>  *node                       [push_back(at_c<1>(_val), _1)]
            >>  end_tag(at_c<0>(_val))
        ;

The things inside [] in the right-side are "actions" that are triggered when the input matches the left-side grammar. It is clever, neat, and extremely powerful (not even macros involved -- pure C++!). Yet most people do not want to write C++ like that (and much less implement or maintain things like Boost Spirit to make that work).

@bjorn3
Copy link
Member

bjorn3 commented May 28, 2021

Off-topic, but worth mentioning: quote and syn (or similar) should be part of the Rust standard library. Is there any work on something like that?

Syn is deliberately not in the standard library as some syntax changes need breaking changes to syn. This is not allowed for the standard library.

@ojeda
Copy link
Member

ojeda commented May 28, 2021

Syn is deliberately not in the standard library as some syntax changes need breaking changes to syn. This is not allowed for the standard library.

I am not sure I follow. The standard library syn should parse whatever the compiler does and always go in lockstep.

@bjorn3
Copy link
Member

bjorn3 commented May 28, 2021

As syn gets updated to parse new syntax, it sometimes needs to break it's api. For example RFC 1506 made the following possible:

struct Foo(u8, u8);

fn main() {
    Foo {
        0: 0,
        1: 42,
    };
}

If this was stabilized after syn was created this would have required switching from using Ident directly as field name to the current Member which is an enum that can either be an Ident or a numerical index. There is also a proposal to make Foo { a: 0, .. } acceptable as expression where .. would expand to ..Default::default(). This would require changing the rest field of ExprStruct to allow .. without a following expression. These are just a couple of examples where adding syntax would require breaking changes to the api.

Some rationale on the RFC that introduced proc macros:

https://rust-lang.github.io/rfcs/1566-proc-macros.html#alternatives

We could have an AST-based (rather than token-based) system. This has major backwards compatibility issues.

rust-lang/rfcs#1566 (comment)

The goal with a token-based API is that we can add new syntax to Rust without changing the API for macros. E.g., although ? for try adds new syntactic forms, it does not add new tokens. Parsing tokens into AST should be provided by external (possibly 3rd party) crates. They would be free to choose from different future-proofing strategies.

@ojeda
Copy link
Member

ojeda commented May 28, 2021

That is only a problem for those that want to access the very latest syntax through the API. For them, it is a given that you need to update your proc macros as you update your compiler.

@ojeda
Copy link
Member

ojeda commented May 28, 2021

i.e. for many use cases, having access to the very latest syntax does not matter (whether accepted or not by the parser).

e.g. one approach I have seen (Python bindings for Clang) is having "unknown" nodes in the AST for anything that cannot be expressed in the current API.

Another approach would be providing "syntax editions" and keeping support for them, much like the compiler does for Rust editions.

@bjorn3
Copy link
Member

bjorn3 commented May 28, 2021

That is only a problem for those that want to access the very latest syntax through the API. For them, it is a given that you need to update your proc macros as you update your compiler.

The existing struct definitions need to get updated for the examples I gave. No new struct definitions (or other definitions) are introduced.

@ojeda
Copy link
Member

ojeda commented May 28, 2021

The existing struct definitions need to get updated for the examples I gave. No new struct definitions (or other definitions) are introduced.

No, they do not need to get updated.

@Manishearth
Copy link

I think it's worth pointing out that the Rust standard library is extremely conservative, so even if adding syn were possible to do in a way that is stable and pleasant, it would still be very unlikely to happen.

I think the chances of this happening are near zero and it is not worth putting any hopes into that happening when considering tradeoffs here. Not that that means syn should be merged into the Linux deps, just that it will be easier to assume syn-in-std is not and will never be a part of the equation for that decision.

@nbdd0121
Copy link
Member Author

Closed in favour of #910

@nbdd0121 nbdd0121 closed this Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants