-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crate local state for procedural macros? #44034
Comments
Statics and thread-locals should both be safe to use as the proc-macro crate is loaded dynamically and remains resident for the duration of the macro-expansion pass for the current crate (each crate gets its own compiler invocation). This is not necessarily stable as eventually we want to load proc-macros as child processes instead of dynamic libraries, but I don't see why they wouldn't be kept alive for the duration of the crate's compilation run anyway. |
@abonander I don't think this is reliable for two reasons:
|
Adressing orderingDeclare dependency of macros to enable delaying of macro execution. In practical terms, think of macro #[proc_macro_derive(Foo)]
pub fn foo(input: TokenStream) -> TokenStream {
...
}
#[proc_macro_derive(Bar, depends_on(Foo))]
pub fn bar(input: TokenStream) -> TokenStream {
...
} Adressing incremental compilationA persistant storage, maybe a web-like "local storage", per proc-macro-crate? This would store and load a byte array, which the user could (de-)serialize with e.g. serde Emerging questions
|
Has there been any movement on this issue? |
The problem with such a scheme is that invocations of the same macro are still not ordered, so you can easily end up in a situation where the order of invocation changes the result. If an ordering scheme between proc macros is implemented, one could consider giving |
Perhaps I'm missing some nuance or information but, what if when you defined local storage you also had to define any and all files affected by this? Then when you re-compiled it would then scan those files for changes and re-compile as appropriate? Whilst I'm all for the idea of automating where possible, the advantage is that you can now get a clear list of files involved, and it would provide some working functionality that would provide what this issue is trying to solve, even if it's not perfect, so long as it's reasonably ergonomic (despite having to list the files) it should be good enough. Yes, you do have to define each file that gets affected, but no doubt that there is a way to automate even that. |
I would love to have this future to solve the PyO3 add_wrapped requirement. |
A different approach. What do people think? Stateful Macros(I need better names, syntax etc. But the idea should be there) Have a "stateful macro", which is a compile time struct. Things run in 3x main steps:
// (May need to be tagged for compiler parsing reasons)
const db: postgresql::DBMacro = postgresql::new!(version => 42, schema => "cats.sql")
fn count_cats(con: DBConnection, cuteness: u64) -> u64 {
// sql! is able to read from data parsed in from new!
// eg, could type check the database query
// if a field in db is a set, then it can be added to, but not read from
// otherwise it can be read from, but not written to
db.sql!(con, select count(*) from cats where cuteness >= $cuteness)
}
fn main() {
let con = postgresql::Connection::new('10.42.42.42');
// all_queries is generated from a function that runs in a later stage
// of the compilation, and gets linked in like normal.
// It must have a fixed type signature (can change biased on params to new!)
// The function is able to read from all of db's variables, but write to none
con.prepare(db.all_queries!);
// expanded to: con.prepare(magic_crate_db::all_queries);
// use the preprocessed query
println!("There are {} extra cute cats!", count_cats(con, 4242));
} note: Changes to the values of delayed symbols don't require recompilation of the crate using it. Another example: for the sql! macro, it may want a unique id, that it can reference this query to the db by. The sql! macro could insert a symbol who's content is the position the query got inserted into the set (evaluated in stage 3). The sql! macro would not be able to see the contents of this symbol, but can inject it into the code. compiler wise, crates are compiled independently like normal till just before they need to be linked. The compiler would then:
Use cases
Addressing points raised
Other points of note
|
To me, it feels like:
|
@LukasKalbertodt If # rest of Cargo.toml
[package.metadata.foo]
config-item = "hello" One potential issue is ensuring that macros are re-expanded whenever the metadata key changes, which I'm not sure how to accomplish. |
FYI I implemented a scheme very similar to this in my macro_state crate. So far it seems to avoid most of the pitfalls. If not, pull requests welcome! |
Would it be possible to mark a proc macro as always-dirty, or that it shouldn't be cached? This might be necessary for stateful proc macros. It's currently unclear which parts of the toolchain cache macro invocations, if any, and which might in the future. Something like #[proc_macro(cache = false)] |
Not caching in the incr comp cache is not enough for stateful macros. You need to throw away the state every time and make sure state for different crates isn't mixed. For rust-analyzer this requires restarting the proc macro server every time anything changes and prevents reusing the same proc macro server for multiple crates. There is no way to restart the proc macro server and re-expand the whole crate on every keystroke (inside a macro expansion) fast enough to not add noticable latency. Depending on the amount of macro invocations this could take multiple seconds, but even just restarting the proc macro server on windows would be noticable I think. |
@bjorn3 Could you not provide a way to mark macro as being state-full vs not so? Then in the macro-server you allow only those that are marked as such to be constantly re-expanded? |
Constantly re-expanding would be bad for latency in an IDE. You only have like 100ms after typing to compute autocompletion suggestions without it feeling sluggish. Re-expanding all macros of a specific kind and invalidating all necessary caches on every keystroke may well cost a significant part of that budget. And that still doesn't solve the issue of ordering. Rustc may currently run everything in source order, but rust-analyzer will lazily expand out of order. |
One of the hardest things in programming, Cache-Invalidation le-sigh |
I believe the approach proposed by David Tolnay in his crate linkme solves a huge part of this topic's problem without any side-effect pitfalls to the current compiler's workflow (without any harms to the incremental compiler and/or IDE plugins in particular). Instead of introducing of the global state in the macro expansion stage, or the life-before-main in the runtime stage, the crate utilizes linker capabilities to collect related metadata across the compilation module. Under the hood the linkme crate introduces two macros: the first one assigns a common linker section name to the Rust statics which enforces the linker to concatenate all such static values into a single slice during the linking stage, and the second macro that associates another static with this link section (that would point to the slice assembled by the linker). With this approach you can implement a macro that would "write" the data into a common shared slice in uncertain order, but you wouldn't be able to "read" this data from the macros code (during the macro expansion), you would have to manage this slice in runtime. This solution solve's the problem of this Issue only partially:
Hopefully it should be enough to solve most of the problems when one would practically need macros intercommunication (e.g. a plugin system in some way). One of the drawbacks of the current linkme implementation is that the implementation is platform-dependent. The linkers have slightly different interfaces depending on the target platform, and in particular the wasm target doesn't have a linker at all. David's idea to address this issue is to raise an API similar to the linkme's API to the Rust's syntax level such that one would use linking capabilities for statics without the need to write platform-dependent code. I believe this proposal needs more visibility by the Rust community. |
In my experience, having a build number that increments with every new build and/or a const timestamp that you generate right at the start of a build, and throwing out everything that is older than that is actually sufficient and works in most cases, but is entirely dependent on the coincidence that right now the compiler processes files top to bottom one at a time. For needier use-cases, you generate a random u64 and throw out anything not built with that particular u64, resulting in that always-dirty behavior. You can also achieve something similar using mutexes in your proc-macro crate. This isn't the best solution obviously, but it's definitely not an insurmountable problem, especially if we are making tweaks to the actual build process and not just trying to hack around this limitation in current stable. If we can hack it to work 99% of the time in stable, we could definitely tweak the build process to close that last 1%. for a really cursed implementation of this that I don't endorse at all see https://github.com/sam0x17/macro_state/blob/main/macros/src/macros.rs that said, I think going forward something like the linkme approach is probably the way to go, but would be nice if this wasn't dependent on platform-specific nuances |
That doesn't work for rust-analyzer. Rust-analyzer handles proc-macros for all crates in a single process and doesn't restart this process between rebuilds. Only when the proc-macro itself got rebuilt. Rust-analyzer will cache all macro expansions between edits. In the past it was possible for macro expansion to get evicted from the cache, but this is no longer the case as non-determinism in some macros caused rust-analyzer to crash after recomputing an evicted macro expansion. Rust-analyzer doesn't have any issues with the linkme approach however. |
I am glad to see such a crate for storing macro state. The core code in 1st macro is : if proc_has_state(&trait_name_str) {
proc_clear_state(&trait_name_str).unwrap();
}
proc_append_state(&trait_name_str, &caller_path).unwrap();
... And it is fetched from 2ed macro like this: let trait_infos = proc_read_state_vec(&trait_name_str);
... But finally, it still panic at the 2ed macro with msg (By the way, I found anthor approach from crate enum_dispatch only taking raw global vars. But I didn't get it.) |
btw a much cleaner way of doing this is with the outer macro pattern |
If the For my case, there are 3 reasons why I want to use local statement to replace the inner declarative macro in 1st macro.
I am so frustrated with the By the way, just for my specific case, I realize it could be done without using local state or |
so to be clear, an example of the outer macro pattern would be: #[my_outer_attribute]
mod my_mod {
#[attribute_1]
pub fn some_fn() {
#[attribute_2]
let some_expr = 33;
}
#[attribute_1]
pub fn some_other_fn() {
println!("hello world");
}
} in the parsing code for all of those inner macros, in turn, need not even be defined as real macros, since during your visitor pattern you will simply be finding and removing/replacing their invocations. For convenience and doc reasons it is common to provide stubs for these that compile-error if used, allowing you to attach rust docs to them (in a way that will be picked up by rust analyzer) and to force them to only be valid in the context of your outer macro pattern. They then become very similar to derive helpers. A very full example of this is the pallet syntax in substrate: https://github.com/paritytech/polkadot-sdk/blob/master/substrate/frame/balances/src/lib.rs#L191-L1168 The main caveat with this approach is it must be on a non-empty module and there is no (easy) way to spread this out over multiple files. Everything has to be in the same file that you want global access to. For more exotic things where you want to legally access the tokens of some foreign item in a proc macro, you can use my macro_magic pattern: https://crates.io/crates/macro_magic. You'll notice the |
@sam0x17, thanks your detailed suggestion, since supertrait is quite bizarre, I am still chewing it. |
Macro doesn't have to have a state to aggregate data. The macro function can return a |
@aikixd, as far as I know, a procedural macro should return |
Currently it must return the stream only, but it can be expanded, with additional macro, enum, or any other way. My line of though is the following: most of the cases where a state is required is to aggregate some data (perhaps rewriting the code along the way) and then reduce it to something. A common example would be marking handlers in web services: #[get("/a")]
fn handlerA(context: &mut ctx) { ... }
#[get("/b")]
fn handlerB(context: &mut ctx) { ... }
fn main() {
HttpServer::new().with_handlers(get_handlers!());
} This can be implemented like so (i'm using an enum approach): #[proc_macro_attribute]
pub fn get(...) -> proc_macro::MacroResult {
...
proc_macro::MacroResult::Data(my_data)
}
#[proc_macro(depends_on: get)]
pub fn get_handlers(item: TokenStream, aggr: HashMap<&str, &[MyDataType]>) -> ... { ... } The aggregate would contain the data from all the dependents. I'm not sure how to reconcile the data type, the presented signature wouldn't suffice. A lot of problems can be described in terms of reduction, so this would cover a lot of ground. Also, this approach doesn't clash with the caching, but works in harmony with it. Rules for updating the user data are the same as updating the token stream. Also, the analyzer will need to be updated accordingly. |
@aikixd the standard way of doing this (especially for things like URL handler attributes, which is one I've specifically implemented this way before) is to use the outer macro pattern. The short explanation is you will need to put all your routes in a In this way you can aggregate state info across the whole module while you parse it, instead of being limited to just a single function definition like you usually are. This is the outer macro pattern. so basically this: #[build_routes]
pub mod routes {
#[get("/a")]
fn handlerA(context: &mut ctx) { ... }
#[get("/b")]
fn handlerB(context: &mut ctx) { ... }
}
fn main() {
HttpServer::new().with_handlers(routes);
} A talk I did a while ago covers this fully here: https://youtu.be/aEWbZxNCH0A?si=ToRhOiM26FkBJK8P&t=1989 side note that if custom inner attributes ever stabilize, you can use this approach for entire files. Right now it has to be a |
Another more exotic way you can do this is using the this would look something like: #[get("/a")]
fn handlerA(context: &mut ctx) { ... }
#[get("/b")]
fn handlerB(context: &mut ctx) { ... }
fn main() {
HttpServer::new().with_handlers(routes![handlerA, handlerB]);
} but note the one thing you cannot do with this approach is know the full list of handlers -- they have to be written out in main. https://crates.io/crates/macro_magic for some relevant examples under the hood, the way this works is |
Before this patch, `thin_delegate` used a global variable in proc macro to transfer definitions of traits and structs/enums to where it generates impl of a trait. This method is problematic. For example, it's not ensured that the same process is used to invoke proc macros, even if it's in the same compilation unit (crate). E.g. incremental builds like rust-analyzer. See e.g. [1], [2], and [3] for more details. We resolve this problem by using declarative macros to transfer the data. This CL updates fundamental structures of this crate and updates tests as much as possible. Remaining tests will be updated in the next patches. [1] rust-lang/rust#44034 [2] https://users.rust-lang.org/t/simple-state-in-procedural-macro/68204 [3] https://stackoverflow.com/questions/52910783/is-it-possible-to-store-state-within-rusts-procedural-macros
I'm tinkering a bit with procedural macros and encountered a problem that can be solved by keeping state in between proc macro invocations.
Example from my real application: assume my
proc-macro
crate exposes two macros:config! {}
anddo_it {}
. The user of my lib is supposed to callconfig! {}
only once, but may calldo_it! {}
multiple times. Butdo_it!{}
needs data from theconfig!{}
invocation.Another example: we want to write a
macro_unique_id!()
macro returning au64
by counting internally.How am I supposed to solve those problems? I know that somewhat-global state is usually bad. But I do see applications for crate-local state for proc macros.
The text was updated successfully, but these errors were encountered: