-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vision for cleaner substrate-node's developer-facing interface #5
Comments
(Once I get some initial feedback, unless if this is totally crazy, I would like to post this in the parity forum for other eco-dev related departments to also see.) |
I don't see much reason to have custom subcommand. |
Why not? I mean, why impose that limit to the builders? |
It is all about trade offs. If you believe support custom RPC have more advantages than disadvantages, sure, list them out and we good. Please keep in mind that Substrate is so flexible to the point it is pain to use and we still haven’t learned lessons from this? |
Yeah, custom RPCs are not really needed. This simple kind of interface should be to onboard people really easy. There would be some advantaged mode where you can still add your custom RPCs and whatever. So, there will be some "hard mode" and some "easy mode". I already spoke with @kianenigma and I support this for parachains: paritytech/cumulus#2181
Yes, but to make this properly it will require quite a lot of work. |
Ok, yeah...makes sense. |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: |
@expenses and @gnunicorn were also keen on this happening. |
Strong disagree on any There's a lot of stuff baked into consensus that doesn't usually get considered:
Consensus is usually not just one task but many things stitched together. In my view, we should separate the infrastructure that Substrate (interfaces to libp2p, a block database, maybe stuff like sync or tx-gossip) from everything else. These should be provided as context into long-running futures that all just have absolute freedom in how they use these things. Also, I don't see why the runtime should factor into this at all. The runtime should be encapsulated entirely by the chain-spec once the native runtime is gone, right? chain-spec is a CLI flag and can be handled internally. Another goal is that we should not only abstract complexity at the very outer layer, but also make extending these functionalities very simple. The simplest thing is just letting users spawn futures. For example - starting with an Aura parachain, but then adding a bunch of off-chain logic and networking protocols for e.g. offchain storage or oracles. This should be an easy tweak to the templates we provide. My ideal here would be something like // Some "mega object" that gives you access to:
// - Registering new network protocols
// - Adding block import hooks and notifications
// - Spawning tasks
// - Importing blocks
// - Finalizing blocks
// - Generating storage proofs
// - Calling runtime APIs
// - Setting the best block
// - Adding RPCs
// - Reading blocks and headers
// - Reading and writing arbitrary items in the DB
// - Reading state
// - Executing Wasm
// - etc.
//
// It's not intended to be `Clone`d, but will give shared ownership of handles for all these things to avoid leaking a
// god-object everywhere.
let raw_node: Node = RawNode::<Block>::new(cli_params); The Raw Node does nothing on its own. It's just a bag of all the raw components you need to do anything interesting. This makes starting consensus logic simple. Here are some examples (which could be library functions exposed by top-level Substrate/Cumulus crates). A single function for running a standalone Babe/Grandpa node: // Example for a standalone Babe/Grandpa node
fn vanilla_substrate() {
let raw_node = RawNode::new(cli_params);
// Under the hood, this is setting up new networking protocols, block import hooks, new DB
spawn(run_grandpa(&raw_node));
spawn(run_babe(&raw_node));
// wait...
} How it might be used in Polkadot or other complex systems with extra subsystems added: fn polkadot() {
let raw_node = RawNode::new(cli_params);
// Under the hood, this is setting up new networking protocols, block import hooks, new DB
spawn(run_grandpa(&raw_node));
spawn(run_babe(&raw_node));
spawn(run_parachain_consensus(&raw_node));
// wait...
} How it might be used for a one-size-fits-all parachain with Aura: fn aura_parachain(cli_params) {
// This may be starting a Polkadot raw node underneath. Or not.
let relay_chain_interface = make_relay_chain_interface(cli_params);
let parachain_raw_node = RawNode::new(cli_params);
// Under the hood, this is setting up new networking protocols, block import hooks, new DB
spawn(run_parachain_general(&raw_node, relay_chain_interface));
spawn(run_parachain_aura(&raw_node, relay_chain_interface));
// wait...
} This is pretty dead-simple and would hide all the complexity of registering network protocols, starting services, etc. inside of specialized crates that aren't exposed to the user except through a simple one or two parameter function. I could see things getting a little more complex, i.e. if we let people swap in We need to make the easy stuff easy while keeping the hard stuff possible. From above:
It's not clear to me that |
@bkchr I am interested in sketching out some issues related to the above refactoring approach, if it's an agreeable direction. I am coming off of a larger project and would like to spend some time making Substrate more fun to work with for end-users, and this is one of the higher impact things I feel I can reason well about. Furthermore, many of these issues are the result of my own decisions in 2018/19 and I have some personal responsibility to improve the situation as a result. Some initial suggestions with the goal of creating the
|
While we don't have done that much work on the front of the node refactoring, we have collected ideas etc here: https://www.notion.so/paritytechnologies/Node-Refactoring-937b5770d14c494991903a4b7ce52012
Yes this is already going into the right direction, while I think we should even go a step further. Do we really need a
The The job of the import queue is to import blocks and not to verify the integrity of consensus seals etc.
Makes sense.
We really want to get rid off the
💯 and we have done some great work on this. Thanks to @altonen and his team the sync is almost a "free standing" protocol and also all the other protocols got moved out of We have developed Substrate with Polkadot in mind, which made sense and was a totally reasonable way to approach the problem we had ahead of us. However, now we need to break the chain ;) between Polkadot and Substrate to make Substrate the generic framework it should be (still in some bounds, but you hopefully get what I mean). Now to this issue, as I already had said above, this idea of having a builder interface for creating a node is not something we didn't also thought about. I think this will also crucial as I want to have this On top of this "expert mode" we would then provide this builder interface. The builder interface would give users a very simple interface to build their own node. However, at some point it would stop its support and would require the user to switch to the "expert mode". This builder interface will also enable to do proper versioning quite easy, as we only need to version the functions that the builder exposes as the rest should be hidden in the implementation. I'm still in favor that we start with this builder pattern like approach for Parachains. We don't need to "waste" time now on standalone blockchains, because that isn't our focus. |
Sure, there's not much difference between a
The job of the import queue is 100% to verify the integrity of consensus seals, we just do it in a half-baked way. In particular, it's theoretically important when performing a full sync from genesis. Doing full sync quickly requires parallel verification of headers which are pending import. Every Ethereum node does this, including parity-ethereum from 2016. There are two phases to import: things which can be done in parallel (PoW checks, signature checks), and things which must be done sequentially. While no Substrate consensus logic currently makes use of this, I don't see a good motivation for removing the possibility. One day, someone will be tasked with making full sync fast.
Is it that much of a god object? It combines database access and code execution with some more superfluous stuff like fork choice rules. We might destructure it somewhat but the ability for combining those two things is definitely needed, at the very least in a read-only way.
I'm fine with a builder-like interface as long as it doesn't try to proscribe things like consensus and we expect that things like GRANDPA will have a "builder pattern" vs "expert mode" feels like a false dichotomy. I believe a builder pattern could easily get more complicated than a well-written API for "expert mode" (the i.e. when you spawn GRANDPA you want:
The builder pattern (without a |
I meant this on the BABE/Grandpa crate level. There we probably don't need this trait. You can still introduce a trait on a higher level when needed.
Yes I know that we doesn't support this. Still I don't know how parallel verification is related to block import. You get new header or even blocks from the network. You verify these in parallel and then you pass the blocks in the import queue. The main thing that changes here is that we split the block import from the verification. And by doing this we achieve exactly what you want, we achieve to run verification in parallel and then block import can continue in sequential mode. However, as I would also like to move the seal removal into the runtime, we would not achieve that much from doing it in parallel before. But someone could argue that you pass in a "seal already" checked argument to
The main problem with the current model is that you bring too much assumptions into the process. Too many weird branches etc to support all the small differences. This works for stuff like ETH where you have a specific implementation of a blockchain. However, it doesn't work for a framework that wants to be generic for all its users. There are tons of "hacks" we added to support custom implementations of X and Y. Fork choice rule would for example something that lives in the database write task. This task in the only task that can give you a transactional view onto the database, as it is the one that writes to it. If you need to know exactly what is the block you want to build on, you need to call into this database task (it would be a channel that communicates with the object). However, things like RPC don't need the strong guarantee and are mainly happy with a good enough guarantee that the block its getting is the best block. One main problem we also have is that the database is aware of what is the best or finalized block. The database should not know this. The database should store blocks and give you an interface to query them. The fork choice rule should be the one deciding on what is the best block. This would also solve things like the informant showing a wrong best block for the relay chain when we have a dispute until the new fork is seen as the best chain. The finalized block also isn't interesting to the database, the database just needs to expose some interface
I would expect that the builder has functions like |
The implementation of that function would look pretty close to this. fn run_grandpa_and_aura(node: &mut RawNodeForSpawningTasksAndAddingHooksAndMetricsAndWhatever) {
grandpa::run(node);
aura::run(node);
} where those crates handle setting up block import hooks, network protocols, whatever. This does not look too difficult for the average programmer. Making things simple doesn't require anything more than just exposing the raw functionality that tasks need to run. What I mean is that I don't understand the justification for adding a builder pattern alongside the general refactoring, when just refactoring node startup already makes things easy for the end user. Basically, I see
Okay, yes, this is reasonable. There are some complications around runtime upgrades but that can mostly be worked around at higher levels.
Fork choice rules should just be background tasks like anything else. We should not make any assumptions that a fork choice rule is a function that only takes the current known blocks as input. (see https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/node/core/chain-selection/src/lib.rs). |
I never said that this is an assumption. It is just about the synchronization point between looking into the db an ensuring that the db doesn't change while you determine the best block. That you need to have access to all blocks and more information is clear to me ;)
General refactoring and builder pattern are mostly orthogonal things. I still think that you assume too much knowledge from the user side. I'm looking more on the people wanting to get something up fast for experimentation. However, I also don't have any hard requirements on this stuff as long as we go into the direction I tried to outline above. The things you have proposed are also mainly fine and it is just about small details and we could probably make both work. |
The only thing a fork choice rule really needs is that blocks don't disappear from the DB. There might be some necessary CAS logic but that is a lower level concern where I think we can break any coupling (this seems good).
I don't think I'm assuming any knowledge - some preconfigured templates and single function calls would make things really easy for anyone. It's just an API design philosophy, "make the easy things easy and the hard things possible". The stack I'm imagining is:
If you really want to target the audience of people who are looking to experiment, we can't just write code to target level (4) and call it a day. Those users will then ask questions like "how do I add a custom RPC?" or "how can I add extra background tasks to the node?". It should be easy to answer those questions without pushing them into level (1). That is, it needs to be easy to move from writing code at level (4) to writing at level (3), (2), or even (1). A good programmer would be able to figure this out from the docs, as long as the code is explicit enough. They'd see that the My main concern is that shifting the "paradigm" between level 4 and level 3 is not going to be good for the developer community to do anything other than basic demos. |
Yeah. In general I want to prevent stuff like this. This
I'm with you 100% or better 99.99% ;) I think the main difference being where things are being exposed etc. Nothing that is really blocking IMO. |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/developer-experience-must-be-our-1-priority/3957/8 |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/developer-experience-must-be-our-1-priority/3957/12 |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/developer-experience-must-be-our-1-priority/3957/47 |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/developer-experience-must-be-our-1-priority/3957/48 |
need to make sure this is addressed #2499 |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/polkadot-parachain-omni-node-gathering-ideas-and-feedback/7823/1 |
Related to: #5 A couple of cosmetics and improvements related to `polkadot-parachain-bin`: - Adding some convenience traits in order to avoid declaring long duplicate bounds - Specifically check if the runtime exposes `AuraApi` when executing `start_lookahead_aura_consensus()` - Some fixes for the `RelayChainCli`. Details in the commits description
Related to: #5 A couple of cosmetics and improvements related to `polkadot-parachain-bin`: - Adding some convenience traits in order to avoid declaring long duplicate bounds - Specifically check if the runtime exposes `AuraApi` when executing `start_lookahead_aura_consensus()` - Some fixes for the `RelayChainCli`. Details in the commits description
…h#4666) Related to: paritytech#5 A couple of cosmetics and improvements related to `polkadot-parachain-bin`: - Adding some convenience traits in order to avoid declaring long duplicate bounds - Specifically check if the runtime exposes `AuraApi` when executing `start_lookahead_aura_consensus()` - Some fixes for the `RelayChainCli`. Details in the commits description
* Integrate storage monitor * Fix clippy
# Description This is a continuation of #5666 that finally fixes #5333. This should allow developers to create custom syncing strategies or even the whole syncing engine if they so desire. It also moved syncing engine creation and addition of corresponding protocol outside `build_network_advanced` method, which is something Bastian expressed as desired in #5 (comment) Here I replaced strategy-specific types and methods in `SyncingStrategy` trait with generic ones. Specifically `SyncingAction` is now used by all strategies instead of strategy-specific types with conversions. `StrategyKey` was an enum with a fixed set of options and now replaced with an opaque type that strategies create privately and send to upper layers as an opaque type. Requests and responses are now handled in a generic way regardless of the strategy, which reduced and simplified strategy API. `PolkadotSyncingStrategy` now lives in its dedicated module (had to edit .gitignore for this) like other strategies. `build_network_advanced` takes generic `SyncingService` as an argument alongside with a few other low-level types (that can probably be extracted in the future as well) without any notion of specifics of the way syncing is actually done. All the protocol and tasks are created outside and not a part of the network anymore. It still adds a bunch of protocols like for light client and some others that should eventually be restructured making `build_network_advanced` just building generic network and not application-specific protocols handling. ## Integration Just like #5666 introduced `build_polkadot_syncing_strategy`, this PR introduces `build_default_block_downloader`, but for convenience and to avoid typical boilerplate a simpler high-level function `build_default_syncing_engine` is added that will take care of creating typical block downloader, syncing strategy and syncing engine, which is what most users will be using going forward. `build_network` towards the end of the PR was renamed to `build_network_advanced` and `build_network`'s API was reverted to pre-#5666, so most users will not see much of a difference during upgrade unless they opt-in to use new API. ## Review Notes For `StrategyKey` I was thinking about using something like private type and then storing `TypeId` inside instead of a static string in it, let me know if that would preferred. The biggest change happened to requests that different strategies make and how their responses are handled. The most annoying thing here is that block response decoding, in contrast to all other responses, is dependent on request. This meant request had to be sent throughout the system. While originally `Response` was `Vec<u8>`, I didn't want to re-encode/decode request and response just to fit into that API, so I ended up with `Box<dyn Any + Send>`. This allows responses to be truly generic and each strategy will know how to downcast it back to the concrete type when handling the response. Import queue refactoring was needed to move `SyncingEngine` construction out of `build_network` that awkwardly implemented for `SyncingService`, but due to `&mut self` wasn't usable on `Arc<SyncingService>` for no good reason. `Arc<SyncingService>` itself is of course useless, but refactoring to replace it with just `SyncingService` was unfortunately rejected in #5454 As usual I recommend to review this PR as a series of commits instead of as the final diff, it'll make more sense that way. # Checklist * [x] My PR includes a detailed description as outlined in the "Description" and its two subsections above. * [x] My PR follows the [labeling requirements]( https://github.com/paritytech/polkadot-sdk/blob/master/docs/contributor/CONTRIBUTING.md#Process ) of this project (at minimum one label for `T` required) * External contributors: ask maintainers to put the right label on your PR. * [x] I have made corresponding changes to the documentation (if applicable)
…h#4666) Related to: paritytech#5 A couple of cosmetics and improvements related to `polkadot-parachain-bin`: - Adding some convenience traits in order to avoid declaring long duplicate bounds - Specifically check if the runtime exposes `AuraApi` when executing `start_lookahead_aura_consensus()` - Some fixes for the `RelayChainCli`. Details in the commits description
related to #186 #1337
Vision 1
As it stands now, we take it for granted that that one builds a runtime, then pulls the substrate-node-template (or some similar repository of code), integrate it in a "blind" fashion (using probably nasty trial and error, without really knowing what they are doing) and never look at the node side code ever again.
Taking the same mindset that I talked about for a "substrate-node-builder-cli" here, Imagine an alternative like this:
main.rs
that looks like:This is basically what a "creat-substrate-app" CLI would do for you, but done via code. Once we have this, the CLI would be rather trivial to build, and would merely be something that converts a YAML/JSON file t the above piece of code.
Vision 2
I have little clue about what are the blockers to reach the above. Historically I knew that code in
service.rs
has been quite a difficult one to play with. But, looking at a very simple node-template with manual seal, I think I can wrap my head around it and conclude that the above is feasible.But, if the above is not possible, what I would be a good improvement to the existing quo, especially from a DevEx perspective is to replace the need to clone any code for the node software with pure binaries.
That is, once #62 is done, and there is less (hopefully none!) strict dependency between the node and the runtime, the process to learn FRAME and run a basic chain would be:
node
and run./node --runtime path/to/wasm.wasm <rest of substrate cli opts>
and the rest should work.I am not sure if this is actually much simpler than the above, as it would require some checks like seeing which runtime api/rpcs are available to become runtime (os opposed to the current compile-time) checks.
The text was updated successfully, but these errors were encountered: