-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(gno.land): make gno store cache respect rollbacks #2319
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2319 +/- ##
==========================================
+ Coverage 60.24% 60.43% +0.19%
==========================================
Files 562 563 +1
Lines 75031 75149 +118
==========================================
+ Hits 45200 45419 +219
+ Misses 26458 26342 -116
- Partials 3373 3388 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
A current update with the state of the code: I'm trying to work "transaction" logic in the GnoVM store. The current store is not "transactional" (as in the database significance - ie. a set of operations that are performed together, by "committing" them or rolling them back) because if a transaction is rolled back, the "cache" fields of the GnoVM store are not. The end goal is to make it have transaction logic, plugging it into this existing mechanism: Lines 829 to 843 in 8aafb6c
ie. have it as one of the Stores in the MultiStore, so that the BaseApp only calls Write to "commit" the changes to the database if all the messages in the transaction have passed. So I've been trying to get the store out of the vm keeper, and into its own store which is plugged into other places. At the time being, I've done the changes to the GnoVM store needed to accomodate committing and rolling back to a parent store. Right now, trying to run a gnoland node quickly leads to a consensus failure. My hunch is that this comes from the fact that the current The GnoVM tests are succeeding |
gno.TypeCheckMemPackage() is used by the keeper to check the type first w/ MsgAddPackage, but during type checking Go's type checker will import some mempackages, but the mempackage is first run by the import getter function. The gno.Store's types are not cleared after a DeliverTx, nor are the nodes. I think the easiest way to solve this problem right now is to change the implementation of the usage of the type checker so that it doesn't call store.GetMemPackage, but only reads the mempackage through ReadMemPackage(). The type checker doesn't care about gno state, I think it only cares about the source code, so there's no need to run "GetMemPackage" which also runs the mempackage. The hard way to solve this is to complete the implementation of the store and fix everything so that types and nodes can be cleared, and restored correctly lazily. Lazily loading a package where the nodes were cleared, would also involve preprocessing the loaded mempackage, but we wouldn't need to run it since it was already run before. I think a hacky way to solve this is to insert a gno.Write() below: // Validate Gno syntax and type check.
if err := gno.TypeCheckMemPackage(memPkg, gnostore); err != nil {
gnostore.Write() // <--- hacky solution
return ErrTypeCheck(err)
} Then the values constructed from store.GetMemPackage() will be stored, and a stdlib that was once initialized won't be re-initialized. Since the package value of "encoding/base64" wasn't saved the first time upon a type check (or something like that), store.GetMemPackage is called a second time in a later transaction for the same "encoding/base64", but it fails because the type is found from the store the second time around: // in preprocess.go
case *DeclaredType:
// if store has this type, use that.
tid := DeclaredTypeID(lastpn.PkgPath, n.Name)
exists := false
if dt := store.GetTypeSafe(tid); dt != nil {
// XXX SECOND PASS HERE XXX
dst = dt.(*DeclaredType)
last.GetValueRef(store, n.Name, true).SetType(dst)
exists = true
}
if !exists {
// XXX FIRST PASS HERE XXX
// otherwise construct new *DeclaredType.
// NOTE: this is where declared types are
// actually instantiated, not in
// machine.go:runDeclaration().
dt2 := declareWith(lastpn.PkgPath, n.Name, tmp)
// if !n.IsAlias { // not sure why this was here.
dt2.Seal()
// }
*dst = *dt2
} Another solution might be to not use the store in the preprocessor for this purpose but I'm not sure what other problems will be encountered with this approach. Either way for now I think the easiest solution is to not use GetMemPackage but ReadMemPackage for the type checker. (We shouldn't be running init() for the same package redundantly anwyays, so unless we're going to do the hacky gnostore.Write(), just not doing any initialization seems better.) The way to do that is, I think, instead of passing in a mere packageGetter callback to the gno.Store, we pass in an object that implements two methods separately; GetPackage() and GetMemPackage(). That way the type checker can call GetMemPackage, while during preprocessing of any code that imports a yet-unloaded stdlib, the store would end up calling GetPackage() which actually performs the initialization. |
Maybe the solution being worked on here will work, but I'm not sure. |
I read this this morning and got thinking to how this could be done -- ie., how I came up with #2504 :) Since all stdlibs are eagerly loaded at startup, there is no needed for the package getter on-chain; so the scenario which is trying to be fixed here. The issue is not fully solved, I'm sure, which is why I want to keep this PR and the underlying idea open. Ie., if we don't roll-back together with the transaction, a succeeding message can modify the GnoVM store (and its cache), to then be followed by a failing transaction; this sequence changes the GnoVM store, but not the underlying baseStore/iavlStore.
Yes, I tried going for that in a first approach (making the GnoVM store one which doesn't actually "store" anything, and make the word "cache" representative of what the maps are actually doing). But then I realised that it was kind of a more herculean task For Another Time, due to the amount of data structures involved in the serialization of BlockNodes.
Trust me, I would have loved to make it all Unluckily, cloning the blocknodes is slow on a decently large store (enough to be very noticeable); so I decided against it. It was one of the things I wanted to "permanently" fix with this PR. |
Agreed. Maybe we need a tag "RevisitLater" and close a bunch of PRs/issues for our peace of mind. |
Fixes #2283 (with a different approach from #2319) This PR loads all the standard libraries into the database when vm.Initialize is called. It doesn't fully fix the problems that #2319 is trying to address, but it does fix the most immediate bug of not being able to publish certain packages on portal loop. With these changes, we don't need a PackageGetter on-chain anymore. All packages are already loaded into the store, thus solving the problem @jaekwon was talking about [here](#2319 (comment)) at its root. This PR has a problem, which is that adding loading the entire stdlibs in the VM initialization step brings a huge overhead computationally; this is not a problem on the node, but it is when testing as it needs to happen very often. This translates to 2x slower txtar tests, compared to master. On my PC, it adds a 2-3 second overhead whenever running Initialize. I tried working out on a system which could save the data to quickly recover it, at least for some cases where we need to Initialize often; [see this diff](https://gist.github.com/thehowl/cb1ee79e63cf77d3f323730580eb2d18). But I didn't get it to work so far; after copying the DB, attempting initialization crashes because [ParseMemPackage is being handed a nil package, for some reason](https://gist.github.com/thehowl/d1efa51858d865fb5beb9c3a9cb0eeef). @gfanton, any tips?? :)) I'd want to avoid lazy loading in the node, as that's what got us here in the first place. <details><summary>Contributors' checklist...</summary> - [ ] Added new tests, or not needed, or not feasible - [x] Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory - [x] Updated the official documentation or not needed - [x] No breaking changes were made, or a `BREAKING CHANGE: xxx` message was included in the description - [x] Added references to related issues and PRs - [x] Provided any useful hints for running manual tests - [x] Added new benchmarks to [generated graphs](https://gnoland.github.io/benchmarks), if any. More info [here](https://github.com/gnolang/gno/blob/master/.benchmarks/README.md). </details> --------- Co-authored-by: Manfred Touron <[email protected]>
I'm going to continue working on this in an effort to have a fix for #2605 |
…ang#2504) Fixes gnolang#2283 (with a different approach from gnolang#2319) This PR loads all the standard libraries into the database when vm.Initialize is called. It doesn't fully fix the problems that gnolang#2319 is trying to address, but it does fix the most immediate bug of not being able to publish certain packages on portal loop. With these changes, we don't need a PackageGetter on-chain anymore. All packages are already loaded into the store, thus solving the problem @jaekwon was talking about [here](gnolang#2319 (comment)) at its root. This PR has a problem, which is that adding loading the entire stdlibs in the VM initialization step brings a huge overhead computationally; this is not a problem on the node, but it is when testing as it needs to happen very often. This translates to 2x slower txtar tests, compared to master. On my PC, it adds a 2-3 second overhead whenever running Initialize. I tried working out on a system which could save the data to quickly recover it, at least for some cases where we need to Initialize often; [see this diff](https://gist.github.com/thehowl/cb1ee79e63cf77d3f323730580eb2d18). But I didn't get it to work so far; after copying the DB, attempting initialization crashes because [ParseMemPackage is being handed a nil package, for some reason](https://gist.github.com/thehowl/d1efa51858d865fb5beb9c3a9cb0eeef). @gfanton, any tips?? :)) I'd want to avoid lazy loading in the node, as that's what got us here in the first place. <details><summary>Contributors' checklist...</summary> - [ ] Added new tests, or not needed, or not feasible - [x] Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory - [x] Updated the official documentation or not needed - [x] No breaking changes were made, or a `BREAKING CHANGE: xxx` message was included in the description - [x] Added references to related issues and PRs - [x] Provided any useful hints for running manual tests - [x] Added new benchmarks to [generated graphs](https://gnoland.github.io/benchmarks), if any. More info [here](https://github.com/gnolang/gno/blob/master/.benchmarks/README.md). </details> --------- Co-authored-by: Manfred Touron <[email protected]>
There is some cleanup to be done; tests are failing and need to be adjusted. But an empirical start of the chain and restart now runs successfully. 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 |
Okay, the coverage is up to ~78%, and I'm well beyond the point where I'm adding tests on areas that are barely touched by this PR; so I'll stop here. I addressed the issue that Milos brought up, which was a relatively simple change. @zivkovicmilos, if you have a minute, take a second look. If it looks good to you I think we can likely push it for a final check with Jae. @ltzmaxwell, if you have a minute, take a look at this: #2319 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code seems as good as it could be respecting the actual architecture. Waiting for @zivkovicmilos comments to be addressed to approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for showing this bug that the core team's got hands too :)
Merging for now, as this PR is important to move us forward. @jaekwon, feel free to review later. As I told you, I'd be happy if most of the work undertaken here could be simplified, eventually, by (a) making the caching work on a msg level rather than being permanent from chain start and (b) possibly have caching where it makes sense (ie. for BlockNodes and such) happen in a properly managed in-memory cache rather than being an append-only cache which would potentially end up using a lot of memory. |
}) | ||
baseApp.SetEndTxHook(func(ctx sdk.Context, result sdk.Result) { | ||
if result.IsOK() { | ||
vmk.CommitGnoTransactionStore(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this should be named 'commit.' Here, it only writes to the store, while the actual commit happens in Commit(), not in EndBlock().
dirty map[K]deletable[V] // pending writes on source | ||
} | ||
|
||
func (b *txLog[K, V]) Commit() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this should be named 'Commit.' It does not commit anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're welcome to make a PR to propose a rename to Write
- or anything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(My reasoning here for the name comes from the terminology also used in SQL; which uses BEGIN, ROLLBACK and COMMIT)
"The tm2 BaseApp has the new "hooks" BeginTxHook and EndTxHook. The former is called right before starting a transaction, and the latter is called right after finishing it, together with the transaction result. Instead of creating an App.BeginTxHook and App.EndBlockHook, to keep ABCI clean, what do you think about implementing these at the module level as the VM keeper's BeginBlocker and EndBlocker, which are called within BaseApp.BeginBlock and BaseApp.EndBlocker? |
It's important we do rollbacks precisely as they are done for the underlying store. The store rollbacks ALL changes from ALL messages in a transaction, but only in the failing transaction. Scoping this at the block level would mean that the "entire block fails". |
Thank you for the explanation, that makes sense. Here are two questions.
|
Remember, if any message in a transaction fails, then the entire transaction should fail. If we tackle this at the level of
In general, yes, but the underlying changes are a bit more complex, involve discussion and were out of scope here. My current thinking is that:
Of course the result of Preprocess itself can be cached, so in a real-life scenario we don't have to run it often, especially for standard libraries and very common packages. But it means also that we're not bounded by the nodes' RAM in how many packages we can have on-chain. |
This pull request modifies the current
defaultStore
to support transactionality in the same way as our tm2 stores. A few new concepts are introduced:TransactionStore
interface is added, which extends thegnolang.Store
interface to support a Write() method.BaseApp
has the new "hooks"BeginTxHook
andEndTxHook
. The former is called right before starting a transaction, and the latter is called right after finishing it, together with the transaction result.TransactionalStore
in thesdk.Context
through theBeginTxHook
; and commit the result, if successful, in theEndTxHook
.Overview of changes
gno.land
pkg/gnoland/app.go
: the InitChainer is now additionally responsible of loading standard libraries. To separate the options related to app startup globally, and those to genesis, the InitChainer is now a method of its config struct,InitChainerConfig
, embedded into theAppOptions
.pkg/gnoland/app.go
:NewAppOptions
is only used inNewApp
, where most of its fields were modified, anyway. I replaced it by changing thevalidate()
method to write default values.pkg/gnoland/node_inmemory.go
,pkg/integration/testing_integration.go
: these changes were made necessary to supportgnoland restart
in our txtars, and supporting fast reloading of standard libraries (LoadStdlibCached
).pkg/sdk/vm/keeper.go
: removed all the code to load standard libraries on Initialize, as it's now done in the InitChainer. The hack introduced in fix(sdk/vm): re-load iavl store from backup if empty #2568 is no longer necessary as well. Other changes show how the Gno Store is injected and retrieved from thesdk.Context
.gnovm/pkg/gnolang/store.go
TransactionalStore
; the "transaction-scoped" fields of the defaultStore are swapped with "transactional" versions (including the allocator, cacheObjects and cacheTypes/cacheNodes - the latter write to atxLogMap
).Map
interface intxlog
allows us to have atxLog
data type stacked on top of another. This is useful for the cases where we useBeginTransaction
in preprocess. (We previously usedFork
.) See later in the "open questions" section.txlog.Map
interface - this will be compatible with RangeFunc, once we switch over to go1.23.tm2/pkg/sdk
gnovm/pkg/gnolang/machine.go
has a one-line fix for a bug printing the machine, whereby it would panic if len(blocks) == 0.Open questions / notes
txLog
; it's still ingnolang/internal/txlog/txlog_test.go
where it also has benchmarks. See 1347c5f for the solution which usesbufferedTxMap
, without using thetxlog.Map
interface. The main problem with this approach is that it does not support stacking bufferedTxMaps, thus there is a different method to be used in preprocess -preprocessFork()
.