-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txtar for test4 cacheTypes issue #2605
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2605 +/- ##
==========================================
- Coverage 55.03% 55.02% -0.02%
==========================================
Files 595 595
Lines 79665 79644 -21
==========================================
- Hits 43843 43822 -21
+ Misses 32509 32508 -1
- Partials 3313 3314 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Nice job diagnosing the issue 💪 . In regards to the solutions, could we maybe do one of the following:
I know this won't fix test4, but it seems like a sane solution going forward. I made a short POC implementation here -- https://github.com/gnolang/gno/compare/master...deelawn:gno:fix/transient-type-cache?expand=1 Let me know what you think, or if I'm missing something, why it won't work 😅 |
I think this is not just a problem for cacheTypes, but also for cacheNodes. And, even then, the code in tm2 commits a tx to store when all of its messages succeed; while your approach would instead commit to store on a "per-message" basis. Which is why I think it's better to do it with something like #2319's approach. |
Right, thanks. I forgot this is happening for each message, not each tx. |
I will react, naively, to the 3 solutions, @thehowl.
Solution 2. dicey, agree. The ideal solution if we omit gas calculations just for this sentence seems solution 1, right? In what way does solution 1 affect gas calculations? Is it like: If so, before ditching solution 1... in #2319, Jae talked about a short-term solution for rollback issues (you explored a different solution in #2504, consisting of preloading all packages at start):
Couldn't this work and help here? For the sake of the discussion, here the txtar in this PR (truncated after 80th column as we don't need the sourcecode):
|
Well, the txtar is actually already a good example of how fixing this bug breaks gas calculations. This is the output:
These two transactions are identical, except for the gas wanted. But the first panics because it exceeds the 10M gas limit. The second one succeeds.. but note that the gas used is now 7851998; ie. less than 10M. In fact, you get the same bug if instead of having tx1 and tx2, you only use tx1 (but re-sign it to have a different account-number); the first time around it fails, the second time it succeeds. This is because on the second time there's already data about the objects it's trying to save to the store, in cacheTypes. So... it hits the store less, and consequently it hits the gas counter less (in Keeping test4 as is while fixing the bug and maintaining the same gas used for all replayed transactions is hard. Which is why we're now looking at doing 1 (with #2319) and probably doing a chain reset / new chain.
the changes have already been merged in #2504 and are already part of the test4 binary (the bug is very related, but with #2504 we made it so at least it's not possible that a failing transaction makes a stdlib package "unimportable"). This is another problem, relating to the lack of rollback functionality in cacheTypes (and probably cacheNodes) as well. |
Merging to #2319 |
Draft, to share txtar for an issue on test4.
There is a new
gnoland restart
command to try to verify restarting a node. It doesn't seem to work properly in a "successful" scenario (it panics further down the line); but at least for as far as this issue is concerned, it shows the correct stacktrace we are encountering in production.My theory on the panic: tx1 fails because of an out-of-gas error, creating one of those inconsistencies between
cache*
fields in thegno.Store
and the underlying database. The typegno.land/p/g17ernafy6ctpcz6uepfsq2js8x2vz0wladh5yc3/zentasktic.Collection
is stored incacheTypes
, but not in the store.gno.land/p/g17ernafy6ctpcz6uepfsq2js8x2vz0wladh5yc3/zentasktic
was then redeployed, in tx2. This time, it worked; however, when saving the package, SetType is called and is a no-op, because the type is considered as being already in the store (as it exists in the cache).Thus, already running nodes are fine (they have the
zentasktic.Collection
from the failing transaction, still in the cache). However, when restarting a node, the type exists in neithercacheTypes
nor the underlying database; generating the panic.Ideas for a solution:
RunFiles
inPreprocessAllFilesAndSaveBlockNodes
, so we can have the types eventually saved again in the cache/store.