fix: consolidate vm gas consumption #1430

piux2 · 2023-12-11T11:18:21Z

Contributors' checklist...

Added new tests
Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory
Updated the official documentation or not needed
No breaking changes were made
Added references to related issues and PRs
Provided any useful hints for running manual tests
Added new benchmarks to generated graphs, if any. More info here.

Summary

The current gno.land node, optimized for development purposes, has a simplified verification process and gas meter implementation. To transition the gno.land node to a production-ready state, it is necessary to implement a comprehensive gas metering system that accurately accounts for VM gas consumption. This includes refining the gas fee structure to encompass all relevant costs, ensuring robust transaction validation, and calculating gas consumption based on actual computational load.

This PR aims to address these limitations by introducing a complete gas meter and validation flow, laying the groundwork for further gas meter profiling and configuration.

Problem Definition

Current State and Limitations for Production:

VM Gas Consumption Not Accounted in Gas Meter: The current gas meter fails to calculate VM gas consumption, potentially allowing heavy contract loads without corresponding gas meter deductions. A refined system should measure and charge for VM gas usage accurately.
Gas Fee Structure: Presently, the gas fee structure only includes storage access, transaction size, and signature verification. VM gas fees are levied as a separate, flat fee, which might lead to confusion among users expecting the total fee to match the amount specified in the 'gas-fee' argument. For improved transparency and precision, the gas fee structure should integrate all these aspects.
Transaction Validation: The system currently validates basic information for VM msg_addpkg and msg_call. However, gas consumption cannot be determined before fully executing these messages against the VM. Consequently, VM transactions are placed in the mempool and propagated to other nodes, even if they may not meet the gas fee requirements to execute these transactions. This undermines the purpose of using gas fees to prevent VM spamming.

Solution: ( Updated )

This is a high-level description of the implemented features:

~~Added an anteHandler in VM to monitor gas consumption~~
~~Implemented chained VM anteHandler in auth.anteHandler~~

Consume gas to verify account, signature and tx size in CheckTx
Consume VM gas in DeliverTx
Accumulated VM CPU cycles, memory allocation, store access, transaction size, and signature verification into a single gas meter.
Enabled local node checks of VM resource usage. The VM message is only aborted if it runs out of gas in basic CheckTx. However, the message is still propagated to other nodes if execution fails to prevent censorship
Introduced a structured format for logging gas consumption for profiling and metrics.
Introduced a gas factor linking gas to vm CPU cycles and memory allocation to balance between vm gas consumption with the rest.

Trade-offs and Future Optimization: ( Updated )

~~The current implementation processes messages against the VM to check gas consumption in abci.CheckTx() before inclusion in the mempool and propagation to other nodes.~~

Messages lacking sufficient gas-wanted will be dropped, preventing abuse without adequate gas fees. However, the trade-off is that for each message with enough gas, the VM executes the transaction twice: once in CheckTx() and once in DeliverTx(). As these occur in separate execution contexts and are not in synchronized sequence, the performance impact is currently a secondary concern.

We moved the VM gas check from CheckTx to DeliverTx for the following reasons:

We only know the VM gas consumption after the messages have been processed.
Running VM execution for many CheckTx requests from the peers could overload the mempool that is executing CheckTx.
This could slow down the propagation of transactions across the entire network.

By moving the VM gas check from CheckTx to DeliverTx, we are able to reduce the load on the mempool of a node and allow transactions to propagate through the network faster.

In the future, we may use a predicted median value instead of the exact value from transaction execution for efficiency.

What's Next:

Add a minimum gas price flag and configuration for node operation.
Provide a user-friendly fee input interface, offering 'gas-wanted' and 'gas price' as alternatives to the current 'gas-wanted' and 'gas-fee' inputs.
Tune the gas factor based on VM CPU and Memory Profiling. The current factor is 1:1 between gas and VM CPU cycles and memory allocation.

codecov · 2023-12-11T11:23:06Z

Codecov Report

Attention: Patch coverage is 48.86364% with 45 lines in your changes are missing coverage. Please review.

Project coverage is 46.27%. Comparing base (0ba95bf) to head (0677d5d).

Files	Patch %	Lines
gno.land/pkg/sdk/vm/keeper.go	45.56%	42 Missing and 1 partial ⚠️
gno.land/pkg/sdk/vm/handler.go	33.33%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1430      +/-   ##
==========================================
+ Coverage   46.25%   46.27%   +0.02%     
==========================================
  Files         483      483              
  Lines       68837    68851      +14     
==========================================
+ Hits        31838    31861      +23     
+ Misses      34368    34357      -11     
- Partials     2631     2633       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gno.land/pkg/sdk/vm/ante.go

gno.land/pkg/sdk/vm/gas.go

gno.land/pkg/sdk/vm/keeper.go

tbruyelle · 2023-12-11T14:26:49Z

A very minimal change for such a big improvement, that is impressive!

MichaelFrazzy

The anteHandler will be super helpful, thanks for taking the time to knock all this out @piux2. Really amazing work.

To fill you in on where we're at for the mechanics (would love your feedback/thoughts here, maybe a meeting soon before sims start):

Currently it seems like we'll largely be mirroring EIP 1559. So as you probably already know in that case... we'd have a target block size, actual block size, max amount gas can change per block, and real block size threshold. If the threshold is 200%, then when the actual block size is 200% larger than the target then gas would increase by the max amount, 12.5% in ETH's case.

From there we can feed the resulting gas value into a final fee equation that also takes into account CPU cycles and/or bytes required for a specific transaction. To account for the resources needed we'd need to assign a fixed value to each byte stored and each CPU cycle to multiply by the required amounts. CPU cycles would be hardware specific though, right?

I also still like the idea of a logistic or expo curve that kicks in after a certain level, as a circuit breaker spam protection method.

At the same time the resource parameters present a game-able situation where people can create less efficient/optimized code to earn additional fees. But considering these submissions will often undergo manual review early on and even with us planning to give a % of all fees earned to the realm's builder, we should have some counter measures to play with.

MichaelFrazzy

Handful of small nitpicks, nothing major

gno.land/pkg/sdk/vm/ante_test.go

gno.land/pkg/sdk/vm/ante.go

gno.land/pkg/sdk/vm/ante_test.go

gno.land/pkg/sdk/vm/gas.go

gno.land/pkg/sdk/vm/keeper.go

gno.land/pkg/sdk/vm/ante.go

gno.land/pkg/sdk/vm/keeper.go

deelawn

Just left a couple comments on style and consolidating code, but the overall approach seems good to me. Great job 🎉

Co-authored-by: Thomas Bruyelle <[email protected]>

thehowl

Me & Ray went through the PR to discuss changes and optimisations. From Ray's benchmarking, it seems that a considerable amount of the lost optimisations are due to the overflow checks which are now in place in the gas package.

To address the benchmarking issues, I see two future improvements which can be made:

Changing gnolang/overflow to use the OF/CF flags instead, checking for overflow directly on hardware rather than with a software solution. This is tracked in gnolang/overflow: optimise package for common architectures #1844.
Eventually moving to having one single system to track resource consumption, rather than the double system of Gas meter + MaxCycles. (For now, it makes sense to keep the latter as a "dumb fallback".)

Another thing to note, is that the keeper code is painfully repetitive, but it's not this PR's fault. Some refactor should aim to try to make it not repeat itself, while trying not to harm readability.

After resolving outstanding comments, I'd say good to merge.

jaekwon · 2024-03-27T23:19:22Z

still reviewing plz hold off on merging just yet.

gno.land/pkg/sdk/vm/gas_test.go

gno.land/pkg/sdk/vm/handler.go

gno.land/pkg/sdk/vm/keeper.go

gnovm/pkg/gnolang/machine.go

jaekwon · 2024-03-28T01:43:44Z

it seems that a considerable amount of the lost optimisations are due to the overflow checks which are now in place in the gas package.

Are we sure about this?
Try benchmarking with overflow.*() usage removed.
It could be something else, like, maybe if there are too much logic in the switch statement (in the machine for the opcode) it becomes slower... subtle things like that. If this is the case, then it might make sense to first look up the CPU cost per opcode, and then incrCPU() once before the switch statement... (I have no idea if this is true, I'm just spitballing here; my point is fine tuning optimizations like this can be very tricky, and sometimes you have to look at the resulting pseudo-assembly itself to understand what is happening).

Another example of something tricky... struct fields that are later in field order take longer to access. (this is actually true, or was true when I was first developing GNO). If a struct field is used often, such as the gas meter, it is better to declare it higher, rather than how it is how, where it is among the last of fields.

Another fact is that interface methods take much longer to call than concrete implementation methods. We probably want to use the tm2.GasMeter interface, but read the next point:

If it really is the call to incrCPU that is slow, there is something else we can do... we can batch increment CPUs... e.g. run 10, 20, or 100 GNO OpCode operations, accumulate the CPU cycles used, and then call incrCPU just once with the accumulated figure. We just need to remember to call incrCPU at the end of Machine.Run() before the batch call to incrCPU happens again for any remainder CPU cycles. But I would only do this once we are absolutely certain that it is the call to gasMeter that is slow.

We could implement a concrete BatchGasMeter and have the Machine hold a reference to *BatchGasMeter rather than an interface, and BatchGasMeter could "flush" the batch to the underlying tm2.GasMeter every 10/20/100 runs; this way the more expensive call to an interface can be amortized. And then we can call BatchGasMeter.Flush() to flush any remainders to the underlying tm2.GasMeter upon completion of Machine.Run. <-- probably will help.

But how much would this actually help in performance?

piux2 · 2024-04-02T09:18:23Z

it seems that a considerable amount of the lost optimisations are due to the overflow checks which are now in place in the gas package.

Are we sure about this? Try benchmarking with overflow.*() usage removed. It could be something else, like, maybe if there are too much logic in the switch statement (in the machine for the opcode) it becomes slower... subtle things like that. If this is the case, then it might make sense to first look up the CPU cost per opcode, and then incrCPU() once before the switch statement... (I have no idea if this is true, I'm just spitballing here; my point is fine tuning optimizations like this can be very tricky, and sometimes you have to look at the resulting pseudo-assembly itself to understand what is happening).

Another example of something tricky... struct fields that are later in field order take longer to access. (this is actually true, or was true when I was first developing GNO). If a struct field is used often, such as the gas meter, it is better to declare it higher, rather than how it is how, where it is among the last of fields.

Another fact is that interface methods take much longer to call than concrete implementation methods. We probably want to use the tm2.GasMeter interface, but read the next point:

If it really is the call to incrCPU that is slow, there is something else we can do... we can batch increment CPUs... e.g. run 10, 20, or 100 GNO OpCode operations, accumulate the CPU cycles used, and then call incrCPU just once with the accumulated figure. We just need to remember to call incrCPU at the end of Machine.Run() before the batch call to incrCPU happens again for any remainder CPU cycles. But I would only do this once we are absolutely certain that it is the call to gasMeter that is slow.

We could implement a concrete BatchGasMeter and have the Machine hold a reference to *BatchGasMeter rather than an interface, and BatchGasMeter could "flush" the batch to the underlying tm2.GasMeter every 10/20/100 runs; this way the more expensive call to an interface can be amortized. And then we can call BatchGasMeter.Flush() to flush any remainders to the underlying tm2.GasMeter upon completion of Machine.Run. <-- probably will help.

But how much would this actually help in performance?

The phrase 'a considerable amount of' is based solely on limited benchmarking of the incrCPU test. The overflow is accounted for 27% of the gas consumption logic in incrCPU.

The benchmarking method and results discussed here focus on a Gno contract, emphasizing nested for-range loops and two-dimensional slice access, without considering I/O or any other factors. More information can be found here: https://github.com/gnolang/gno/blob/master/gnovm/pkg/gnolang/benchdata/matrix.gno

After conducting several more rounds of benchmark tests using the same method, there was a 7% increase observed, instead of the previously mentioned 20%. All of this increase can be attributed to the incrCPU() function. Of this 7%, 27% is from overflow checking, and the remainder is from the gas consumption function.

ns/op	master	fix_gasmeter	fix_gasmeter.GasConsumption.NoOverFlowCheck	fix_gasmeter.NoGasConsumption
BenchmarkBenchdata/fib.gno_param:4-8	6019	6350	6321	6013
BenchmarkBenchdata/fib.gno_param:8-8	43752	46529	46045	43868
BenchmarkBenchdata/fib.gno_param:16-8	2087269	2220203	2188264	2084855
BenchmarkBenchdata/loop.gno-8	156.4	173.1	170.2	156.4
BenchmarkBenchdata/matrix.gno_param:3-8	132421	140119	138349	130790
BenchmarkBenchdata/matrix.gno_param:4-8	369017	393032	387500	366743
BenchmarkBenchdata/matrix.gno_param:5-8	1347503	1440471	1418249	1340457
BenchmarkBenchdata/matrix.gno_param:6-8	6890003	7380312	7248520	6880262
		7.12%	5.20%	-0.14%

All of these are good points for us to consider during performance optimizations. How about we tackle performance by prioritizing IO > VM > GasMeter after we complete the basic functionalities and fixing critical issues?

consolidate vm gas consumption with the rest

ba5cac7

piux2 requested review from jaekwon, moul and a team as code owners December 11, 2023 11:18

github-actions bot added the 📦 🌐 tendermint v2 Issues or PRs tm2 related label Dec 11, 2023

github-actions bot assigned piux2 Dec 11, 2023

github-actions bot added the 📦 ⛰️ gno.land Issues or PRs gno.land package related label Dec 11, 2023

piux2 changed the title ~~consolidate vm gas consumption with the rest~~ fix: consolidate vm gas consumption Dec 11, 2023

tbruyelle reviewed Dec 11, 2023

View reviewed changes

gno.land/pkg/sdk/vm/ante.go Outdated Show resolved Hide resolved

tbruyelle reviewed Dec 11, 2023

View reviewed changes

gno.land/pkg/sdk/vm/ante.go Outdated Show resolved Hide resolved

tbruyelle reviewed Dec 11, 2023

View reviewed changes

gno.land/pkg/sdk/vm/gas.go Outdated Show resolved Hide resolved

tbruyelle reviewed Dec 11, 2023

View reviewed changes

gno.land/pkg/sdk/vm/gas.go Outdated Show resolved Hide resolved

tbruyelle reviewed Dec 11, 2023

View reviewed changes

gno.land/pkg/sdk/vm/gas.go Outdated Show resolved Hide resolved

tbruyelle reviewed Dec 11, 2023

View reviewed changes

gno.land/pkg/sdk/vm/keeper.go Outdated Show resolved Hide resolved

tbruyelle reviewed Dec 11, 2023

View reviewed changes

gno.land/pkg/sdk/vm/keeper.go Outdated Show resolved Hide resolved

zivkovicmilos requested review from MichaelFrazzy and zivkovicmilos December 11, 2023 15:52

MichaelFrazzy approved these changes Dec 11, 2023

View reviewed changes

fix defer() in vm keeper

1d513ce

MichaelFrazzy suggested changes Dec 12, 2023

View reviewed changes

michelleellen mentioned this pull request Dec 13, 2023

Minutes: Public Dev Calls [every Wednesday] gnolang/meetings#37

Open

deelawn reviewed Dec 13, 2023

View reviewed changes

gno.land/pkg/sdk/vm/ante.go Outdated Show resolved Hide resolved

deelawn reviewed Dec 13, 2023

View reviewed changes

gno.land/pkg/sdk/vm/keeper.go Outdated Show resolved Hide resolved

deelawn approved these changes Dec 13, 2023

View reviewed changes

piux2 and others added 2 commits December 13, 2023 10:06

Update gno.land/pkg/sdk/vm/gas.go

25e4fdf

Co-authored-by: Thomas Bruyelle <[email protected]>

Update gno.land/pkg/sdk/vm/ante.go

ce5c061

Co-authored-by: Thomas Bruyelle <[email protected]>

adr-sk mentioned this pull request Dec 20, 2023

META: GnoSwap gnolang/hackerspace#49

Open

6 tasks

moul mentioned this pull request Jan 9, 2024

Determine the gas consumption of opcodes #1281

Open

zivkovicmilos added this to the 🏗4️⃣ test4.gno.land milestone Mar 25, 2024

piux2 mentioned this pull request Mar 26, 2024

msg run command's main package source code are hidden #1843

Open

thehowl approved these changes Mar 26, 2024

View reviewed changes

thehowl mentioned this pull request Mar 26, 2024

gnolang/overflow: optimise package for common architectures #1844

Open

thehowl approved these changes Mar 26, 2024

View reviewed changes

jaekwon reviewed Mar 27, 2024

View reviewed changes

gno.land/pkg/sdk/vm/gas_test.go Outdated Show resolved Hide resolved

jaekwon reviewed Mar 27, 2024

View reviewed changes

gno.land/pkg/sdk/vm/gas_test.go Outdated Show resolved Hide resolved