-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validated EVM Contracts #2348
Validated EVM Contracts #2348
Changes from all commits
2f7183f
cbefa57
92229a7
3f50542
3cf50f0
9897c89
fb6031d
2594fc3
4b2aecf
d8451c6
4f5290e
b96f47a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,273 @@ | ||
--- | ||
eip: 2348 | ||
title: Validated EVM Contracts | ||
author: Danno Ferrin (@shemnon) | ||
discussions-to: https://ethereum-magicians.org/t/eip-2348-validated-evm-contracts/3756 | ||
status: Draft | ||
type: Standards Track | ||
category: Core | ||
created: 2019-11-01 | ||
requires: 1702, 2327 | ||
--- | ||
|
||
## Simple Summary | ||
|
||
Make minor changes to EVM contract layout and add validation rules to a subset of those contracts. | ||
|
||
## Abstract | ||
|
||
A set of contract markers and validation rules relating to those markers is proposed. These | ||
validation rules enable forwards compatible evolution of EVM contracts and provide some assurances | ||
to Ethereum clients allowing them to disable some runtime verification steps by moving these | ||
validations to the deployment phase. | ||
|
||
## Motivation | ||
|
||
There are two major motivations: first the need to make the EVM easier to evolve, and the second is | ||
to provide validations that allow clients to optimize their EVM execution. | ||
|
||
First there is the issue of an evolvable EVM. With the current state of EVM contracts literally any | ||
sequence of bytes can be deployed to the blockchain. Some tools take advantage of this situation and | ||
add meta-data to the end of their contract deployment. The real impact is that this precludes the | ||
addition of new multi-byte instructions (such as the `PUSHn` series) because the new instructions | ||
could hide a previously valid `JUMPDEST` when evaluated as a new opcode set. To prevent this account | ||
versioning will be used so that contracts can be deployed in a way that is demonstrably validated. | ||
|
||
Second there is the issue of improving runtime execution. One example is `JUMPDEST` evaluation. | ||
Because each jump must "land" on a jump dest each client needs to validate that the dest is a valid | ||
opcode location. Clients either need to do the analysis and store the values or re-evaluate the | ||
contract on each execution. Stronger deployment validation will allow clients to presume jump calls | ||
are valid in certain circumstances. | ||
|
||
A tertiary motivation is to prepare the way for easily JITable contracts. While the current EVM can | ||
be JIT compiled there are certain analyses that need to be performed to prevent or accommodate some | ||
pathological or uncompilable cases from being compiled. With stricter rules these cases can be | ||
detected at deploy time and rejected allowing EVM clients to make better assumptions about the | ||
contract being compiled. | ||
|
||
## Specification | ||
|
||
There are three interlocking portions specified in this EIP and two portions from other active EIPs | ||
included in this validation. [EIP-1702] (Generalized Account Versioning Scheme) and [EIP-2327] | ||
(`BEGINDATA` opcode) are specified in their published locations. The portions specified in this EIP | ||
are a versioning header (similar to what was in [EIP-1707]), invalid opcode validation (similar to | ||
[EIP-1712]), and static jump analysis. | ||
|
||
### EVM Account Versioning | ||
|
||
Starting at `BLOCKNUM` (TBD) `EIP-1702` will be activated, `LATEST_VERSION` will be set to `1`, and | ||
all new and updated accounts will have the account version `1`. The validation phase will apply the | ||
rules described in the Version Header, `BEGINDATA`, Invalid Opcode Validation, and Static Jump | ||
Validations sections. | ||
|
||
These EIP sections applies to contracts stored or in the process of being stored in in accounts with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in in accounts => in accounts |
||
version `1`. This EIP never applies to contracts stored or in the process of being stored in | ||
accounts at version `0`. For initcode being executed for `CREATE` and `CREATE2` operations this | ||
applies if the contract invoking the opcode is version `1`. If the calling contract was stored in an | ||
account with version `0` this EIP does not apply. | ||
|
||
Future EIPs may increase the set of contract versions this EIP applies to. | ||
|
||
### Version Header | ||
|
||
For contracts with the first byte is not `0xef`, or whose total length is less than 4 bytes, the | ||
contract is treated exactly as through it had been deployed to an account with version `0`. For | ||
these contracts none of the other subsections in this EIP apply. | ||
|
||
When deploying a contract if a contract starts with `0xef` and has a length 4 or later the first | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. later => larger, greater |
||
four bytes form a version header. If a version header is not recognized by the EVM the contract | ||
deployment transaction fails with out-of-gas. | ||
|
||
When executing a contract with a header the execution should start at `PC=4`, corresponding to the | ||
first byte of the contract that is not part of the headers. | ||
|
||
EVM implementations could model this as a 4 byte no-op no-gas operation that can only occur at the | ||
zeroth index of a contract. However they would need to take care that the byte `0xef` would be | ||
invalid if it occurred in the code segment at any location other than the zeroth byte. | ||
|
||
For this EIP the header byte sequence [`0xef`, `0x65`, `0x76`, `0x6d`] is defined (corresponding to | ||
the ISO/IEC 8859 part 1 string `'ïevm'`) is specified. This version indicates that next set of | ||
validations are applied to the content of the contract, keeping all other semantics of the current | ||
"version 0" EVM contracts, including the same gas schedule. | ||
|
||
Future EIPs may expand on the valid set of headers. No other header sequences are defined in this | ||
EIP. | ||
|
||
### `BEGINDATA` operation | ||
|
||
As described in [EIP-2327] a new opcode `BEGINDATA` (`0xb6`) is added that indicates the remainder | ||
of the contract should not be considered executable code. | ||
|
||
If the EVM attempts to execute the `BEGINDATA` operation it should be treated as attempting to | ||
execute an invalid operation. Similarly jumping into any location after the `BEGINDATA` operation is | ||
an invalid operation, even if the byte jumped to corresponds to the `JUMPDEST` opcode. | ||
|
||
### Code Segment Size Limit | ||
|
||
With the introduction of the `BEGINDATA` opcode the contract can now be conceptually split into a | ||
code segment ad a data segment. The code segments corresponds to all the bytes prior to and | ||
including the `BEGINDATA` opcode or the entire contract if no `BEGINDATA` opcode is present. All | ||
other data after the code segment is referred to as the data segment. If there is no `BEGINDATA` | ||
operation there are no bytes in the data segment. | ||
|
||
In [EIP 170](https://eips.ethereum.org/EIPS/eip-170) a contract code size limit was introduced. All | ||
code segment data, including the header bytes and `BEGINDATA` operation (if present) must be equal | ||
to or less than the chain's specified contract code size limit, which is currently 24KiB for | ||
mainnet. | ||
|
||
For contract creation transactions, and the return of `CREATE`, and `CREATE2` operations this limit | ||
is already enforced for the entire size of the contract, including code and data segments. For the | ||
initialization code for a `CREATE` or `CREATE2` operation there is no specified limit, so the | ||
separate enforcement of the code segment length will need to be enforced in those instances. The | ||
combined code and data segment size for init code in `CREATE` and `CREATE2` operations is out of | ||
scope for this EIP. | ||
|
||
### Invalid Opcode Validation | ||
|
||
All data between the Version Header and either the `BEGINDATA` marker or the end of the contract if | ||
`BEGINDATA` is not present must represent a valid EVM program at all points of the data. Invalid | ||
opcode validation consists of the following process: | ||
|
||
- Iterate over the code bytes starting after the header bytes one by one. | ||
- If the code byte is a multi-byte operation, skip the appropriate number of bytes and continue. | ||
- If the code byte is a valid opcode or the designated invalid instruction (`0xfe`), continue. | ||
- If the code byte is the `BEGINDATA` operation (`0xb6`) stop iterating and consider the contract | ||
valid. | ||
- If more bytes than the contract code size limit would be validated the contract is invalid and | ||
the operation fails. | ||
- Otherwise, the contract is invalid and the operation fails. | ||
|
||
As of the Istanbul upgrade all of the multi-byte operations are the `PUSHn` series of operations | ||
from `0x60` to `0x7f`. Future upgrades may add more multi-byte operations. | ||
|
||
As of the Istanbul upgrade the invalid opcodes are `0x0c` to `0x0f`, `0x1e`, `0x1f`, `0x21` to | ||
`0x2f`, `0x46` to `0x4f`, `0x5c` to `0x5f`, `0xa5` to `0xaf`, `0xb3` to `0xef`, `0xf6` to `0xf9`, | ||
`0xfb`, `0xfc`, and `0xfe`. Future upgrades will remove items from this list. Note that `0xb6` is | ||
referenced in this spec as the `BEGINDATA` marker, but is not part of any deployed upgrade. Also | ||
note that `0xfe` would remain as a reserved 'invalid instruction' that will still be permitted. | ||
|
||
### Static Jump Validations | ||
|
||
For every jump operation preceded by a `PUSHn` instruction the value of the data pushed on to the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about jump operations not preceded by a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In that case we have to use data flow analysis to determine if the argument to BTW, compilers only generate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say we should not use this relatively weak heuristics as a part of new validation rules. It is better to implement subroutines (which eliminates the most common source of dynamic jumps - which is return from the subroutine), and then the actual static jumps, and disable the dynamic jumps all together. Then we can remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using subroutines is way better for validating the contract, but it is not infeasible to validate a contract without static jumps. Symbolic execution is still able to figure out which jump is dynamic and hence report it. |
||
stack by the `PUSHn` operation must point to a valid `JUMPDEST` operation. If this validation fails | ||
then the contract creation fails with out-of-gas. | ||
|
||
As of the Istanbul upgrade the jump operations are `JUMP` (`0x56`) and `JUMPI` (`0x57`). Future | ||
upgrades may add more jump operations. | ||
|
||
As a client optimization this check may be performed during invalid opcode validation, or it may be | ||
performed separately at contract deployment time. | ||
|
||
## Rationale | ||
|
||
The choice for the first byte of the header as `0xef` was first recommended in | ||
[issue 154](https://github.com/ethereum/EIPs/issues/154) of the EIP repository. It also maps to an | ||
unused opcode in the version 0 spec and packs next to the `0xf0` series of call instructions, and | ||
the `evm` part was to mirror what WASM has done. Choosing `0x00` as the first byte as it could be | ||
confused with a nonsensical, but correct contract that starts with STOP and the next operation is | ||
PUSH5 if lowercase e was selected, or `STOP` `GASLIMIT` `JUMP` `<invalid 0x4d>` if capital letters | ||
were used. A header that was always invalid in the prior EVM specs was seen as desirable. | ||
|
||
The first major validation is the invalid opcode removal. In the case where a contract has an | ||
invalid opcode that later becomes a multi-byte opcode followed by a `JUMPDEST` marker that contract | ||
would become invalid after an upgrade because the destination marker would become part of the new | ||
multi-byte instruction, as described in the [EIP-663 discussion]. If no invalid opcodes can be | ||
deployed then the possibility of the `JUMPDEST` being absorbed by new multi-byte instructions is | ||
eliminated. | ||
|
||
One complication is that current versions of solidity append the swarm hash of the source code of | ||
the contract in some instances to the end of the generated EVM bytecode. That is what motivated the | ||
addition of the `BEGINDATA` opcode. Solidity can add a fairly simple wrapper function to it's | ||
existing EVM generation. This option was chosen for its simplicity over other options such as | ||
encoding the data in uncalled `PUSNn` instructions. | ||
|
||
`JUMPDEST` validation is present to eliminate repeated validation calls for contracts and to reduce | ||
the needed data storage requirements for cached validation. For example, if a client notices a | ||
contract contains only static jumps it could store a cached validation flag that no jump analysis | ||
needs to be performed, alternately they could defer the analysis until the first dynamic jump is | ||
encountered. | ||
|
||
## Backwards Compatibility | ||
|
||
Almost all existing contract deployments will be able to be deployed with no client changes. The one | ||
exception is contract deployments that start with `0x00`. This should have no impact on existing | ||
contract execution because any contract with a `0x00` in the first position would immediately halt | ||
because `0x00` maps to the `STOP` instruction, the utility and value of those contracts is minimal | ||
at best. If this is not desirable a different header signaling byte that does not map to an existing | ||
opcode (such as `0xEF`) can be used. | ||
|
||
Except for the validation rules and versioning header all other semantics of the EVM are the same. | ||
Gas schedules and opcode tables would be the same between account versions and whether or not the | ||
contract was deployed with headers. Future EIPs may add opcodes that are only valid with a contract | ||
that is deployed with a version header. Because of the version header validation rules multi-byte | ||
contracts can be deployed. | ||
|
||
Existing compilers (such as solidity) can provide support for headers by prepending their output | ||
stream with `0xef`, `0x65`, `0x76`, `0x6d` and appending `0xb6` prior to any non-code data added as | ||
part of the contract. | ||
|
||
## Forwards Compatibility | ||
|
||
This spec provides forward compatibility in at least two ways. | ||
|
||
First, the content of multi byte and jump dest validated opcodes can be increased in future | ||
upgrades. Contracts that would be valid under new rules would be rejected under old rules, and all | ||
older contracts would still be valid under the new rules. Any newly deployed opcodes would be | ||
disabled unless the code is appropriately validated. | ||
|
||
Second, the versioning header can be extended to allow for stricter validations in future upgrades | ||
while keeping the EVM evaluation semantics the same. Such possible stricter validations could | ||
include prohibiting dynamic jumps. | ||
|
||
## Test Cases | ||
|
||
This is an incomplete list, but provides insight as to the scope of the required testing. Each test | ||
would need to be written 3 times, once for normal contract deployment, once for `CREATE`, and once | ||
again for `CREATE2`. | ||
|
||
- Positive | ||
- no header and invalid opcodes | ||
- including the case where a `JUMPDEST` gets consumed by a proposed multi-byte operation | ||
- no header and all valid opcodes | ||
- includes static jump to invalid destination | ||
- header and all valid opcodes | ||
- includes static jump to valid destination | ||
- header, all valid opcodes, and `BEGINDATA` | ||
- header, all valid opcodes, `BEGINDATA`, and invalid opcodes in data | ||
- three byte program, starts with zero | ||
- four bytes program, header only | ||
- header and begin data only | ||
- validated code in `CREATE` an `CREATE2` init code with proper code segment size and total size | ||
greater than the code segment limit | ||
- Negative | ||
- contract with otherwise valid program that starts with zero, 5 bytes or more | ||
- contract with header and invalid opcodes | ||
- contract with header, begin data, and invalid opcodes in the middle | ||
- contract with header, and static jump to bad place | ||
- contract with unrecognized header | ||
- contract with a static jump into code in `BEGINDATA` | ||
- contract with a static jump outside of all data | ||
- header, and contract code+header to large by less than 4 bytes | ||
- header, and contract code+header to large by more than 4 bytes | ||
- header, contract code, begin data, data, and the whole thing is too large | ||
- one test for each invalid opcode: no header, with header, and with header and `BEGINDATA` | ||
- code segment size violations | ||
- In a contract creation transaction | ||
- In `CREATE` and `CREATE2` init code | ||
- In `CREATE` and `CREATE2` created contracts | ||
|
||
## Implementation | ||
|
||
No implementation yet. | ||
|
||
## Copyright | ||
|
||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). | ||
|
||
[eip-615]: https://eips.ethereum.org/EIPS/eip-615 | ||
[eip-1702]: https://eips.ethereum.org/EIPS/eip-1702 | ||
[eip-1707]: https://github.com/ethereum/EIPs/pull/1707 | ||
[eip-1712]: https://github.com/ethereum/EIPs/pull/1712 | ||
[eip-2327]: https://github.com/ethereum/EIPs/pull/2327 | ||
[eip-663 discussion]: | ||
https://ethereum-magicians.org/t/eip-663-unlimited-swap-and-dup-instructions/3346/11?u=shemnon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two major and one minor?