Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update EIP-4788: initial stab at v2 #7456

Merged
merged 19 commits into from
Aug 24, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 149 additions & 58 deletions EIPS/eip-4788.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,20 @@
eip: 4788
title: Beacon block root in the EVM
description: Expose beacon chain roots in the EVM
author: Alex Stokes (@ralexstokes), Ansgar Dietrichs (@adietrichs), Danny Ryan (@djrtwo)
author: Alex Stokes (@ralexstokes), Ansgar Dietrichs (@adietrichs), Danny Ryan (@djrtwo), lightclient (@lightclient)
discussions-to: https://ethereum-magicians.org/t/eip-4788-beacon-root-in-evm/8281
status: Draft
type: Standards Track
category: Core
created: 2022-02-10
requires: 1559
---

## Abstract

Commit to the hash tree root of each beacon chain block in the corresponding execution payload header.

Store each of these roots in a stateful precompile.
Store each of these roots in a smart contract.

## Motivation

Expand All @@ -25,18 +26,19 @@ restaking constructions, smart contract bridges, MEV mitigations and more.

## Specification

| constants | value | units
|--- |--- |---
| constants | value |
|--- |--- |
| `FORK_TIMESTAMP` | TBD |
| `HISTORY_STORAGE_ADDRESS` | `Bytes20(0xB)` |
| `G_beacon_root` | 4200 | gas
| `HISTORICAL_ROOTS_MODULUS` | 98304 |
| `HISTORICAL_ROOTS_MODULUS` | `98304` |
| `SYSTEM_ADDRESS` | `0xfffffffffffffffffffffffffffffffffffffffe` |
| `BEACON_ROOTS_ADDRESS` | `0x502E02F5d91024A9AF0aB81fbF0a47Eb99a013aE` |

### Background

The high-level idea is that each execution block contains the parent beacon block root. Even in the event of missed slots since the previous block root does not change,
The high-level idea is that each execution block contains the parent beacon block's root. Even in the event of missed slots since the previous block root does not change,
we only need a constant amount of space to represent this "oracle" in each execution block. To improve the usability of this oracle, a small history of block roots
are stored in a stateful precompile.
are stored in the contract.

To bound the amount of storage this construction consumes, a ring buffer is used that mirrors a block root accumulator on the consensus layer.

### Block structure and validity
Expand Down Expand Up @@ -77,76 +79,165 @@ When verifying a block, execution clients **MUST** ensure the root value in the

For a genesis block with no existing parent beacon block root the 32 zero bytes are used as a root placeholder.

### EVM changes
#### Beacon roots contract

#### Block processing
The beacon roots contract has two operations: `get` and `set`. The input itself is not used to determine which function to execute, for that the result of `caller` is used. If `caller` is equal to `SYSTEM_ADDRESS` then the operation to perform is `set`. Otherwise, `get`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, funny. seems okay though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somewhat seems to imply that this contract has an ABI (which it has not, at least not in current assembly). I would really like to do the ABI-like approach as in lightclient/4788asm#5 since this will also be very helpful for solidity users (you can now BeaconRootContract(address).get(timest).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a bad idea to enshrine solidity ABI behavior. It is ubiquitous today, but in the future I expect other language will have different calling conventions. Solidity should directly support this contract like they do for ecrecover.


At the start of processing any execution block where `block.timestamp >= FORK_TIMESTAMP` (i.e. before processing any transactions),
write the parent beacon root provided in the block header into the storage of the contract at `HISTORY_STORAGE_ADDRESS`.
##### `get`

In order to bound the storage used by this precompile, two ring buffers are used: one to track the latest timestamp at a given index in the ring buffer and another to track
the latest root at a given index.
* If `caller` is equal to `SYSTEM_ADDRESS`, the contract must revert.
lightclient marked this conversation as resolved.
Show resolved Hide resolved
* Callers provide the `timestamp` they are querying encoded as 32 bytes in big-endian format.
* If the input is not exactly 32 bytes, the contract must revert.
* Given `timestamp`, the contract computes the storage index in which the timestamp is stored by computing the modulo `timestamp % HISTORICAL_ROOTS_MODULUS` and reads the value.
* If the `timestamp` does not match, the contract must revert.
* Finally, the beacon root associated with the timestamp is accessed at returned to the user. It is stored at `timestamp % HISTORICAL_ROOTS_MODULUS + HISTORICAL_ROOTS_MODULUS`.
lightclient marked this conversation as resolved.
Show resolved Hide resolved

To derive the index `timestamp_index` into the timestamp ring buffer, the timestamp (a 64-bit unsigned integer value) is reduced modulo `HISTORICAL_ROOTS_MODULUS`.
To derive the index `root_index` into the root ring buffer, add `HISTORICAL_ROOTS_MODULUS` to the index into the timestamp ring buffer.
Both resulting 64-bit unsigned integers should be encoded as 32 bytes in big-endian format when writing to the storage.
##### `set`

The timestamp from the header, encoded as 32 bytes in big-endian format, is the value to write behind the `timestamp_index`.
The 32 bytes of the `parent_beacon_block_root` (as provided) are the value to write behind the `root_index`.
* If `caller` is not equal to `SYSTEM_ADDRESS`, the contract must revert.
lightclient marked this conversation as resolved.
Show resolved Hide resolved
* Caller provides the parent beacon block root as calldata to the contract.
* Set the storage value at `header.timestamp % HISTORICAL_ROOTS_MODULUS` to be `header.timestamp`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of want to make the case for avoiding a system-TX and just setting these storage values at the top of the block. Then the code is just get and we don't have to test/specify how to do system-TXs

weaker held opinion than the deploy method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're going to use a standard contract we should embrace it and allow it to have storing functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote this already in discord, posting here too for visibility.

With the spec being 'system call', it means a reference client can just do a simple call. A client implementor can still choose not to, and instead bypass the whole call and update the values directly.

If the eip standardizes 'direct update', however, and omits the system-update path from the contract, then we remove all optionality, and force more special-case code into clients

Therefore I think the 'system call' approach is the best way forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the EIP should mention that "direct update" is a valid/possible client implementation/optimization detail.

Copy link
Contributor

@g11tech g11tech Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system-tx would be a good generic functionality to have in long term imo

* Set the storage value at `header.timestamp % HISTORICAL_ROOTS_MODULUS + HISTORICAL_ROOTS_MODULUS` to be `calldata[0:32]`

In Python pseudocode:
##### Pseudocode

```python
timestamp_reduced = block_header.timestamp % HISTORICAL_ROOTS_MODULUS
timestamp_extended = timestamp_reduced + HISTORICAL_ROOTS_MODULUS
timestamp_index = to_uint256_be(timestamp_reduced)
root_index = to_uint256_be(timestamp_extended)
if evm.caller == SYSTEM_ADDRESS:
set()
else:
get()

timestamp_as_uint256 = to_uint256_be(block_header.timestamp)
parent_beacon_block_root = block_header.parent_beacon_block_root
def get():
if len(evm.calldata) != 32:
evm.revert()

sstore(HISTORY_STORAGE_ADDRESS, timestamp_index, timestamp_as_uint256)
sstore(HISTORY_STORAGE_ADDRESS, root_index, parent_beacon_block_root)
```
timestamp_idx = to_uint256_be(timestamp) % HISTORICAL_ROOTS_MODULUS
timestamp = storage.get(timestamp_idx)

#### New stateful precompile
if timestamp != evm.calldata:
evm.revert()

Beginning at the execution timestamp `FORK_TIMESTAMP`, a "stateful" precompile is deployed at `HISTORY_STORAGE_ADDRESS`.
root_idx = timestamp_idx + HISTORICAL_ROOTS_MODULUS
root = storage.get(root_idx)

evm.return(root)

Callers of the precompile should provide the `timestamp` they are querying encoded as 32 bytes in big-endian format.
Clients **MUST** sanitize this input call data to the precompile.
If the input is _more_ than 32 bytes, the precompile only takes the first 32 bytes of the input buffer and ignores the rest.
If the input is _less_ than 32 bytes, the precompile should revert.
def set():
timestamp_idx = to_uint256_be(evm.timestamp) % HISTORICAL_ROOTS_MODULUS
root_idx = timestamp_idx + HISTORICAL_ROOTS_MODULUS

Given this input, the precompile reduces the `timestamp` in the same way during the write routine and first checks if
the `timestamp` recorded in the ring buffer matches the one supplied by the caller.
storage.set(timestamp_idx, evm.timestamp)
storage.set(root_idx, evm.calldata)
```

If the `timestamp` **does NOT** match, the client **MUST** return the "zero" word -- the 32-byte value where each byte is `0x00`.
##### Bytecode

The exact initcode to deploy is shared below.

```asm
push1 0x5a
dup1
push1 0x09
push0
codecopy
push0
return

caller
push20 0xfffffffffffffffffffffffffffffffffffffffe
eq
push1 0x42
jumpi

push1 0x20
calldatasize
eq
push1 0x24
jumpi

push0
push0
revert

jumpdest
push3 0x018000
push0
calldataload
mod
dup1
sload
push0
calldataload
eq
iszero
push1 0x3d
jumpi

push3 0x018000
add
sload
push0
mstore

jumpdest
push1 0x20
push0
return

jumpdest
timestamp
push3 0x018000
timestamp
mod
sstore
push0
calldataload
push3 0x018000
timestamp
mod
push3 0x018000
add
sstore
stop
```

If the `timestamp` **does** match, the client **MUST** read the root from the contract storage and return those 32 bytes in the caller's return buffer.
#### Deployment

The beacon roots contract is deployed like any other smart contract. A special synthetic address is generated
by working backwards from the desired deployment transaction:
lightclient marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"type": "0x2",
"chainId": "0x1",
"nonce": "0x0",
"to": null,
"gas": "0xd4f8",
"gasPrice": null,
"maxPriorityFeePerGas": "0x9c7652400",
"maxFeePerGas": "0xe8d4a51000",
lightclient marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do these gas values come from?

to echo what others have said, it seems fragile to enshrine a "synthetic transaction"

easier to just drop byte code a la EIP-1011

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just arbitrary values that would be included on mainnet. It isn't really enshrining the tx. I would think more of it as two steps: 1) deploy contract 2) set address in clients to call as part of pre block processing. A synthetic tx is just one way of deploying the contract. I think it is best because it is clear, transparent, and permissionless. We could just as easily have someone hand deploy the contract before the fork (now even, if the byte code is ready) and then set the address in client configurations.

"value": "0x0",
"input": "0x605a8060095f395ff33373fffffffffffffffffffffffffffffffffffffffe14604257602036146024575f5ffd5b620180005f350680545f351415603d576201800001545f525b60205ff35b42620180004206555f3562018000420662018000015500",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might put many of these values behind a VAR -- e.g. input and the stubbed r, s

"accessList": [],
"v": "0x0",
"r": "0x539",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this method with a fake signature and stuff...

Can we not just drop bytecode X at address Y at the fork?
e.g. like 1011 proposed -- https://eips.ethereum.org/EIPS/eip-1011#deploying-casper-contract

I think it's more straight forward to just place it exactly where we want it, rather than trying to use a TX of sorts but without gas or signature verification which requires a lot more exceptional logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this style of contract creation is not tied to any specific initcode like create2 is, the synthetic address is cryptographically bound to the input data of the transaction (e.g. the initcode).

and then we don't have to think about things like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than trying to use a TX of sorts but without gas or signature verification which requires a lot more exceptional logic

To be clear, this tx does still abide by gas and signature verification. The signature is constructed arbitrarily and therefore the sender is a one-time sender address (pk is not known).

--

In an effort to minimize protocol baggage, I think it is better to deploy the contract using standard methods. Once we are satisfied with the bytecode we'd like to deploy, it could be deployed by anyone. We simply have to agree on the address at which it is deployed. It doesn't have to be a synthetic tx, although I do slightly prefer this method as it is permissionless (anyone can fund the account and submit the tx).

"s": "0x1337",
lightclient marked this conversation as resolved.
Show resolved Hide resolved
"hash": "0x8ecfe5753922d27aa737597d946f638a15b7e3b5f74fef9ef8cf1b510a1af1cc"
}
```

In pseudocode:
The sender of the transaction can be calculated as `0x01d0610058aC7AEF1887d8877ee7f04B7645Dc95`. The address of the first contract deployed from the account is `rlp([sender, 0])` which equals `0x502E02F5d91024A9AF0aB81fbF0a47Eb99a013aE`. Although this style of contract creation is not tied to any specific initcode like create2 is, the synthetic address is cryptographically bound to the input data of the transaction (e.g. the initcode).

```python
timestamp = evm.calldata[:32]
if len(timestamp) != 32:
evm.revert()
return
### Block processing

timestamp_reduced = to_uint64_be(timestamp) % HISTORICAL_ROOTS_MODULUS
timestamp_index = to_uint256_be(timestamp_reduced)
At the start of processing any execution block where `block.timestamp >= FORK_TIMESTAMP` (i.e. before processing any transactions), call `BEACON_ROOTS_ADDRESS` as `SYSTEM_ADDRESS` with the 32-byte input of `header.parent_beacon_block_root`. This will trigger the `set()` routine of the beacon roots contract. This is a system operation and therefore:

recorded_timestamp = sload(HISTORY_STORAGE_ADDRESS, timestamp_index)
if recorded_timestamp != timestamp:
evm.returndata[:32].set(uint256(0))
else:
timestamp_extended = timestamp_reduced + HISTORICAL_ROOTS_MODULUS
root_index = to_uint256_be(timestamp_extended)
root = sload(HISTORY_STORAGE_ADDRESS, root_index)
evm.returndata[:32].set(root)
```
* the call must execute to completion, therefore the available gas can be considered as infinite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?
I would rather specify some hard limit, like 30M gas, as suggested by @yperbasis . I suspect most actual implementation will need to set a limit anyway.

* the call does not count against the block's gas limit
* the call does not follow the [EIP-1559](./eip-1559.md) burn semantics - no value should be transferred as part of the call
* if no code exists at `BEACON_ROOTS_ADDRESS`, the call must fail silently
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the prior section specifies when to deploy the bytecode, this bullet will become unnecessary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in other thread, I don't think it is important to specify the deployment. But I suppose the bullet is unnecessary because that is the semantics of an evm call anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet is unnecessary, other than as clarification: any non-value call to account with non-existing code has no discernible effect on state.
If you prefix the bullet with Note: or Clarification:, it becomes apparent that this bullet does not intend to add any behavioural changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holiman But shouldn't it then rather say:

  • if no code exists at BEACON_ROOTS_ADDRESS, the call must succeed as with any other account without code

I find "fail" a bit misleading here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree: fail or succeed depends on your point of view. Whole bullet is moot and should not exist


The precompile costs `G_beacon_root` gas to reflect the two (2) implicit `SLOAD`s from the precompile's state.
Client may decide to omit an explicit EVM call and directly set the storage values.
lightclient marked this conversation as resolved.
Show resolved Hide resolved
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we have to decide whether we want to do the EVM call or to directly set the storage values. I understand that when we set the value directly and the the account is empty we do remove the account at the end of the block, but there are the edge case where someone puts some eth into the account, which means it is not empty anymore.
I would prefer to set the values directly, as that is 10x faster and we do not require all clients to implement the system operation.

Copy link
Contributor

@holiman holiman Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we have to decide whether we want to do the EVM call

The consensus-definition is that an EVM call is made. A client implementor may choose to optimize this, and do a direct update instead. This choice may cause incompatibilities in non-mainnet scenarios, e.g. if the code is replaced with different code.

I understand that when we set the value directly and the the account is empty we do remove the account at the end of the block, but there are the edge case where someone puts some eth into the account, which means it is not empty anymore.

No, there is now code on the account, so it would not be removed, since it is not empty, according to the definition of empty.

I would prefer to set the values directly, as that is 10x faster and we do not require all clients to implement the system operation.

You are at liberty to do so, since "we do not require all clients to implement the system operation" -- however, the consensus-correct behaviour, in any given scenario, is determined by how the system-call would execute

IMO this is a pretty good middle ground.


## Rationale

Expand Down