Betanet release at March 11 2020 #2247

ailisp · 2020-03-11T22:53:26Z

This is the version currently running at rpc.devnet.nearprotcol.com

applayer test good
nightly isn't ready, so won't be tested for this release

* Enable floats but prohibit some CPU architectures * Merge branch 'staging' into enable_floats * Merge refs/heads/staging into enable_floats * Avoid overflowing u32 during contract preparation (#1946) * Update runtime/near-vm-runner/src/runner.rs Co-Authored-By: Evgeny Kuzyakov <[email protected]> * Merge branch 'staging' into enable_floats * Nit Co-authored-by: Maksym Zavershynskyi <[email protected]>

* Move sysinfo fix from staging

* Introduce keccak256 and keccak512 native support * modify genesis Co-authored-by: Bowen Wang <[email protected]>

* Create code-of-conduct.md * Move code of conduct to proper name Co-authored-by: Illia Polosukhin <[email protected]>

* Fix #2067: querying contract state * Change format to {key, value, proof} for state view. Proofs are empty for now

Instead of dumping state in a json file which is slow, this PR changes it to dumping state in a binary format. Fixes #2070. Test plan --------- Manually test this with a local node to make sure that the state is preserved correctly after state dumping.

* Add json state dump for debugging * state_dump.json * fix

Fixes: #2183 # Test plan: - Extracted the code into utils crate and added unit tests.

- Revert binary state dump - #1922 for separating config and records - scripts/new-genesis-from-existing-state.sh that dump state, calculate new genesis hash, upload to s3 - testnet genesis records separate from near binary - download testnet genesis from s3 in python start_testnet - check genesis hash when run testnet Co-authored-by: Evgeny Kuzyakov <[email protected]> Co-authored-by: Bo Yao <[email protected]> Test Plans ------------- - it can `near init --chain-id=testnet --genesis-config near/res/testnet_genesis_config.json --genesis-records <records download from s3> --genesis-hash <expected-hash>` to initialize testnet config from external genesis records - When `near run`, with incorrect `~/.near/genesis_hash` it will panic - It can start testnet with updated start_testnet.py, which download genesis_records from s3 - After stop a node, we can call `state-viewer dump_genesis` to dump genesis_records, config, and genesis_hash. - If there was no state change happen since genesis block, load the dump genesis_records/config will give you a same genesis_hash, and dump again generate same genesis_records, config & genesis_hash.

@Kouprin

Some of our tests don't have actix running. @Kouprin found when assert in such tests failed it's not able to see which test fails, only `thread panicked while panicking`. So check actix is running and only shutdown it if so fix this. Test Plan --------- ``` #[test] fn test_assert() { init_stop_on_panic(); assert(false); } ``` should give assert fails instead of `thread panicked while panicking`

If an invalid chunk made it into block the invalid incoming receipts might be forwarded to different shards. Ideally the chunk should be challenged and the block is reverted, but before this happens we need to handle invalid receipts in the Runtime. Another possibility is to create invalid receipts in the state directly and then create a challenge on this invalid state. So any field in the state can potentially contain invalid value. So if an invalid receipt is present in the delayed receipts state, we consider it a StorageError. Fixes: #1850 # Test plan - Added 2 tests to cover handling of invalid receipts in the Runtime - Filed an issue to handle error in NightshadeRuntime #2152

* `block_production.py` started failing because the blocks are now produced faster, and by the first time it checked the heights the heights exceeded 2, causing poorly stated assert to trigger * `staking*.py` were never adjusted for yocto-near * `state_sync*.py` both header sync and state sync were broken Header sync: we were cleaning up mapping from heights to headers during GC, which is necessary for header sync. State sync: we changed the state sync to always happen at the epoch boundary, but did it in a wrong way: the hash we stored locally was pre-pushing to the epoch boundary, and thus the node was rejecting any incoming state responces * `state_sync1.py` was also failing because it was relying on one out of two validating nodes being able to produce blocks, which it cannot, because of Doomslug * `skip_epoch.py` was expecting one validator to be producing blocks, which with doomslug requires such validator to have more than half the stake. Also was not updated for yocto-near * `one_val.py` similarly was expecting one validator to be producing blocks, and thus needed that validator stake bumped to worh with DS * `lightclnt.py` with doomslug blocks are produced faster, and by the time the test started a few blocks were already produced. Making the test expect such behavior * `block_sync.py` was failing because Doomslug requires `max_block_production_delay` to be at least 2x `min_block_production_delay`. Also disabling `network_stress`, because the current runner crashes trying to launch it and skips all the consecutive tests Test Plan --------- All the above python tests pass.

Merge stable into master

Similar to #1985 but now we changed back to log to stderr Test Plans ------------- Run pytest locally, now it's no longer have `AssertionError: node dirs: 0 num_nodes: 4 num_observers: 0` error

…tly failures (#2204) There were several issues in the test infra: 1. The peer info in the client test infra was the largest height and the largest score ever observed. If a block with a higher score but lower height than the previous tip was created, it would report incorrect peer info, and peers would attempt header sync believing the peer has higher header head height, and such header sync would fail. 2. The tests tamper with the FG, and the last final block could be way more than 5 epochs in the past. That makes creating light client blocks potentially require blocks from 5 epoch lengths ago. I'm just making all nodes in cross_shard_tx archival. In practice if one epoch has been lasting for five epoch lengths, we have bigger problems. 3. We historically see cross_shard_tx tests fail with `InvalidBlockHeight` error when a block is more than epoch length ahead of the previous block. Since that check is a heuristic anyway, I'm doubling the distance, to reduce the flakiness of the test. Separately, increasing the timeouts for NFG tests, they take more than 15 minutes. Also bumping timeouts for the `test_all_chunks_accepted_1000*` tests, it's clear that they need at least 2000 / 4000 / 1000 seconds to complete, I set the timeouts to 3600 / 7200 / 1800 for some extra room. Also the one that requires 7200 (`*_slow`) seems to provide no value compared to the base test, and is the slowest test in our entire suite, so I completely disable it. Separately, fixing the issue with state sync tests, where the transition to state sync happens before the log tracker is initialized, and the check for the transition in the log later fails Slightly bumping block production time in `test_catchup_sanity_blocks_produced`, it works on local machine, but on the gcloud runner doesn't keep up. Test plan --------- All cross_shard_tx* tests passed at least three runs. If they are flaky, nightly will catch that.

…rds via RPC Resolves #2007 and #2025 We needed to make Near config query-able through node RPC. Specifically, one of our clients wanted to know how many blocks remain until an account is going to be evicted due to rent. This information can be derived from the account balance and the config. In this commit, two new RPC endpoints are exposed: EXPERIMENTAL_genesis_config and EXPERIMENTAL_genesis_records. Learn more in PR #2109. # Test plan Added tests to query the endpoints with a happy path and also invalid parameters.

Many timeout tweaks, and small typos. `cross_shard_tx_with_validator_rotation` with 150 block time has extremely long forks (doomslug is disabled), and takes lots of time per iteration. I split it into two: one with 150ms block time, but only 16 iterations, and one with 400ms block time, but all 64 iterations. Both locally take ~45 minutes, will see how long it takes on gcloud.

Fixing the following issues: 1. `ThreadNode` was not properly setting its state on kill 2. `ThreadNode` doesn't properly free up its port, but instead of figuring out why, I just replaced it with `ProcessNode` in the tests that are affected 3. `test_4_20_kill1` wasn't accounting for fees. Disabling fees. 4. In the same test, the same chunk producer was always mapped to the same block producer, and thus killing the second node was making the 0-th chunk producer (who happens to be attached to the 2nd BP) to not be able to have their transactions included. Address it by having 17 seats. Also split the test in two, with one shard and with two shards 5. In multiple places we were using the wrong node to get the access key Test plan --------- Locally `test_4_20_kill1` and `test_*_multiple_nodes` pass, let's see how the next nightly looks

Disabling old tests that fail due to the runtime cache unti they are fixed; Changing the `cross_shard_tx_with_validator_rotation` slightly based on its performance on gcloud. (increasing the block prod time speeds up test, because it results in fewer forks, so the 150->200 change is to make more iterations fit. With 150ms it fits ~6 iterations into one hour)

The actual pipeline definition is saved on buildkite ui, this way, we got it shared between stable, beta and master. There'll be some master only builds (nightly release) Test Plans -------------- Buildkite CI should pass

Remove doctest that was running incorrectly.

The test was not updated after some config parameters were renamed. Also because of #2195, tx status of lost transactions times out, so adding a workaround into the test for now Disabling the version of stress that messes up with network, because the nightly runner is currently not configured to support the utility I use to stop network between processes Couple other changes: 1. Made `node_restart` worker not restart the node if no blocks were produced in the meantime. It is needed because with only two nodes after a long restart doomslug can take a while to recover (it is equivalent to half the network shutting down and restarting after some delay). Block production worker will fail the test if the block production actually stalls 2. For the same reason increased the tolerated delays for block production 3. Limited how many txs are sent per iteration of tx worker, since due to #2195 it takes one second to query one transaction, and if the test finished in the middle of querying, the allowed one minute for workers to stop is not sufficient for the tx worker. 4. Also generally increasing the allowance from 1m to 2m at the end, since the tx worker at the end of the test might take some time before it even starts querying the transaction outcomes

* Try setting up GitPod * Try pre-building nearcore in Docker image * Remove rustup command (nightly should already be available) * Update location for nearcore prebuild * Fix the way cargo is executed in Dockerfile * Do cargo test as well when building docker image

* Fix fishermen unstake * fix kickout set * add test

Equivalent coverage and docker image release from gitlab ci. Except release is a docker image release and will also add s3 release in future, since gcloud storage requires google login, which is inconvenient for used in scripts Test Plans -------------- coverage only in master, beta, stable branch and docker release only in master branch (For test purpose also this cov-release branch but will be removed). badge updated. docker release in beta/stable branch will be added but not ready now as tests in beta/stable is more strict

* Fix validator kickout set * fix stake change

Some fixes, refactoring and comments. Test plan --------- Run existing tests Add more asserts

Further investigation of `stress.py` failures in nightly shows that all the workers appear to work as expected, but frequent restarts cause block production to be delayed more than the current tolerance. Locally spradically also more than 50% of transactions get lost if the node get restarted too frequently. Increasing both tolerances. Also making the prints unbuffered, otherwise several workers in the nightly runs don't flush their outputs. Finally, fixing a typo in the nightly.txt Test plan --------- Locally `stress.py` doesn't fail, so there's no easy way to test whether the new tolerances would be sufficient. It also doesn't completely fix all the known issues, there are some other failures in stress.py that I haven't gotten to yet

Add doc test runs in CI Test Plan ------------ Should see doc test log at begining of other tests

Fixes #2226 The current behavior of a view call is to prohibit state changes. This change moves check from the end of the state viewer to VMLogic by prohibiting state changes functions during view calls. ## Test plan: - Fixed state change tests. Previously the test didn't verify error type. The error was `MethodNotFound` and also alice account was wrong. - Added unit tests for the new prohibited methods.

* Bump Borsh version * Nit * Bump Borsh versions * Bump borsh more

scripts for binary release Test Plan ------------ Download uploaded binary in a few popular linux vm and see if they works

@mfornet

Two people run 100 node together is often blocked by firewall rules limit and static ip address limit (The recent fail by @mfornet is this case). Made they do not reserve ip address and use organization global firewall rules would fix this. Test Plans -------------- The same setting has been proved to work in create devnet nodes.

Resolves: #2034 and #2048 The changes RPC API was broken implementation-wise (#2048), and design-wise (#2034). This version has meaningful API design for all the exposed data, and it is also tested better. This PR is massive since, initially, we missed quite a point of exposing deserialized internal data (like account info, access key used to be returned as a Borsh-serialized blob, which is useless for the API user as they don't have easy access to the schema of those structures). ## Test plan I did not succeed in writing Rust tests as we used a mocked runtime there. Thus, I had extended end-to-end tests (pytest) with: * Test for account changes on account creation * Test for access key changes on account creation and access key removal * Test for code changes * Test for several transactions on the same block/chunk

gitpod-io · 2020-03-11T22:53:32Z

codecov · 2020-03-11T23:01:50Z

Codecov Report

Merging #2247 into beta will decrease coverage by 0.59%.
The diff coverage is 78.33%.

@@           Coverage Diff           @@
##            beta   #2247     +/-   ##
=======================================
- Coverage   87.2%   86.6%   -0.6%     
=======================================
  Files        183     184      +1     
  Lines      34977   35260    +283     
=======================================
+ Hits       30502   30538     +36     
- Misses      4475    4722    +247

Impacted Files	Coverage Δ
core/primitives/src/block.rs	`96.33% <ø> (+0.22%)`	⬆️
core/primitives/src/telemetry.rs	`100% <ø> (ø)`	⬆️
core/primitives/src/merkle.rs	`100% <ø> (ø)`	⬆️
core/primitives/src/transaction.rs	`81.81% <ø> (ø)`	⬆️
test-utils/testlib/src/node/mod.rs	`100% <ø> (ø)`	⬆️
core/primitives/src/challenge.rs	`85.71% <ø> (ø)`	⬆️
core/primitives/src/lib.rs	`100% <ø> (ø)`	⬆️
core/primitives/src/sharding.rs	`99.47% <ø> (ø)`	⬆️
chain/chain/src/doomslug.rs	`99.52% <ø> (ø)`	⬆️
core/primitives/src/network.rs	`91.83% <ø> (ø)`	⬆️
... and 131 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4a07fa...6d07a2d. Read the comment docs.

k06a and others added 30 commits February 4, 2020 08:22

Move sysinfo fix from staging (#2085)

15753f4

* Move sysinfo fix from staging

Introduce keccak256 and keccak512 native support (#2072)

d68cad0

* Introduce keccak256 and keccak512 native support * modify genesis Co-authored-by: Bowen Wang <[email protected]>

Create code-of-conduct.md (#1932)

d8c9f6a

* Create code-of-conduct.md * Move code of conduct to proper name Co-authored-by: Illia Polosukhin <[email protected]>

Ref #2067: querying contract state (#2075)

1c0ff7c

* Fix #2067: querying contract state * Change format to {key, value, proof} for state view. Proofs are empty for now

Add json state dump for debugging (#2120)

1697741

* Add json state dump for debugging * state_dump.json * fix

docs: add links to Issues/Milestones (#2144)

a031e4f

fix(runtime): Fix empty method_names bug (#2188)

29cb252

Fixes: #2183 # Test plan: - Extracted the code into utils crate and added unit tests.

Fix epoch manager bug (#2193)

e951354

fix: Use stderr for logs generated with tracing. (#2198)

319da67

Merge branch 'stable'

d3dac8a

Merge branch 'master' into rescue1

9eb3d21

Merge pull request #2199 from nearprotocol/rescue1

bbf7a25

Merge stable into master

fix: Fix pytest after log to stderr (#2203)

b75367f

Similar to #1985 but now we changed back to log to stderr Test Plans ------------- Run pytest locally, now it's no longer have `AssertionError: node dirs: 0 num_nodes: 4 num_observers: 0` error

Buildkite CI (#2217)

37a5b87

The actual pipeline definition is saved on buildkite ui, this way, we got it shared between stable, beta and master. There'll be some master only builds (nightly release) Test Plans -------------- Buildkite CI should pass

fix(doctest): Refactor comment. (#2219)

746e187

Remove doctest that was running incorrectly.

fix(epoch manager): Fix fishermen unstake (#2212)

5b9b173

* Fix fishermen unstake * fix kickout set * add test

Kouprin and others added 11 commits March 5, 2020 18:36

feat(adversary): adv_disable_doomslug + tests update

466d881

fix(epoch manager): Fix validator kickout set (#2214)

299940d

* Fix validator kickout set * fix stake change

fix(epoch_manager): Some fixes and comments (#2222)

24db970

Some fixes, refactoring and comments. Test plan --------- Run existing tests Add more asserts

Add doc test run in CI (#2230)

d8d3951

Add doc test runs in CI Test Plan ------------ Should see doc test log at begining of other tests

Bump Borsh and runtime version (#2232)

909d10a

* Bump Borsh version * Nit * Bump Borsh versions * Bump borsh more

Add binary release script (#2235)

fd6f478

scripts for binary release Test Plan ------------ Download uploaded binary in a few popular linux vm and see if they works

Skip phantom in genesis serialization (#2238)

8972a6d

ailisp requested review from bowenwang1996, evgenykuzyakov, frol, MaksymZavershynskyi and SkidanovAlex as code owners March 11, 2020 22:53

bowenwang1996 approved these changes Mar 11, 2020

View reviewed changes

ailisp merged commit 2e5f367 into beta Mar 11, 2020

ailisp deleted the betanet-release branch March 11, 2020 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Betanet release at March 11 2020 #2247

Betanet release at March 11 2020 #2247

ailisp commented Mar 11, 2020 •

edited

Loading

gitpod-io bot commented Mar 11, 2020

codecov bot commented Mar 11, 2020 •

edited

Loading

Betanet release at March 11 2020 #2247

Betanet release at March 11 2020 #2247

Conversation

ailisp commented Mar 11, 2020 • edited Loading

gitpod-io bot commented Mar 11, 2020

codecov bot commented Mar 11, 2020 • edited Loading

Codecov Report

ailisp commented Mar 11, 2020 •

edited

Loading

codecov bot commented Mar 11, 2020 •

edited

Loading