Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hotfix] Fix GPU reg bug. bad indent #883

Merged
merged 94 commits into from
Aug 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
92fcfd7
added cuda solver
camfairchild Apr 6, 2022
a97fbf3
boost versions to fix pip error
camfairchild May 17, 2022
61b21fc
allow choosing device id
camfairchild May 17, 2022
a55d554
fix solution check to use keccak
camfairchild May 17, 2022
752913d
adds params for cuda and dev_id to register
camfairchild May 18, 2022
99a9f45
list devices by name during selection
camfairchild May 18, 2022
e6adbba
add block number logging
camfairchild May 18, 2022
b788fb4
fix calculation of hashrate
camfairchild Jun 2, 2022
14e1293
fix update interval default
camfairchild Jun 2, 2022
17d945f
add --TPB arg to register
camfairchild Jun 3, 2022
32ffc0c
add update_interval flag
camfairchild Jun 3, 2022
7723972
switch back to old looping/work structure
camfairchild Jun 20, 2022
6166829
change typing
camfairchild Jun 21, 2022
66c2365
device count is a function
camfairchild Jun 24, 2022
816bcd7
stop early if wallet registered
camfairchild Jul 11, 2022
e55fc61
add update interval and num proc flag
camfairchild Jul 25, 2022
9b4637d
add better number output
camfairchild Jul 25, 2022
4befcb5
optimize multiproc cpu reg
camfairchild Jul 25, 2022
e963b4b
fix test
camfairchild Jul 25, 2022
5a707c2
Merge branch 'feature/cpu_register_faster' into feature/cuda_solver
camfairchild Jul 25, 2022
e324d9e
change import to cubit
camfairchild Aug 4, 2022
2224427
fix import and default
camfairchild Aug 4, 2022
d124a86
up default
camfairchild Aug 4, 2022
7f7c043
add comments about params
camfairchild Aug 4, 2022
91e613e
fix config var access
camfairchild Aug 5, 2022
be3a7af
add cubit as extra
camfairchild Aug 5, 2022
42e528a
handle stale pow differently
camfairchild Aug 9, 2022
6166a55
Merge remote-tracking branch 'origin/nobunaga' into feature/cuda_solver
camfairchild Aug 10, 2022
4288c3a
[feature] cpu register faster (#854)
Aug 11, 2022
dc9e7f1
Merge branch 'nobunaga' into feature/cuda_solver
Aug 11, 2022
1c49e4c
restrict number of processes for integration test
camfairchild Aug 11, 2022
34a82e0
fix stale check
camfairchild Aug 11, 2022
bfb1662
use wallet.is_registered instead
camfairchild Aug 11, 2022
5263574
attempt to fix test issue
camfairchild Aug 11, 2022
c6b2ca9
fix my test
camfairchild Aug 11, 2022
1035498
oops typo
camfairchild Aug 11, 2022
d38c6d3
typo again ugh
camfairchild Aug 11, 2022
e9845ce
remove print out
camfairchild Aug 11, 2022
010f859
fix partly reg test
camfairchild Aug 11, 2022
6a95ada
fix if solution None
camfairchild Aug 11, 2022
378b241
fix test?
camfairchild Aug 11, 2022
47ef70c
fix patch
camfairchild Aug 11, 2022
4629b82
[hotfix] fix flags for multiproc register limit (#876)
Aug 12, 2022
3ade241
Merge remote-tracking branch 'origin/master' into nobunaga
Eugene-hu Aug 12, 2022
a9f5040
Merge remote-tracking branch 'origin/nobunaga' into feature/cuda_solver
camfairchild Aug 14, 2022
3d018a7
add args for cuda to subtensor
camfairchild Aug 14, 2022
5951f7e
add cuda args to reregister call
camfairchild Aug 14, 2022
78ba033
add to wallet register the cuda args
camfairchild Aug 14, 2022
405aeba
fix refs and tests
camfairchild Aug 14, 2022
49a04fb
add for val test also
camfairchild Aug 14, 2022
cefc1dd
fix tests with rereg
camfairchild Aug 14, 2022
224719c
Fix/diff unpack bit shift (#878)
Aug 15, 2022
0dfbcd6
Merge remote-tracking branch 'origin/nobunaga' into feature/cuda_solver
camfairchild Aug 15, 2022
6be528b
fix patch for tests
camfairchild Aug 15, 2022
1feec11
add mock_register to subtensor passed instead
camfairchild Aug 15, 2022
94417bd
move register under the check for isregistered
camfairchild Aug 15, 2022
c911475
use patch obj instead
camfairchild Aug 15, 2022
16e66ae
fit patch object
camfairchild Aug 15, 2022
c546d3f
[Feature] [cubit] CUDA registration solver (#868)
Aug 15, 2022
fb6be7c
Fix/move overview args to cli (#867)
Aug 15, 2022
fdb77ec
Merge remote-tracking branch 'origin/nobunaga' into feature/cuda_solver
camfairchild Aug 15, 2022
e633f44
fix prompt
camfairchild Aug 15, 2022
850cb14
remove unneeded if
camfairchild Aug 15, 2022
8ed2b0c
modify POW submit to use rolling submit again
camfairchild Aug 15, 2022
65e001d
add backoff to block get from network
camfairchild Aug 15, 2022
13b08c7
add test for backoff get block
camfairchild Aug 15, 2022
98196ff
suppress the dev id flag if not set
camfairchild Aug 15, 2022
c829809
remove dest so it uses first arg
camfairchild Aug 15, 2022
dcaae9f
fix pow submit loop
camfairchild Aug 15, 2022
3d534b3
move registration status with
camfairchild Aug 15, 2022
25814da
fix max attempts check
camfairchild Aug 15, 2022
2f21bc2
remove status in subtensor.register
camfairchild Aug 15, 2022
321eda9
add submit status
camfairchild Aug 15, 2022
40b4648
change to neuron get instead
camfairchild Aug 16, 2022
c1da384
fix count
camfairchild Aug 16, 2022
782e244
try to patch live display
camfairchild Aug 16, 2022
d0bca31
fix patch
camfairchild Aug 16, 2022
75d3a9d
.
camfairchild Aug 16, 2022
364fa89
separate test cases
camfairchild Aug 16, 2022
fdae5ba
add POWNotStale and tests
camfairchild Aug 16, 2022
7c5a6d0
add more test cases for block get with retry
camfairchild Aug 16, 2022
e4f90c7
fix return to None
camfairchild Aug 16, 2022
0f9b8be
fix arg order
camfairchild Aug 16, 2022
458e7df
Merge remote-tracking branch 'origin/master' into feature/cuda_solver
camfairchild Aug 17, 2022
986e7a0
fix indent
camfairchild Aug 17, 2022
edf635c
add test to verify solution is submitted
camfairchild Aug 17, 2022
5b4a13b
fix mock call
camfairchild Aug 17, 2022
497edeb
patch hex bytes instead
camfairchild Aug 17, 2022
85c8ae4
typo :/
camfairchild Aug 17, 2022
0b066f9
fix print out for unstake
camfairchild Aug 17, 2022
dd77820
fix indexing into mock call
camfairchild Aug 17, 2022
751d9a2
call indexing
camfairchild Aug 17, 2022
a2339d7
access dict not with dot
camfairchild Aug 17, 2022
a9d4f51
fix other indent
camfairchild Aug 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion bittensor/_cli/cli_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ def unstake( self ):
if not self.config.no_prompt:
if not Confirm.ask("Do you want to unstake from the following keys:\n" + \
"".join([
f" [bold white]- {wallet.hotkey_str}: {amount.tao}𝜏[/bold white]\n" for wallet, amount in zip(final_wallets, final_amounts)
f" [bold white]- {wallet.hotkey_str}: {amount}𝜏[/bold white]\n" for wallet, amount in zip(final_wallets, final_amounts)
])
):
return None
Expand Down
131 changes: 66 additions & 65 deletions bittensor/_subtensor/subtensor_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -501,73 +501,74 @@ def register (
else:
pow_result = bittensor.utils.create_pow( self, wallet, num_processes=num_processes, update_interval=update_interval)

# pow failed
if not pow_result:
# might be registered already
if (wallet.is_registered( self )):
bittensor.__console__.print(":white_heavy_check_mark: [green]Registered[/green]")
return True

# pow successful, proceed to submit pow to chain for registration
else:
with bittensor.__console__.status(":satellite: Submitting POW..."):
# check if pow result is still valid
while bittensor.utils.POWNotStale(self, pow_result):
with self.substrate as substrate:
# create extrinsic call
call = substrate.compose_call(
call_module='SubtensorModule',
call_function='register',
call_params={
'block_number': pow_result['block_number'],
'nonce': pow_result['nonce'],
'work': bittensor.utils.hex_bytes_to_u8_list( pow_result['work'] ),
'hotkey': wallet.hotkey.ss58_address,
'coldkey': wallet.coldkeypub.ss58_address
}
)
extrinsic = substrate.create_signed_extrinsic( call = call, keypair = wallet.hotkey )
response = substrate.submit_extrinsic( extrinsic, wait_for_inclusion=wait_for_inclusion, wait_for_finalization=wait_for_finalization )

# We only wait here if we expect finalization.
if not wait_for_finalization and not wait_for_inclusion:
bittensor.__console__.print(":white_heavy_check_mark: [green]Sent[/green]")
# pow failed
if not pow_result:
# might be registered already
if (wallet.is_registered( self )):
bittensor.__console__.print(":white_heavy_check_mark: [green]Registered[/green]")
return True

# pow successful, proceed to submit pow to chain for registration
else:
with bittensor.__console__.status(":satellite: Submitting POW..."):
# check if pow result is still valid
while bittensor.utils.POWNotStale(self, pow_result):
with self.substrate as substrate:
# create extrinsic call
call = substrate.compose_call(
call_module='SubtensorModule',
call_function='register',
call_params={
'block_number': pow_result['block_number'],
'nonce': pow_result['nonce'],
'work': bittensor.utils.hex_bytes_to_u8_list( pow_result['work'] ),
'hotkey': wallet.hotkey.ss58_address,
'coldkey': wallet.coldkeypub.ss58_address
}
)
extrinsic = substrate.create_signed_extrinsic( call = call, keypair = wallet.hotkey )
response = substrate.submit_extrinsic( extrinsic, wait_for_inclusion=wait_for_inclusion, wait_for_finalization=wait_for_finalization )

# We only wait here if we expect finalization.
if not wait_for_finalization and not wait_for_inclusion:
bittensor.__console__.print(":white_heavy_check_mark: [green]Sent[/green]")
return True

# process if registration successful, try again if pow is still valid
response.process_events()
if not response.is_success:
if 'key is already registered' in response.error_message:
# Error meant that the key is already registered.
bittensor.__console__.print(":white_heavy_check_mark: [green]Already Registered[/green]")
return True

bittensor.__console__.print(":cross_mark: [red]Failed[/red]: error:{}".format(response.error_message))
time.sleep(0.5)

# Successful registration, final check for neuron and pubkey
else:
bittensor.__console__.print(":satellite: Checking Balance...")
neuron = self.neuron_for_pubkey( wallet.hotkey.ss58_address )
if not neuron.is_null:
bittensor.__console__.print(":white_heavy_check_mark: [green]Registered[/green]")
return True

# process if registration successful, try again if pow is still valid
response.process_events()
if not response.is_success:
if 'key is already registered' in response.error_message:
# Error meant that the key is already registered.
bittensor.__console__.print(":white_heavy_check_mark: [green]Already Registered[/green]")
return True

bittensor.__console__.print(":cross_mark: [red]Failed[/red]: error:{}".format(response.error_message))
time.sleep(0.5)

# Successful registration, final check for neuron and pubkey
else:
bittensor.__console__.print(":satellite: Checking Balance...")
neuron = self.neuron_for_pubkey( wallet.hotkey.ss58_address )
if not neuron.is_null:
bittensor.__console__.print(":white_heavy_check_mark: [green]Registered[/green]")
return True
else:
# neuron not found, try again
bittensor.__console__.print(":cross_mark: [red]Unknown error. Neuron not found.[/red]")
continue
else:
# Exited loop because pow is no longer valid.
bittensor.__console__.print( "[red]POW is stale.[/red]" )
return False
if attempts < max_allowed_attempts:
#Failed registration, retry pow
attempts += 1
bittensor.__console__.print( ":satellite: Failed registration, retrying pow ...({}/{})".format(attempts, max_allowed_attempts))
else:
# Failed to register after max attempts.
bittensor.__console__.print( "[red]No more attempts.[/red]" )
return False
# neuron not found, try again
bittensor.__console__.print(":cross_mark: [red]Unknown error. Neuron not found.[/red]")
continue
else:
# Exited loop because pow is no longer valid.
bittensor.__console__.print( "[red]POW is stale.[/red]" )
return False

if attempts < max_allowed_attempts:
#Failed registration, retry pow
attempts += 1
bittensor.__console__.print( ":satellite: Failed registration, retrying pow ...({}/{})".format(attempts, max_allowed_attempts))
else:
# Failed to register after max attempts.
bittensor.__console__.print( "[red]No more attempts.[/red]" )
return False

def serve (
self,
Expand Down
52 changes: 52 additions & 0 deletions tests/unit_tests/bittensor_tests/utils/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import random
import torch
import multiprocessing
from types import SimpleNamespace

from sys import platform
from substrateinterface.base import Keypair
Expand Down Expand Up @@ -346,6 +347,57 @@ def test_pow_not_stale_diff_block_number_too_old(self):

assert not bittensor.utils.POWNotStale(mock_subtensor, mock_solution)

def test_pow_called_for_cuda():
class MockException(Exception):
pass
mock_compose_call = MagicMock(side_effect=MockException)

mock_subtensor = bittensor.subtensor(_mock=True)
mock_subtensor.neuron_for_pubkey=MagicMock(is_null=True)
mock_subtensor.substrate = MagicMock(
__enter__= MagicMock(return_value=MagicMock(
compose_call=mock_compose_call
)),
__exit__ = MagicMock(return_value=None),
)

mock_wallet = SimpleNamespace(
hotkey=SimpleNamespace(
ss58_address=''
),
coldkeypub=SimpleNamespace(
ss58_address=''
)
)

mock_result = {
"block_number": 1,
'nonce': random.randint(0, pow(2, 32)),
'work': b'\x00' * 64,
}

with patch('bittensor.utils.POWNotStale', return_value=True) as mock_pow_not_stale:
with patch('torch.cuda.is_available', return_value=True) as mock_cuda_available:
with patch('bittensor.utils.create_pow', return_value=mock_result) as mock_create_pow:
with patch('bittensor.utils.hex_bytes_to_u8_list', return_value=b''):

# Should exit early
with pytest.raises(MockException):
mock_subtensor.register(mock_wallet, cuda=True, prompt=False)

mock_pow_not_stale.assert_called_once()
mock_create_pow.assert_called_once()
mock_cuda_available.assert_called_once()

call0 = mock_pow_not_stale.call_args
assert call0[0][0] == mock_subtensor
assert call0[0][1] == mock_result

mock_compose_call.assert_called_once()
call1 = mock_compose_call.call_args
assert call1[1]['call_function'] == 'register'
call_params = call1[1]['call_params']
assert call_params['nonce'] == mock_result['nonce']

if __name__ == "__main__":
test_solve_for_difficulty_fast_registered_already()