Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] [CUDA solver] Add multi-GPU and ask for CUDA during btcli run #893

Merged
merged 105 commits into from
Sep 9, 2022
Merged
Show file tree
Hide file tree
Changes from 95 commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
92fcfd7
added cuda solver
camfairchild Apr 6, 2022
a97fbf3
boost versions to fix pip error
camfairchild May 17, 2022
61b21fc
allow choosing device id
camfairchild May 17, 2022
a55d554
fix solution check to use keccak
camfairchild May 17, 2022
752913d
adds params for cuda and dev_id to register
camfairchild May 18, 2022
99a9f45
list devices by name during selection
camfairchild May 18, 2022
e6adbba
add block number logging
camfairchild May 18, 2022
b788fb4
fix calculation of hashrate
camfairchild Jun 2, 2022
14e1293
fix update interval default
camfairchild Jun 2, 2022
17d945f
add --TPB arg to register
camfairchild Jun 3, 2022
32ffc0c
add update_interval flag
camfairchild Jun 3, 2022
7723972
switch back to old looping/work structure
camfairchild Jun 20, 2022
6166829
change typing
camfairchild Jun 21, 2022
66c2365
device count is a function
camfairchild Jun 24, 2022
816bcd7
stop early if wallet registered
camfairchild Jul 11, 2022
e55fc61
add update interval and num proc flag
camfairchild Jul 25, 2022
9b4637d
add better number output
camfairchild Jul 25, 2022
4befcb5
optimize multiproc cpu reg
camfairchild Jul 25, 2022
e963b4b
fix test
camfairchild Jul 25, 2022
5a707c2
Merge branch 'feature/cpu_register_faster' into feature/cuda_solver
camfairchild Jul 25, 2022
e324d9e
change import to cubit
camfairchild Aug 4, 2022
2224427
fix import and default
camfairchild Aug 4, 2022
d124a86
up default
camfairchild Aug 4, 2022
7f7c043
add comments about params
camfairchild Aug 4, 2022
91e613e
fix config var access
camfairchild Aug 5, 2022
be3a7af
add cubit as extra
camfairchild Aug 5, 2022
42e528a
handle stale pow differently
camfairchild Aug 9, 2022
6166a55
Merge remote-tracking branch 'origin/nobunaga' into feature/cuda_solver
camfairchild Aug 10, 2022
dc9e7f1
Merge branch 'nobunaga' into feature/cuda_solver
Aug 11, 2022
1c49e4c
restrict number of processes for integration test
camfairchild Aug 11, 2022
34a82e0
fix stale check
camfairchild Aug 11, 2022
bfb1662
use wallet.is_registered instead
camfairchild Aug 11, 2022
5263574
attempt to fix test issue
camfairchild Aug 11, 2022
c6b2ca9
fix my test
camfairchild Aug 11, 2022
1035498
oops typo
camfairchild Aug 11, 2022
d38c6d3
typo again ugh
camfairchild Aug 11, 2022
e9845ce
remove print out
camfairchild Aug 11, 2022
010f859
fix partly reg test
camfairchild Aug 11, 2022
6a95ada
fix if solution None
camfairchild Aug 11, 2022
378b241
fix test?
camfairchild Aug 11, 2022
47ef70c
fix patch
camfairchild Aug 11, 2022
a9f5040
Merge remote-tracking branch 'origin/nobunaga' into feature/cuda_solver
camfairchild Aug 14, 2022
3d018a7
add args for cuda to subtensor
camfairchild Aug 14, 2022
5951f7e
add cuda args to reregister call
camfairchild Aug 14, 2022
78ba033
add to wallet register the cuda args
camfairchild Aug 14, 2022
405aeba
fix refs and tests
camfairchild Aug 14, 2022
49a04fb
add for val test also
camfairchild Aug 14, 2022
cefc1dd
fix tests with rereg
camfairchild Aug 14, 2022
0dfbcd6
Merge remote-tracking branch 'origin/nobunaga' into feature/cuda_solver
camfairchild Aug 15, 2022
6be528b
fix patch for tests
camfairchild Aug 15, 2022
1feec11
add mock_register to subtensor passed instead
camfairchild Aug 15, 2022
94417bd
move register under the check for isregistered
camfairchild Aug 15, 2022
c911475
use patch obj instead
camfairchild Aug 15, 2022
16e66ae
fit patch object
camfairchild Aug 15, 2022
fdb77ec
Merge remote-tracking branch 'origin/nobunaga' into feature/cuda_solver
camfairchild Aug 15, 2022
e633f44
fix prompt
camfairchild Aug 15, 2022
850cb14
remove unneeded if
camfairchild Aug 15, 2022
8ed2b0c
modify POW submit to use rolling submit again
camfairchild Aug 15, 2022
65e001d
add backoff to block get from network
camfairchild Aug 15, 2022
13b08c7
add test for backoff get block
camfairchild Aug 15, 2022
98196ff
suppress the dev id flag if not set
camfairchild Aug 15, 2022
c829809
remove dest so it uses first arg
camfairchild Aug 15, 2022
dcaae9f
fix pow submit loop
camfairchild Aug 15, 2022
3d534b3
move registration status with
camfairchild Aug 15, 2022
25814da
fix max attempts check
camfairchild Aug 15, 2022
2f21bc2
remove status in subtensor.register
camfairchild Aug 15, 2022
321eda9
add submit status
camfairchild Aug 15, 2022
40b4648
change to neuron get instead
camfairchild Aug 16, 2022
c1da384
fix count
camfairchild Aug 16, 2022
782e244
try to patch live display
camfairchild Aug 16, 2022
d0bca31
fix patch
camfairchild Aug 16, 2022
75d3a9d
.
camfairchild Aug 16, 2022
364fa89
separate test cases
camfairchild Aug 16, 2022
fdae5ba
add POWNotStale and tests
camfairchild Aug 16, 2022
7c5a6d0
add more test cases for block get with retry
camfairchild Aug 16, 2022
e4f90c7
fix return to None
camfairchild Aug 16, 2022
0f9b8be
fix arg order
camfairchild Aug 16, 2022
458e7df
Merge remote-tracking branch 'origin/master' into feature/cuda_solver
camfairchild Aug 17, 2022
986e7a0
fix indent
camfairchild Aug 17, 2022
edf635c
add test to verify solution is submitted
camfairchild Aug 17, 2022
5b4a13b
fix mock call
camfairchild Aug 17, 2022
497edeb
patch hex bytes instead
camfairchild Aug 17, 2022
85c8ae4
typo :/
camfairchild Aug 17, 2022
0b066f9
fix print out for unstake
camfairchild Aug 17, 2022
dd77820
fix indexing into mock call
camfairchild Aug 17, 2022
751d9a2
call indexing
camfairchild Aug 17, 2022
a2339d7
access dict not with dot
camfairchild Aug 17, 2022
a9d4f51
fix other indent
camfairchild Aug 17, 2022
1592a7b
add CUDAException for cubit
camfairchild Aug 29, 2022
c1bdc8a
up cubit version
camfairchild Aug 29, 2022
ac687d4
[Feature] ask cuda during btcli run (#890)
Aug 30, 2022
fe8ee18
[Feature] [cuda solver] multi gpu (#891)
Aug 31, 2022
ee985e0
Feature/cuda solver multi gpu (#892)
Aug 31, 2022
bea3a04
Merge branch 'nobunaga' into feature/cuda_solver
camfairchild Aug 31, 2022
2b28fd3
Merge branch 'nobunaga' into feature/cuda_solver
Sep 1, 2022
e760c62
continue trying reg after Stale
camfairchild Sep 2, 2022
20e8d2d
Merge branch 'nobunaga' into feature/cuda_solver
Sep 2, 2022
791b166
Merge branch 'nobunaga' into feature/cuda_solver
Sep 2, 2022
85d9a8d
catch for OSX
camfairchild Sep 2, 2022
307af9e
dont use qsize
camfairchild Sep 2, 2022
1117d87
Merge branch 'nobunaga' into feature/cuda_solver
Sep 5, 2022
69aa247
add test for continue after being stale
camfairchild Sep 6, 2022
4adf9fa
Merge branch 'nobunaga' into feature/cuda_solver
Sep 7, 2022
e4f81f6
patch get_nowait instead of qsize
camfairchild Sep 7, 2022
e15ad87
Merge branch 'nobunaga' into feature/cuda_solver
Sep 7, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 47 additions & 22 deletions bittensor/_cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

import bittensor
import torch
from rich.prompt import Confirm, Prompt
from rich.prompt import Confirm, Prompt, PromptBase

from . import cli_impl

Expand Down Expand Up @@ -823,6 +823,36 @@ def check_overview_config( config: 'bittensor.Config' ):
wallet_name = Prompt.ask("Enter wallet name", default = bittensor.defaults.wallet.name)
config.wallet.name = str(wallet_name)

def _check_for_cuda_reg_config( config: 'bittensor.Config' ) -> None:
"""Checks, when CUDA is available, if the user would like to register with their CUDA device."""
if torch.cuda.is_available():
if config.subtensor.register.cuda.get('use_cuda') is None:
# Ask about cuda registration only if a CUDA device is available.
cuda = Confirm.ask("Detected CUDA device, use CUDA for registration?\n")
config.subtensor.register.cuda.use_cuda = cuda

# Only ask about which CUDA device if the user has more than one CUDA device.
if config.subtensor.register.cuda.use_cuda and config.subtensor.register.cuda.get('dev_id') is None and torch.cuda.device_count() > 0:
devices: List[str] = [str(x) for x in range(torch.cuda.device_count())]
device_names: List[str] = [torch.cuda.get_device_name(x) for x in range(torch.cuda.device_count())]
console.print("Available CUDA devices:")
choices_str: str = ""
for i, device in enumerate(devices):
choices_str += (" {}: {}\n".format(device, device_names[i]))
console.print(choices_str)
dev_id = IntListPrompt.ask("Which GPU(s) would you like to use? Please list one, or comma-separated", choices=devices, default='All')
camfairchild marked this conversation as resolved.
Show resolved Hide resolved
if dev_id == 'All':
dev_id = list(range(torch.cuda.device_count()))
else:
try:
# replace the commas with spaces then split over whitespace.,
# then strip the whitespace and convert to ints.
dev_id = [int(dev_id.strip()) for dev_id in dev_id.replace(',', ' ').split()]
except ValueError:
console.error(":cross_mark:[red]Invalid GPU device[/red] [bold white]{}[/bold white]\nAvailable CUDA devices:{}".format(dev_id, choices_str))
sys.exit(1)
config.subtensor.register.cuda.dev_id = dev_id

def check_register_config( config: 'bittensor.Config' ):
if config.subtensor.get('network') == bittensor.defaults.subtensor.network and not config.no_prompt:
config.subtensor.network = Prompt.ask("Enter subtensor network", choices=bittensor.__networks__, default = bittensor.defaults.subtensor.network)
Expand All @@ -835,27 +865,8 @@ def check_register_config( config: 'bittensor.Config' ):
hotkey = Prompt.ask("Enter hotkey name", default = bittensor.defaults.wallet.hotkey)
config.wallet.hotkey = str(hotkey)

if not config.no_prompt and config.subtensor.register.cuda.use_cuda == bittensor.defaults.subtensor.register.cuda.use_cuda:
# Ask about cuda registration only if a CUDA device is available.
if torch.cuda.is_available():
cuda = Confirm.ask("Detected CUDA device, use CUDA for registration?\n")
config.subtensor.register.cuda.use_cuda = cuda
# Only ask about which CUDA device if the user has more than one CUDA device.
if cuda and config.subtensor.register.cuda.get('dev_id') is None and torch.cuda.device_count() > 0:
devices: List[str] = [str(x) for x in range(torch.cuda.device_count())]
device_names: List[str] = [torch.cuda.get_device_name(x) for x in range(torch.cuda.device_count())]
console.print("Available CUDA devices:")
choices_str: str = ""
for i, device in enumerate(devices):
choices_str += (" {}: {}\n".format(device, device_names[i]))
console.print(choices_str)
dev_id = Prompt.ask("Which GPU would you like to use?", choices=devices, default=str(bittensor.defaults.subtensor.register.cuda.dev_id))
try:
dev_id = int(dev_id)
except ValueError:
console.error(":cross_mark:[red]Invalid GPU device[/red] [bold white]{}[/bold white]\nAvailable CUDA devices:{}".format(dev_id, choices_str))
sys.exit(1)
config.subtensor.register.cuda.dev_id = dev_id
if not config.no_prompt:
cli._check_for_cuda_reg_config(config)

def check_new_coldkey_config( config: 'bittensor.Config' ):
if config.wallet.get('name') == bittensor.defaults.wallet.name and not config.no_prompt:
Expand Down Expand Up @@ -931,6 +942,10 @@ def check_run_config( config: 'bittensor.Config' ):
if 'server' in config.model and not config.no_prompt:
synapse = Prompt.ask('Enter synapse', choices = list(bittensor.synapse.__synapses_types__), default = 'All')
config.synapse = synapse

# Don't need to ask about registration if they don't want to reregister the wallet.
if config.wallet.get('reregister', bittensor.defaults.wallet.reregister) and not config.no_prompt:
cli._check_for_cuda_reg_config(config)

def check_help_config( config: 'bittensor.Config'):
if config.model == 'None':
Expand All @@ -941,3 +956,13 @@ def check_update_config( config: 'bittensor.Config'):
if not config.no_prompt:
answer = Prompt.ask('This will update the local bittensor package', choices = ['Y','N'], default = 'Y')
config.answer = answer

class IntListPrompt(PromptBase):
""" Prompt for a list of integers. """

def check_choice( self, value: str ) -> bool:
assert self.choices is not None
# check if value is a valid choice or all the values in a list of ints are valid choices
return value == "All" or \
value in self.choices or \
all( val.strip() in self.choices for val in value.replace(',', ' ').split( ))
2 changes: 1 addition & 1 deletion bittensor/_cli/cli_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ def unstake( self ):
if not self.config.no_prompt:
if not Confirm.ask("Do you want to unstake from the following keys:\n" + \
"".join([
f" [bold white]- {wallet.hotkey_str}: {amount.tao}𝜏[/bold white]\n" for wallet, amount in zip(final_wallets, final_amounts)
f" [bold white]- {wallet.hotkey_str}: {amount}𝜏[/bold white]\n" for wallet, amount in zip(final_wallets, final_amounts)
])
):
return None
Expand Down
33 changes: 20 additions & 13 deletions bittensor/_subtensor/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,16 @@
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
import argparse
import copy
import os

import random
import time
import psutil
import subprocess
from sys import platform

import bittensor
import copy
from loguru import logger
from substrateinterface import SubstrateInterface
from torch.cuda import is_available as is_cuda_available

from . import subtensor_impl
from . import subtensor_mock
from . import subtensor_impl, subtensor_mock

from loguru import logger
logger = logger.opt(colors=True)

__type_registery__ = {
Expand Down Expand Up @@ -188,8 +182,9 @@ def add_args(cls, parser: argparse.ArgumentParser, prefix: str = None ):
parser.add_argument('--' + prefix_str + 'subtensor.register.num_processes', '-n', dest='subtensor.register.num_processes', help="Number of processors to use for registration", type=int, default=bittensor.defaults.subtensor.register.num_processes)
parser.add_argument('--' + prefix_str + 'subtensor.register.update_interval', '--' + prefix_str + 'subtensor.register.cuda.update_interval', '--' + prefix_str + 'cuda.update_interval', '-u', help="The number of nonces to process before checking for next block during registration", type=int, default=bittensor.defaults.subtensor.register.update_interval)
# registration args. Used for register and re-register and anything that calls register.
parser.add_argument( '--' + prefix_str + 'subtensor.register.cuda.use_cuda', '--' + prefix_str + 'cuda', '--' + prefix_str + 'cuda.use_cuda', default=bittensor.defaults.subtensor.register.cuda.use_cuda, help='''Set true to use CUDA.''', action='store_true', required=False )
parser.add_argument( '--' + prefix_str + 'subtensor.register.cuda.dev_id', '--' + prefix_str + 'cuda.dev_id', type=int, default=argparse.SUPPRESS, help='''Set the CUDA device id. Goes by the order of speed. (i.e. 0 is the fastest).''', required=False )
parser.add_argument( '--' + prefix_str + 'subtensor.register.cuda.use_cuda', '--' + prefix_str + 'cuda', '--' + prefix_str + 'cuda.use_cuda', default=argparse.SUPPRESS, help='''Set true to use CUDA.''', action='store_true', required=False )
parser.add_argument( '--' + prefix_str + 'subtensor.register.cuda.dev_id', '--' + prefix_str + 'cuda.dev_id', type=int, nargs='+', default=argparse.SUPPRESS, help='''Set the CUDA device id(s). Goes by the order of speed. (i.e. 0 is the fastest).''', required=False )

parser.add_argument( '--' + prefix_str + 'subtensor.register.cuda.TPB', '--' + prefix_str + 'cuda.TPB', type=int, default=bittensor.defaults.subtensor.register.cuda.TPB, help='''Set the number of Threads Per Block for CUDA.''', required=False )

except argparse.ArgumentError:
Expand All @@ -210,14 +205,26 @@ def add_defaults(cls, defaults ):
defaults.subtensor.register.update_interval = os.getenv('BT_SUBTENSOR_REGISTER_UPDATE_INTERVAL') if os.getenv('BT_SUBTENSOR_REGISTER_UPDATE_INTERVAL') != None else 50_000

defaults.subtensor.register.cuda = bittensor.Config()
defaults.subtensor.register.cuda.dev_id = 0
defaults.subtensor.register.cuda.dev_id = [0]
defaults.subtensor.register.cuda.use_cuda = False
defaults.subtensor.register.cuda.TPB = 256

@staticmethod
def check_config( config: 'bittensor.Config' ):
assert config.subtensor
#assert config.subtensor.network != None
if config.subtensor.get('register') and config.subtensor.register.get('cuda'):
assert all((isinstance(x, int) or isinstance(x, str) and x.isnumeric() ) for x in config.subtensor.register.cuda.get('dev_id', []))

if config.subtensor.register.cuda.get('use_cuda', False):
try:
import cubit
except ImportError:
raise ImportError('CUDA registration is enabled but cubit is not installed. Please install cubit.')

if not is_cuda_available():
raise RuntimeError('CUDA registration is enabled but no CUDA devices are detected.')


@staticmethod
def determine_chain_endpoint(network: str):
Expand Down
Loading