[hotfix][3.6.3] Merging back hotfix into finney (#1065)

* Update README.md * Hotfix/3.6.2/validator logit parameters (#1057) * additional parameters * fixed naming to logit divergence * versioning and fixes * typo fixes * bug fixes * Tests cli fixes (#1058) * fix btcli list with wallet.path (#1036) fix path join * remove mock subtensor and replace with mock calls * additional fixes * mock wallet Co-authored-by: Cameron Fairchild <[email protected]> * Log prune_len and logits_divergence * Always get latest prune_len Co-authored-by: Cameron Fairchild <[email protected]> Co-authored-by: opentaco <[email protected]> * fixing no_version_checking error * updating version to 3.6.3 --------- Co-authored-by: Unconst <[email protected]> Co-authored-by: Eugene-hu <[email protected]> Co-authored-by: Cameron Fairchild <[email protected]> Co-authored-by: opentaco <[email protected]> Co-authored-by: Eugene <[email protected]>
opentensor · Feb 2, 2023 · ceb29a2 · ceb29a2
1 parent c04403c
commit ceb29a2
Show file tree

Hide file tree

Showing 4 changed files with 46 additions and 21 deletions.
diff --git a/README.md b/README.md
@@ -13,7 +13,13 @@
 
 </div>
 
-At Bittensor, we are creating an open, decentralized, peer-to-peer network that functions as a market system for the development of artificial intelligence. Our purpose is not only to accelerate the development of AI by creating an environment optimally condusive to its evolution, but to democratize the global production and use of this valuable commodity. Our aim is to disrupt the status quo: a system that is centrally controlled, inefficient and unsustainable. In developing the Bittensor API, we are allowing engineers to monetize their work, gain access to machine intelligence and join our community of creative, forward-thinking individuals. For more info, read our [paper](https://drive.google.com/file/d/1VnsobL6lIAAqcA1_Tbm8AYIQscfJV4KU/view).
+This repository contains Bittensor's python API which can be used to 1) Query the Bittensor network as a [client](#31-client) 2) Run and build Bittensor miners & validators for [mining TAO](#43-running-a-template-miner), 3) Pull network [state information](#3-using-bittensor) and 4) Manage [TAO wallets](#41-cli), balances, transfers etc. 
+
+Bittensor is a mining network (like Bitcoin) with inbaked incentives which are designed to drive miners to provide value; which, in our network, is achieved by hosting trained or training machine learning models, which can be queried by clients seeking inference over inputs (i.e. text-generation, or numerical embeddings from a large foundation model like GPT-NeoX-20B). 
+
+The use of token based incentives is by design, built-in to drive the network's size and as a means of distributing the value generated by the network directly to the individuals producing that value without intermediary. The network is open to those who participate and no individual or group has full power of what it learns, who can profit from it, or access it.
+
+To learn more about Bittensor read our [paper].(https://drive.google.com/file/d/1VnsobL6lIAAqcA1_Tbm8AYIQscfJV4KU/view). 
 
 - [1. Documentation](#1-documentation)
 - [2. Install](#2-install)
@@ -26,11 +32,9 @@ At Bittensor, we are creating an open, decentralized, peer-to-peer network that
   - [4.2. Selecting the network to join](#42-selecting-the-network-to-join)
   - [4.3. Running a template miner](#43-running-a-template-miner)
   - [4.4. Running a template server](#44-running-a-template-server)
-  - [4.5. Subscription to the network](#45-subscription-to-the-network)
-  - [4.6. Syncing with the chain/ Finding the ranks/stake/uids of other nodes](#46-syncing-with-the-chain-finding-the-ranksstakeuids-of-other-nodes)
-  - [4.7. Finding and creating the endpoints for other nodes in the network](#47-finding-and-creating-the-endpoints-for-other-nodes-in-the-network)
-  - [4.8. Querying others in the network](#48-querying-others-in-the-network)
-  - [4.9. Creating a Priority Thread Pool for the axon](#49-creating-a-priority-thread-pool-for-the-axon)
+  - [4.5. Syncing with the chain/ Finding the ranks/stake/uids of other nodes](#46-syncing-with-the-chain-finding-the-ranksstakeuids-of-other-nodes)
+  - [4.6. Finding and creating the endpoints for other nodes in the network](#47-finding-and-creating-the-endpoints-for-other-nodes-in-the-network)
+  - [4.7. Querying others in the network](#48-querying-others-in-the-network)
 - [5. Release](#5-release)
 - [6. License](#6-license)
 - [7. Acknowledgments](#7-acknowledgments)

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-3.6.1
+3.6.3
diff --git a/bittensor/_neuron/text/core_validator/__init__.py b/bittensor/_neuron/text/core_validator/__init__.py
@@ -148,7 +148,7 @@ def __init__(
         self.device = torch.device ( device = self.config.neuron.device )    
         self.nucleus = nucleus ( config = self.config, device = self.device, subtensor = self.subtensor, vlogger = self.vlogger ).to( self.device )
         self.dataset = (bittensor.dataset(config=self.config, batch_size=self.subtensor.validator_batch_size(self.config.netuid),
-                                          block_size=self.subtensor.validator_sequence_length(self.config.netuid) + self.config.neuron.validation_len + self.config.neuron.prune_len)
+                                          block_size=self.subtensor.validator_sequence_length(self.config.netuid) + self.config.neuron.validation_len +  self.subtensor.validator_prune_len(netuid=self.config.netuid))
                         if dataset is None else dataset)
         self.optimizer = torch.optim.SGD(
             self.nucleus.parameters(), lr=self.config.neuron.learning_rate, momentum=self.config.neuron.momentum
@@ -205,7 +205,7 @@ def add_args( cls, parser ):
         parser.add_argument('--neuron.blocks_per_epoch', type=int, help='Blocks per epoch, -1 value means we use the chain value.', default = -1 )
         parser.add_argument('--neuron.epochs_until_reset', type=int, help='Number of epochs before weights are reset.', default = -1 )
         parser.add_argument('--neuron.validation_len', type=int, help='Number of tokens to holdout for phrase validation beyond sequence context.', default=8)
-        parser.add_argument('--neuron.prune_len', type=int, help='Number of tokens to prune from each validation input sequence.', default=1)
+        parser.add_argument('--neuron.prune_len', type=int, help='Number of tokens to prune from each validation input sequence.  (default value: -1, pulling from subtensor directly)', default=-1)
         parser.add_argument('--neuron.device', type=str, help='miner default training device cpu/cuda', default=("cuda" if torch.cuda.is_available() else "cpu"))
         parser.add_argument('--neuron.clip_gradients', type=float, help='Implement gradient clipping to avoid exploding loss on smaller architectures.', default=1.0 )
         parser.add_argument('--neuron.track_hotkey_changes', action='store_true', help='If True, track hotkey changes.', default=False)
@@ -375,7 +375,9 @@ def run_epoch( self ):
         batch_size = self.subtensor.validator_batch_size(netuid=self.config.netuid)
         sequence_length = self.subtensor.validator_sequence_length(netuid=self.config.netuid)
         validation_len = self.config.neuron.validation_len  # Number of tokens to holdout for phrase validation beyond sequence context
-        prune_len = self.config.neuron.prune_len  # Number of tokens to holdout for phrase validation beyond sequence context
+        # Number of tokens to prune for phrase validation beyond sequence context
+        prune_len = self.config.neuron.prune_len = self.subtensor.validator_prune_len(netuid=self.config.netuid)
+        self.config.nucleus.logits_divergence = self.subtensor.validator_logits_divergence(netuid=self.config.netuid)
         min_allowed_weights = self.subtensor.min_allowed_weights(netuid=self.config.netuid)
         max_weight_limit = self.subtensor.max_weight_limit(netuid=self.config.netuid)
         blocks_per_epoch = self.subtensor.validator_epoch_length(netuid=self.config.netuid) if self.config.neuron.blocks_per_epoch == -1 else self.config.neuron.blocks_per_epoch
@@ -657,7 +659,7 @@ def neuron_stats_update(self, neuron_stats: Dict[int, Dict[str, Any]]):
 
                 if 'logits_excess_nxt' in stats:
                     # penalize by logits divergence excess
-                    extra_stats['shapley_values_nxt'] /= 1 + stats['logits_excess_nxt']
+                    extra_stats['shapley_values_nxt'] /= 1 + self.config.nucleus.logits_divergence * stats['logits_excess_nxt']
 
             # === EMA zeroing update ===
             # Push zero into EMA for synapse_keys to exponentially decay weighting keys if neuron non-responsive
@@ -750,6 +752,7 @@ def __init__( self, config, device, subtensor, vlogger ):
         super(nucleus, self).__init__()
         self.config = config
         self.vlogger = vlogger
+        self.config.nucleus.logits_divergence = subtensor.validator_logits_divergence(netuid=self.config.netuid) if self.config.nucleus.logits_divergence == -1 else self.config.nucleus.logits_divergence
         self.config.nucleus.scaling_law_power = subtensor.scaling_law_power(netuid=self.config.netuid) if self.config.nucleus.scaling_law_power == -1 else self.config.nucleus.scaling_law_power
         self.config.nucleus.synergy_scaling_law_power = subtensor.synergy_scaling_law_power(netuid=self.config.netuid) if self.config.nucleus.synergy_scaling_law_power == -1 else self.config.nucleus.synergy_scaling_law_power
 
@@ -799,6 +802,7 @@ def add_args( cls, parser ):
         parser.add_argument('--nucleus.no_dendrite_backward', action='store_true', help='Pass backward request to the server side or not', default=False )
         parser.add_argument('--nucleus.scaling_law_power', type=float, help='Power for modified scaling law, powered down to improve dynamic range, e.g. 3 → 6 nats for 0.5. (default value: -1, pulling from subtensor directly)', default=-1)
         parser.add_argument('--nucleus.synergy_scaling_law_power', type=float, help='Power for synergy modified scaling law, powered down to improve dynamic range, e.g. 3 → 6 nats for 0.5. (default value: -1, pulling from subtensor directly)', default=-1)
+        parser.add_argument('--nucleus.logits_divergence', type=float, help=' the divergence value for logit anomaly detection (default value: -1, pulling from subtensor directly)', default=-1)
 
     @classmethod
     def config ( cls ):
@@ -910,7 +914,7 @@ def forward(
         num_endpoints = len(random_endpoints)  # in case len(self.permute_uids) < num_endpoints during random_uids select
 
         logger.info(f'Forward \t| Routing forward <dim>[{time.time() - start_time:.3g}s]</dim>')
-        logger.info(f'Dendrite \t| Request {num_endpoints} x {list(inputs_seq.shape)}')
+        logger.info(f'Dendrite \t| Request {num_endpoints} x {list(inputs_seq.shape)} (prune_len={prune_len})')
         request_start_time = time.time()
 
         # === Define which synapse we want to use ===
@@ -951,6 +955,7 @@ def forward(
                     f'<dim>[{time.time() - request_start_time:.3g}s]</dim>')
 
         # === Prepare validation parameter set ===
+        console_width = self.config.get('width', None)  # console width for rich table displays of synapse measures
         validation_params = {
             'uids': random_uids, 
             'query_responses': query_responses, 
@@ -960,6 +965,7 @@ def forward(
             'inputs': inputs, 
             'validation_len': val_len, 
             'loss_fct': self.loss_fct,
+            'logits_divergence_penalty':self.config.nucleus.logits_divergence,
             'scaling_law_power': self.config.nucleus.scaling_law_power, 
             'synergy_scaling_law_power': self.config.nucleus.synergy_scaling_law_power,
             'vlogger': self.vlogger,
@@ -991,9 +997,9 @@ def scaling_law_loss_to_params(loss):
 
 def textcausallm(uids: torch.Tensor, query_responses: List[List[torch.FloatTensor]], return_ops: List[torch.LongTensor],
                  times: List[torch.FloatTensor], routing_score: torch.FloatTensor,
-                 inputs: torch.FloatTensor, validation_len: int, loss_fct: Callable,
+                 inputs: torch.FloatTensor, validation_len: int, loss_fct: Callable,                 
                  scaling_law_power: float, synergy_scaling_law_power: float, vlogger: ValidatorLogger,
-                 logging, synapse: 'bittensor.TextCausalLM' = None, index_s: int = 0
+                 logits_divergence_penalty: float,logging, synapse: 'bittensor.TextCausalLM' = None, index_s: int = 0
                  ) -> Tuple[torch.FloatTensor, Dict]:
     r"""
     Calculate Shapley values and neuron response validation measure statistics, given TextCausalLM synapse responses.
@@ -1019,6 +1025,8 @@ def textcausallm(uids: torch.Tensor, query_responses: List[List[torch.FloatTenso
                 Power for modified scaling law, powered down to improve dynamic range, e.g. 3 → 6 nats for 0.5.
             synergy_scaling_law_power (:obj:`float`, `required`):
                 Power for synergy modified scaling law, powered down to improve dynamic range, e.g. 3 → 6 nats for 0.5.
+            logits_divergence_penalty (:obj:`float`, `required`):
+                Penalty scaling for logits divergence.
             vlogger (:obj:`ValidatorLogger`, `required`):
                 Logger for validator.
             logging (:obj:`bool`, `required`):
@@ -1069,7 +1077,7 @@ def _synergy(first, second, target, _ext):
     loss, stats, unsuccessful = shapley_base(uids, query_responses, return_ops, times, routing_score,
                                              _base_params, index_s, ext='')
 
-    logger.info(f'{str(synapse)} \t| Shapley base values (power={scaling_law_power:.1f})'
+    logger.info(f'{str(synapse)} \t| Shapley base values (power={scaling_law_power:.1f}) '
                 f'<dim>[{time.time() - shapley_start_time:.3g}s]</dim>')
 
     synergy_start_time = time.time()
@@ -1096,7 +1104,7 @@ def _synergy(first, second, target, _ext):
             if hasattr(s[key], 'item'):
                 s[key] = s[key].item()
 
-    logger.info(f'{str(synapse)} \t| Shapley synergy values (power={synergy_scaling_law_power:.1f})'
+    logger.info(f'{str(synapse)} \t| Shapley synergy values (power={synergy_scaling_law_power:.1f}) '
                 f'<dim>[{time.time() - synergy_start_time:.3g}s]</dim>')
 
     if logging:
@@ -1117,9 +1125,9 @@ def _synergy(first, second, target, _ext):
 
 def textcausallmnext(uids: torch.Tensor, query_responses: List[List[torch.FloatTensor]], return_ops: List[torch.LongTensor],
                      times: List[torch.FloatTensor], routing_score: torch.FloatTensor,
-                     inputs: torch.FloatTensor, validation_len: int, loss_fct: Callable,
+                     inputs: torch.FloatTensor, validation_len: int, loss_fct: Callable,                     
                      scaling_law_power: float, synergy_scaling_law_power: float, vlogger:ValidatorLogger,
-                     logging, synapse: 'bittensor.TextCausalLMNext' = None, index_s: int = 0
+                     logits_divergence_penalty: float,logging, synapse: 'bittensor.TextCausalLMNext' = None, index_s: int = 0
                      ) -> Tuple[torch.FloatTensor, Dict]:
     r"""
     Calculate Shapley values and neuron response validation measure statistics, given TextCausalLMNext synapse responses.
@@ -1145,6 +1153,8 @@ def textcausallmnext(uids: torch.Tensor, query_responses: List[List[torch.FloatT
                 Power for modified scaling law, powered down to improve dynamic range, e.g. 3 → 6 nats for 0.5.
             synergy_scaling_law_power (:obj:`float`, `required`):
                 Power for synergy modified scaling law, powered down to improve dynamic range, e.g. 3 → 6 nats for 0.5.
+            logits_divergence_penalty (:obj:`float`, `required`):
+                Penalty scaling for logits divergence.
             vlogger (:obj:`ValidatorLogger`, `required`):
                 Logger for validator.
             logging (:obj:`bool`, `required`):
@@ -1183,17 +1193,18 @@ def _synergy(first, second, target, ext):
     shapley_start_time = time.time()
     loss, stats, unsuccessful = shapley_base(uids, query_responses, return_ops, times, routing_score,
                                              _base_params, index_s, ext='_nxt')
-    logger.info(f'{str(synapse)} \t| Shapley base values (power={scaling_law_power:.1f})'
+    logger.info(f'{str(synapse)} \t| Shapley base values (power={scaling_law_power:.1f}) '
                 f'<dim>[{time.time() - shapley_start_time:.3g}s]</dim>')
 
     divergence_start_time = time.time()
     with torch.no_grad():
         logits_divergence(stats, uids, query_responses, return_ops, times, index_s, ext='_nxt')
-    logger.info(f'{str(synapse)} \t| Logits divergences <dim>[{time.time() - divergence_start_time:.3g}s]</dim>')
+    logger.info(f'{str(synapse)} \t| Logits divergences (penalty={logits_divergence_penalty}) '
+                f'<dim>[{time.time() - divergence_start_time:.3g}s]</dim>')
 
     synergy_start_time = time.time()
     syn_loss_diff = shapley_synergy(stats, _synergy, '_nxt', scaling_law_power=synergy_scaling_law_power)
-    logger.info(f'{str(synapse)} \t| Shapley synergy values (power={synergy_scaling_law_power:.1f})'
+    logger.info(f'{str(synapse)} \t| Shapley synergy values (power={synergy_scaling_law_power:.1f}) '
                 f'<dim>[{time.time() - synergy_start_time:.3g}s]</dim>')
 
     # === Shapley value combination ===

diff --git a/bittensor/_subtensor/subtensor_impl.py b/bittensor/_subtensor/subtensor_impl.py
@@ -355,6 +355,16 @@ def validator_batch_size (self, netuid: int, block: Optional[int] = None ) -> Op
         if not self.subnet_exists( netuid ): return None
         return self.query_paratensor("ValidatorBatchSize", block, [netuid] ).value
 
+    """ Returns network ValidatorPruneLen hyper parameter """
+    def validator_prune_len (self, netuid: int, block: Optional[int] = None ) -> int:
+        if not self.subnet_exists( netuid ): return None
+        return self.query_paratensor("ValidatorPruneLen", block, [netuid] ).value
+
+    """ Returns network ValidatorLogitsDivergence hyper parameter """
+    def validator_logits_divergence (self, netuid: int, block: Optional[int] = None ) -> int:
+        if not self.subnet_exists( netuid ): return None
+        return self.query_paratensor("ValidatorLogitsDivergence", block, [netuid] ).value/U64_MAX
+
     """ Returns network ValidatorSequenceLength hyper parameter """
     def validator_sequence_length (self, netuid: int, block: Optional[int] = None ) -> Optional[int]:
         if not self.subnet_exists( netuid ): return None