Add draft text (current version) for methods and results/discussion o…

…f F&F section (#149) * Add experiment with baseline SNN with 250Hz input + usual delays allowed + num_classes = 6 * Add methods and results/discussion drafts of F&F section
comob-project · Jul 15, 2024 · 1f7a515 · 1f7a515
1 parent a8c542e
commit 1f7a515
Show file tree

Hide file tree

Showing 3 changed files with 60 additions and 3 deletions.
diff --git a/paper/sections/filter_and_fire_model/introduction.md b/paper/sections/filter_and_fire_model/introduction.md
@@ -1,5 +1,7 @@
+# Introduction
+
 Studies of biologically plausible computation in ANNs have typically focused on network-level constraints, such as feedback connectivity and learning rules (Lee et al., 2015; Lillicrap et al., 2016; Nøkland, 2016; Bengio et al., 2016). However, recent research points towards the critical importance of finer-grained biological properties, such as membrane time constants and cell morphologies (e.g., Perez-Nieves et al., 2021). Here, inspired by foundational work on synaptic integration in single neurons, we investigate the contributions of dendritic filtering to temporal processing in a spiking neural network (SNN) model.
 
-In the brain, synaptic currents flow through a neuron's dendrites before reaching the soma, fundamentally shaping the postsynaptic potentials (PSPs) that drive action potential output. Dendritic properties such as diameter, length, branching patterns and ion channel distributions lead to diverse filtering effects. Moreover, input location in the dendritic tree influences the degree of attenuation and delay of PSPs -- distal inputs lead to smaller and broader voltage waveforms at the soma than proximal inputs. Dendritic filtering thus endows single biological neurons with processing capabilities far beyond those of the simplified `point neurons' that dominate in network models. A wealth of experimental and theoretical research indeed supports the importance of dendrites for neural computation (London \& Hausser, 2005; Bicknell \& Hausser; OTHER REFS), including in auditory coincidence detection for sound localisation in mammals and birds (Agmon-Snir et al., 1998).
+In the brain, synaptic currents flow through a neuron's dendrites before reaching the soma, fundamentally shaping the postsynaptic potentials (PSPs) that drive action potential output. Dendritic properties such as diameter, length, branching patterns and ion channel distributions lead to diverse filtering effects. Moreover, input location in the dendritic tree influences the degree of attenuation and delay of PSPs -- distal inputs lead to smaller and broader voltage waveforms at the soma than proximal inputs. Dendritic filtering thus endows single biological neurons with processing capabilities far beyond those of the simplified `point neurons' that dominate in network models. A wealth of experimental and theoretical research indeed supports the importance of dendrites for neural computation (London \& Hausser, 2005; Bicknell \& Hausser, 2021), including in auditory coincidence detection for sound localisation in mammals and birds (Agmon-Snir et al., 1998).
 
 The SNN model in our study is trained to detect and exploit small phase differences between two concurrently received signals, representing sound waves arriving at different ears. While the computation can be performed if the network is seeded with random axonal delays, we hypothesized that this requirement can be discarded in the presence of dendritic filtering. Because synaptic input at different locations on a dendritic tree produces different PSP waveforms, randomly located synapses could serve to seed the network with a broad range of signal delays. Further, inspired by the `Filter-and-Fire' neuron model of Beniaguev et al. (2022), assuming multiple synaptic connections between pairs of neurons (as in the brain), we test whether such synaptic signal delays can also be effectively tuned simply by adjusting synaptic weights.
diff --git a/paper/sections/filter_and_fire_model/methods.md b/paper/sections/filter_and_fire_model/methods.md
@@ -1 +1,50 @@
-# TO ADD
+# Methods [WIP]
+
+We followed an incremental methodology to introduce our adaptation of the F&F neuron model in the baseline SNN: we devised three Experiments, with increasing complexity introduced to the baseline SNN. In all Experiments, we endowed single neurons in the hidden layer of the SNN with synaptic dynamics (rise and decay time constants), in order to obtain synaptic currents from the input spike trains that were originally fed into the baseline SNN.
+
+The synaptic current to a single hidden neuron (e.g., the $j_{th}$ hidden neuron) receiving inputs from $N$ presynaptic (input) neurons with $M$ connections between each input neuron and the $j_{th}$ hidden neuron, at timestep $t$ of the simulation, is described in Equation (1):
+
+$$g_{j}(t) = \sum_{i=1}^{N} \sum_{k=1}^{M}(x_{i} \ast kernel_{i,k,j})^{T} \cdot {W_{i,k,j}}$$
+
+where:
+
+- $M = 1$ in **Experiment 1** and **Experiment 2**; $M$ took discrete values in $[2, 5]$ in **Experiment 3**.
+- $kernel_{i,k,j}$ is the synaptic kernel of the synapse from the $k_{th}$ connection of the $i_{th}$ input neuron to the $j_{th}$ hidden neuron.
+- Implementation note: the kernels were computed and stored for each synapse before the filtering, and the weight matrix between the input and hidden layers was a $(N \cdot M \cdot H, H)$ matrix, where $(N \cdot M \cdot H)$ is the total number of synapses across $N$ input neurons and $H$ hidden neurons.
+
+The synaptic kernel for a synapse $syn$ is defined as the difference of two exponentials (adapted from Roth & Rossum, 2009):
+
+$$kernel_{syn} = e^{-(window\_size)/\tau_{decay}} - e^{-(window\_size)/\tau_{rise}}$$
+
+where the window size $window\_size$ was the entire simulation size ($1000$ ms).
+
+In **Experiment 1**, $tau_{decay}$ and $tau_{rise}$ were fixed across all synapses (the same kernel is used for all synapses), with $tau_{decay}$ ranging from 0.4 to 2 ms and $tau_{rise}$ ranging from 0.2 to 1.8 ms. These ranges were empirically motivated from a diverse literature on auditory modelling and experimental research in animals (e.g., see review in Trussell, 1997). In **Experiments 2** and **3**, $tau_{decay}$ and $tau_{rise}$ were randomly sampled for each synapse such that the kernel peak times are roughly uniformly distributed in a fixed real-valued range of $[0, 1]$ ms: this approximative range is motivated by the physiological human range of interaural time differences (ITDs) observed for a single frequency of $250$ Hz. The peak time of a synaptic kernel (i.e., when the synaptic conductance peaks) is given by Equation (3) (Roth & Rossum, 2009):
+
+$$t_{peak} = t_{0} + \frac{\tau_{decay}\tau_{rise}}{\tau_{decay} - \tau_{rise}} + \ln(\frac{\tau_{decay}}{\tau_{rise}})$$
+
+**Experiment 2** acts, therefore, as an extension to **Experiment 1**, and implements dendritic filtering with heterogeneous synaptic dynamics across synapses. Importantly, **Experiment 3** extends the dendritic filtering implemented in **Experiment 2** by allowing each hidden F&F neuron to be contacted multiple ($M > 1$) times by each input neuron.
+
+The operations above assume a batch size of 1 (i.e., only one simulation training sample is considered). During training, we used minibatch stochastic gradient descent ($batch\_size = 64$), so these computations were vectorised across all examples of a given batch.
+
+The main hyperparameters are reported in Table 1.
+
+| **Hyperparameter** | **Name** | **Value** |
+|---------------------|----------|-----------|
+| sound stimulus duration | duration | 100 (ms) |
+| sound stimulus frequency | F | 250 Hz |
+| timestep | DT | 0.1 (ms) |
+| number of input neurons | input\_size | 200 (100 neurons per "ear") |
+| number of hidden neurons | num\_hidden | 5 |
+| number of output neurons (bins of IPDs) | num\_classes | 6 |
+| membrane time constant (hidden neuron) | tau\_hidden | 2 (ms) |
+| membrane time constant (output neuron) | tau\_output | 20 (ms) |
+| connection drop probability (Exp 3 only) | drop\_probability | [0, 0.2, 0.5, 0.8] |
+| connections per axon between each input neuron and hidden neuron | connections\_per\_axon | 1 for Exps 1-2, [2,3,4,5] for Exp 3 |
+| batch size | batch\_size | 64 |
+| number of training batches | n\_training\_batches | 64 |
+| number of testing batches | n\_testing\_batches | 64 |
+| number of training epochs | num\_epochs | 50 |
+| learning rate | lr | [0.0001, 0.0004, 0.0007, 0.001, 0.004, 0.007, 0.01, 0.04, 0.07, 0.1] for Exps 1-2, [0.1, 0.01, 0.001, 0.0001] for Exp 3 |
+| hyperparameter for surrogate gradient descent | BETA | 5 (ms) |
+
+**Table 1:** Main Hyperparameters
diff --git a/paper/sections/filter_and_fire_model/results_and_discussion.md b/paper/sections/filter_and_fire_model/results_and_discussion.md
@@ -1 +1,7 @@
-# TO ADD
+# Results & Discussion [WIP]
+
+First, different hyperparameter configurations were investigated for the **baseline SNN model** in order to establish a baseline performance for comparison. The baseline SNN was trained and tested in three major settings: **(1)** where the phase delays are used at input spike generation (for both training and testing); **(2)** where there are no phase delays at input spike generation (for both training and testing); **(3)** where the model is trained on training input with phase delays, but tested on test input without phase delays. The third setting is particularly interesting, as it provides a means of checking whether the model makes use of the available information about delays to solve the task. Accuracy results for settings **(1)**, **(2)** and **(3)** are shown in Table X. In the original setting — **setting (1)** —, the performance of the SNN is consistently above chance level and generalizable (provided a sensible learning rate), with the model exhibiting high classification capabilities across training and test sets. The same observation can be made in **setting (2)** for certain values of the training learning rate, suggesting that the baseline SNN model can learn to solve the task without information about the phase delays between the sound signals that come from two different sources (left ear - right ear). Then, does such a model make use of that information at all to solve the task? Results from **setting (3)** confirm that, when available, information about phase delays are used by the model to solve the task: indeed, the consistent drop in accuracy from training to testing is more significant in this setting than the previous ones. As the neural network's capacity increases, this drop in performance may become less noticeable or even avoidable (**setting (4)**): to test this hypothesis, we ran the same configuration as in **setting (3)**, but with a varying number ($n\_hidden$) of hidden neurons (see results for **setting (4)** in Table X). Results for **setting (4)** suggest that increasing the capacity of the network (i.e., increasing $num\_hidden$) does not substantially help the network's generalisability when the network has been trained with delay information that is unavailable at test time.
+
+**Experiment 1** revealed that the high performance (accuracy) of the baseline SNN on the classification task can be maintained in certain configurations (e.g., depending on the learning rate, or membrane time constants), when simple synaptic dynamics are introduced. That is, in this fixed synaptic dynamics setup, there were specific rise and decay time constant pairings that led to no drop in task accuracy compared to the best performance results reported for the baseline SNN. Importantly, this observation was made in two separate settings: **(1)** where the phase delays present in the input spike trains are kept (see Figure X; e.g. $tau_{rise} = 1.4$, $tau_{decay} = 1.6$); **(2)** where those phase delays were completely ignored during input dataset generation (see Figure X; e.g. $tau_{rise} = 1.0$, $tau_{decay} = 1.6$). Still, an overall decrease in (training and test) classification accuracy — across rise and decay time constant pairings — can be noted in the absence of delay information.
+
+We hypothesized that the setup introduced in **Experiment 2** could prevent this decrease of performance in the absence of delay information, thanks to the temporal dynamics introduced by the heterogeneity in the rise and decay time constants used across "synapses" (i.e., connections in the neural network). Table X summarizes the training and test accuracy results obtained for 5 random seeds per experiment; an experiment is defined by the learning rate used (10 numerical values tested: see Table 1) and whether the delays are used during both training and testing (Boolean value). The network's performance was substantially variable across random seeds of the same configurations. The choice of learning rate significantly impacted the seed-averaged classification accuracies; the impact on task performance also depended on whether delays were allowed at training and test set generation. Overall, in both delay settings, the performance of the network was hindered compared to previous experiments, including **Experiment 1** without delays and **baseline SNN experiments**. Still, we observed that the network's performance with such heterogeneous synaptic dynamics did not consistently or strongly change with the removal of delay information. This observation suggests that **(1)** the temporal dynamics introduced with synaptic filtering may somewhat compensate for the temporal information conveyed with signal delays; **(2)** and the overall drop in performance in **Experiment 2** is due to the loss of temporal precision in the information propagated through the network (given the synaptic filtering applied).