diff --git a/.gitignore b/.gitignore
index 79830d6..1c6d0f0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,6 +3,8 @@ __pycache__/
 *.py[cod]
 *$py.class
 
+*/.DS_Store
+
 # C extensions
 *.so
 
diff --git a/README.md b/README.md
index eb2ae47..dab2ebc 100644
--- a/README.md
+++ b/README.md
@@ -47,14 +47,15 @@ Reinforcement Learning (Henderson et al., 2018) and Computer Vision (Borji, 2017
 
 To help mitigate this problem, this package supplies fully-tested re-implementations of useful functions for significance
 testing:
-* Statistical Significance tests such as Almost Stochastic Order (Dror et al., 2019), bootstrap (Efron & Tibshirani, 1994) and 
-  permutation-randomization (Noreen, 1989).
+* Statistical Significance tests such as Almost Stochastic Order (del Barrio et al, 2017; Dror et al., 2019), 
+  bootstrap (Efron & Tibshirani, 1994) and permutation-randomization (Noreen, 1989).
 * Bonferroni correction methods for multiplicity in datasets (Bonferroni, 1936). 
 * Bootstrap power analysis (Yuan & Hayashi, 2003) and other functions to determine the right sample size.
 
 All functions are fully tested and also compatible with common deep learning data structures, such as PyTorch / 
 Tensorflow tensors as well as NumPy and Jax arrays.  For examples about the usage, consult the documentation 
-[here](https://deep-significance.readthedocs.io/en/latest/) or the scenarios in the section [Examples](#examples).
+[here](https://deep-significance.readthedocs.io/en/latest/) , the scenarios in the section [Examples](#examples) or 
+the [demo Jupyter notebook](https://github.com/Kaleidophon/deep-significance/tree/main/paper/deep-significance%20demo.ipynb).
 
 ## :inbox_tray: Installation
 
@@ -74,46 +75,51 @@ Another option is to clone the repository and install the package locally:
 
 ---
 **tl;dr**: Use `aso()` to compare scores for two models. If the returned `eps_min < 0.5`, A is better than B. The lower
-`eps_min`, the more confident the result. 
+`eps_min`, the more confident the result (we recommend to check `eps_min < 0.2` and record `eps_min` alongside 
+experimental results). 
 
 :warning: Testing models with only one set of hyperparameters and only one test set will be able to guarantee superiority
 in all settings. See [General Recommendations & other notes](#general-recommendations).
 
 ---
 
-In the following, I will lay out three scenarios that describe common use cases for ML practitioners and how to apply 
+In the following, we will lay out three scenarios that describe common use cases for ML practitioners and how to apply 
 the methods implemented in this package accordingly. For an introduction into statistical hypothesis testing, please
 refer to resources such as [this blog post](https://machinelearningmastery.com/statistical-hypothesis-tests/) for a general
 overview or [Dror et al. (2018)](https://www.aclweb.org/anthology/P18-1128.pdf) for a NLP-specific point of view. 
 
-In general, in statistical significance testing, we usually compare two algorithms <img src="svgs/53d147e7f3fe6e47ee05b88b166bd3f6.svg?invert_in_darkmode" align=middle width=12.32879834999999pt height=22.465723500000017pt/> and <img src="svgs/61e84f854bc6258d4108d08d4c4a0852.svg?invert_in_darkmode" align=middle width=13.29340979999999pt height=22.465723500000017pt/> on a dataset <img src="svgs/cbfb1b2a33b28eab8a3e59464768e810.svg?invert_in_darkmode" align=middle width=14.908688849999992pt height=22.465723500000017pt/> using 
-some evaluation metric <img src="svgs/b5eaea000e06d5cf2e882f8fdbc71e36.svg?invert_in_darkmode" align=middle width=19.740822749999992pt height=22.465723500000017pt/> (we assume a higher = better). The difference between the two algorithms on the 
-data is then defined as 
-
-<p align="center"><img src="svgs/9540dc879d2ecaa7cb245871b24f4e5d.svg?invert_in_darkmode" align=middle width=212.73480854999997pt height=16.438356pt/></p>
-
-where <img src="svgs/6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> is our test statistic. We then test the following **null hypothesis**:
-
-<p align="center"><img src="svgs/1d210dbbb93bbdc5a632b9443059499d.svg?invert_in_darkmode" align=middle width=100.49629589999999pt height=16.438356pt/></p>
-
-Thus, we assume our algorithm A to be equally as good or worse than algorithm B and reject the null hypothesis if A 
-is better than B (what we actually would like to see). Most statistical significance tests operate using 
-*p-values*, which define the probability that under the null-hypothesis, the <img src="svgs/6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> expected by the test is larger than or
-equal to the observed difference <img src="svgs/ecdae90a73f512871267f358443bd563.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> (that is, for a one-sided test, i.e. we assume A to be better than B):
-
-<p align="center"><img src="svgs/6d2735c4e335ec03c8b45736da4531a3.svg?invert_in_darkmode" align=middle width=135.91559685pt height=16.438356pt/></p>
-
-We can interpret this equation as follows: Assuming that A is *not* better than B, the test assumes a corresponding distribution
-of differences that <img src="svgs/6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> is drawn from. How does our actually observed difference <img src="svgs/94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> fit in there?
-This is what the p-value is expressing: If this probability is high, <img src="svgs/94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> is in line with what we expected under 
-the null hypothesis, so we conclude A not to better than B. If the 
-probability is low, that means that <img src="svgs/94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> is quite unlikely under the null hypothesis and that the reverse 
-case is more likely - i.e. that it is 
-likely *larger* than <img src="svgs/6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> - and we conclude that A is indeed better than B. Note that **the p-value does not 
-express whether the null hypothesis is true**.
-
-To decide when we trust A to be better than B, we set a threshold that will determine when the p-value is small enough 
-for us to reject the null hypothesis, this is called the significance level <img src="svgs/c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/> and it is often set to be 0.05.
+We assume that we have two sets of scores we would like to compare, <img src="svgs/b7e817ab52abd984b082abaa1da6a8e4.svg?invert_in_darkmode" align=middle width=17.44287434999999pt height=22.648391699999998pt/> and <img src="svgs/d06f8d92c07734af06da289c13d2beed.svg?invert_in_darkmode" align=middle width=16.80361814999999pt height=22.648391699999998pt/>,
+for instance obtained by running two models <img src="svgs/d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> and <img src="svgs/f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/> multiple times with a different random seed. 
+We can then define a one-sided test statistic  <img src="svgs/ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> based on the gathered observations. 
+An example of such test statistics is for instance the difference in observation means. We then formulate the following null-hypothesis:
+
+<p align="center"><img src="svgs/00160c684b3af8ccefcdf19c69712e34.svg?invert_in_darkmode" align=middle width=128.7838134pt height=16.438356pt/></p>
+
+That means that we actually assume the opposite of our desired case, namely that <img src="svgs/d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is not better than <img src="svgs/f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>, 
+but equally as good or worse, as indicated by the value of the test statistic. 
+Usually, the goal becomes to reject this null hypothesis using the SST. 
+*p*-value testing is a frequentist method in the realm of SST. 
+It introduces the notion of data that *could have been observed* if we were to repeat our experiment again using 
+the same conditions, which we will write with superscript <img src="svgs/e723e08dae472a15132221e280670a7e.svg?invert_in_darkmode" align=middle width=22.87678634999999pt height=14.15524440000002pt/> in order to distinguish them from our actually 
+observed scores (Gelman et al., 2021). 
+We then define the *p*-value as the probability that, under the null hypothesis, the test statistic using replicated 
+observation is larger than or equal to the *observed* test statistic:
+
+<p align="center"><img src="svgs/5db9dda6d48361ba963326d3f98a033d.svg?invert_in_darkmode" align=middle width=216.90071865pt height=17.74869195pt/></p>
+
+We can interpret this expression as follows: Assuming that <img src="svgs/d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is not better than <img src="svgs/f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>, the test 
+assumes a corresponding distribution of statistics that <img src="svgs/38f1e2a089e53d5c990a82f284948953.svg?invert_in_darkmode" align=middle width=7.928075099999989pt height=22.831056599999986pt/> is drawn from. So how does the observed test statistic 
+<img src="svgs/ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> fit in here? This is what the <img src="svgs/2ec6e630f199f589a2402fdf3e0289d5.svg?invert_in_darkmode" align=middle width=8.270567249999992pt height=14.15524440000002pt/>-value expresses: When the 
+probability is high, <img src="svgs/ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> is in line with what we expected under the 
+null hypothesis, so we can *not* reject the null hypothesis, or in other words, we \emph{cannot} conclude 
+<img src="svgs/d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> to be better than <img src="svgs/f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>. If the probability is low, that means that the observed 
+<img src="svgs/67ebeedcf8c4d1141331d07b2cef2b03.svg?invert_in_darkmode" align=middle width=54.77736824999999pt height=24.65753399999998pt/> is quite unlikely under the null hypothesis and that the reverse case is 
+more likely - i.e. that it is likely larger than - and we conclude that <img src="svgs/d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is indeed better than 
+<img src="svgs/f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>. Note that **the <img src="svgs/2ec6e630f199f589a2402fdf3e0289d5.svg?invert_in_darkmode" align=middle width=8.270567249999992pt height=14.15524440000002pt/>-value does not express whether the null hypothesis is true**. To make our decision 
+about whether or not to reject the null hypothesis, we typically determine a threshold - the significance level 
+<img src="svgs/c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/>, often set to 0.05 - that the *p*-value has to fall below. However, it has been argued that a better practice 
+involves reporting the *p*-value alongside the results without a pidgeonholing of results into significant and non-significant
+(Wasserstein et al., 2019).
 
 
 ### Intermezzo: Almost Stochastic Order - a better significance test for Deep Neural Networks
@@ -121,8 +127,8 @@ for us to reject the null hypothesis, this is called the significance level <img
 Deep neural networks are highly non-linear models, having their performance highly dependent on hyperparameters, random 
 seeds and other (stochastic) factors. Therefore, comparing the means of two models across several runs might not be 
 enough to decide if a model A is better than B. In fact, **even aggregating more statistics like standard deviation, minimum
-or maximum might not be enough** to make a decision. For this reason, Dror et al. (2019) introduced *Almost Stochastic 
-Order* (ASO), a test to compare two score distributions. 
+or maximum might not be enough** to make a decision. For this reason, del Barrio et al. (2017) and Dror et al. (2019) 
+introduced *Almost Stochastic Order* (ASO), a test to compare two score distributions. 
 
 It builds on the concept of *stochastic order*: We can compare two distributions and declare one as *stochastically dominant*
 by comparing their cumulative distribution functions: 
@@ -132,19 +138,20 @@ by comparing their cumulative distribution functions:
 Here, the CDF of A is given in red and in green for B. If the CDF of A is lower than B for every <img src="svgs/332cc365a4987aacce0ead01b8bdcc0b.svg?invert_in_darkmode" align=middle width=9.39498779999999pt height=14.15524440000002pt/>, we know the 
 algorithm A to score higher. However, in practice these cases are rarely so clear-cut (imagine e.g. two normal 
 distributions with the same mean but different variances).
-For this reason, Dror et al. (2019) consider the notion of *almost stochastic dominance* by quantifying the extent to 
-which stochastic order is being violated (red area):
+For this reason, del Barrio et al. (2017) and Dror et al. (2019) consider the notion of *almost stochastic dominance* 
+by quantifying the extent to which stochastic order is being violated (red area):
 
 ![](img/aso.png)
 
-ASO returns a value <img src="svgs/70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>, which expresses the amount of violation of stochastic order. If 
-<img src="svgs/dabed7f05cf133d9eb92631d564a96a8.svg?invert_in_darkmode" align=middle width=72.19750559999999pt height=21.18721440000001pt/>, A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as 
+ASO returns a value <img src="svgs/70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>, which expresses (an upper bound to) the amount of violation of stochastic order. If 
+<img src="svgs/4cd4877610a47d915f39367760234822.svg?invert_in_darkmode" align=middle width=60.239714699999986pt height=17.723762100000005pt/> (where \tau is 0.5 or less), A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as 
 superior. We can also interpret <img src="svgs/70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> as a *confidence score*. The lower it is, the more sure we can be 
 that A is better than B. Note: **ASO does not compute p-values.** Instead, the null hypothesis formulated as 
 
-<p align="center"><img src="svgs/69c5ac8ce10d0dbd0c2b915aaf0472c1.svg?invert_in_darkmode" align=middle width=106.93478895pt height=13.698590399999999pt/></p>
+<p align="center"><img src="svgs/06f5ff6214110287d3948e9b44e31a1f.svg?invert_in_darkmode" align=middle width=94.97699804999999pt height=13.698590399999999pt/></p>
 
-If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5.
+If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5 
+(see the discussion in [this section](#general-recommendations)).
 Furthermore, the significance level <img src="svgs/c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/> is determined as an input argument when running ASO and actively influence 
 the resulting <img src="svgs/70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>.
 
@@ -159,12 +166,15 @@ We can now simply apply the ASO test:
 import numpy as np
 from deepsig import aso
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores
 N = 5  # Number of random seeds
 my_model_scores = np.random.normal(loc=0.9, scale=0.8, size=N)
 baseline_scores = np.random.normal(loc=0, scale=1, size=N)
 
-min_eps = aso(my_model_scores, baseline_scores)  # min_eps = 0.0, so A is better
+min_eps = aso(my_model_scores, baseline_scores, seed=seed)  # min_eps = 0.225, so A is better
 ```
 
 Note that ASO **does not make any assumptions about the distributions of the scores**. 
@@ -185,6 +195,9 @@ which corresponds to the Bonferroni correction (Bonferroni et al., 1936):
 import numpy as np
 from deepsig import aso 
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores for three datasets
 M = 3  # Number of datasets
 N = 5  # Number of random seeds
@@ -192,8 +205,8 @@ my_model_scores_per_dataset = [np.random.normal(loc=0.3, scale=0.8, size=N) for
 baseline_scores_per_dataset  = [np.random.normal(loc=0, scale=1, size=N) for _ in range(M)]
 
 # epsilon_min values with Bonferroni correction 
-eps_min = [aso(a, b, confidence_level=0.05 / M) for a, b in zip(my_model_scores_per_dataset, baseline_scores_per_dataset)]
-# eps_min = [0.1565800030782686, 1, 0.0]
+eps_min = [aso(a, b, confidence_level=0.95, num_comparisons=M, seed=seed) for a, b in zip(my_model_scores_per_dataset, baseline_scores_per_dataset)]
+# eps_min = [0.006370113450148568, 0.6534772728574852, 0.0]
 ```
 
 ### Scenario 3 - Comparing sample-level scores
@@ -212,6 +225,9 @@ from itertools import product
 import numpy as np
 from deepsig import aso 
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores for three datasets
 M = 40   # Number of data points
 N = 3  # Number of random seeds
@@ -220,7 +236,9 @@ baseline_scored_samples_per_run = [np.random.normal(loc=0, scale=1, size=M) for
 pairs = list(product(my_model_scored_samples_per_run, baseline_scored_samples_per_run))
 
 # epsilon_min values with Bonferroni correction 
-eps_min = [aso(a, b, confidence_level=0.05 / len(pairs)) for a, b in pairs]
+eps_min = [aso(a, b, confidence_level=0.95, num_comparisons=len(pairs), seed=seed) for a, b in pairs]
+# eps_min = [0.3831678636198528, 0.07194780234194881, 0.9152792807128325, 0.5273463008857844, 0.14946944524461184, 1.0, 
+# 0.6099543280369378, 0.22387448804041898, 1.0]
 ```
 
 ### Scenario 4 - Comparing more than two models 
@@ -249,6 +267,9 @@ Let's look at an example:
 ```python 
 import numpy as np 
 from deepsig import multi_aso 
+
+seed = 1234
+np.random.seed(seed)
  
 N = 5  # Number of random seeds
 M = 3  # Number of different models / algorithms
@@ -257,20 +278,19 @@ M = 3  # Number of different models / algorithms
 # Here, we will sample from N(0.1, 0.8), N(0.15, 0.8), N(0.2, 0.8)
 my_models_scores = np.array([np.random.normal(loc=loc, scale=0.8, size=N) for loc in np.arange(0.1, 0.1 + 0.05 * M, step=0.05)])
 
-eps_min = multi_aso(my_models_scores, confidence_level=0.05)
+eps_min = multi_aso(my_models_scores, confidence_level=0.95, seed=seed)
     
 # eps_min =
-# array([[1., 1., 1.],
-#        [0., 1., 1.],
-#        [0., 0., 1.]])
+# array([[1.       , 0.92621655, 1.        ],
+#       [1.        , 1.        , 1.        ],
+#       [0.82081635, 0.73048716, 1.        ]])
 ```
 
 In the example, `eps_min` is now a matrix, containing the <img src="svgs/70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> score between all pairs of models (for 
 the same model, it set to 1 by default). The matrix is always to be read as ASO(row, column).
 
 The function applies the bonferroni correction for multiple comparisons by 
-default, but this can be turned off by using `use_bonferroni=False`. In order to save compute, the above symmetry
-property is used as well, but this can also be disabled by `use_symmetry=False`.
+default, but this can be turned off by using `use_bonferroni=False`.
 
 Lastly, when the `scores` argument is a dictionary and the function is called with `return_df=True`, the resulting matrix is 
 given as a `pandas.DataFrame` for increased readability:
@@ -278,6 +298,9 @@ given as a `pandas.DataFrame` for increased readability:
 ```python 
 import numpy as np 
 from deepsig import multi_aso 
+
+seed = 1234
+np.random.seed(seed)
  
 N = 5  # Number of random seeds
 M = 3  # Number of different models / algorithms
@@ -294,14 +317,14 @@ my_models_scores = {
 #   ...
 # }
 
-eps_min = multi_aso(my_models_scores, confidence_level=0.05, return_df=True)
+eps_min = multi_aso(my_models_scores, confidence_level=0.95, return_df=True, seed=seed)
     
 # This is now a DataFrame!
 # eps_min =
-#           model 1   model 2  model 3
-# model 1       1.0       1.0      1.0
-# model 2       0.0       1.0      1.0
-# model 3       1.0       0.0      1.0
+#          model 1   model 2  model 3
+# model 1  1.000000  0.926217      1.0
+# model 2  1.000000  1.000000      1.0
+# model 3  0.820816  0.730487      1.0
 
 ```
 
@@ -315,7 +338,7 @@ score. Below lists some example snippets reporting the results of scenarios 1 an
 
     We compared all pairs of models based on five random seeds each using ASO with a confidence level of 
     $\alpha = 0.05$ (before adjusting for all pair-wise comparisons using the Bonferroni correction). Almost stochastic 
-    dominance ($\epsilon_\text{min} < 0.5)$ is indicated in table X.
+    dominance ($\epsilon_\text{min} < \tau$ with $\tau = 0.2$) is indicated in table X.
 
 ### :control_knobs: Sample size
 
@@ -384,11 +407,11 @@ from deepsig import aso
 import numpy as np
 from timeit import timeit
 
-a = np.random.normal(size=5)
-b = np.random.normal(size=5)
+a = np.random.normal(size=1000)
+b = np.random.normal(size=1000)
 
-print(timeit(lambda: aso(a, b, num_jobs=1, show_progress=False), number=5))  # 146.6909574989986
-print(timeit(lambda: aso(a, b, num_jobs=4, show_progress=False), number=5))  # 50.416724971000804
+print(timeit(lambda: aso(a, b, num_jobs=1, show_progress=False), number=5))  # 393.6318126
+print(timeit(lambda: aso(a, b, num_jobs=4, show_progress=False), number=5))  # 139.73514621799995n
 ```
 
 #### :electric_plug: Compatibility with PyTorch, Tensorflow, Jax & Numpy
@@ -438,11 +461,15 @@ as many scores as possible should be collected, especially if the variance betwe
   Because this is usually infeasible in practice, Bouthilier et al. (2020) recommend to **vary all other sources of variation**
   between runs to obtain the most trustworthy estimate of the "true" performance, such as data shuffling, weight initialization etc.
 
-* `num_samples` and `num_bootstrap_iterations` can be reduced to increase the speed of `aso()`. However, this is not 
+* `num_bootstrap_iterations` can be reduced to increase the speed of `aso()`. However, this is not 
 recommended as the result of the test will also become less accurate. Technically, <img src="svgs/70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> is a upper bound
   that becomes tighter with the number of samples and bootstrap iterations (del Barrio et al., 2017). Thus, increasing 
   the number of jobs with `num_jobs` instead is always preferred.
   
+* While we could declare a model stochastically dominant with <img src="svgs/dabed7f05cf133d9eb92631d564a96a8.svg?invert_in_darkmode" align=middle width=72.19750559999999pt height=21.18721440000001pt/>, we found this to have a comparatively high
+Type I error (false positives). Tests [in our paper](https://arxiv.org/pdf/2204.06815.pdf) have shown that a more useful threshold that trades of Type I and 
+  Type II error between different scenarios might be <img src="svgs/9ac49cb370a5b09fca29068ea18eab63.svg?invert_in_darkmode" align=middle width=51.969107849999986pt height=21.18721440000001pt/>.
+  
 * Bootstrap and permutation-randomization are all non-parametric tests, i.e. they don't make any assumptions about 
 the distribution of our test metric. Nevertheless, they differ in their *statistical power*, which is defined as the probability
   that the null hypothesis is being rejected given that there is a difference between A and B. In other words, the more powerful 
@@ -454,7 +481,17 @@ the distribution of our test metric. Nevertheless, they differ in their *statist
 
 ### :mortar_board: Cite
 
-If you use the ASO test via `aso()`, please cite the original work:
+Using this package in general, please cite the following:
+
+    @article{ulmer2022deep,
+      title={deep-significance-Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks},
+      author={Ulmer, Dennis and Hardmeier, Christian and Frellsen, Jes},
+      journal={arXiv preprint arXiv:2204.06815},
+      year={2022}
+    }
+
+
+If you use the ASO test via `aso()` or `multi_aso, please cite the original works:
 
     @inproceedings{dror2019deep,
       author    = {Rotem Dror and
@@ -475,21 +512,20 @@ If you use the ASO test via `aso()`, please cite the original work:
       timestamp = {Tue, 28 Jan 2020 10:27:52 +0100},
     }
 
-Using this package in general, please cite the following:
-
-    @software{dennis_ulmer_2021_4638709,
-      author       = {Dennis Ulmer},
-      title        = {{deep-significance: Easy and Better Significance 
-                       Testing for Deep Neural Networks}},
-      month        = mar,
-      year         = 2021,
-      note         = {https://github.com/Kaleidophon/deep-significance},
-      publisher    = {Zenodo},
-      version      = {v1.0.0a},
-      doi          = {10.5281/zenodo.4638709},
-      url          = {https://doi.org/10.5281/zenodo.4638709}
+    @incollection{del2018optimal,
+      title={An optimal transportation approach for assessing almost stochastic order},
+      author={Del Barrio, Eustasio and Cuesta-Albertos, Juan A and Matr{\'a}n, Carlos},
+      booktitle={The Mathematics of the Uncertain},
+      pages={33--44},
+      year={2018},
+      publisher={Springer}
     }
 
+For instance, you can write
+
+    In order to compare models, we use the Almost Stochastic Order test \citep{del2018optimal, dror2019deep} as 
+    implemented by \citet{ulmer2022deep}.
+
 ### :medal_sports: Acknowledgements
 
 This package was created out of discussions of the [NLPnorth group](https://nlpnorth.github.io/) at the IT University 
@@ -526,6 +562,9 @@ Dror, Rotem, Shlomov, Segev, and Reichart, Roi. "Deep dominance-how to properly
 
 Efron, Bradley, and Robert J. Tibshirani. "An introduction to the bootstrap." CRC press, 1994.
 
+Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, Donald B Rubin, John
+Carlin, Hal Stern, Donald Rubin, and David Dunson. Bayesian data analysis third edition, 2021.
+
 Henderson, Peter, et al. "Deep reinforcement learning that matters." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. No. 1. 2018.
 
 Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. "Visualizing the Loss Landscape of Neural Nets." NeurIPS 2018: 6391-6401
@@ -534,4 +573,7 @@ Narang, Sharan, et al. "Do Transformer Modifications Transfer Across Implementat
 
 Noreen, Eric W. "Computer intensive methods for hypothesis testing: An introduction." Wiley, New York (1989).
 
+Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. Moving to a world beyond “p< 0.05”,
+2019
+
 Yuan, Ke‐Hai, and Kentaro Hayashi. "Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models." British Journal of Mathematical and Statistical Psychology 56.1 (2003): 93-110.
\ No newline at end of file
diff --git a/README_RAW.md b/README_RAW.md
index bd8908d..abdbe96 100644
--- a/README_RAW.md
+++ b/README_RAW.md
@@ -47,14 +47,15 @@ Reinforcement Learning (Henderson et al., 2018) and Computer Vision (Borji, 2017
 
 To help mitigate this problem, this package supplies fully-tested re-implementations of useful functions for significance
 testing:
-* Statistical Significance tests such as Almost Stochastic Order (Dror et al., 2019), bootstrap (Efron & Tibshirani, 1994) and 
-  permutation-randomization (Noreen, 1989).
+* Statistical Significance tests such as Almost Stochastic Order (del Barrio et al, 2017; Dror et al., 2019), 
+  bootstrap (Efron & Tibshirani, 1994) and permutation-randomization (Noreen, 1989).
 * Bonferroni correction methods for multiplicity in datasets (Bonferroni, 1936). 
 * Bootstrap power analysis (Yuan & Hayashi, 2003) and other functions to determine the right sample size.
 
 All functions are fully tested and also compatible with common deep learning data structures, such as PyTorch / 
 Tensorflow tensors as well as NumPy and Jax arrays.  For examples about the usage, consult the documentation 
-[here](https://deep-significance.readthedocs.io/en/latest/) or the scenarios in the section [Examples](#examples).
+[here](https://deep-significance.readthedocs.io/en/latest/) , the scenarios in the section [Examples](#examples) or 
+the [demo Jupyter notebook](https://github.com/Kaleidophon/deep-significance/tree/main/paper/deep-significance%20demo.ipynb).
 
 ## :inbox_tray: Installation
 
@@ -74,52 +75,55 @@ Another option is to clone the repository and install the package locally:
 
 ---
 **tl;dr**: Use `aso()` to compare scores for two models. If the returned `eps_min < 0.5`, A is better than B. The lower
-`eps_min`, the more confident the result. 
+`eps_min`, the more confident the result (we recommend to check `eps_min < 0.2` and record `eps_min` alongside 
+experimental results). 
 
 :warning: Testing models with only one set of hyperparameters and only one test set will be able to guarantee superiority
 in all settings. See [General Recommendations & other notes](#general-recommendations).
 
 ---
 
-In the following, I will lay out three scenarios that describe common use cases for ML practitioners and how to apply 
+In the following, we will lay out three scenarios that describe common use cases for ML practitioners and how to apply 
 the methods implemented in this package accordingly. For an introduction into statistical hypothesis testing, please
 refer to resources such as [this blog post](https://machinelearningmastery.com/statistical-hypothesis-tests/) for a general
 overview or [Dror et al. (2018)](https://www.aclweb.org/anthology/P18-1128.pdf) for a NLP-specific point of view. 
 
-In general, in statistical significance testing, we usually compare two algorithms $A$ and $B$ on a dataset $X$ using 
-some evaluation metric $\mathcal{M}$ (we assume a higher = better). The difference between the two algorithms on the 
-data is then defined as 
+We assume that we have two sets of scores we would like to compare, $\mathbb{S}_\mathbb{A}$ and $\mathbb{S}_\mathbb{B}$,
+for instance obtained by running two models $\mathbb{A}$ and $\mathbb{B}$ multiple times with a different random seed. 
+We can then define a one-sided test statistic  $\delta(\mathbb{S}_\mathbb{A}, \mathbb{S}_\mathbb{B})$ based on the gathered observations. 
+An example of such test statistics is for instance the difference in observation means. We then formulate the following null-hypothesis:
 
 $$
-\delta(X) = \mathcal{M}(A, X) - \mathcal{M}(B, X)
+H_0: \delta(\mathbb{S}_\mathbb{A}, \mathbb{S}_\mathbb{B}) \le 0
 $$
 
-where $\delta(X)$ is our test statistic. We then test the following **null hypothesis**:
+That means that we actually assume the opposite of our desired case, namely that $\mathbb{A}$ is not better than $\mathbb{B}$, 
+but equally as good or worse, as indicated by the value of the test statistic. 
+Usually, the goal becomes to reject this null hypothesis using the SST. 
+*p*-value testing is a frequentist method in the realm of SST. 
+It introduces the notion of data that *could have been observed* if we were to repeat our experiment again using 
+the same conditions, which we will write with superscript $\text{rep}$ in order to distinguish them from our actually 
+observed scores (Gelman et al., 2021). 
+We then define the *p*-value as the probability that, under the null hypothesis, the test statistic using replicated 
+observation is larger than or equal to the *observed* test statistic:
 
 $$
-H_0: \delta(X) \le 0
+p(\delta(\mathbb{S}_\mathbb{A}^\text{rep}, \mathbb{S}_\mathbb{B}^\text{rep}) \ge \delta(\mathbb{S}_\mathbb{A}, \mathbb{S}_\mathbb{B})|H_0)
 $$
 
-Thus, we assume our algorithm A to be equally as good or worse than algorithm B and reject the null hypothesis if A 
-is better than B (what we actually would like to see). Most statistical significance tests operate using 
-*p-values*, which define the probability that under the null-hypothesis, the $\delta(X)$ expected by the test is larger than or
-equal to the observed difference $\delta_{\text{obs}}$ (that is, for a one-sided test, i.e. we assume A to be better than B):
-
-$$
-P(\delta(X) \ge \delta_\text{obs}| H_0)
-$$
-
-We can interpret this equation as follows: Assuming that A is *not* better than B, the test assumes a corresponding distribution
-of differences that $\delta(X)$ is drawn from. How does our actually observed difference $\delta_\text{obs}$ fit in there?
-This is what the p-value is expressing: If this probability is high, $\delta_\text{obs}$ is in line with what we expected under 
-the null hypothesis, so we conclude A not to better than B. If the 
-probability is low, that means that $\delta_\text{obs}$ is quite unlikely under the null hypothesis and that the reverse 
-case is more likely - i.e. that it is 
-likely *larger* than $\delta(X)$ - and we conclude that A is indeed better than B. Note that **the p-value does not 
-express whether the null hypothesis is true**.
-
-To decide when we trust A to be better than B, we set a threshold that will determine when the p-value is small enough 
-for us to reject the null hypothesis, this is called the significance level $\alpha$ and it is often set to be 0.05.
+We can interpret this expression as follows: Assuming that $\mathbb{A}$ is not better than $\mathbb{B}$, the test 
+assumes a corresponding distribution of statistics that $\delta$ is drawn from. So how does the observed test statistic 
+$\delta(\mathbb{S}_\mathbb{A}, \mathbb{S}_\mathbb{B})$ fit in here? This is what the $p$-value expresses: When the 
+probability is high, $\delta(\mathbb{S}_\mathbb{A}, \mathbb{S}_\mathbb{B})$ is in line with what we expected under the 
+null hypothesis, so we can *not* reject the null hypothesis, or in other words, we \emph{cannot} conclude 
+$\mathbb{A}$ to be better than $\mathbb{B}$. If the probability is low, that means that the observed 
+$\delta(\mathbb{S}, \mathbb{S}_\mathbb{B})$ is quite unlikely under the null hypothesis and that the reverse case is 
+more likely - i.e. that it is likely larger than - and we conclude that $\mathbb{A}$ is indeed better than 
+$\mathbb{B}$. Note that **the $p$-value does not express whether the null hypothesis is true**. To make our decision 
+about whether or not to reject the null hypothesis, we typically determine a threshold - the significance level 
+$\alpha$, often set to 0.05 - that the *p*-value has to fall below. However, it has been argued that a better practice 
+involves reporting the *p*-value alongside the results without a pidgeonholing of results into significant and non-significant
+(Wasserstein et al., 2019).
 
 
 ### Intermezzo: Almost Stochastic Order - a better significance test for Deep Neural Networks
@@ -127,8 +131,8 @@ for us to reject the null hypothesis, this is called the significance level $\al
 Deep neural networks are highly non-linear models, having their performance highly dependent on hyperparameters, random 
 seeds and other (stochastic) factors. Therefore, comparing the means of two models across several runs might not be 
 enough to decide if a model A is better than B. In fact, **even aggregating more statistics like standard deviation, minimum
-or maximum might not be enough** to make a decision. For this reason, Dror et al. (2019) introduced *Almost Stochastic 
-Order* (ASO), a test to compare two score distributions. 
+or maximum might not be enough** to make a decision. For this reason, del Barrio et al. (2017) and Dror et al. (2019) 
+introduced *Almost Stochastic Order* (ASO), a test to compare two score distributions. 
 
 It builds on the concept of *stochastic order*: We can compare two distributions and declare one as *stochastically dominant*
 by comparing their cumulative distribution functions: 
@@ -138,21 +142,22 @@ by comparing their cumulative distribution functions:
 Here, the CDF of A is given in red and in green for B. If the CDF of A is lower than B for every $x$, we know the 
 algorithm A to score higher. However, in practice these cases are rarely so clear-cut (imagine e.g. two normal 
 distributions with the same mean but different variances).
-For this reason, Dror et al. (2019) consider the notion of *almost stochastic dominance* by quantifying the extent to 
-which stochastic order is being violated (red area):
+For this reason, del Barrio et al. (2017) and Dror et al. (2019) consider the notion of *almost stochastic dominance* 
+by quantifying the extent to which stochastic order is being violated (red area):
 
 ![](img/aso.png)
 
-ASO returns a value $\epsilon_\text{min}$, which expresses the amount of violation of stochastic order. If 
-$\epsilon_\text{min} < 0.5$, A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as 
+ASO returns a value $\epsilon_\text{min}$, which expresses (an upper bound to) the amount of violation of stochastic order. If 
+$\epsilon_\text{min} < \tau$ (where \tau is 0.5 or less), A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as 
 superior. We can also interpret $\epsilon_\text{min}$ as a *confidence score*. The lower it is, the more sure we can be 
 that A is better than B. Note: **ASO does not compute p-values.** Instead, the null hypothesis formulated as 
 
 $$
-H_0: \epsilon_\text{min} \ge 0.5
+H_0: \epsilon_\text{min} \ge \tau
 $$
 
-If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5.
+If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5 
+(see the discussion in [this section](#general-recommendations)).
 Furthermore, the significance level $\alpha$ is determined as an input argument when running ASO and actively influence 
 the resulting $\epsilon_\text{min}$.
 
@@ -167,12 +172,15 @@ We can now simply apply the ASO test:
 import numpy as np
 from deepsig import aso
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores
 N = 5  # Number of random seeds
 my_model_scores = np.random.normal(loc=0.9, scale=0.8, size=N)
 baseline_scores = np.random.normal(loc=0, scale=1, size=N)
 
-min_eps = aso(my_model_scores, baseline_scores)  # min_eps = 0.0, so A is better
+min_eps = aso(my_model_scores, baseline_scores, seed=seed)  # min_eps = 0.225, so A is better
 ```
 
 Note that ASO **does not make any assumptions about the distributions of the scores**. 
@@ -193,6 +201,9 @@ which corresponds to the Bonferroni correction (Bonferroni et al., 1936):
 import numpy as np
 from deepsig import aso 
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores for three datasets
 M = 3  # Number of datasets
 N = 5  # Number of random seeds
@@ -200,8 +211,8 @@ my_model_scores_per_dataset = [np.random.normal(loc=0.3, scale=0.8, size=N) for
 baseline_scores_per_dataset  = [np.random.normal(loc=0, scale=1, size=N) for _ in range(M)]
 
 # epsilon_min values with Bonferroni correction 
-eps_min = [aso(a, b, confidence_level=0.05 / M) for a, b in zip(my_model_scores_per_dataset, baseline_scores_per_dataset)]
-# eps_min = [0.1565800030782686, 1, 0.0]
+eps_min = [aso(a, b, confidence_level=0.95, num_comparisons=M, seed=seed) for a, b in zip(my_model_scores_per_dataset, baseline_scores_per_dataset)]
+# eps_min = [0.006370113450148568, 0.6534772728574852, 0.0]
 ```
 
 ### Scenario 3 - Comparing sample-level scores
@@ -220,6 +231,9 @@ from itertools import product
 import numpy as np
 from deepsig import aso 
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores for three datasets
 M = 40   # Number of data points
 N = 3  # Number of random seeds
@@ -228,7 +242,9 @@ baseline_scored_samples_per_run = [np.random.normal(loc=0, scale=1, size=M) for
 pairs = list(product(my_model_scored_samples_per_run, baseline_scored_samples_per_run))
 
 # epsilon_min values with Bonferroni correction 
-eps_min = [aso(a, b, confidence_level=0.05 / len(pairs)) for a, b in pairs]
+eps_min = [aso(a, b, confidence_level=0.95, num_comparisons=len(pairs), seed=seed) for a, b in pairs]
+# eps_min = [0.3831678636198528, 0.07194780234194881, 0.9152792807128325, 0.5273463008857844, 0.14946944524461184, 1.0, 
+# 0.6099543280369378, 0.22387448804041898, 1.0]
 ```
 
 ### Scenario 4 - Comparing more than two models 
@@ -259,6 +275,9 @@ Let's look at an example:
 ```python 
 import numpy as np 
 from deepsig import multi_aso 
+
+seed = 1234
+np.random.seed(seed)
  
 N = 5  # Number of random seeds
 M = 3  # Number of different models / algorithms
@@ -267,20 +286,19 @@ M = 3  # Number of different models / algorithms
 # Here, we will sample from N(0.1, 0.8), N(0.15, 0.8), N(0.2, 0.8)
 my_models_scores = np.array([np.random.normal(loc=loc, scale=0.8, size=N) for loc in np.arange(0.1, 0.1 + 0.05 * M, step=0.05)])
 
-eps_min = multi_aso(my_models_scores, confidence_level=0.05)
+eps_min = multi_aso(my_models_scores, confidence_level=0.95, seed=seed)
     
 # eps_min =
-# array([[1., 1., 1.],
-#        [0., 1., 1.],
-#        [0., 0., 1.]])
+# array([[1.       , 0.92621655, 1.        ],
+#       [1.        , 1.        , 1.        ],
+#       [0.82081635, 0.73048716, 1.        ]])
 ```
 
 In the example, `eps_min` is now a matrix, containing the $\epsilon_\text{min}$ score between all pairs of models (for 
 the same model, it set to 1 by default). The matrix is always to be read as ASO(row, column).
 
 The function applies the bonferroni correction for multiple comparisons by 
-default, but this can be turned off by using `use_bonferroni=False`. In order to save compute, the above symmetry
-property is used as well, but this can also be disabled by `use_symmetry=False`.
+default, but this can be turned off by using `use_bonferroni=False`.
 
 Lastly, when the `scores` argument is a dictionary and the function is called with `return_df=True`, the resulting matrix is 
 given as a `pandas.DataFrame` for increased readability:
@@ -288,6 +306,9 @@ given as a `pandas.DataFrame` for increased readability:
 ```python 
 import numpy as np 
 from deepsig import multi_aso 
+
+seed = 1234
+np.random.seed(seed)
  
 N = 5  # Number of random seeds
 M = 3  # Number of different models / algorithms
@@ -304,14 +325,14 @@ my_models_scores = {
 #   ...
 # }
 
-eps_min = multi_aso(my_models_scores, confidence_level=0.05, return_df=True)
+eps_min = multi_aso(my_models_scores, confidence_level=0.95, return_df=True, seed=seed)
     
 # This is now a DataFrame!
 # eps_min =
-#           model 1   model 2  model 3
-# model 1       1.0       1.0      1.0
-# model 2       0.0       1.0      1.0
-# model 3       1.0       0.0      1.0
+#          model 1   model 2  model 3
+# model 1  1.000000  0.926217      1.0
+# model 2  1.000000  1.000000      1.0
+# model 3  0.820816  0.730487      1.0
 
 ```
 
@@ -325,7 +346,7 @@ score. Below lists some example snippets reporting the results of scenarios 1 an
 
     We compared all pairs of models based on five random seeds each using ASO with a confidence level of 
     $\alpha = 0.05$ (before adjusting for all pair-wise comparisons using the Bonferroni correction). Almost stochastic 
-    dominance ($\epsilon_\text{min} < 0.5)$ is indicated in table X.
+    dominance ($\epsilon_\text{min} < \tau$ with $\tau = 0.2$) is indicated in table X.
 
 ### :control_knobs: Sample size
 
@@ -394,11 +415,11 @@ from deepsig import aso
 import numpy as np
 from timeit import timeit
 
-a = np.random.normal(size=5)
-b = np.random.normal(size=5)
+a = np.random.normal(size=1000)
+b = np.random.normal(size=1000)
 
-print(timeit(lambda: aso(a, b, num_jobs=1, show_progress=False), number=5))  # 146.6909574989986
-print(timeit(lambda: aso(a, b, num_jobs=4, show_progress=False), number=5))  # 50.416724971000804
+print(timeit(lambda: aso(a, b, num_jobs=1, show_progress=False), number=5))  # 393.6318126
+print(timeit(lambda: aso(a, b, num_jobs=4, show_progress=False), number=5))  # 139.73514621799995n
 ```
 
 #### :electric_plug: Compatibility with PyTorch, Tensorflow, Jax & Numpy
@@ -448,11 +469,15 @@ as many scores as possible should be collected, especially if the variance betwe
   Because this is usually infeasible in practice, Bouthilier et al. (2020) recommend to **vary all other sources of variation**
   between runs to obtain the most trustworthy estimate of the "true" performance, such as data shuffling, weight initialization etc.
 
-* `num_samples` and `num_bootstrap_iterations` can be reduced to increase the speed of `aso()`. However, this is not 
+* `num_bootstrap_iterations` can be reduced to increase the speed of `aso()`. However, this is not 
 recommended as the result of the test will also become less accurate. Technically, $\epsilon_\text{min}$ is a upper bound
   that becomes tighter with the number of samples and bootstrap iterations (del Barrio et al., 2017). Thus, increasing 
   the number of jobs with `num_jobs` instead is always preferred.
   
+* While we could declare a model stochastically dominant with $\epsilon_\text{min} < 0.5$, we found this to have a comparatively high
+Type I error (false positives). Tests [in our paper](https://arxiv.org/pdf/2204.06815.pdf) have shown that a more useful threshold that trades of Type I and 
+  Type II error between different scenarios might be $\tau = 0.2$.
+  
 * Bootstrap and permutation-randomization are all non-parametric tests, i.e. they don't make any assumptions about 
 the distribution of our test metric. Nevertheless, they differ in their *statistical power*, which is defined as the probability
   that the null hypothesis is being rejected given that there is a difference between A and B. In other words, the more powerful 
@@ -464,7 +489,17 @@ the distribution of our test metric. Nevertheless, they differ in their *statist
 
 ### :mortar_board: Cite
 
-If you use the ASO test via `aso()`, please cite the original work:
+Using this package in general, please cite the following:
+
+    @article{ulmer2022deep,
+      title={deep-significance-Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks},
+      author={Ulmer, Dennis and Hardmeier, Christian and Frellsen, Jes},
+      journal={arXiv preprint arXiv:2204.06815},
+      year={2022}
+    }
+
+
+If you use the ASO test via `aso()` or `multi_aso, please cite the original works:
 
     @inproceedings{dror2019deep,
       author    = {Rotem Dror and
@@ -485,21 +520,20 @@ If you use the ASO test via `aso()`, please cite the original work:
       timestamp = {Tue, 28 Jan 2020 10:27:52 +0100},
     }
 
-Using this package in general, please cite the following:
-
-    @software{dennis_ulmer_2021_4638709,
-      author       = {Dennis Ulmer},
-      title        = {{deep-significance: Easy and Better Significance 
-                       Testing for Deep Neural Networks}},
-      month        = mar,
-      year         = 2021,
-      note         = {https://github.com/Kaleidophon/deep-significance},
-      publisher    = {Zenodo},
-      version      = {v1.0.0a},
-      doi          = {10.5281/zenodo.4638709},
-      url          = {https://doi.org/10.5281/zenodo.4638709}
+    @incollection{del2018optimal,
+      title={An optimal transportation approach for assessing almost stochastic order},
+      author={Del Barrio, Eustasio and Cuesta-Albertos, Juan A and Matr{\'a}n, Carlos},
+      booktitle={The Mathematics of the Uncertain},
+      pages={33--44},
+      year={2018},
+      publisher={Springer}
     }
 
+For instance, you can write
+
+    In order to compare models, we use the Almost Stochastic Order test \citep{del2018optimal, dror2019deep} as 
+    implemented by \citet{ulmer2022deep}.
+
 ### :medal_sports: Acknowledgements
 
 This package was created out of discussions of the [NLPnorth group](https://nlpnorth.github.io/) at the IT University 
@@ -536,6 +570,9 @@ Dror, Rotem, Shlomov, Segev, and Reichart, Roi. "Deep dominance-how to properly
 
 Efron, Bradley, and Robert J. Tibshirani. "An introduction to the bootstrap." CRC press, 1994.
 
+Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, Donald B Rubin, John
+Carlin, Hal Stern, Donald Rubin, and David Dunson. Bayesian data analysis third edition, 2021.
+
 Henderson, Peter, et al. "Deep reinforcement learning that matters." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. No. 1. 2018.
 
 Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. "Visualizing the Loss Landscape of Neural Nets." NeurIPS 2018: 6391-6401
@@ -544,4 +581,7 @@ Narang, Sharan, et al. "Do Transformer Modifications Transfer Across Implementat
 
 Noreen, Eric W. "Computer intensive methods for hypothesis testing: An introduction." Wiley, New York (1989).
 
+Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. Moving to a world beyond “p< 0.05”,
+2019
+
 Yuan, Ke‐Hai, and Kentaro Hayashi. "Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models." British Journal of Mathematical and Statistical Psychology 56.1 (2003): 93-110.
\ No newline at end of file
diff --git a/deepsig/__init__.py b/deepsig/__init__.py
index d64b5ce..be84c06 100644
--- a/deepsig/__init__.py
+++ b/deepsig/__init__.py
@@ -5,5 +5,5 @@
 from deepsig.permutation import permutation_test
 from deepsig.sample_size import aso_uncertainty_reduction, bootstrap_power_analysis
 
-__version__ = "1.2.3"
+__version__ = "1.2.5"
 __author__ = "Dennis Ulmer"
diff --git a/deepsig/aso.py b/deepsig/aso.py
index a92708d..c67a51e 100644
--- a/deepsig/aso.py
+++ b/deepsig/aso.py
@@ -8,7 +8,7 @@
 from warnings import warn
 
 # EXT
-from joblib import Parallel, delayed
+from joblib import Parallel, delayed, wrap_non_picklable_objects
 from joblib.externals.loky import set_loky_pickler
 import numpy as np
 import pandas as pd
@@ -20,9 +20,8 @@
     ArrayLike,
     ScoreCollection,
     score_pair_conversion,
-    ALLOWED_TYPES,
-    CONVERSIONS,
 )
+from deepsig.utils import _progress_iter, _get_num_models
 
 # MISC
 set_loky_pickler("dill")  # Avoid weird joblib error with multi_aso
@@ -32,7 +31,8 @@
 def aso(
     scores_a: ArrayLike,
     scores_b: ArrayLike,
-    confidence_level: float = 0.05,
+    confidence_level: float = 0.95,
+    num_comparisons: int = 1,
     num_samples: int = 1000,
     num_bootstrap_iterations: int = 1000,
     dt: float = 0.005,
@@ -60,7 +60,9 @@ def aso(
     scores_b: List[float]
         Scores of algorithm B.
     confidence_level: float
-        Desired confidence level of test. Set to 0.05 by default.
+        Desired confidence level of test. Set to 0.95 by default.
+    num_comparisons: int
+        Number of comparisons that the test is being used for. Is used to perform a Bonferroni correction.
     num_samples: int
         Number of samples from the score distributions during every bootstrap iteration when estimating sigma.
     num_bootstrap_iterations: int
@@ -84,15 +86,15 @@ def aso(
     assert (
         len(scores_a) > 0 and len(scores_b) > 0
     ), "Both lists of scores must be non-empty."
-    assert num_samples > 0, "num_samples must be positive, {} found.".format(
-        num_samples
-    )
     assert (
         num_bootstrap_iterations > 0
     ), "num_samples must be positive, {} found.".format(num_bootstrap_iterations)
     assert num_jobs > 0, "Number of jobs has to be at least 1, {} found.".format(
         num_jobs
     )
+    assert (
+        num_comparisons > 0
+    ), "Number of comparisons has to be at least 1, {} found.".format(num_comparisons)
 
     # TODO: Remove in future version
     if num_samples != 1000:
@@ -101,83 +103,47 @@ def aso(
             DeprecationWarning,
         )
 
-    violation_ratio = compute_violation_ratio(scores_a, scores_b, dt)
+    # TODO: Remove in future version
+    if confidence_level < 0.95:
+        warn(
+            "'confidence_level' was refactored in version 1.2.4 to be more intuitive and usually should be in the .95 -"
+            f".99 range, but {confidence_level} was found. If you tried to adjust the confidence level for multiple "
+            f"comparisons, try the new num_comparisons argument instead.",
+            UserWarning,
+        )
+
+    if num_comparisons > 1:
+        confidence_level += (1 - confidence_level) / num_comparisons
+
+    violation_ratio = compute_violation_ratio(
+        scores_a=scores_a, scores_b=scores_b, dt=dt
+    )
     # Based on the actual number of samples
     quantile_func_a = get_quantile_function(scores_a)
     quantile_func_b = get_quantile_function(scores_b)
 
-    def _progress_iter(high: int, progress_bar: tqdm):
-        """
-        This function is used when a shared progress bar is passed from multi_aso() - every time the iterator yields an
-        element, the progress bar is updated by one. It essentially behaves like a simplified range() function.
-
-        Parameters
-        ----------
-        high: int
-            Number of elements in iterator.
-        progress_bar: tqdm
-            Shared progress bar.
-        """
-        current = 0
-
-        while current < high:
-            yield current
-            current += 1
-            progress_bar.update(1)
-
-    # Add progress bar if applicable
-    if show_progress and _progress_bar is None:
-        iters = tqdm(range(num_bootstrap_iterations), desc="Bootstrap iterations")
-
-    # Shared progress bar when called from multi_aso()
-    elif _progress_bar is not None:
-        iters = _progress_iter(num_bootstrap_iterations, _progress_bar)
-
-    else:
-        iters = range(num_bootstrap_iterations)
-
-    # Set seeds for different jobs if applicable
-    # "Sub-seeds" for jobs are just seed argument + job index
-    seeds = (
-        [None] * num_bootstrap_iterations
-        if seed is None
-        else [seed + offset for offset in range(1, num_bootstrap_iterations + 1)]
+    samples = get_bootstrapped_violation_ratios(
+        scores_a,
+        scores_b,
+        quantile_func_a,
+        quantile_func_b,
+        num_bootstrap_iterations,
+        dt,
+        num_jobs,
+        show_progress,
+        seed,
+        _progress_bar,
     )
-
-    def _bootstrap_iter(seed: Optional[int] = None):
-        """
-        One bootstrap iteration. Wrapped in a function so it can be handed to joblib.Parallel.
-        """
-        # When running multiple jobs, these modules have to be re-imported for some reason to avoid an error
-        # Use dir() to check whether module is available in local scope:
-        # https://stackoverflow.com/questions/30483246/how-to-check-if-a-module-has-been-imported
-        if "numpy" not in dir() or "deepsig" not in dir():
-            import numpy as np
-            from deepsig.aso import compute_violation_ratio
-
-        if seed is not None:
-            np.random.seed(seed)
-
-        sampled_scores_a = quantile_func_a(np.random.uniform(0, 1, len(scores_a)))
-        sampled_scores_b = quantile_func_b(np.random.uniform(0, 1, len(scores_b)))
-        sample = compute_violation_ratio(
-            sampled_scores_a,
-            sampled_scores_b,
-            dt,
-        )
-
-        return sample
-
-    # Initialize worker pool and start iterations
-    parallel = Parallel(n_jobs=num_jobs)
-    samples = parallel(delayed(_bootstrap_iter)(seed) for seed, _ in zip(seeds, iters))
+    samples = np.array(samples)
 
     const = np.sqrt(len(scores_a) * len(scores_b) / (len(scores_a) + len(scores_b)))
     sigma_hat = np.std(const * (samples - violation_ratio))
 
     # Compute eps_min and make sure it stays in [0, 1]
     min_epsilon = np.clip(
-        violation_ratio - (1 / const) * sigma_hat * normal.ppf(confidence_level), 0, 1
+        violation_ratio - (1 / const) * sigma_hat * normal.ppf(1 - confidence_level),
+        0,
+        1,
     )
 
     return min_epsilon
@@ -185,7 +151,7 @@ def _bootstrap_iter(seed: Optional[int] = None):
 
 def multi_aso(
     scores: ScoreCollection,
-    confidence_level: float = 0.05,
+    confidence_level: float = 0.95,
     use_bonferroni: bool = True,
     use_symmetry: bool = True,
     num_samples: int = 1000,
@@ -207,7 +173,7 @@ def multi_aso(
         Collection of model scores. Should be either dictionary of model name to model scores, nested Python list,
         2D numpy or Jax array, or 2D Tensorflow or PyTorch tensor.
     confidence_level: float
-        Desired confidence level of test. Set to 0.05 by default.
+        Desired confidence level of test. Set to 0.95 by default.
     use_bonferroni: bool
         Indicate whether Bonferroni correction should be applied to confidence level in order to adjust for the number
         of comparisons. Default is True.
@@ -243,12 +209,28 @@ def multi_aso(
             DeprecationWarning,
         )
 
+    # TODO: Remove in future version
+    if not use_symmetry:
+        warn(
+            "'use_symmetry' argument is being ignored in the current version and will be deprecated in version 1.3!",
+            DeprecationWarning,
+        )
+
+    # TODO: Remove in future version
+    if confidence_level < 0.95:
+        warn(
+            "'confidence_level' was refactored in version 1.2.4 to be more intuitive and usually should be in the .95 -"
+            f".99 range, but {confidence_level} was found.",
+            UserWarning,
+        )
+
     num_models = _get_num_models(scores)
     num_comparisons = num_models * (num_models - 1) / 2
     eps_min = np.eye(num_models)  # Initialize score matrix
 
     if use_bonferroni:
-        confidence_level /= num_comparisons
+        # Increase the confidence level based in oder to mitigate the multiple comparisons problem
+        confidence_level += (1 - confidence_level) / num_comparisons
 
     # Iterate over simple indices or dictionary keys depending on type of scores argument
     indices = list(range(num_models)) if type(scores) != dict else list(scores.keys())
@@ -266,38 +248,57 @@ def multi_aso(
     for i, key_i in enumerate(indices):
         for j, key_j in enumerate(indices[(i + 1) :], start=i + 1):
             scores_a, scores_b = scores[key_i], scores[key_j]
+            quantile_func_a = get_quantile_function(scores_a)
+            quantile_func_b = get_quantile_function(scores_b)
+            const = np.sqrt(
+                len(scores_a) * len(scores_b) / (len(scores_a) + len(scores_b))
+            )
 
-            eps_min[i, j] = aso(
+            violation_ratio_ab = compute_violation_ratio(
+                dt=dt,
+                quantile_func_a=quantile_func_a,
+                quantile_func_b=quantile_func_b,
+            )
+            violation_ratio_ba = (
+                1 - violation_ratio_ab
+            )  # Exploit symmetry of violation ratio here
+            samples_ab = get_bootstrapped_violation_ratios(
                 scores_a,
                 scores_b,
-                confidence_level=confidence_level,
-                num_samples=1000,  # TODO: Avoid double warning, remove in future version
-                num_bootstrap_iterations=num_bootstrap_iterations,
-                dt=dt,
-                num_jobs=num_jobs,
-                show_progress=False,
-                seed=seed,
-                _progress_bar=progress_bar,
+                quantile_func_a,
+                quantile_func_b,
+                num_bootstrap_iterations,
+                dt,
+                num_jobs,
+                show_progress,
+                seed,
+                progress_bar,
+            )
+            samples_ab = np.array(samples_ab)
+
+            # This quantity is the same for both, so we only have to compute it once, see
+            # (samples_ab - violation_ratio_ab)
+            # = (1 - samples_ba - 1 + violation_ratio_ba)
+            # = (samples_ba - violation_ratio_ba)
+            sigma_hat = np.std(const * (samples_ab - violation_ratio_ab))
+
+            # Compute eps_min and make sure it stays in [0, 1]
+            min_epsilon_ab = np.clip(
+                violation_ratio_ab
+                - (1 / const) * sigma_hat * normal.ppf(1 - confidence_level),
+                0,
+                1,
+            )
+            min_epsilon_ba = np.clip(
+                violation_ratio_ba
+                - (1 / const) * sigma_hat * normal.ppf(1 - confidence_level),
+                0,
+                1,
             )
 
-            # Use ASO(A, B, alpha) = 1 - ASO(B, A, alpha)
-            if use_symmetry:
-                eps_min[j, i] = 1 - eps_min[i, j]
-
-            # Compute ASO(B, A, alpha) separately
-            else:
-                eps_min[i, j] = aso(
-                    scores_b,
-                    scores_a,
-                    confidence_level=confidence_level,
-                    num_samples=1000,  # TODO: Avoid double warning, remove in future version
-                    num_bootstrap_iterations=num_bootstrap_iterations,
-                    dt=dt,
-                    num_jobs=num_jobs,
-                    show_progress=False,
-                    seed=seed,
-                    _progress_bar=progress_bar,
-                )
+            # Set values
+            eps_min[i, j] = min_epsilon_ab
+            eps_min[j, i] = min_epsilon_ba
 
     if type(scores) == dict and return_df:
         eps_min = pd.DataFrame(data=eps_min, index=list(scores.keys()))
@@ -306,37 +307,61 @@ def multi_aso(
     return eps_min
 
 
-def compute_violation_ratio(scores_a: np.array, scores_b: np.array, dt: float) -> float:
+def compute_violation_ratio(
+    scores_a: Optional[np.array] = None,
+    scores_b: Optional[np.array] = None,
+    quantile_func_a: Optional[Callable] = None,
+    quantile_func_b: Optional[Callable] = None,
+    dt: float = 0.001,
+) -> float:
     """
     Compute the violation ration e_W2 (equation 4 + 5).
 
     Parameters
     ----------
-    scores_a: List[float]
+    scores_a:  Optional[np.array]
         Scores of algorithm A.
-    scores_b: List[float]
+    scores_b:  Optional[np.array]
         Scores of algorithm B.
     dt: float
         Differential for t during integral calculation.
+    quantile_func_a: Optional[Callable]
+        Quantile function based on the first set of scores.
+    quantile_func_b: Optional[Callable]
+        Quantile function based on the second set of scores.
 
     Returns
     -------
     float
         Return violation ratio.
     """
-    squared_wasserstein_dist = 0
-    int_violation_set = 0  # Integral over violation set A_X
-    quantile_func_a = get_quantile_function(scores_a)
-    quantile_func_b = get_quantile_function(scores_b)
+    assert (
+        scores_a is not None or quantile_func_a is not None
+    ), "Either scores or quantile function are required for the first sample, neither found."
+
+    assert (
+        scores_b is not None or quantile_func_b is not None
+    ), "Either scores or quantile function are required for the second sample, neither found."
+
+    if quantile_func_a is None:
+        quantile_func_a = get_quantile_function(scores_a)
+
+    if quantile_func_b is None:
+        quantile_func_b = get_quantile_function(scores_b)
+
+    t = np.arange(dt, 1, dt)  # Points we integrate over
+    f = quantile_func_a(t)  # F-1(t)
+    g = quantile_func_b(t)  # G-1(t)
+    diff = g - f
+    squared_wasserstein_dist = np.sum(diff ** 2 * dt)
 
-    for p in np.arange(0, 1, dt):
-        diff = quantile_func_b(p) - quantile_func_a(p)
-        squared_wasserstein_dist += (diff ** 2) * dt
-        int_violation_set += (max(diff, 0) ** 2) * dt
+    # Now only consider points where stochastic order is being violated and set the rest to 0
+    diff[f >= g] = 0
+    int_violation_set = np.sum(diff[1:] ** 2 * dt)  # Ignore t = 0 since t in (0, 1)
 
     if squared_wasserstein_dist == 0:
         warn("Division by zero encountered in violation ratio.")
-        violation_ratio = 0
+        violation_ratio = 0.5
 
     else:
         violation_ratio = int_violation_set / squared_wasserstein_dist
@@ -361,7 +386,7 @@ def get_quantile_function(scores: np.array) -> Callable:
     # When running multiple jobs via joblib, numpy has to be re-imported for some reason to avoid an error
     # Use dir() to check whether module is available in local scope:
     # https://stackoverflow.com/questions/30483246/how-to-check-if-a-module-has-been-imported
-    if "numpy" not in dir():
+    if "np" not in dir():
         import numpy as np
 
     def _quantile_function(p: float) -> float:
@@ -369,54 +394,100 @@ def _quantile_function(p: float) -> float:
         num = len(scores)
         index = int(np.ceil(num * p))
 
-        return cdf[min(num - 1, max(0, index - 1))]
+        return cdf[np.clip(index - 1, 0, num - 1)]
 
     return np.vectorize(_quantile_function)
 
 
-def _get_num_models(scores: ScoreCollection) -> int:
+def get_bootstrapped_violation_ratios(
+    scores_a: ArrayLike,
+    scores_b: ArrayLike,
+    quantile_func_a: Callable,
+    quantile_func_b: Callable,
+    num_bootstrap_iterations: int,
+    dt: float,
+    num_jobs: int,
+    show_progress: bool,
+    seed: Optional[int],
+    _progress_bar: Optional[tqdm],
+) -> List[float]:
     """
-    Retrieve the number of models from a ScoreCollection for multi_aso().
+    Retrieve violation ratios computed based on a number of bootstrap samples.
 
     Parameters
     ----------
-    scores: ScoreCollection
-        Collection of model scores. Should be either dictionary of model name to model scores, nested Python list,
-        2D numpy or Jax array, or 2D Tensorflow or PyTorch tensor.
+    scores_a: List[float]
+        Scores of algorithm A.
+    scores_b: List[float]
+        Scores of algorithm B.
+    quantile_func_a: Callable
+        Quantile function based on the first set of scores.
+    quantile_func_b: Callable
+        Quantile function based on the second set of scores.
+    num_bootstrap_iterations: int
+        Number of bootstrap iterations when estimating sigma.
+    dt: float
+        Differential for t during integral calculation.
+    num_jobs: int
+        Number of threads that bootstrap iterations are divided among.
+    show_progress: bool
+        Show progress bar. Default is True.
+    seed: Optional[int]
+        Set seed for reproducibility purposes. Default is None (meaning no seed is used).
+    _progress_bar: Optional[tqdm]
+        Hands over a progress bar object when called by multi_aso(). Only for internal use.
 
     Returns
     -------
-    int
-        Number of models.
+    List[float]
+        Bootstrapped violation ratios.
     """
-    # Python dictionary
-    if isinstance(scores, dict):
-        if len(scores) < 2:
-            raise ValueError(
-                "'scores' argument should contain at least two sets of scores, but only {} found.".format(
-                    len(scores)
-                )
-            )
+    # Add progress bar if applicable
+    if show_progress and _progress_bar is None:
+        iters = tqdm(range(num_bootstrap_iterations), desc="Bootstrap iterations")
 
-        return len(scores)
+    # Shared progress bar when called from multi_aso()
+    elif _progress_bar is not None:
+        iters = _progress_iter(num_bootstrap_iterations, _progress_bar)
 
-    # (Nested) python list
-    elif isinstance(scores, list):
-        if not isinstance(scores[0], list):
-            raise TypeError(
-                "'scores' argument must be nested list of scores when Python lists are used, but elements of type {} "
-                "found".format(type(scores[0]).__name__)
-            )
+    else:
+        iters = range(num_bootstrap_iterations)
 
-        return len(scores)
+    # Set seeds for different jobs if applicable
+    # "Sub-seeds" for jobs are just seed argument + job index
+    seeds = (
+        [None] * num_bootstrap_iterations
+        if seed is None
+        else [seed + offset for offset in range(1, num_bootstrap_iterations + 1)]
+    )
 
-    # Numpy / Jax arrays, Tensorflow / PyTorch tensor
-    elif type(scores) in ALLOWED_TYPES:
-        scores = CONVERSIONS[type(scores)](scores)  # Convert to numpy array
+    @wrap_non_picklable_objects
+    def _bootstrap_iter(seed: Optional[int] = None):
+        """
+        One bootstrap iteration. Wrapped in a function so it can be handed to joblib.Parallel.
+        """
+        # When running multiple jobs, these modules have to be re-imported for some reason to avoid an error
+        # Use dir() to check whether module is available in local scope:
+        # https://stackoverflow.com/questions/30483246/how-to-check-if-a-module-has-been-imported
+        if "numpy" not in dir() or "deepsig" not in dir():
+            import numpy as np
+            from deepsig.aso import compute_violation_ratio
 
-        return scores.shape[0]
+        if seed is not None:
+            np.random.seed(seed)
 
-    raise TypeError(
-        "Invalid type for 'scores', should be nested Python list, dict, Jax / Numpy array or Tensorflow / PyTorch "
-        "tensor, '{}' found.".format(type(scores).__name__)
-    )
+        sampled_scores_a = quantile_func_a(np.random.uniform(0, 1, len(scores_a)))
+        sampled_scores_b = quantile_func_b(np.random.uniform(0, 1, len(scores_b)))
+        sample = compute_violation_ratio(
+            scores_a=sampled_scores_a,
+            scores_b=sampled_scores_b,
+            dt=dt,
+        )
+
+        return sample
+
+    # Initialize worker pool and start iterations
+    parallel = Parallel(n_jobs=num_jobs)
+    samples = parallel(delayed(_bootstrap_iter)(seed) for seed, _ in zip(seeds, iters))
+
+    return samples
diff --git a/deepsig/bootstrap.py b/deepsig/bootstrap.py
index aec8459..578d0ba 100644
--- a/deepsig/bootstrap.py
+++ b/deepsig/bootstrap.py
@@ -3,7 +3,11 @@
 `(Efron & Tibshirani, 1994) <https://cds.cern.ch/record/526679/files/0412042312_TOC.pdf>`_.
 """
 
+# STD
+from typing import Optional
+
 # EXT
+from joblib import Parallel, delayed
 import numpy as np
 
 # PKG
@@ -12,7 +16,11 @@
 
 @score_pair_conversion
 def bootstrap_test(
-    scores_a: ArrayLike, scores_b: ArrayLike, num_samples: int = 1000
+    scores_a: ArrayLike,
+    scores_b: ArrayLike,
+    num_samples: int = 1000,
+    num_jobs: int = 1,
+    seed: Optional[int] = None,
 ) -> float:
     """
     Implementation of paired bootstrap test. A p-value is being estimated by comparing the mean of scores
@@ -26,10 +34,14 @@ def bootstrap_test(
     ----------
     scores_a: ArrayLike
         Scores of algorithm A.
-    scores_b: ArrrayLike
+    scores_b: ArrayLike
         Scores of algorithm B.
     num_samples: int
         Number of bootstrap samples used for estimation.
+    num_jobs: int
+        Number of threads that bootstrap iterations are divided among.
+    seed: Optional[int]
+        Set seed for reproducibility purposes. Default is None (meaning no seed is used).
 
     Returns
     -------
@@ -46,17 +58,42 @@ def bootstrap_test(
 
     N = len(scores_a)
     delta = np.mean(scores_a) - np.mean(scores_b)
-    num_larger = 0
 
-    for _ in range(num_samples):
+    # Set seeds for different jobs if applicable
+    # "Sub-seeds" for jobs are just seed argument + job index
+    seeds = (
+        [None] * num_samples
+        if seed is None
+        else [seed + offset for offset in range(1, num_samples + 1)]
+    )
+
+    def _bootstrap_iter(delta: float, seed: Optional[int] = None):
+        """
+        One bootstrap iteration. Wrapped in a function so it can be handed to joblib.Parallel.
+        """
+        # When running multiple jobs, modules have to be re-imported for some reason to avoid an error
+        # Use dir() to check whether module is available in local scope:
+        # https://stackoverflow.com/questions/30483246/how-to-check-if-a-module-has-been-imported
+        if "np" not in dir():
+            import numpy as np
+
+        if seed is not None:
+            np.random.seed(seed)
+
         resampled_scores_a = np.random.choice(scores_a, N)
         resampled_scores_b = np.random.choice(scores_b, N)
 
         new_delta = np.mean(resampled_scores_a - resampled_scores_b)
 
-        if new_delta >= 2 * delta:
-            num_larger += 1
+        return int(new_delta >= 2 * delta)
+
+    # Initialize worker pool and start iterations
+    parallel = Parallel(n_jobs=num_jobs)
+    samples = parallel(
+        delayed(_bootstrap_iter)(delta, seed)
+        for _, seed in zip(range(num_samples), seeds)
+    )
 
-    p_value = num_larger / num_samples
+    p_value = sum(samples) / num_samples
 
     return p_value
diff --git a/deepsig/permutation.py b/deepsig/permutation.py
index 2cc66bb..2cebb20 100644
--- a/deepsig/permutation.py
+++ b/deepsig/permutation.py
@@ -2,7 +2,11 @@
 Implementation of paired sign test.
 """
 
+# STD
+from typing import Optional
+
 # EXT
+from joblib import Parallel, delayed
 import numpy as np
 
 # PKG
@@ -11,7 +15,11 @@
 
 @score_pair_conversion
 def permutation_test(
-    scores_a: ArrayLike, scores_b: ArrayLike, num_samples: int = 1000
+    scores_a: ArrayLike,
+    scores_b: ArrayLike,
+    num_samples: int = 1000,
+    num_jobs: int = 1,
+    seed: Optional[int] = None,
 ) -> float:
     """
     Implementation of a permutation-randomization test. Scores of A and B will be randomly swapped and the difference
@@ -28,6 +36,10 @@ def permutation_test(
         Scores of algorithm B.
     num_samples: int
         Number of permutations used for estimation.
+    num_jobs: int
+        Number of threads that bootstrap iterations are divided among.
+    seed: Optional[int]
+        Set seed for reproducibility purposes. Default is None (meaning no seed is used).
 
     Returns
     -------
@@ -44,11 +56,28 @@ def permutation_test(
 
     N = len(scores_a)
     delta = np.mean(scores_a - scores_b)
-    num_larger = 0
 
-    # Do the permutations
-    for _ in range(num_samples):
-        # Swap entries of a and b with 50 % probability
+    # Set seeds for different jobs if applicable
+    # "Sub-seeds" for jobs are just seed argument + job index
+    seeds = (
+        [None] * num_samples
+        if seed is None
+        else [seed + offset for offset in range(1, num_samples + 1)]
+    )
+
+    def _bootstrap_iter(delta: float, seed: Optional[int] = None):
+        """
+        One bootstrap iteration. Wrapped in a function so it can be handed to joblib.Parallel.
+        """
+        # When running multiple jobs, modules have to be re-imported for some reason to avoid an error
+        # Use dir() to check whether module is available in local scope:
+        # https://stackoverflow.com/questions/30483246/how-to-check-if-a-module-has-been-imported
+        if "np" not in dir():
+            import numpy as np
+
+        if seed is not None:
+            np.random.seed(seed)
+
         swapped_a, swapped_b = zip(
             *[
                 (scores_a[i], scores_b[i])
@@ -59,9 +88,15 @@ def permutation_test(
         )
         swapped_a, swapped_b = np.array(swapped_a), np.array(swapped_b)
 
-        if np.mean(swapped_a - swapped_b) >= delta:
-            num_larger += 1
+        return int(np.mean(swapped_a - swapped_b) >= delta)
+
+    # Initialize worker pool and start iterations
+    parallel = Parallel(n_jobs=num_jobs)
+    samples = parallel(
+        delayed(_bootstrap_iter)(delta, seed)
+        for _, seed in zip(range(num_samples), seeds)
+    )
 
-    p_value = (num_larger + 1) / (num_samples + 1)
+    p_value = (sum(samples) + 1) / (num_samples + 1)
 
     return p_value
diff --git a/deepsig/sample_size.py b/deepsig/sample_size.py
index e395ea2..9c79a0a 100644
--- a/deepsig/sample_size.py
+++ b/deepsig/sample_size.py
@@ -8,7 +8,7 @@
 
 # EXT
 import numpy as np
-from scipy.stats import ttest_ind
+from scipy.stats import ttest_rel
 from tqdm import tqdm
 
 # PROJECT
@@ -115,8 +115,8 @@ def bootstrap_power_analysis(
 
     # Set default significance test to Welch's t-test
     if significance_test is None:
-        significance_test = lambda scores_a, scores_b: ttest_ind(
-            scores_a, scores_b, equal_var=False, alternative="greater"
+        significance_test = lambda scores_a, scores_b: ttest_rel(
+            scores_a, scores_b, alternative="greater"
         ).pvalue
 
     iters = (
diff --git a/deepsig/tests/test_aso.py b/deepsig/tests/test_aso.py
index 8f00f79..4bb0cf0 100644
--- a/deepsig/tests/test_aso.py
+++ b/deepsig/tests/test_aso.py
@@ -3,6 +3,7 @@
 """
 
 # STD
+from itertools import product
 import unittest
 
 # EXT
@@ -11,7 +12,7 @@
 import tensorflow as tf
 
 # import jax.numpy as jnp
-from scipy.stats import wasserstein_distance, pearsonr
+from scipy.stats import wasserstein_distance, pearsonr, norm, laplace, rayleigh
 
 # PKG
 from deepsig.aso import (
@@ -42,21 +43,38 @@ def test_assertions(self):
             aso([3, 4], [])
 
         with self.assertRaises(AssertionError):
-            aso([1, 2, 3], [3, 4, 5], num_samples=-1, show_progress=False)
+            aso([1, 2, 3], [3, 4, 5], num_bootstrap_iterations=-1, show_progress=False)
 
         with self.assertRaises(AssertionError):
-            aso([1, 2, 3], [3, 4, 5], num_samples=0, show_progress=False)
+            aso([1, 2, 3], [3, 4, 5], num_bootstrap_iterations=0, show_progress=False)
 
         with self.assertRaises(AssertionError):
-            aso([1, 2, 3], [3, 4, 5], num_bootstrap_iterations=-1, show_progress=False)
+            aso([1, 2, 3], [3, 4, 5], num_jobs=0, show_progress=False)
 
+    def test_argument_combos(self):
+        """
+        Try different combinations of inputs arguments for compute_violation_ratio().
+        """
+        scores_a = np.random.normal(size=5)
+        scores_b = np.random.normal(size=5)
+        quantile_func_a = norm.ppf
+        quantile_func_b = norm.ppf
+
+        # All of these should work
+        for kwarg1, kwarg2 in product(
+            [{"scores_a": scores_a}, {"quantile_func_a": quantile_func_a}],
+            [{"scores_b": scores_b}, {"quantile_func_b": quantile_func_b}],
+        ):
+            compute_violation_ratio(**{**kwarg1, **kwarg2})
+
+        # These should create errors
         with self.assertRaises(AssertionError):
-            aso([1, 2, 3], [3, 4, 5], num_bootstrap_iterations=0, show_progress=False)
+            compute_violation_ratio(scores_a=scores_a, quantile_func_a=quantile_func_a)
 
         with self.assertRaises(AssertionError):
-            aso([1, 2, 3], [3, 4, 5], num_jobs=0, show_progress=False)
+            compute_violation_ratio(scores_b=scores_b, quantile_func_b=quantile_func_b)
 
-    def test_compute_violation_ratio(self):
+    def test_compute_violation_ratio_correlation(self):
         """
         Test whether violation ratio is being computed correctly.
         """
@@ -81,6 +99,61 @@ def test_compute_violation_ratio(self):
         rho, _ = pearsonr(violation_ratios, inv_sqw_dists)
         self.assertGreaterEqual(rho, 0.85)
 
+    def test_compute_violation_ratio_exact(self):
+        """
+        Test the value of the violation ratio given some exact CDFs.
+        """
+        test_dists = [
+            (
+                np.random.normal,
+                norm.ppf,
+                {"loc": 0.275, "scale": 1.5},
+                {"loc": 0.25, "scale": 1},
+            ),
+            (
+                np.random.laplace,
+                laplace.ppf,
+                {"loc": 0.275, "scale": 1.5},
+                {"loc": 0.25, "scale": 1},
+            ),
+            (np.random.rayleigh, rayleigh.ppf, {"scale": 1.05}, {"scale": 1}),
+        ]
+
+        for sample_func, ppf, params_a, params_b in test_dists:
+            quantile_func_a = lambda x: ppf(x, **params_a)
+            quantile_func_b = lambda x: ppf(x, **params_b)
+            violation_ratio_ab_exact = compute_violation_ratio(
+                quantile_func_a=quantile_func_a, quantile_func_b=quantile_func_b
+            )
+            violation_ratio_ba_exact = compute_violation_ratio(
+                quantile_func_a=quantile_func_b, quantile_func_b=quantile_func_a
+            )
+
+            samples_a = sample_func(size=self.num_samples, **params_a)
+            samples_b = sample_func(size=self.num_samples, **params_b)
+            violation_ratio_ab_sampled = compute_violation_ratio(
+                scores_a=samples_a, scores_b=samples_b
+            )
+            violation_ratio_ba_sampled = compute_violation_ratio(
+                scores_a=samples_b, scores_b=samples_a
+            )
+
+            # Check symmetries
+            self.assertAlmostEqual(
+                violation_ratio_ab_exact, 1 - violation_ratio_ba_exact, delta=0.05
+            )
+            self.assertAlmostEqual(
+                violation_ratio_ab_sampled, 1 - violation_ratio_ba_sampled, delta=0.05
+            )
+
+            # Check closeness to exact value
+            self.assertAlmostEqual(
+                violation_ratio_ab_exact, violation_ratio_ab_sampled, delta=0.05
+            )
+            self.assertAlmostEqual(
+                violation_ratio_ba_exact, violation_ratio_ba_sampled, delta=0.05
+            )
+
     def test_get_quantile_function(self):
         """
         Test whether quantile function is working correctly. Values for normal distribution taken from
@@ -150,9 +223,9 @@ class MultiASOTests(unittest.TestCase):
 
     def setUp(self) -> None:
         self.aso_kwargs = {
-            "num_samples": 100,
             "num_bootstrap_iterations": 100,
             "num_jobs": 4,
+            "show_progress": False,
         }
         self.num_models = 3
         self.num_seeds = 100
@@ -164,13 +237,6 @@ def setUp(self) -> None:
         self.scores_dict = {
             "model{}".format(i): scores for i, scores in enumerate(self.scores)
         }
-        # Test case based on https://github.com/Kaleidophon/deep-significance/issues/7
-        self.mikes_scores_dict = {
-            "x": np.array([59.13, 58.03, 59.18, 58.78, 58.5]),
-            "y": np.array([58.13, 59.19, 59.94, 60.08, 59.85]),
-            "z": np.array([58.77, 58.86, 59.58, 59.59, 59.64]),
-            "w": np.array([58.16, 58.49, 59.87, 58.94, 58.96]),
-        }
         self.scores_numpy = np.array(self.scores)
         self.scores_torch = torch.from_numpy(self.scores_numpy)
         self.scores_tensorflow = tf.convert_to_tensor(self.scores_numpy)
@@ -201,59 +267,6 @@ def test_bonferroni_correction(self):
         )
         self.assertTrue(np.all(corrected_scores >= uncorrected_scores))
 
-    def test_symmetry(self):
-        """
-        Test flag that toggles the use of the symmetry property.
-        """
-        seed = 4321
-        asymmetric_scores = multi_aso(
-            self.scores_numpy, seed=seed, use_symmetry=False, **self.aso_kwargs
-        )
-        symmetric_scores = multi_aso(self.scores_numpy, seed=seed, **self.aso_kwargs)
-
-        self.assertTrue(
-            np.all(
-                np.tril(symmetric_scores, -1) == np.tril((1 - symmetric_scores).T, -1)
-            )
-        )
-        self.assertTrue(
-            np.any(
-                np.tril(asymmetric_scores, -1) == np.tril((1 - asymmetric_scores).T, -1)
-            )
-        )
-        self.assertTrue(
-            np.all(np.diag(symmetric_scores) == 1)
-        )  # Check all diagonals to be one
-        self.assertTrue(
-            np.all(np.diag(asymmetric_scores) == 1)
-        )  # Check all diagonals to be one
-
-        # Cover Mike's test case: https://github.com/Kaleidophon/deep-significance/issues/7
-        mikes_asymmetric_scores = multi_aso(
-            self.mikes_scores_dict, seed=seed, use_symmetry=False, **self.aso_kwargs
-        )
-        mikes_symmetric_scores = multi_aso(
-            self.mikes_scores_dict, seed=seed, **self.aso_kwargs
-        )
-        self.assertTrue(
-            np.all(
-                np.tril(mikes_symmetric_scores, -1)
-                == np.tril((1 - mikes_symmetric_scores).T, -1)
-            )
-        )
-        self.assertTrue(
-            np.any(
-                np.tril(mikes_asymmetric_scores, -1)
-                == np.tril((1 - mikes_asymmetric_scores).T, -1)
-            )
-        )
-        self.assertTrue(
-            np.all(np.diag(mikes_symmetric_scores) == 1)
-        )  # Check all diagonals to be one
-        self.assertTrue(
-            np.all(np.diag(mikes_asymmetric_scores) == 1)
-        )  # Check all diagonals to be one
-
     def test_result_df(self):
         """
         Test the creation of a results DataFrame.
@@ -312,9 +325,9 @@ def test_extreme_cases(self):
         )
         self.assertAlmostEqual(eps_min2, 0, delta=0.01)
 
-    def test_dependency_on_alpha(self):
+    def test_dependency_on_confidence_level(self):
         """
-        Make sure that the minimum epsilon threshold increases as we increase the confidence level.
+        Make sure that the minimum epsilon threshold decreases as we increase the confidence level.
         """
         samples_normal1 = np.random.normal(
             loc=0.1, size=self.num_samples
@@ -325,11 +338,11 @@ def test_dependency_on_alpha(self):
 
         min_epsilons = []
         seed = 6666
-        for alpha in np.arange(0.8, 0.1, -0.1):
+        for confidence_level in np.arange(0.1, 0.8, 0.1):
             min_eps = aso(
                 samples_normal1,
                 samples_normal2,
-                confidence_level=alpha,
+                confidence_level=confidence_level,
                 num_bootstrap_iterations=100,
                 show_progress=False,
                 num_jobs=4,
@@ -340,65 +353,3 @@ def test_dependency_on_alpha(self):
         self.assertEqual(
             list(sorted(min_epsilons)), min_epsilons
         )  # Make sure min_epsilon decreases
-
-    def test_dependency_on_samples(self):
-        """
-        Make sure that the minimum epsilon threshold decreases as we increase the number of samples.
-        """
-        min_epsilons = []
-        seed = 7890
-
-        for num_samples in [80, 1000, 8000]:
-            samples_normal2 = np.random.normal(
-                loc=0, scale=1.1, size=num_samples
-            )  # Scores for algorithm B
-            samples_normal1 = samples_normal2 + 1e-3
-
-            min_eps = aso(
-                samples_normal1,
-                samples_normal2,
-                num_bootstrap_iterations=100,
-                show_progress=False,
-                num_jobs=4,
-                seed=seed,
-            )
-            min_epsilons.append(min_eps)
-
-        self.assertEqual(
-            list(sorted(min_epsilons, reverse=True)), min_epsilons
-        )  # Make sure min_epsilon decreases
-
-    def test_symmetry(self):
-        """
-        Test whether ASO(A, B, alpha) = 1 - ASO(B, A, alpha) holds.
-        """
-        parameters = [
-            ((0, 0.5), (0, 1)),
-            ((-0.5, 0.1), (-0.6, 0.2)),
-            ((0.5, 0.21), (0.7, 0.1)),
-            ((0.1, 0.3), (0.2, 0.1)),
-        ]
-
-        for (loc1, scale1), (loc2, scale2) in parameters:
-            samples_normal1 = np.random.normal(
-                loc=loc1, scale=scale1, size=2000
-            )  # New scores for algorithm A
-            samples_normal2 = np.random.normal(
-                loc=loc2, scale=scale2, size=2000
-            )  # Scores for algorithm B
-
-            eps_min1 = aso(
-                samples_normal1,
-                samples_normal2,
-                show_progress=True,  # Show progress so travis CI build doesn't time out
-                num_jobs=4,
-                num_bootstrap_iterations=1000,
-            )
-            eps_min2 = aso(
-                samples_normal2,
-                samples_normal1,
-                show_progress=True,  # Show progress so travis CI build doesn't time out
-                num_jobs=4,
-                num_bootstrap_iterations=1000,
-            )
-            self.assertAlmostEqual(eps_min1, 1 - eps_min2, delta=0.2)
diff --git a/deepsig/utils.py b/deepsig/utils.py
new file mode 100644
index 0000000..eb930a4
--- /dev/null
+++ b/deepsig/utils.py
@@ -0,0 +1,77 @@
+"""
+Module comprising test-unrelated utility functions.
+"""
+
+# EXT
+from tqdm import tqdm
+
+# PKG
+from deepsig.conversion import ScoreCollection, ALLOWED_TYPES, CONVERSIONS
+
+
+def _progress_iter(high: int, progress_bar: tqdm):
+    """
+    This function is used when a shared progress bar is passed from multi_aso() - every time the iterator yields an
+    element, the progress bar is updated by one. It essentially behaves like a simplified range() function.
+
+    Parameters
+    ----------
+    high: int
+        Number of elements in iterator.
+    progress_bar: tqdm
+        Shared progress bar.
+    """
+    current = 0
+
+    while current < high:
+        yield current
+        current += 1
+        progress_bar.update(1)
+
+
+def _get_num_models(scores: ScoreCollection) -> int:
+    """
+    Retrieve the number of models from a ScoreCollection for multi_aso().
+
+    Parameters
+    ----------
+    scores: ScoreCollection
+        Collection of model scores. Should be either dictionary of model name to model scores, nested Python list,
+        2D numpy or Jax array, or 2D Tensorflow or PyTorch tensor.
+
+    Returns
+    -------
+    int
+        Number of models.
+    """
+    # Python dictionary
+    if isinstance(scores, dict):
+        if len(scores) < 2:
+            raise ValueError(
+                "'scores' argument should contain at least two sets of scores, but only {} found.".format(
+                    len(scores)
+                )
+            )
+
+        return len(scores)
+
+    # (Nested) python list
+    elif isinstance(scores, list):
+        if not isinstance(scores[0], list):
+            raise TypeError(
+                "'scores' argument must be nested list of scores when Python lists are used, but elements of type {} "
+                "found".format(type(scores[0]).__name__)
+            )
+
+        return len(scores)
+
+    # Numpy / Jax arrays, Tensorflow / PyTorch tensor
+    elif type(scores) in ALLOWED_TYPES:
+        scores = CONVERSIONS[type(scores)](scores)  # Convert to numpy array
+
+        return scores.shape[0]
+
+    raise TypeError(
+        "Invalid type for 'scores', should be nested Python list, dict, Jax / Numpy array or Tensorflow / PyTorch "
+        "tensor, '{}' found.".format(type(scores).__name__)
+    )
diff --git a/docs/README_DOCS.md b/docs/README_DOCS.md
index b344874..61cb439 100644
--- a/docs/README_DOCS.md
+++ b/docs/README_DOCS.md
@@ -47,14 +47,15 @@ Reinforcement Learning (Henderson et al., 2018) and Computer Vision (Borji, 2017
 
 To help mitigate this problem, this package supplies fully-tested re-implementations of useful functions for significance
 testing:
-* Statistical Significance tests such as Almost Stochastic Order (Dror et al., 2019), bootstrap (Efron & Tibshirani, 1994) and 
-  permutation-randomization (Noreen, 1989).
+* Statistical Significance tests such as Almost Stochastic Order (del Barrio et al, 2017; Dror et al., 2019), 
+  bootstrap (Efron & Tibshirani, 1994) and permutation-randomization (Noreen, 1989).
 * Bonferroni correction methods for multiplicity in datasets (Bonferroni, 1936). 
 * Bootstrap power analysis (Yuan & Hayashi, 2003) and other functions to determine the right sample size.
 
 All functions are fully tested and also compatible with common deep learning data structures, such as PyTorch / 
 Tensorflow tensors as well as NumPy and Jax arrays.  For examples about the usage, consult the documentation 
-[here](https://deep-significance.readthedocs.io/en/latest/) or the scenarios in the section [Examples](#examples).
+[here](https://deep-significance.readthedocs.io/en/latest/) , the scenarios in the section [Examples](#examples) or 
+the [demo Jupyter notebook](https://github.com/Kaleidophon/deep-significance/tree/main/paper/deep-significance%20demo.ipynb).
 
 ## |:inbox_tray:| Installation
 
@@ -74,46 +75,51 @@ Another option is to clone the repository and install the package locally:
 
 ---
 **tl;dr**: Use `aso()` to compare scores for two models. If the returned `eps_min < 0.5`, A is better than B. The lower
-`eps_min`, the more confident the result. 
+`eps_min`, the more confident the result (we recommend to check `eps_min < 0.2` and record `eps_min` alongside 
+experimental results). 
 
 |:warning:| Testing models with only one set of hyperparameters and only one test set will be able to guarantee superiority
 in all settings. See [General Recommendations & other notes](#general-recommendations).
 
 ---
 
-In the following, I will lay out three scenarios that describe common use cases for ML practitioners and how to apply 
+In the following, we will lay out three scenarios that describe common use cases for ML practitioners and how to apply 
 the methods implemented in this package accordingly. For an introduction into statistical hypothesis testing, please
 refer to resources such as [this blog post](https://machinelearningmastery.com/statistical-hypothesis-tests/) for a general
 overview or [Dror et al. (2018)](https://www.aclweb.org/anthology/P18-1128.pdf) for a NLP-specific point of view. 
 
-In general, in statistical significance testing, we usually compare two algorithms <img src="53d147e7f3fe6e47ee05b88b166bd3f6.svg?invert_in_darkmode" align=middle width=12.32879834999999pt height=22.465723500000017pt/> and <img src="61e84f854bc6258d4108d08d4c4a0852.svg?invert_in_darkmode" align=middle width=13.29340979999999pt height=22.465723500000017pt/> on a dataset <img src="cbfb1b2a33b28eab8a3e59464768e810.svg?invert_in_darkmode" align=middle width=14.908688849999992pt height=22.465723500000017pt/> using 
-some evaluation metric <img src="b5eaea000e06d5cf2e882f8fdbc71e36.svg?invert_in_darkmode" align=middle width=19.740822749999992pt height=22.465723500000017pt/> (we assume a higher = better). The difference between the two algorithms on the 
-data is then defined as 
-
-<p align="center"><img src="9540dc879d2ecaa7cb245871b24f4e5d.svg?invert_in_darkmode" align=middle width=212.73480854999997pt height=16.438356pt/></p>
-
-where <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> is our test statistic. We then test the following **null hypothesis**:
-
-<p align="center"><img src="1d210dbbb93bbdc5a632b9443059499d.svg?invert_in_darkmode" align=middle width=100.49629589999999pt height=16.438356pt/></p>
-
-Thus, we assume our algorithm A to be equally as good or worse than algorithm B and reject the null hypothesis if A 
-is better than B (what we actually would like to see). Most statistical significance tests operate using 
-*p-values*, which define the probability that under the null-hypothesis, the <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> expected by the test is larger than or
-equal to the observed difference <img src="ecdae90a73f512871267f358443bd563.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> (that is, for a one-sided test, i.e. we assume A to be better than B):
-
-<p align="center"><img src="6d2735c4e335ec03c8b45736da4531a3.svg?invert_in_darkmode" align=middle width=135.91559685pt height=16.438356pt/></p>
-
-We can interpret this equation as follows: Assuming that A is *not* better than B, the test assumes a corresponding distribution
-of differences that <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> is drawn from. How does our actually observed difference <img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> fit in there?
-This is what the p-value is expressing: If this probability is high, <img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> is in line with what we expected under 
-the null hypothesis, so we conclude A not to better than B. If the 
-probability is low, that means that <img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> is quite unlikely under the null hypothesis and that the reverse 
-case is more likely - i.e. that it is 
-likely *larger* than <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> - and we conclude that A is indeed better than B. Note that **the p-value does not 
-express whether the null hypothesis is true**.
-
-To decide when we trust A to be better than B, we set a threshold that will determine when the p-value is small enough 
-for us to reject the null hypothesis, this is called the significance level <img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/> and it is often set to be 0.05.
+We assume that we have two sets of scores we would like to compare, <img src="b7e817ab52abd984b082abaa1da6a8e4.svg?invert_in_darkmode" align=middle width=17.44287434999999pt height=22.648391699999998pt/> and <img src="d06f8d92c07734af06da289c13d2beed.svg?invert_in_darkmode" align=middle width=16.80361814999999pt height=22.648391699999998pt/>,
+for instance obtained by running two models <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> and <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/> multiple times with a different random seed. 
+We can then define a one-sided test statistic  <img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> based on the gathered observations. 
+An example of such test statistics is for instance the difference in observation means. We then formulate the following null-hypothesis:
+
+<p align="center"><img src="00160c684b3af8ccefcdf19c69712e34.svg?invert_in_darkmode" align=middle width=128.7838134pt height=16.438356pt/></p>
+
+That means that we actually assume the opposite of our desired case, namely that <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is not better than <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>, 
+but equally as good or worse, as indicated by the value of the test statistic. 
+Usually, the goal becomes to reject this null hypothesis using the SST. 
+*p*-value testing is a frequentist method in the realm of SST. 
+It introduces the notion of data that *could have been observed* if we were to repeat our experiment again using 
+the same conditions, which we will write with superscript <img src="e723e08dae472a15132221e280670a7e.svg?invert_in_darkmode" align=middle width=22.87678634999999pt height=14.15524440000002pt/> in order to distinguish them from our actually 
+observed scores (Gelman et al., 2021). 
+We then define the *p*-value as the probability that, under the null hypothesis, the test statistic using replicated 
+observation is larger than or equal to the *observed* test statistic:
+
+<p align="center"><img src="5db9dda6d48361ba963326d3f98a033d.svg?invert_in_darkmode" align=middle width=216.90071865pt height=17.74869195pt/></p>
+
+We can interpret this expression as follows: Assuming that <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is not better than <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>, the test 
+assumes a corresponding distribution of statistics that <img src="38f1e2a089e53d5c990a82f284948953.svg?invert_in_darkmode" align=middle width=7.928075099999989pt height=22.831056599999986pt/> is drawn from. So how does the observed test statistic 
+<img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> fit in here? This is what the <img src="2ec6e630f199f589a2402fdf3e0289d5.svg?invert_in_darkmode" align=middle width=8.270567249999992pt height=14.15524440000002pt/>-value expresses: When the 
+probability is high, <img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> is in line with what we expected under the 
+null hypothesis, so we can *not* reject the null hypothesis, or in other words, we \emph{cannot} conclude 
+<img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> to be better than <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>. If the probability is low, that means that the observed 
+<img src="67ebeedcf8c4d1141331d07b2cef2b03.svg?invert_in_darkmode" align=middle width=54.77736824999999pt height=24.65753399999998pt/> is quite unlikely under the null hypothesis and that the reverse case is 
+more likely - i.e. that it is likely larger than - and we conclude that <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is indeed better than 
+<img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>. Note that **the <img src="2ec6e630f199f589a2402fdf3e0289d5.svg?invert_in_darkmode" align=middle width=8.270567249999992pt height=14.15524440000002pt/>-value does not express whether the null hypothesis is true**. To make our decision 
+about whether or not to reject the null hypothesis, we typically determine a threshold - the significance level 
+<img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/>, often set to 0.05 - that the *p*-value has to fall below. However, it has been argued that a better practice 
+involves reporting the *p*-value alongside the results without a pidgeonholing of results into significant and non-significant
+(Wasserstein et al., 2019).
 
 
 ### Intermezzo: Almost Stochastic Order - a better significance test for Deep Neural Networks
@@ -121,8 +127,8 @@ for us to reject the null hypothesis, this is called the significance level <img
 Deep neural networks are highly non-linear models, having their performance highly dependent on hyperparameters, random 
 seeds and other (stochastic) factors. Therefore, comparing the means of two models across several runs might not be 
 enough to decide if a model A is better than B. In fact, **even aggregating more statistics like standard deviation, minimum
-or maximum might not be enough** to make a decision. For this reason, Dror et al. (2019) introduced *Almost Stochastic 
-Order* (ASO), a test to compare two score distributions. 
+or maximum might not be enough** to make a decision. For this reason, del Barrio et al. (2017) and Dror et al. (2019) 
+introduced *Almost Stochastic Order* (ASO), a test to compare two score distributions. 
 
 It builds on the concept of *stochastic order*: We can compare two distributions and declare one as *stochastically dominant*
 by comparing their cumulative distribution functions: 
@@ -132,19 +138,20 @@ by comparing their cumulative distribution functions:
 Here, the CDF of A is given in red and in green for B. If the CDF of A is lower than B for every <img src="332cc365a4987aacce0ead01b8bdcc0b.svg?invert_in_darkmode" align=middle width=9.39498779999999pt height=14.15524440000002pt/>, we know the 
 algorithm A to score higher. However, in practice these cases are rarely so clear-cut (imagine e.g. two normal 
 distributions with the same mean but different variances).
-For this reason, Dror et al. (2019) consider the notion of *almost stochastic dominance* by quantifying the extent to 
-which stochastic order is being violated (red area):
+For this reason, del Barrio et al. (2017) and Dror et al. (2019) consider the notion of *almost stochastic dominance* 
+by quantifying the extent to which stochastic order is being violated (red area):
 
 ![](img/aso.png)
 
-ASO returns a value <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>, which expresses the amount of violation of stochastic order. If 
-<img src="dabed7f05cf133d9eb92631d564a96a8.svg?invert_in_darkmode" align=middle width=72.19750559999999pt height=21.18721440000001pt/>, A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as 
+ASO returns a value <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>, which expresses (an upper bound to) the amount of violation of stochastic order. If 
+<img src="4cd4877610a47d915f39367760234822.svg?invert_in_darkmode" align=middle width=60.239714699999986pt height=17.723762100000005pt/> (where \tau is 0.5 or less), A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as 
 superior. We can also interpret <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> as a *confidence score*. The lower it is, the more sure we can be 
 that A is better than B. Note: **ASO does not compute p-values.** Instead, the null hypothesis formulated as 
 
-<p align="center"><img src="69c5ac8ce10d0dbd0c2b915aaf0472c1.svg?invert_in_darkmode" align=middle width=106.93478895pt height=13.698590399999999pt/></p>
+<p align="center"><img src="06f5ff6214110287d3948e9b44e31a1f.svg?invert_in_darkmode" align=middle width=94.97699804999999pt height=13.698590399999999pt/></p>
 
-If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5.
+If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5 
+(see the discussion in [this section](#general-recommendations)).
 Furthermore, the significance level <img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/> is determined as an input argument when running ASO and actively influence 
 the resulting <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>.
 
@@ -159,12 +166,15 @@ We can now simply apply the ASO test:
 import numpy as np
 from deepsig import aso
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores
 N = 5  # Number of random seeds
 my_model_scores = np.random.normal(loc=0.9, scale=0.8, size=N)
 baseline_scores = np.random.normal(loc=0, scale=1, size=N)
 
-min_eps = aso(my_model_scores, baseline_scores)  # min_eps = 0.0, so A is better
+min_eps = aso(my_model_scores, baseline_scores, seed=seed)  # min_eps = 0.225, so A is better
 ```
 
 Note that ASO **does not make any assumptions about the distributions of the scores**. 
@@ -185,6 +195,9 @@ which corresponds to the Bonferroni correction (Bonferroni et al., 1936):
 import numpy as np
 from deepsig import aso 
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores for three datasets
 M = 3  # Number of datasets
 N = 5  # Number of random seeds
@@ -192,8 +205,8 @@ my_model_scores_per_dataset = [np.random.normal(loc=0.3, scale=0.8, size=N) for
 baseline_scores_per_dataset  = [np.random.normal(loc=0, scale=1, size=N) for _ in range(M)]
 
 # epsilon_min values with Bonferroni correction 
-eps_min = [aso(a, b, confidence_level=0.05 / M) for a, b in zip(my_model_scores_per_dataset, baseline_scores_per_dataset)]
-# eps_min = [0.1565800030782686, 1, 0.0]
+eps_min = [aso(a, b, confidence_level=0.95, num_comparisons=M, seed=seed) for a, b in zip(my_model_scores_per_dataset, baseline_scores_per_dataset)]
+# eps_min = [0.006370113450148568, 0.6534772728574852, 0.0]
 ```
 
 ### Scenario 3 - Comparing sample-level scores
@@ -212,6 +225,9 @@ from itertools import product
 import numpy as np
 from deepsig import aso 
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores for three datasets
 M = 40   # Number of data points
 N = 3  # Number of random seeds
@@ -220,7 +236,9 @@ baseline_scored_samples_per_run = [np.random.normal(loc=0, scale=1, size=M) for
 pairs = list(product(my_model_scored_samples_per_run, baseline_scored_samples_per_run))
 
 # epsilon_min values with Bonferroni correction 
-eps_min = [aso(a, b, confidence_level=0.05 / len(pairs)) for a, b in pairs]
+eps_min = [aso(a, b, confidence_level=0.95, num_comparisons=len(pairs), seed=seed) for a, b in pairs]
+# eps_min = [0.3831678636198528, 0.07194780234194881, 0.9152792807128325, 0.5273463008857844, 0.14946944524461184, 1.0, 
+# 0.6099543280369378, 0.22387448804041898, 1.0]
 ```
 
 ### Scenario 4 - Comparing more than two models 
@@ -249,6 +267,9 @@ Let's look at an example:
 ```python 
 import numpy as np 
 from deepsig import multi_aso 
+
+seed = 1234
+np.random.seed(seed)
  
 N = 5  # Number of random seeds
 M = 3  # Number of different models / algorithms
@@ -257,20 +278,19 @@ M = 3  # Number of different models / algorithms
 # Here, we will sample from N(0.1, 0.8), N(0.15, 0.8), N(0.2, 0.8)
 my_models_scores = np.array([np.random.normal(loc=loc, scale=0.8, size=N) for loc in np.arange(0.1, 0.1 + 0.05 * M, step=0.05)])
 
-eps_min = multi_aso(my_models_scores, confidence_level=0.05)
+eps_min = multi_aso(my_models_scores, confidence_level=0.95, seed=seed)
     
 # eps_min =
-# array([[1., 1., 1.],
-#        [0., 1., 1.],
-#        [0., 0., 1.]])
+# array([[1.       , 0.92621655, 1.        ],
+#       [1.        , 1.        , 1.        ],
+#       [0.82081635, 0.73048716, 1.        ]])
 ```
 
 In the example, `eps_min` is now a matrix, containing the <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> score between all pairs of models (for 
 the same model, it set to 1 by default). The matrix is always to be read as ASO(row, column).
 
 The function applies the bonferroni correction for multiple comparisons by 
-default, but this can be turned off by using `use_bonferroni=False`. In order to save compute, the above symmetry
-property is used as well, but this can also be disabled by `use_symmetry=False`.
+default, but this can be turned off by using `use_bonferroni=False`.
 
 Lastly, when the `scores` argument is a dictionary and the function is called with `return_df=True`, the resulting matrix is 
 given as a `pandas.DataFrame` for increased readability:
@@ -278,6 +298,9 @@ given as a `pandas.DataFrame` for increased readability:
 ```python 
 import numpy as np 
 from deepsig import multi_aso 
+
+seed = 1234
+np.random.seed(seed)
  
 N = 5  # Number of random seeds
 M = 3  # Number of different models / algorithms
@@ -294,14 +317,14 @@ my_models_scores = {
 #   ...
 # }
 
-eps_min = multi_aso(my_models_scores, confidence_level=0.05, return_df=True)
+eps_min = multi_aso(my_models_scores, confidence_level=0.95, return_df=True, seed=seed)
     
 # This is now a DataFrame!
 # eps_min =
-#           model 1   model 2  model 3
-# model 1       1.0       1.0      1.0
-# model 2       0.0       1.0      1.0
-# model 3       1.0       0.0      1.0
+#          model 1   model 2  model 3
+# model 1  1.000000  0.926217      1.0
+# model 2  1.000000  1.000000      1.0
+# model 3  0.820816  0.730487      1.0
 
 ```
 
@@ -315,7 +338,7 @@ score. Below lists some example snippets reporting the results of scenarios 1 an
 
     We compared all pairs of models based on five random seeds each using ASO with a confidence level of 
     $\alpha = 0.05$ (before adjusting for all pair-wise comparisons using the Bonferroni correction). Almost stochastic 
-    dominance ($\epsilon_\text{min} < 0.5)$ is indicated in table X.
+    dominance ($\epsilon_\text{min} < \tau$ with $\tau = 0.2$) is indicated in table X.
 
 ### |:control_knobs:| Sample size
 
@@ -384,11 +407,11 @@ from deepsig import aso
 import numpy as np
 from timeit import timeit
 
-a = np.random.normal(size=5)
-b = np.random.normal(size=5)
+a = np.random.normal(size=1000)
+b = np.random.normal(size=1000)
 
-print(timeit(lambda: aso(a, b, num_jobs=1, show_progress=False), number=5))  # 146.6909574989986
-print(timeit(lambda: aso(a, b, num_jobs=4, show_progress=False), number=5))  # 50.416724971000804
+print(timeit(lambda: aso(a, b, num_jobs=1, show_progress=False), number=5))  # 393.6318126
+print(timeit(lambda: aso(a, b, num_jobs=4, show_progress=False), number=5))  # 139.73514621799995n
 ```
 
 #### |:electric_plug:| Compatibility with PyTorch, Tensorflow, Jax & Numpy
@@ -438,11 +461,15 @@ as many scores as possible should be collected, especially if the variance betwe
   Because this is usually infeasible in practice, Bouthilier et al. (2020) recommend to **vary all other sources of variation**
   between runs to obtain the most trustworthy estimate of the "true" performance, such as data shuffling, weight initialization etc.
 
-* `num_samples` and `num_bootstrap_iterations` can be reduced to increase the speed of `aso()`. However, this is not 
+* `num_bootstrap_iterations` can be reduced to increase the speed of `aso()`. However, this is not 
 recommended as the result of the test will also become less accurate. Technically, <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> is a upper bound
   that becomes tighter with the number of samples and bootstrap iterations (del Barrio et al., 2017). Thus, increasing 
   the number of jobs with `num_jobs` instead is always preferred.
   
+* While we could declare a model stochastically dominant with <img src="dabed7f05cf133d9eb92631d564a96a8.svg?invert_in_darkmode" align=middle width=72.19750559999999pt height=21.18721440000001pt/>, we found this to have a comparatively high
+Type I error (false positives). Tests [in our paper](https://arxiv.org/pdf/2204.06815.pdf) have shown that a more useful threshold that trades of Type I and 
+  Type II error between different scenarios might be <img src="9ac49cb370a5b09fca29068ea18eab63.svg?invert_in_darkmode" align=middle width=51.969107849999986pt height=21.18721440000001pt/>.
+  
 * Bootstrap and permutation-randomization are all non-parametric tests, i.e. they don't make any assumptions about 
 the distribution of our test metric. Nevertheless, they differ in their *statistical power*, which is defined as the probability
   that the null hypothesis is being rejected given that there is a difference between A and B. In other words, the more powerful 
@@ -454,7 +481,17 @@ the distribution of our test metric. Nevertheless, they differ in their *statist
 
 ### |:mortar_board:| Cite
 
-If you use the ASO test via `aso()`, please cite the original work:
+Using this package in general, please cite the following:
+
+    @article{ulmer2022deep,
+      title={deep-significance-Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks},
+      author={Ulmer, Dennis and Hardmeier, Christian and Frellsen, Jes},
+      journal={arXiv preprint arXiv:2204.06815},
+      year={2022}
+    }
+
+
+If you use the ASO test via `aso()` or `multi_aso, please cite the original works:
 
     @inproceedings{dror2019deep,
       author    = {Rotem Dror and
@@ -475,21 +512,20 @@ If you use the ASO test via `aso()`, please cite the original work:
       timestamp = {Tue, 28 Jan 2020 10:27:52 +0100},
     }
 
-Using this package in general, please cite the following:
-
-    @software{dennis_ulmer_2021_4638709,
-      author       = {Dennis Ulmer},
-      title        = {{deep-significance: Easy and Better Significance 
-                       Testing for Deep Neural Networks}},
-      month        = mar,
-      year         = 2021,
-      note         = {https://github.com/Kaleidophon/deep-significance},
-      publisher    = {Zenodo},
-      version      = {v1.0.0a},
-      doi          = {10.5281/zenodo.4638709},
-      url          = {https://doi.org/10.5281/zenodo.4638709}
+    @incollection{del2018optimal,
+      title={An optimal transportation approach for assessing almost stochastic order},
+      author={Del Barrio, Eustasio and Cuesta-Albertos, Juan A and Matr{\'a}n, Carlos},
+      booktitle={The Mathematics of the Uncertain},
+      pages={33--44},
+      year={2018},
+      publisher={Springer}
     }
 
+For instance, you can write
+
+    In order to compare models, we use the Almost Stochastic Order test \citep{del2018optimal, dror2019deep} as 
+    implemented by \citet{ulmer2022deep}.
+
 ### |:medal_sports:| Acknowledgements
 
 This package was created out of discussions of the [NLPnorth group](https://nlpnorth.github.io/) at the IT University 
@@ -526,6 +562,9 @@ Dror, Rotem, Shlomov, Segev, and Reichart, Roi. "Deep dominance-how to properly
 
 Efron, Bradley, and Robert J. Tibshirani. "An introduction to the bootstrap." CRC press, 1994.
 
+Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, Donald B Rubin, John
+Carlin, Hal Stern, Donald Rubin, and David Dunson. Bayesian data analysis third edition, 2021.
+
 Henderson, Peter, et al. "Deep reinforcement learning that matters." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. No. 1. 2018.
 
 Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. "Visualizing the Loss Landscape of Neural Nets." NeurIPS 2018: 6391-6401
@@ -534,4 +573,7 @@ Narang, Sharan, et al. "Do Transformer Modifications Transfer Across Implementat
 
 Noreen, Eric W. "Computer intensive methods for hypothesis testing: An introduction." Wiley, New York (1989).
 
+Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. Moving to a world beyond “p< 0.05”,
+2019
+
 Yuan, Ke‐Hai, and Kentaro Hayashi. "Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models." British Journal of Mathematical and Statistical Psychology 56.1 (2003): 93-110.
\ No newline at end of file
diff --git a/docs/build/html/.buildinfo b/docs/build/html/.buildinfo
index 7aeb335..0e310f4 100644
--- a/docs/build/html/.buildinfo
+++ b/docs/build/html/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: fae97644b57b58ebf06591c09d196b41
+config: 173784e3cf37332c2864e4679c2177e7
 tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/docs/build/html/README_DOCS.html b/docs/build/html/README_DOCS.html
index b80af70..f10858c 100644
--- a/docs/build/html/README_DOCS.html
+++ b/docs/build/html/README_DOCS.html
@@ -4,22 +4,22 @@
 <html>
   <head>
     <meta charset="utf-8" />
-    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
+
     <meta charset="utf-8">
     <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
     <meta http-equiv="x-ua-compatible" content="ie=edge">
     
     <title>deep-significance: Easy and Better Significance Testing for Deep Neural Networks &#8212; deep-significance 0.9 documentation</title>
 
-    <link rel="stylesheet" href="_static/basic.css" type="text/css" />
-    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <link rel="stylesheet" type="text/css" href="_static/pygments.css" />
+    <link rel="stylesheet" type="text/css" href="_static/basic.css" />
     <link rel="stylesheet" href="_static/bootstrap-4.3.1-dist/css/bootstrap.min.css" type="text/css" />
     <link rel="stylesheet" href="_static/sphinxbootstrap4.css" type="text/css" />
-    <script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
+    <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
     <script src="_static/jquery.js"></script>
     <script src="_static/underscore.js"></script>
     <script src="_static/doctools.js"></script>
-    <script src="_static/language_data.js"></script>
     <script src="_static/bootstrap-4.3.1-dist/js/bootstrap.min.js"></script>
     <script src="_static/sphinxbootstrap4.js"></script>
     <link rel="index" title="Index" href="genindex.html" />
@@ -146,7 +146,7 @@ <h4>Table Of Contents</h4>
         <div class="bodywrapper">
           <div class="body">
             
-  <div class="section" id="deep-significance-easy-and-better-significance-testing-for-deep-neural-networks">
+  <section id="deep-significance-easy-and-better-significance-testing-for-deep-neural-networks">
 <h1>deep-significance: Easy and Better Significance Testing for Deep Neural Networks<a class="headerlink" href="#deep-significance-easy-and-better-significance-testing-for-deep-neural-networks" title="Permalink to this headline">¶</a></h1>
 <p><a class="reference external" href="#"><img alt="Build Status" src="https://travis-ci.com/Kaleidophon/deep-significance.svg?branch=main" /></a>
 <a class="reference external" href="https://coveralls.io/github/Kaleidophon/deep-significance?branch=main"><img alt="Coverage Status" src="https://coveralls.io/repos/github/Kaleidophon/deep-significance/badge.svg?branch=main&amp;service=github" /></a>
@@ -177,7 +177,7 @@ <h1>deep-significance: Easy and Better Significance Testing for Deep Neural Netw
 <li><p><a class="reference external" href="#people_holding_hands-papers-using-deep-significance">|:people_holding_hands:| Papers using deep-significance</a></p></li>
 <li><p><a class="reference external" href="#books-bibliography">|:books:| Bibliography</a></p></li>
 </ul>
-<div class="section" id="interrobang-why">
+<section id="interrobang-why">
 <h2>|:interrobang:| Why?<a class="headerlink" href="#interrobang-why" title="Permalink to this headline">¶</a></h2>
 <p>Although Deep Learning has undergone spectacular growth in the recent decade,
 a large portion of experimental evidence is not supported by statistical hypothesis tests. Instead,
@@ -194,16 +194,17 @@ <h2>|:interrobang:| Why?<a class="headerlink" href="#interrobang-why" title="Per
 <p>To help mitigate this problem, this package supplies fully-tested re-implementations of useful functions for significance
 testing:</p>
 <ul class="simple">
-<li><p>Statistical Significance tests such as Almost Stochastic Order (Dror et al., 2019), bootstrap (Efron &amp; Tibshirani, 1994) and
-permutation-randomization (Noreen, 1989).</p></li>
+<li><p>Statistical Significance tests such as Almost Stochastic Order (del Barrio et al, 2017; Dror et al., 2019),
+bootstrap (Efron &amp; Tibshirani, 1994) and permutation-randomization (Noreen, 1989).</p></li>
 <li><p>Bonferroni correction methods for multiplicity in datasets (Bonferroni, 1936).</p></li>
 <li><p>Bootstrap power analysis (Yuan &amp; Hayashi, 2003) and other functions to determine the right sample size.</p></li>
 </ul>
 <p>All functions are fully tested and also compatible with common deep learning data structures, such as PyTorch /
 Tensorflow tensors as well as NumPy and Jax arrays.  For examples about the usage, consult the documentation
-<a class="reference external" href="https://deep-significance.readthedocs.io/en/latest/">here</a> or the scenarios in the section <a class="reference external" href="#examples">Examples</a>.</p>
-</div>
-<div class="section" id="inbox-tray-installation">
+<a class="reference external" href="https://deep-significance.readthedocs.io/en/latest/">here</a> , the scenarios in the section <a class="reference external" href="#examples">Examples</a> or
+the <a class="reference external" href="https://github.com/Kaleidophon/deep-significance/tree/main/paper/deep-significance%20demo.ipynb">demo Jupyter notebook</a>.</p>
+</section>
+<section id="inbox-tray-installation">
 <h2>|:inbox_tray:| Installation<a class="headerlink" href="#inbox-tray-installation" title="Permalink to this headline">¶</a></h2>
 <p>The package can simply be installed using <code class="docutils literal notranslate"><span class="pre">pip</span></code> by running</p>
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip3</span> <span class="n">install</span> <span class="n">deepsig</span>
@@ -216,62 +217,72 @@ <h2>|:inbox_tray:| Installation<a class="headerlink" href="#inbox-tray-installat
 </pre></div>
 </div>
 <p><strong>Warning</strong>: Installed like this, imports will fail when the clones repository is moved.</p>
-</div>
-<div class="section" id="bookmark-examples">
+</section>
+<section id="bookmark-examples">
 <h2>|:bookmark:| Examples<a class="headerlink" href="#bookmark-examples" title="Permalink to this headline">¶</a></h2>
 <hr class="docutils" />
 <p><strong>tl;dr</strong>: Use <code class="docutils literal notranslate"><span class="pre">aso()</span></code> to compare scores for two models. If the returned <code class="docutils literal notranslate"><span class="pre">eps_min</span> <span class="pre">&lt;</span> <span class="pre">0.5</span></code>, A is better than B. The lower
-<code class="docutils literal notranslate"><span class="pre">eps_min</span></code>, the more confident the result.</p>
+<code class="docutils literal notranslate"><span class="pre">eps_min</span></code>, the more confident the result (we recommend to check <code class="docutils literal notranslate"><span class="pre">eps_min</span> <span class="pre">&lt;</span> <span class="pre">0.2</span></code> and record <code class="docutils literal notranslate"><span class="pre">eps_min</span></code> alongside
+experimental results).</p>
 <p>|:warning:| Testing models with only one set of hyperparameters and only one test set will be able to guarantee superiority
 in all settings. See <a class="reference external" href="#general-recommendations">General Recommendations &amp; other notes</a>.</p>
 <hr class="docutils" />
-<p>In the following, I will lay out three scenarios that describe common use cases for ML practitioners and how to apply
+<p>In the following, we will lay out three scenarios that describe common use cases for ML practitioners and how to apply
 the methods implemented in this package accordingly. For an introduction into statistical hypothesis testing, please
 refer to resources such as <a class="reference external" href="https://machinelearningmastery.com/statistical-hypothesis-tests/">this blog post</a> for a general
 overview or <a class="reference external" href="https://www.aclweb.org/anthology/P18-1128.pdf">Dror et al. (2018)</a> for a NLP-specific point of view.</p>
-<p>In general, in statistical significance testing, we usually compare two algorithms <img src="53d147e7f3fe6e47ee05b88b166bd3f6.svg?invert_in_darkmode" align=middle width=12.32879834999999pt height=22.465723500000017pt/> and <img src="61e84f854bc6258d4108d08d4c4a0852.svg?invert_in_darkmode" align=middle width=13.29340979999999pt height=22.465723500000017pt/> on a dataset <img src="cbfb1b2a33b28eab8a3e59464768e810.svg?invert_in_darkmode" align=middle width=14.908688849999992pt height=22.465723500000017pt/> using
-some evaluation metric <img src="b5eaea000e06d5cf2e882f8fdbc71e36.svg?invert_in_darkmode" align=middle width=19.740822749999992pt height=22.465723500000017pt/> (we assume a higher = better). The difference between the two algorithms on the
-data is then defined as</p>
-<p align="center"><img src="9540dc879d2ecaa7cb245871b24f4e5d.svg?invert_in_darkmode" align=middle width=212.73480854999997pt height=16.438356pt/></p><p>where <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> is our test statistic. We then test the following <strong>null hypothesis</strong>:</p>
-<p align="center"><img src="1d210dbbb93bbdc5a632b9443059499d.svg?invert_in_darkmode" align=middle width=100.49629589999999pt height=16.438356pt/></p><p>Thus, we assume our algorithm A to be equally as good or worse than algorithm B and reject the null hypothesis if A
-is better than B (what we actually would like to see). Most statistical significance tests operate using
-<em>p-values</em>, which define the probability that under the null-hypothesis, the <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> expected by the test is larger than or
-equal to the observed difference <img src="ecdae90a73f512871267f358443bd563.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> (that is, for a one-sided test, i.e. we assume A to be better than B):</p>
-<p align="center"><img src="6d2735c4e335ec03c8b45736da4531a3.svg?invert_in_darkmode" align=middle width=135.91559685pt height=16.438356pt/></p><p>We can interpret this equation as follows: Assuming that A is <em>not</em> better than B, the test assumes a corresponding distribution
-of differences that <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> is drawn from. How does our actually observed difference <img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> fit in there?
-This is what the p-value is expressing: If this probability is high, <img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> is in line with what we expected under
-the null hypothesis, so we conclude A not to better than B. If the
-probability is low, that means that <img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> is quite unlikely under the null hypothesis and that the reverse
-case is more likely - i.e. that it is
-likely <em>larger</em> than <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> - and we conclude that A is indeed better than B. Note that <strong>the p-value does not
-express whether the null hypothesis is true</strong>.</p>
-<p>To decide when we trust A to be better than B, we set a threshold that will determine when the p-value is small enough
-for us to reject the null hypothesis, this is called the significance level <img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/> and it is often set to be 0.05.</p>
-<div class="section" id="intermezzo-almost-stochastic-order-a-better-significance-test-for-deep-neural-networks">
+<p>We assume that we have two sets of scores we would like to compare, <img src="b7e817ab52abd984b082abaa1da6a8e4.svg?invert_in_darkmode" align=middle width=17.44287434999999pt height=22.648391699999998pt/> and <img src="d06f8d92c07734af06da289c13d2beed.svg?invert_in_darkmode" align=middle width=16.80361814999999pt height=22.648391699999998pt/>,
+for instance obtained by running two models <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> and <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/> multiple times with a different random seed.
+We can then define a one-sided test statistic  <img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> based on the gathered observations.
+An example of such test statistics is for instance the difference in observation means. We then formulate the following null-hypothesis:</p>
+<p align="center"><img src="00160c684b3af8ccefcdf19c69712e34.svg?invert_in_darkmode" align=middle width=128.7838134pt height=16.438356pt/></p><p>That means that we actually assume the opposite of our desired case, namely that <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is not better than <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>,
+but equally as good or worse, as indicated by the value of the test statistic.
+Usually, the goal becomes to reject this null hypothesis using the SST.
+<em>p</em>-value testing is a frequentist method in the realm of SST.
+It introduces the notion of data that <em>could have been observed</em> if we were to repeat our experiment again using
+the same conditions, which we will write with superscript <img src="e723e08dae472a15132221e280670a7e.svg?invert_in_darkmode" align=middle width=22.87678634999999pt height=14.15524440000002pt/> in order to distinguish them from our actually
+observed scores (Gelman et al., 2021).
+We then define the <em>p</em>-value as the probability that, under the null hypothesis, the test statistic using replicated
+observation is larger than or equal to the <em>observed</em> test statistic:</p>
+<p align="center"><img src="5db9dda6d48361ba963326d3f98a033d.svg?invert_in_darkmode" align=middle width=216.90071865pt height=17.74869195pt/></p><p>We can interpret this expression as follows: Assuming that <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is not better than <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>, the test
+assumes a corresponding distribution of statistics that <img src="38f1e2a089e53d5c990a82f284948953.svg?invert_in_darkmode" align=middle width=7.928075099999989pt height=22.831056599999986pt/> is drawn from. So how does the observed test statistic
+<img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> fit in here? This is what the <img src="2ec6e630f199f589a2402fdf3e0289d5.svg?invert_in_darkmode" align=middle width=8.270567249999992pt height=14.15524440000002pt/>-value expresses: When the
+probability is high, <img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> is in line with what we expected under the
+null hypothesis, so we can <em>not</em> reject the null hypothesis, or in other words, we \emph{cannot} conclude
+<img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> to be better than <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>. If the probability is low, that means that the observed
+<img src="67ebeedcf8c4d1141331d07b2cef2b03.svg?invert_in_darkmode" align=middle width=54.77736824999999pt height=24.65753399999998pt/> is quite unlikely under the null hypothesis and that the reverse case is
+more likely - i.e. that it is likely larger than - and we conclude that <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is indeed better than
+<img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>. Note that <strong>the <img src="2ec6e630f199f589a2402fdf3e0289d5.svg?invert_in_darkmode" align=middle width=8.270567249999992pt height=14.15524440000002pt/>-value does not express whether the null hypothesis is true</strong>. To make our decision
+about whether or not to reject the null hypothesis, we typically determine a threshold - the significance level
+<img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/>, often set to 0.05 - that the <em>p</em>-value has to fall below. However, it has been argued that a better practice
+involves reporting the <em>p</em>-value alongside the results without a pidgeonholing of results into significant and non-significant
+(Wasserstein et al., 2019).</p>
+<section id="intermezzo-almost-stochastic-order-a-better-significance-test-for-deep-neural-networks">
 <h3>Intermezzo: Almost Stochastic Order - a better significance test for Deep Neural Networks<a class="headerlink" href="#intermezzo-almost-stochastic-order-a-better-significance-test-for-deep-neural-networks" title="Permalink to this headline">¶</a></h3>
 <p>Deep neural networks are highly non-linear models, having their performance highly dependent on hyperparameters, random
 seeds and other (stochastic) factors. Therefore, comparing the means of two models across several runs might not be
 enough to decide if a model A is better than B. In fact, <strong>even aggregating more statistics like standard deviation, minimum
-or maximum might not be enough</strong> to make a decision. For this reason, Dror et al. (2019) introduced <em>Almost Stochastic
-Order</em> (ASO), a test to compare two score distributions.</p>
+or maximum might not be enough</strong> to make a decision. For this reason, del Barrio et al. (2017) and Dror et al. (2019)
+introduced <em>Almost Stochastic Order</em> (ASO), a test to compare two score distributions.</p>
 <p>It builds on the concept of <em>stochastic order</em>: We can compare two distributions and declare one as <em>stochastically dominant</em>
 by comparing their cumulative distribution functions:</p>
 <p><img alt="_images/so.png" src="_images/so.png" /></p>
 <p>Here, the CDF of A is given in red and in green for B. If the CDF of A is lower than B for every <img src="332cc365a4987aacce0ead01b8bdcc0b.svg?invert_in_darkmode" align=middle width=9.39498779999999pt height=14.15524440000002pt/>, we know the
 algorithm A to score higher. However, in practice these cases are rarely so clear-cut (imagine e.g. two normal
 distributions with the same mean but different variances).
-For this reason, Dror et al. (2019) consider the notion of <em>almost stochastic dominance</em> by quantifying the extent to
-which stochastic order is being violated (red area):</p>
+For this reason, del Barrio et al. (2017) and Dror et al. (2019) consider the notion of <em>almost stochastic dominance</em>
+by quantifying the extent to which stochastic order is being violated (red area):</p>
 <p><img alt="_images/aso.png" src="_images/aso.png" /></p>
-<p>ASO returns a value <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>, which expresses the amount of violation of stochastic order. If
-<img src="dabed7f05cf133d9eb92631d564a96a8.svg?invert_in_darkmode" align=middle width=72.19750559999999pt height=21.18721440000001pt/>, A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as
+<p>ASO returns a value <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>, which expresses (an upper bound to) the amount of violation of stochastic order. If
+<img src="4cd4877610a47d915f39367760234822.svg?invert_in_darkmode" align=middle width=60.239714699999986pt height=17.723762100000005pt/> (where \tau is 0.5 or less), A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as
 superior. We can also interpret <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> as a <em>confidence score</em>. The lower it is, the more sure we can be
 that A is better than B. Note: <strong>ASO does not compute p-values.</strong> Instead, the null hypothesis formulated as</p>
-<p align="center"><img src="69c5ac8ce10d0dbd0c2b915aaf0472c1.svg?invert_in_darkmode" align=middle width=106.93478895pt height=13.698590399999999pt/></p><p>If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5.
+<p align="center"><img src="06f5ff6214110287d3948e9b44e31a1f.svg?invert_in_darkmode" align=middle width=94.97699804999999pt height=13.698590399999999pt/></p><p>If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5
+(see the discussion in <a class="reference external" href="#general-recommendations">this section</a>).
 Furthermore, the significance level <img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/> is determined as an input argument when running ASO and actively influence
 the resulting <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>.</p>
-</div>
-<div class="section" id="scenario-1-comparing-multiple-runs-of-two-models">
+</section>
+<section id="scenario-1-comparing-multiple-runs-of-two-models">
 <h3>Scenario 1 - Comparing multiple runs of two models<a class="headerlink" href="#scenario-1-comparing-multiple-runs-of-two-models" title="Permalink to this headline">¶</a></h3>
 <p>In the simplest scenario, we have retrieved a set of scores from a model A and a baseline B on a dataset, stemming from
 various model runs with different seeds. We want to test whether our model A is better than B (higher scores = better)-
@@ -279,12 +290,15 @@ <h3>Scenario 1 - Comparing multiple runs of two models<a class="headerlink" href
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 <span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">aso</span>
 
+<span class="n">seed</span> <span class="o">=</span> <span class="mi">1234</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
+
 <span class="c1"># Simulate scores</span>
 <span class="n">N</span> <span class="o">=</span> <span class="mi">5</span>  <span class="c1"># Number of random seeds</span>
 <span class="n">my_model_scores</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mf">0.9</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">N</span><span class="p">)</span>
 <span class="n">baseline_scores</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">N</span><span class="p">)</span>
 
-<span class="n">min_eps</span> <span class="o">=</span> <span class="n">aso</span><span class="p">(</span><span class="n">my_model_scores</span><span class="p">,</span> <span class="n">baseline_scores</span><span class="p">)</span>  <span class="c1"># min_eps = 0.0, so A is better</span>
+<span class="n">min_eps</span> <span class="o">=</span> <span class="n">aso</span><span class="p">(</span><span class="n">my_model_scores</span><span class="p">,</span> <span class="n">baseline_scores</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span>  <span class="c1"># min_eps = 0.225, so A is better</span>
 </pre></div>
 </div>
 <p>Note that ASO <strong>does not make any assumptions about the distributions of the scores</strong>.
@@ -292,8 +306,8 @@ <h3>Scenario 1 - Comparing multiple runs of two models<a class="headerlink" href
 (to apply ASO to cases where lower scores indicate better performances, just multiple your scores by -1 before feeding
 them into the function). The more scores of model runs is supplied, the more reliable
 the test becomes, so try to collect scores from as many runs as possible to reject the null hypothesis confidently.</p>
-</div>
-<div class="section" id="scenario-2-comparing-multiple-runs-across-datasets">
+</section>
+<section id="scenario-2-comparing-multiple-runs-across-datasets">
 <h3>Scenario 2 - Comparing multiple runs across datasets<a class="headerlink" href="#scenario-2-comparing-multiple-runs-across-datasets" title="Permalink to this headline">¶</a></h3>
 <p>When comparing models across datasets, we formulate one null hypothesis per dataset. However, we have to make sure not to
 fall prey to the <a class="reference external" href="https://en.wikipedia.org/wiki/Multiple_comparisons_problem">multiple comparisons problem</a>: In short,
@@ -303,6 +317,9 @@ <h3>Scenario 2 - Comparing multiple runs across datasets<a class="headerlink" hr
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 <span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">aso</span> 
 
+<span class="n">seed</span> <span class="o">=</span> <span class="mi">1234</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
+
 <span class="c1"># Simulate scores for three datasets</span>
 <span class="n">M</span> <span class="o">=</span> <span class="mi">3</span>  <span class="c1"># Number of datasets</span>
 <span class="n">N</span> <span class="o">=</span> <span class="mi">5</span>  <span class="c1"># Number of random seeds</span>
@@ -310,12 +327,12 @@ <h3>Scenario 2 - Comparing multiple runs across datasets<a class="headerlink" hr
 <span class="n">baseline_scores_per_dataset</span>  <span class="o">=</span> <span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">N</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">M</span><span class="p">)]</span>
 
 <span class="c1"># epsilon_min values with Bonferroni correction </span>
-<span class="n">eps_min</span> <span class="o">=</span> <span class="p">[</span><span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.05</span> <span class="o">/</span> <span class="n">M</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">my_model_scores_per_dataset</span><span class="p">,</span> <span class="n">baseline_scores_per_dataset</span><span class="p">)]</span>
-<span class="c1"># eps_min = [0.1565800030782686, 1, 0.0]</span>
+<span class="n">eps_min</span> <span class="o">=</span> <span class="p">[</span><span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">num_comparisons</span><span class="o">=</span><span class="n">M</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">my_model_scores_per_dataset</span><span class="p">,</span> <span class="n">baseline_scores_per_dataset</span><span class="p">)]</span>
+<span class="c1"># eps_min = [0.006370113450148568, 0.6534772728574852, 0.0]</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="scenario-3-comparing-sample-level-scores">
+</section>
+<section id="scenario-3-comparing-sample-level-scores">
 <h3>Scenario 3 - Comparing sample-level scores<a class="headerlink" href="#scenario-3-comparing-sample-level-scores" title="Permalink to this headline">¶</a></h3>
 <p>In previous examples, we have assumed that we compare two algorithms A and B based on their performance per run, i.e.
 we run each algorithm once per random seed and obtain exactly one score on our test set. In some cases however,
@@ -328,6 +345,9 @@ <h3>Scenario 3 - Comparing sample-level scores<a class="headerlink" href="#scena
 <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 <span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">aso</span> 
 
+<span class="n">seed</span> <span class="o">=</span> <span class="mi">1234</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
+
 <span class="c1"># Simulate scores for three datasets</span>
 <span class="n">M</span> <span class="o">=</span> <span class="mi">40</span>   <span class="c1"># Number of data points</span>
 <span class="n">N</span> <span class="o">=</span> <span class="mi">3</span>  <span class="c1"># Number of random seeds</span>
@@ -336,11 +356,13 @@ <h3>Scenario 3 - Comparing sample-level scores<a class="headerlink" href="#scena
 <span class="n">pairs</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">product</span><span class="p">(</span><span class="n">my_model_scored_samples_per_run</span><span class="p">,</span> <span class="n">baseline_scored_samples_per_run</span><span class="p">))</span>
 
 <span class="c1"># epsilon_min values with Bonferroni correction </span>
-<span class="n">eps_min</span> <span class="o">=</span> <span class="p">[</span><span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.05</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">pairs</span><span class="p">))</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="n">pairs</span><span class="p">]</span>
+<span class="n">eps_min</span> <span class="o">=</span> <span class="p">[</span><span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">num_comparisons</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">pairs</span><span class="p">),</span> <span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="n">pairs</span><span class="p">]</span>
+<span class="c1"># eps_min = [0.3831678636198528, 0.07194780234194881, 0.9152792807128325, 0.5273463008857844, 0.14946944524461184, 1.0, </span>
+<span class="c1"># 0.6099543280369378, 0.22387448804041898, 1.0]</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="scenario-4-comparing-more-than-two-models">
+</section>
+<section id="scenario-4-comparing-more-than-two-models">
 <h3>Scenario 4 - Comparing more than two models<a class="headerlink" href="#scenario-4-comparing-more-than-two-models" title="Permalink to this headline">¶</a></h3>
 <p>Similarly, when comparing multiple models (now again on a per-seed basis), we can use a similar approach like in the
 previous example. For instance, for three models, we can create a <img src="9f2b6b0a7f3d99fd3f396a1515926eb3.svg?invert_in_darkmode" align=middle width=36.52961069999999pt height=21.18721440000001pt/> matrix and fill the entries
@@ -358,6 +380,9 @@ <h3>Scenario 4 - Comparing more than two models<a class="headerlink" href="#scen
 Let’s look at an example:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> 
 <span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">multi_aso</span> 
+
+<span class="n">seed</span> <span class="o">=</span> <span class="mi">1234</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
  
 <span class="n">N</span> <span class="o">=</span> <span class="mi">5</span>  <span class="c1"># Number of random seeds</span>
 <span class="n">M</span> <span class="o">=</span> <span class="mi">3</span>  <span class="c1"># Number of different models / algorithms</span>
@@ -366,23 +391,25 @@ <h3>Scenario 4 - Comparing more than two models<a class="headerlink" href="#scen
 <span class="c1"># Here, we will sample from N(0.1, 0.8), N(0.15, 0.8), N(0.2, 0.8)</span>
 <span class="n">my_models_scores</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="n">loc</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">N</span><span class="p">)</span> <span class="k">for</span> <span class="n">loc</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.1</span> <span class="o">+</span> <span class="mf">0.05</span> <span class="o">*</span> <span class="n">M</span><span class="p">,</span> <span class="n">step</span><span class="o">=</span><span class="mf">0.05</span><span class="p">)])</span>
 
-<span class="n">eps_min</span> <span class="o">=</span> <span class="n">multi_aso</span><span class="p">(</span><span class="n">my_models_scores</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.05</span><span class="p">)</span>
+<span class="n">eps_min</span> <span class="o">=</span> <span class="n">multi_aso</span><span class="p">(</span><span class="n">my_models_scores</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span>
     
 <span class="c1"># eps_min =</span>
-<span class="c1"># array([[1., 1., 1.],</span>
-<span class="c1">#        [0., 1., 1.],</span>
-<span class="c1">#        [0., 0., 1.]])</span>
+<span class="c1"># array([[1.       , 0.92621655, 1.        ],</span>
+<span class="c1">#       [1.        , 1.        , 1.        ],</span>
+<span class="c1">#       [0.82081635, 0.73048716, 1.        ]])</span>
 </pre></div>
 </div>
 <p>In the example, <code class="docutils literal notranslate"><span class="pre">eps_min</span></code> is now a matrix, containing the <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> score between all pairs of models (for
 the same model, it set to 1 by default). The matrix is always to be read as ASO(row, column).</p>
 <p>The function applies the bonferroni correction for multiple comparisons by
-default, but this can be turned off by using <code class="docutils literal notranslate"><span class="pre">use_bonferroni=False</span></code>. In order to save compute, the above symmetry
-property is used as well, but this can also be disabled by <code class="docutils literal notranslate"><span class="pre">use_symmetry=False</span></code>.</p>
+default, but this can be turned off by using <code class="docutils literal notranslate"><span class="pre">use_bonferroni=False</span></code>.</p>
 <p>Lastly, when the <code class="docutils literal notranslate"><span class="pre">scores</span></code> argument is a dictionary and the function is called with <code class="docutils literal notranslate"><span class="pre">return_df=True</span></code>, the resulting matrix is
 given as a <code class="docutils literal notranslate"><span class="pre">pandas.DataFrame</span></code> for increased readability:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> 
 <span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">multi_aso</span> 
+
+<span class="n">seed</span> <span class="o">=</span> <span class="mi">1234</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
  
 <span class="n">N</span> <span class="o">=</span> <span class="mi">5</span>  <span class="c1"># Number of random seeds</span>
 <span class="n">M</span> <span class="o">=</span> <span class="mi">3</span>  <span class="c1"># Number of different models / algorithms</span>
@@ -399,18 +426,18 @@ <h3>Scenario 4 - Comparing more than two models<a class="headerlink" href="#scen
 <span class="c1">#   ...</span>
 <span class="c1"># }</span>
 
-<span class="n">eps_min</span> <span class="o">=</span> <span class="n">multi_aso</span><span class="p">(</span><span class="n">my_models_scores</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.05</span><span class="p">,</span> <span class="n">return_df</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
+<span class="n">eps_min</span> <span class="o">=</span> <span class="n">multi_aso</span><span class="p">(</span><span class="n">my_models_scores</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">return_df</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span>
     
 <span class="c1"># This is now a DataFrame!</span>
 <span class="c1"># eps_min =</span>
-<span class="c1">#           model 1   model 2  model 3</span>
-<span class="c1"># model 1       1.0       1.0      1.0</span>
-<span class="c1"># model 2       0.0       1.0      1.0</span>
-<span class="c1"># model 3       1.0       0.0      1.0</span>
+<span class="c1">#          model 1   model 2  model 3</span>
+<span class="c1"># model 1  1.000000  0.926217      1.0</span>
+<span class="c1"># model 2  1.000000  1.000000      1.0</span>
+<span class="c1"># model 3  0.820816  0.730487      1.0</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="newspaper-how-to-report-results">
+</section>
+<section id="newspaper-how-to-report-results">
 <h3>|:newspaper:| How to report results<a class="headerlink" href="#newspaper-how-to-report-results" title="Permalink to this headline">¶</a></h3>
 <p>When ASO used, two important details have to be reported, namely the confidence level <img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/> and the <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>
 score. Below lists some example snippets reporting the results of scenarios 1 and 4:</p>
@@ -419,11 +446,11 @@ <h3>|:newspaper:| How to report results<a class="headerlink" href="#newspaper-ho
 
 We compared all pairs of models based on five random seeds each using ASO with a confidence level of 
 $\alpha = 0.05$ (before adjusting for all pair-wise comparisons using the Bonferroni correction). Almost stochastic 
-dominance ($\epsilon_\text{min} &lt; 0.5)$ is indicated in table X.
+dominance ($\epsilon_\text{min} &lt; \tau$ with $\tau = 0.2$) is indicated in table X.
 </pre></div>
 </div>
-</div>
-<div class="section" id="control-knobs-sample-size">
+</section>
+<section id="control-knobs-sample-size">
 <h3>|:control_knobs:| Sample size<a class="headerlink" href="#control-knobs-sample-size" title="Permalink to this headline">¶</a></h3>
 <p>It can be hard to determine whether the currently collected set of scores is large enough to allow for reliable
 significance testing or whether more scores are required. For this reason, <code class="docutils literal notranslate"><span class="pre">deep-significance</span></code> also implements functions to aid the decision of whether to
@@ -469,10 +496,10 @@ <h3>|:control_knobs:| Sample size<a class="headerlink" href="#control-knobs-samp
 <span class="c1"># But adding two runs to scores2 only increases tightness by 1.06! So spending two more runs on scores1 is better</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="sparkles-other-features">
+</section>
+<section id="sparkles-other-features">
 <h3>|:sparkles:| Other features<a class="headerlink" href="#sparkles-other-features" title="Permalink to this headline">¶</a></h3>
-<div class="section" id="rocket-for-the-impatient-aso-with-multi-threading">
+<section id="rocket-for-the-impatient-aso-with-multi-threading">
 <h4>|:rocket:| For the impatient: ASO with multi-threading<a class="headerlink" href="#rocket-for-the-impatient-aso-with-multi-threading" title="Permalink to this headline">¶</a></h4>
 <p>Waiting for all the bootstrap iterations to finish can feel tedious, especially when doing many comparisons. Therefore,
 ASO supports multithreading using <code class="docutils literal notranslate"><span class="pre">joblib</span></code>
@@ -481,15 +508,15 @@ <h4>|:rocket:| For the impatient: ASO with multi-threading<a class="headerlink"
 <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 <span class="kn">from</span> <span class="nn">timeit</span> <span class="kn">import</span> <span class="n">timeit</span>
 
-<span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
-<span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
+<span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
+<span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
 
-<span class="nb">print</span><span class="p">(</span><span class="n">timeit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">num_jobs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">show_progress</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>  <span class="c1"># 146.6909574989986</span>
-<span class="nb">print</span><span class="p">(</span><span class="n">timeit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">num_jobs</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">show_progress</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>  <span class="c1"># 50.416724971000804</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">timeit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">num_jobs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">show_progress</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>  <span class="c1"># 393.6318126</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">timeit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">num_jobs</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">show_progress</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>  <span class="c1"># 139.73514621799995n</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="electric-plug-compatibility-with-pytorch-tensorflow-jax-numpy">
+</section>
+<section id="electric-plug-compatibility-with-pytorch-tensorflow-jax-numpy">
 <h4>|:electric_plug:| Compatibility with PyTorch, Tensorflow, Jax &amp; Numpy<a class="headerlink" href="#electric-plug-compatibility-with-pytorch-tensorflow-jax-numpy" title="Permalink to this headline">¶</a></h4>
 <p>All tests implemented in this package also can take PyTorch / Tensorflow tensors and Jax or NumPy arrays as arguments:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">aso</span> 
@@ -501,13 +528,13 @@ <h4>|:electric_plug:| Compatibility with PyTorch, Tensorflow, Jax &amp; Numpy<a
 <span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>  <span class="c1"># It just works!</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="woman-farmer-setting-seeds-for-replicability">
+</section>
+<section id="woman-farmer-setting-seeds-for-replicability">
 <h4>|:woman_farmer:| Setting seeds for replicability<a class="headerlink" href="#woman-farmer-setting-seeds-for-replicability" title="Permalink to this headline">¶</a></h4>
 <p>In order to ensure replicability, both <code class="docutils literal notranslate"><span class="pre">aso()</span></code> and <code class="docutils literal notranslate"><span class="pre">multi_aso()</span></code> supply as <code class="docutils literal notranslate"><span class="pre">seed</span></code> argument. This even works
 when multiple jobs are used!</p>
-</div>
-<div class="section" id="game-die-permutation-and-bootstrap-test">
+</section>
+<section id="game-die-permutation-and-bootstrap-test">
 <h4>|:game_die:| Permutation and bootstrap test<a class="headerlink" href="#game-die-permutation-and-bootstrap-test" title="Permalink to this headline">¶</a></h4>
 <p>Should you be suspicious of ASO and want to revert to the good old faithful tests, this package also implements
 the paired-bootstrap as well as the permutation randomization test. Note that as discussed in the next section, these
@@ -523,9 +550,9 @@ <h4>|:game_die:| Permutation and bootstrap test<a class="headerlink" href="#game
 <span class="nb">print</span><span class="p">(</span><span class="n">bootstrap_test</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">))</span>    <span class="c1"># 0.103</span>
 </pre></div>
 </div>
-</div>
-</div>
-<div class="section" id="general-recommendations-other-notes">
+</section>
+</section>
+<section id="general-recommendations-other-notes">
 <h3>General recommendations &amp; other notes<a class="headerlink" href="#general-recommendations-other-notes" title="Permalink to this headline">¶</a></h3>
 <ul class="simple">
 <li><p>Naturally, the CDFs built from <code class="docutils literal notranslate"><span class="pre">scores_a</span></code> and <code class="docutils literal notranslate"><span class="pre">scores_b</span></code> can only be approximations of the true distributions. Therefore,
@@ -534,10 +561,13 @@ <h3>General recommendations &amp; other notes<a class="headerlink" href="#genera
 <strong>always</strong> be preferable. Ideally, scores should be obtained even using different sets of hyperparameters per model.
 Because this is usually infeasible in practice, Bouthilier et al. (2020) recommend to <strong>vary all other sources of variation</strong>
 between runs to obtain the most trustworthy estimate of the “true” performance, such as data shuffling, weight initialization etc.</p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">num_samples</span></code> and <code class="docutils literal notranslate"><span class="pre">num_bootstrap_iterations</span></code> can be reduced to increase the speed of <code class="docutils literal notranslate"><span class="pre">aso()</span></code>. However, this is not
+<li><p><code class="docutils literal notranslate"><span class="pre">num_bootstrap_iterations</span></code> can be reduced to increase the speed of <code class="docutils literal notranslate"><span class="pre">aso()</span></code>. However, this is not
 recommended as the result of the test will also become less accurate. Technically, <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> is a upper bound
 that becomes tighter with the number of samples and bootstrap iterations (del Barrio et al., 2017). Thus, increasing
 the number of jobs with <code class="docutils literal notranslate"><span class="pre">num_jobs</span></code> instead is always preferred.</p></li>
+<li><p>While we could declare a model stochastically dominant with <img src="dabed7f05cf133d9eb92631d564a96a8.svg?invert_in_darkmode" align=middle width=72.19750559999999pt height=21.18721440000001pt/>, we found this to have a comparatively high
+Type I error (false positives). Tests <a class="reference external" href="https://arxiv.org/pdf/2204.06815.pdf">in our paper</a> have shown that a more useful threshold that trades of Type I and
+Type II error between different scenarios might be <img src="9ac49cb370a5b09fca29068ea18eab63.svg?invert_in_darkmode" align=middle width=51.969107849999986pt height=21.18721440000001pt/>.</p></li>
 <li><p>Bootstrap and permutation-randomization are all non-parametric tests, i.e. they don’t make any assumptions about
 the distribution of our test metric. Nevertheless, they differ in their <em>statistical power</em>, which is defined as the probability
 that the null hypothesis is being rejected given that there is a difference between A and B. In other words, the more powerful
@@ -547,10 +577,19 @@ <h3>General recommendations &amp; other notes<a class="headerlink" href="#genera
 because these test are in turn less applicable in a Deep Learning setting due to the reasons elaborated on in
 <a class="reference external" href="#interrobang-why">Why?</a>, ASO is still a better choice.</p></li>
 </ul>
-</div>
-<div class="section" id="mortar-board-cite">
+</section>
+<section id="mortar-board-cite">
 <h3>|:mortar_board:| Cite<a class="headerlink" href="#mortar-board-cite" title="Permalink to this headline">¶</a></h3>
-<p>If you use the ASO test via <code class="docutils literal notranslate"><span class="pre">aso()</span></code>, please cite the original work:</p>
+<p>Using this package in general, please cite the following:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nd">@article</span><span class="p">{</span><span class="n">ulmer2022deep</span><span class="p">,</span>
+  <span class="n">title</span><span class="o">=</span><span class="p">{</span><span class="n">deep</span><span class="o">-</span><span class="n">significance</span><span class="o">-</span><span class="n">Easy</span> <span class="ow">and</span> <span class="n">Meaningful</span> <span class="n">Statistical</span> <span class="n">Significance</span> <span class="n">Testing</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">Age</span> <span class="n">of</span> <span class="n">Neural</span> <span class="n">Networks</span><span class="p">},</span>
+  <span class="n">author</span><span class="o">=</span><span class="p">{</span><span class="n">Ulmer</span><span class="p">,</span> <span class="n">Dennis</span> <span class="ow">and</span> <span class="n">Hardmeier</span><span class="p">,</span> <span class="n">Christian</span> <span class="ow">and</span> <span class="n">Frellsen</span><span class="p">,</span> <span class="n">Jes</span><span class="p">},</span>
+  <span class="n">journal</span><span class="o">=</span><span class="p">{</span><span class="n">arXiv</span> <span class="n">preprint</span> <span class="n">arXiv</span><span class="p">:</span><span class="mf">2204.06815</span><span class="p">},</span>
+  <span class="n">year</span><span class="o">=</span><span class="p">{</span><span class="mi">2022</span><span class="p">}</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>If you use the ASO test via <code class="docutils literal notranslate"><span class="pre">aso()</span></code> or `multi_aso, please cite the original works:</p>
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nd">@inproceedings</span><span class="p">{</span><span class="n">dror2019deep</span><span class="p">,</span>
   <span class="n">author</span>    <span class="o">=</span> <span class="p">{</span><span class="n">Rotem</span> <span class="n">Dror</span> <span class="ow">and</span>
                <span class="n">Segev</span> <span class="n">Shlomov</span> <span class="ow">and</span>
@@ -569,25 +608,24 @@ <h3>|:mortar_board:| Cite<a class="headerlink" href="#mortar-board-cite" title="
   <span class="n">doi</span>       <span class="o">=</span> <span class="p">{</span><span class="mf">10.18653</span><span class="o">/</span><span class="n">v1</span><span class="o">/</span><span class="n">p19</span><span class="o">-</span><span class="mi">1266</span><span class="p">},</span>
   <span class="n">timestamp</span> <span class="o">=</span> <span class="p">{</span><span class="n">Tue</span><span class="p">,</span> <span class="mi">28</span> <span class="n">Jan</span> <span class="mi">2020</span> <span class="mi">10</span><span class="p">:</span><span class="mi">27</span><span class="p">:</span><span class="mi">52</span> <span class="o">+</span><span class="mi">0100</span><span class="p">},</span>
 <span class="p">}</span>
-</pre></div>
-</div>
-<p>Using this package in general, please cite the following:</p>
-<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nd">@software</span><span class="p">{</span><span class="n">dennis_ulmer_2021_4638709</span><span class="p">,</span>
-  <span class="n">author</span>       <span class="o">=</span> <span class="p">{</span><span class="n">Dennis</span> <span class="n">Ulmer</span><span class="p">},</span>
-  <span class="n">title</span>        <span class="o">=</span> <span class="p">{{</span><span class="n">deep</span><span class="o">-</span><span class="n">significance</span><span class="p">:</span> <span class="n">Easy</span> <span class="ow">and</span> <span class="n">Better</span> <span class="n">Significance</span> 
-                   <span class="n">Testing</span> <span class="k">for</span> <span class="n">Deep</span> <span class="n">Neural</span> <span class="n">Networks</span><span class="p">}},</span>
-  <span class="n">month</span>        <span class="o">=</span> <span class="n">mar</span><span class="p">,</span>
-  <span class="n">year</span>         <span class="o">=</span> <span class="mi">2021</span><span class="p">,</span>
-  <span class="n">note</span>         <span class="o">=</span> <span class="p">{</span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">Kaleidophon</span><span class="o">/</span><span class="n">deep</span><span class="o">-</span><span class="n">significance</span><span class="p">},</span>
-  <span class="n">publisher</span>    <span class="o">=</span> <span class="p">{</span><span class="n">Zenodo</span><span class="p">},</span>
-  <span class="n">version</span>      <span class="o">=</span> <span class="p">{</span><span class="n">v1</span><span class="o">.</span><span class="mf">0.0</span><span class="n">a</span><span class="p">},</span>
-  <span class="n">doi</span>          <span class="o">=</span> <span class="p">{</span><span class="mf">10.5281</span><span class="o">/</span><span class="n">zenodo</span><span class="o">.</span><span class="mi">4638709</span><span class="p">},</span>
-  <span class="n">url</span>          <span class="o">=</span> <span class="p">{</span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">doi</span><span class="o">.</span><span class="n">org</span><span class="o">/</span><span class="mf">10.5281</span><span class="o">/</span><span class="n">zenodo</span><span class="o">.</span><span class="mi">4638709</span><span class="p">}</span>
+
+<span class="nd">@incollection</span><span class="p">{</span><span class="n">del2018optimal</span><span class="p">,</span>
+  <span class="n">title</span><span class="o">=</span><span class="p">{</span><span class="n">An</span> <span class="n">optimal</span> <span class="n">transportation</span> <span class="n">approach</span> <span class="k">for</span> <span class="n">assessing</span> <span class="n">almost</span> <span class="n">stochastic</span> <span class="n">order</span><span class="p">},</span>
+  <span class="n">author</span><span class="o">=</span><span class="p">{</span><span class="n">Del</span> <span class="n">Barrio</span><span class="p">,</span> <span class="n">Eustasio</span> <span class="ow">and</span> <span class="n">Cuesta</span><span class="o">-</span><span class="n">Albertos</span><span class="p">,</span> <span class="n">Juan</span> <span class="n">A</span> <span class="ow">and</span> <span class="n">Matr</span><span class="p">{</span>\<span class="s1">&#39;a}n, Carlos},</span>
+  <span class="n">booktitle</span><span class="o">=</span><span class="p">{</span><span class="n">The</span> <span class="n">Mathematics</span> <span class="n">of</span> <span class="n">the</span> <span class="n">Uncertain</span><span class="p">},</span>
+  <span class="n">pages</span><span class="o">=</span><span class="p">{</span><span class="mi">33</span><span class="o">--</span><span class="mi">44</span><span class="p">},</span>
+  <span class="n">year</span><span class="o">=</span><span class="p">{</span><span class="mi">2018</span><span class="p">},</span>
+  <span class="n">publisher</span><span class="o">=</span><span class="p">{</span><span class="n">Springer</span><span class="p">}</span>
 <span class="p">}</span>
 </pre></div>
 </div>
+<p>For instance, you can write</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">In</span> <span class="n">order</span> <span class="n">to</span> <span class="n">compare</span> <span class="n">models</span><span class="p">,</span> <span class="n">we</span> <span class="n">use</span> <span class="n">the</span> <span class="n">Almost</span> <span class="n">Stochastic</span> <span class="n">Order</span> <span class="n">test</span> \<span class="n">citep</span><span class="p">{</span><span class="n">del2018optimal</span><span class="p">,</span> <span class="n">dror2019deep</span><span class="p">}</span> <span class="k">as</span> 
+<span class="n">implemented</span> <span class="n">by</span> \<span class="n">citet</span><span class="p">{</span><span class="n">ulmer2022deep</span><span class="p">}</span><span class="o">.</span>
+</pre></div>
 </div>
-<div class="section" id="medal-sports-acknowledgements">
+</section>
+<section id="medal-sports-acknowledgements">
 <h3>|:medal_sports:| Acknowledgements<a class="headerlink" href="#medal-sports-acknowledgements" title="Permalink to this headline">¶</a></h3>
 <p>This package was created out of discussions of the <a class="reference external" href="https://nlpnorth.github.io/">NLPnorth group</a> at the IT University
 Copenhagen, whose members I want to thank for their feedback. The code in this repository is in multiple places based on
@@ -597,8 +635,8 @@ <h3>|:medal_sports:| Acknowledgements<a class="headerlink" href="#medal-sports-a
 answer questions and provide feedback to the implementation and documentation of this package.</p>
 <p>The commit message template used in this project can be found <a class="reference external" href="https://github.com/Kaleidophon/commit-template-for-humans">here</a>.
 The inline latex equations were rendered using <a class="reference external" href="https://github.com/leegao/readme2tex">readme2latex</a>.</p>
-</div>
-<div class="section" id="people-holding-hands-papers-using-deep-significance">
+</section>
+<section id="people-holding-hands-papers-using-deep-significance">
 <h3>|:people_holding_hands:| Papers using deep-significance<a class="headerlink" href="#people-holding-hands-papers-using-deep-significance" title="Permalink to this headline">¶</a></h3>
 <p>In this last section of the readme, I would like to refer to works already using <code class="docutils literal notranslate"><span class="pre">deep-significance</span></code>. Open an issue or
 pull request if you would like to see your work added here!</p>
@@ -606,8 +644,8 @@ <h3>|:people_holding_hands:| Papers using deep-significance<a class="headerlink"
 <li><p><a class="reference external" href="https://robvanderg.github.io/doc/naacl2021.pdf">“From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding” (van der Groot et al., 2021)</a></p></li>
 <li><p><a class="reference external" href="https://arxiv.org/pdf/2109.04282.pdf">“Cartography Active Learning” (Zhang &amp; Plank, 2021)</a></p></li>
 </ul>
-</div>
-<div class="section" id="books-bibliography">
+</section>
+<section id="books-bibliography">
 <h3>|:books:| Bibliography<a class="headerlink" href="#books-bibliography" title="Permalink to this headline">¶</a></h3>
 <p>Del Barrio, Eustasio, Juan A. Cuesta-Albertos, and Carlos Matrán. “An optimal transportation approach for assessing almost stochastic order.” The Mathematics of the Uncertain. Springer, Cham, 2018. 33-44.</p>
 <p>Bonferroni, Carlo. “Teoria statistica delle classi e calcolo delle probabilita.” Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze 8 (1936): 3-62.</p>
@@ -616,14 +654,18 @@ <h3>|:books:| Bibliography<a class="headerlink" href="#books-bibliography" title
 <p>Dror, Rotem, et al. “The hitchhiker’s guide to testing statistical significance in natural language processing.” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018.</p>
 <p>Dror, Rotem, Shlomov, Segev, and Reichart, Roi. “Deep dominance-how to properly compare deep neural models.” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019.</p>
 <p>Efron, Bradley, and Robert J. Tibshirani. “An introduction to the bootstrap.” CRC press, 1994.</p>
+<p>Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, Donald B Rubin, John
+Carlin, Hal Stern, Donald Rubin, and David Dunson. Bayesian data analysis third edition, 2021.</p>
 <p>Henderson, Peter, et al. “Deep reinforcement learning that matters.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. No. 1. 2018.</p>
 <p>Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. “Visualizing the Loss Landscape of Neural Nets.” NeurIPS 2018: 6391-6401</p>
 <p>Narang, Sharan, et al. “Do Transformer Modifications Transfer Across Implementations and Applications?.” arXiv preprint arXiv:2102.11972 (2021).</p>
 <p>Noreen, Eric W. “Computer intensive methods for hypothesis testing: An introduction.” Wiley, New York (1989).</p>
+<p>Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. Moving to a world beyond “p&lt; 0.05”,
+2019</p>
 <p>Yuan, Ke‐Hai, and Kentaro Hayashi. “Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models.” British Journal of Mathematical and Statistical Psychology 56.1 (2003): 93-110.</p>
-</div>
-</div>
-</div>
+</section>
+</section>
+</section>
 
 
           </div>
@@ -636,7 +678,7 @@ <h3>|:books:| Bibliography<a class="headerlink" href="#books-bibliography" title
 <footer class="footer d-flex justify-content-between flex-wrap">
     <div class="copyright">
         <div>&copy; Copyright 2021, Dennis Ulmer.</div>
-      <div>Generated by <a href="http://sphinx.pocoo.org/">Sphinx</a> 3.2.1 using <a href="https://github.com/myyasuda/sphinxbootstrap4theme">sphinxbootstrap4theme</a> 0.6.0.</div>
+      <div>Generated by <a href="http://sphinx.pocoo.org/">Sphinx</a> 4.1.2 using <a href="https://github.com/myyasuda/sphinxbootstrap4theme">sphinxbootstrap4theme</a> 0.6.0.</div>
     </div>
     <div>
         <a href="#" class="btn btn-primary btn-sm" role="botton">Back to top</a>
diff --git a/docs/build/html/_sources/README_DOCS.md.txt b/docs/build/html/_sources/README_DOCS.md.txt
index b344874..61cb439 100644
--- a/docs/build/html/_sources/README_DOCS.md.txt
+++ b/docs/build/html/_sources/README_DOCS.md.txt
@@ -47,14 +47,15 @@ Reinforcement Learning (Henderson et al., 2018) and Computer Vision (Borji, 2017
 
 To help mitigate this problem, this package supplies fully-tested re-implementations of useful functions for significance
 testing:
-* Statistical Significance tests such as Almost Stochastic Order (Dror et al., 2019), bootstrap (Efron & Tibshirani, 1994) and 
-  permutation-randomization (Noreen, 1989).
+* Statistical Significance tests such as Almost Stochastic Order (del Barrio et al, 2017; Dror et al., 2019), 
+  bootstrap (Efron & Tibshirani, 1994) and permutation-randomization (Noreen, 1989).
 * Bonferroni correction methods for multiplicity in datasets (Bonferroni, 1936). 
 * Bootstrap power analysis (Yuan & Hayashi, 2003) and other functions to determine the right sample size.
 
 All functions are fully tested and also compatible with common deep learning data structures, such as PyTorch / 
 Tensorflow tensors as well as NumPy and Jax arrays.  For examples about the usage, consult the documentation 
-[here](https://deep-significance.readthedocs.io/en/latest/) or the scenarios in the section [Examples](#examples).
+[here](https://deep-significance.readthedocs.io/en/latest/) , the scenarios in the section [Examples](#examples) or 
+the [demo Jupyter notebook](https://github.com/Kaleidophon/deep-significance/tree/main/paper/deep-significance%20demo.ipynb).
 
 ## |:inbox_tray:| Installation
 
@@ -74,46 +75,51 @@ Another option is to clone the repository and install the package locally:
 
 ---
 **tl;dr**: Use `aso()` to compare scores for two models. If the returned `eps_min < 0.5`, A is better than B. The lower
-`eps_min`, the more confident the result. 
+`eps_min`, the more confident the result (we recommend to check `eps_min < 0.2` and record `eps_min` alongside 
+experimental results). 
 
 |:warning:| Testing models with only one set of hyperparameters and only one test set will be able to guarantee superiority
 in all settings. See [General Recommendations & other notes](#general-recommendations).
 
 ---
 
-In the following, I will lay out three scenarios that describe common use cases for ML practitioners and how to apply 
+In the following, we will lay out three scenarios that describe common use cases for ML practitioners and how to apply 
 the methods implemented in this package accordingly. For an introduction into statistical hypothesis testing, please
 refer to resources such as [this blog post](https://machinelearningmastery.com/statistical-hypothesis-tests/) for a general
 overview or [Dror et al. (2018)](https://www.aclweb.org/anthology/P18-1128.pdf) for a NLP-specific point of view. 
 
-In general, in statistical significance testing, we usually compare two algorithms <img src="53d147e7f3fe6e47ee05b88b166bd3f6.svg?invert_in_darkmode" align=middle width=12.32879834999999pt height=22.465723500000017pt/> and <img src="61e84f854bc6258d4108d08d4c4a0852.svg?invert_in_darkmode" align=middle width=13.29340979999999pt height=22.465723500000017pt/> on a dataset <img src="cbfb1b2a33b28eab8a3e59464768e810.svg?invert_in_darkmode" align=middle width=14.908688849999992pt height=22.465723500000017pt/> using 
-some evaluation metric <img src="b5eaea000e06d5cf2e882f8fdbc71e36.svg?invert_in_darkmode" align=middle width=19.740822749999992pt height=22.465723500000017pt/> (we assume a higher = better). The difference between the two algorithms on the 
-data is then defined as 
-
-<p align="center"><img src="9540dc879d2ecaa7cb245871b24f4e5d.svg?invert_in_darkmode" align=middle width=212.73480854999997pt height=16.438356pt/></p>
-
-where <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> is our test statistic. We then test the following **null hypothesis**:
-
-<p align="center"><img src="1d210dbbb93bbdc5a632b9443059499d.svg?invert_in_darkmode" align=middle width=100.49629589999999pt height=16.438356pt/></p>
-
-Thus, we assume our algorithm A to be equally as good or worse than algorithm B and reject the null hypothesis if A 
-is better than B (what we actually would like to see). Most statistical significance tests operate using 
-*p-values*, which define the probability that under the null-hypothesis, the <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> expected by the test is larger than or
-equal to the observed difference <img src="ecdae90a73f512871267f358443bd563.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> (that is, for a one-sided test, i.e. we assume A to be better than B):
-
-<p align="center"><img src="6d2735c4e335ec03c8b45736da4531a3.svg?invert_in_darkmode" align=middle width=135.91559685pt height=16.438356pt/></p>
-
-We can interpret this equation as follows: Assuming that A is *not* better than B, the test assumes a corresponding distribution
-of differences that <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> is drawn from. How does our actually observed difference <img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> fit in there?
-This is what the p-value is expressing: If this probability is high, <img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> is in line with what we expected under 
-the null hypothesis, so we conclude A not to better than B. If the 
-probability is low, that means that <img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/> is quite unlikely under the null hypothesis and that the reverse 
-case is more likely - i.e. that it is 
-likely *larger* than <img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/> - and we conclude that A is indeed better than B. Note that **the p-value does not 
-express whether the null hypothesis is true**.
-
-To decide when we trust A to be better than B, we set a threshold that will determine when the p-value is small enough 
-for us to reject the null hypothesis, this is called the significance level <img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/> and it is often set to be 0.05.
+We assume that we have two sets of scores we would like to compare, <img src="b7e817ab52abd984b082abaa1da6a8e4.svg?invert_in_darkmode" align=middle width=17.44287434999999pt height=22.648391699999998pt/> and <img src="d06f8d92c07734af06da289c13d2beed.svg?invert_in_darkmode" align=middle width=16.80361814999999pt height=22.648391699999998pt/>,
+for instance obtained by running two models <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> and <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/> multiple times with a different random seed. 
+We can then define a one-sided test statistic  <img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> based on the gathered observations. 
+An example of such test statistics is for instance the difference in observation means. We then formulate the following null-hypothesis:
+
+<p align="center"><img src="00160c684b3af8ccefcdf19c69712e34.svg?invert_in_darkmode" align=middle width=128.7838134pt height=16.438356pt/></p>
+
+That means that we actually assume the opposite of our desired case, namely that <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is not better than <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>, 
+but equally as good or worse, as indicated by the value of the test statistic. 
+Usually, the goal becomes to reject this null hypothesis using the SST. 
+*p*-value testing is a frequentist method in the realm of SST. 
+It introduces the notion of data that *could have been observed* if we were to repeat our experiment again using 
+the same conditions, which we will write with superscript <img src="e723e08dae472a15132221e280670a7e.svg?invert_in_darkmode" align=middle width=22.87678634999999pt height=14.15524440000002pt/> in order to distinguish them from our actually 
+observed scores (Gelman et al., 2021). 
+We then define the *p*-value as the probability that, under the null hypothesis, the test statistic using replicated 
+observation is larger than or equal to the *observed* test statistic:
+
+<p align="center"><img src="5db9dda6d48361ba963326d3f98a033d.svg?invert_in_darkmode" align=middle width=216.90071865pt height=17.74869195pt/></p>
+
+We can interpret this expression as follows: Assuming that <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is not better than <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>, the test 
+assumes a corresponding distribution of statistics that <img src="38f1e2a089e53d5c990a82f284948953.svg?invert_in_darkmode" align=middle width=7.928075099999989pt height=22.831056599999986pt/> is drawn from. So how does the observed test statistic 
+<img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> fit in here? This is what the <img src="2ec6e630f199f589a2402fdf3e0289d5.svg?invert_in_darkmode" align=middle width=8.270567249999992pt height=14.15524440000002pt/>-value expresses: When the 
+probability is high, <img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/> is in line with what we expected under the 
+null hypothesis, so we can *not* reject the null hypothesis, or in other words, we \emph{cannot} conclude 
+<img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> to be better than <img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>. If the probability is low, that means that the observed 
+<img src="67ebeedcf8c4d1141331d07b2cef2b03.svg?invert_in_darkmode" align=middle width=54.77736824999999pt height=24.65753399999998pt/> is quite unlikely under the null hypothesis and that the reverse case is 
+more likely - i.e. that it is likely larger than - and we conclude that <img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/> is indeed better than 
+<img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/>. Note that **the <img src="2ec6e630f199f589a2402fdf3e0289d5.svg?invert_in_darkmode" align=middle width=8.270567249999992pt height=14.15524440000002pt/>-value does not express whether the null hypothesis is true**. To make our decision 
+about whether or not to reject the null hypothesis, we typically determine a threshold - the significance level 
+<img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/>, often set to 0.05 - that the *p*-value has to fall below. However, it has been argued that a better practice 
+involves reporting the *p*-value alongside the results without a pidgeonholing of results into significant and non-significant
+(Wasserstein et al., 2019).
 
 
 ### Intermezzo: Almost Stochastic Order - a better significance test for Deep Neural Networks
@@ -121,8 +127,8 @@ for us to reject the null hypothesis, this is called the significance level <img
 Deep neural networks are highly non-linear models, having their performance highly dependent on hyperparameters, random 
 seeds and other (stochastic) factors. Therefore, comparing the means of two models across several runs might not be 
 enough to decide if a model A is better than B. In fact, **even aggregating more statistics like standard deviation, minimum
-or maximum might not be enough** to make a decision. For this reason, Dror et al. (2019) introduced *Almost Stochastic 
-Order* (ASO), a test to compare two score distributions. 
+or maximum might not be enough** to make a decision. For this reason, del Barrio et al. (2017) and Dror et al. (2019) 
+introduced *Almost Stochastic Order* (ASO), a test to compare two score distributions. 
 
 It builds on the concept of *stochastic order*: We can compare two distributions and declare one as *stochastically dominant*
 by comparing their cumulative distribution functions: 
@@ -132,19 +138,20 @@ by comparing their cumulative distribution functions:
 Here, the CDF of A is given in red and in green for B. If the CDF of A is lower than B for every <img src="332cc365a4987aacce0ead01b8bdcc0b.svg?invert_in_darkmode" align=middle width=9.39498779999999pt height=14.15524440000002pt/>, we know the 
 algorithm A to score higher. However, in practice these cases are rarely so clear-cut (imagine e.g. two normal 
 distributions with the same mean but different variances).
-For this reason, Dror et al. (2019) consider the notion of *almost stochastic dominance* by quantifying the extent to 
-which stochastic order is being violated (red area):
+For this reason, del Barrio et al. (2017) and Dror et al. (2019) consider the notion of *almost stochastic dominance* 
+by quantifying the extent to which stochastic order is being violated (red area):
 
 ![](img/aso.png)
 
-ASO returns a value <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>, which expresses the amount of violation of stochastic order. If 
-<img src="dabed7f05cf133d9eb92631d564a96a8.svg?invert_in_darkmode" align=middle width=72.19750559999999pt height=21.18721440000001pt/>, A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as 
+ASO returns a value <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>, which expresses (an upper bound to) the amount of violation of stochastic order. If 
+<img src="4cd4877610a47d915f39367760234822.svg?invert_in_darkmode" align=middle width=60.239714699999986pt height=17.723762100000005pt/> (where \tau is 0.5 or less), A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as 
 superior. We can also interpret <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> as a *confidence score*. The lower it is, the more sure we can be 
 that A is better than B. Note: **ASO does not compute p-values.** Instead, the null hypothesis formulated as 
 
-<p align="center"><img src="69c5ac8ce10d0dbd0c2b915aaf0472c1.svg?invert_in_darkmode" align=middle width=106.93478895pt height=13.698590399999999pt/></p>
+<p align="center"><img src="06f5ff6214110287d3948e9b44e31a1f.svg?invert_in_darkmode" align=middle width=94.97699804999999pt height=13.698590399999999pt/></p>
 
-If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5.
+If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5 
+(see the discussion in [this section](#general-recommendations)).
 Furthermore, the significance level <img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/> is determined as an input argument when running ASO and actively influence 
 the resulting <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/>.
 
@@ -159,12 +166,15 @@ We can now simply apply the ASO test:
 import numpy as np
 from deepsig import aso
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores
 N = 5  # Number of random seeds
 my_model_scores = np.random.normal(loc=0.9, scale=0.8, size=N)
 baseline_scores = np.random.normal(loc=0, scale=1, size=N)
 
-min_eps = aso(my_model_scores, baseline_scores)  # min_eps = 0.0, so A is better
+min_eps = aso(my_model_scores, baseline_scores, seed=seed)  # min_eps = 0.225, so A is better
 ```
 
 Note that ASO **does not make any assumptions about the distributions of the scores**. 
@@ -185,6 +195,9 @@ which corresponds to the Bonferroni correction (Bonferroni et al., 1936):
 import numpy as np
 from deepsig import aso 
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores for three datasets
 M = 3  # Number of datasets
 N = 5  # Number of random seeds
@@ -192,8 +205,8 @@ my_model_scores_per_dataset = [np.random.normal(loc=0.3, scale=0.8, size=N) for
 baseline_scores_per_dataset  = [np.random.normal(loc=0, scale=1, size=N) for _ in range(M)]
 
 # epsilon_min values with Bonferroni correction 
-eps_min = [aso(a, b, confidence_level=0.05 / M) for a, b in zip(my_model_scores_per_dataset, baseline_scores_per_dataset)]
-# eps_min = [0.1565800030782686, 1, 0.0]
+eps_min = [aso(a, b, confidence_level=0.95, num_comparisons=M, seed=seed) for a, b in zip(my_model_scores_per_dataset, baseline_scores_per_dataset)]
+# eps_min = [0.006370113450148568, 0.6534772728574852, 0.0]
 ```
 
 ### Scenario 3 - Comparing sample-level scores
@@ -212,6 +225,9 @@ from itertools import product
 import numpy as np
 from deepsig import aso 
 
+seed = 1234
+np.random.seed(seed)
+
 # Simulate scores for three datasets
 M = 40   # Number of data points
 N = 3  # Number of random seeds
@@ -220,7 +236,9 @@ baseline_scored_samples_per_run = [np.random.normal(loc=0, scale=1, size=M) for
 pairs = list(product(my_model_scored_samples_per_run, baseline_scored_samples_per_run))
 
 # epsilon_min values with Bonferroni correction 
-eps_min = [aso(a, b, confidence_level=0.05 / len(pairs)) for a, b in pairs]
+eps_min = [aso(a, b, confidence_level=0.95, num_comparisons=len(pairs), seed=seed) for a, b in pairs]
+# eps_min = [0.3831678636198528, 0.07194780234194881, 0.9152792807128325, 0.5273463008857844, 0.14946944524461184, 1.0, 
+# 0.6099543280369378, 0.22387448804041898, 1.0]
 ```
 
 ### Scenario 4 - Comparing more than two models 
@@ -249,6 +267,9 @@ Let's look at an example:
 ```python 
 import numpy as np 
 from deepsig import multi_aso 
+
+seed = 1234
+np.random.seed(seed)
  
 N = 5  # Number of random seeds
 M = 3  # Number of different models / algorithms
@@ -257,20 +278,19 @@ M = 3  # Number of different models / algorithms
 # Here, we will sample from N(0.1, 0.8), N(0.15, 0.8), N(0.2, 0.8)
 my_models_scores = np.array([np.random.normal(loc=loc, scale=0.8, size=N) for loc in np.arange(0.1, 0.1 + 0.05 * M, step=0.05)])
 
-eps_min = multi_aso(my_models_scores, confidence_level=0.05)
+eps_min = multi_aso(my_models_scores, confidence_level=0.95, seed=seed)
     
 # eps_min =
-# array([[1., 1., 1.],
-#        [0., 1., 1.],
-#        [0., 0., 1.]])
+# array([[1.       , 0.92621655, 1.        ],
+#       [1.        , 1.        , 1.        ],
+#       [0.82081635, 0.73048716, 1.        ]])
 ```
 
 In the example, `eps_min` is now a matrix, containing the <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> score between all pairs of models (for 
 the same model, it set to 1 by default). The matrix is always to be read as ASO(row, column).
 
 The function applies the bonferroni correction for multiple comparisons by 
-default, but this can be turned off by using `use_bonferroni=False`. In order to save compute, the above symmetry
-property is used as well, but this can also be disabled by `use_symmetry=False`.
+default, but this can be turned off by using `use_bonferroni=False`.
 
 Lastly, when the `scores` argument is a dictionary and the function is called with `return_df=True`, the resulting matrix is 
 given as a `pandas.DataFrame` for increased readability:
@@ -278,6 +298,9 @@ given as a `pandas.DataFrame` for increased readability:
 ```python 
 import numpy as np 
 from deepsig import multi_aso 
+
+seed = 1234
+np.random.seed(seed)
  
 N = 5  # Number of random seeds
 M = 3  # Number of different models / algorithms
@@ -294,14 +317,14 @@ my_models_scores = {
 #   ...
 # }
 
-eps_min = multi_aso(my_models_scores, confidence_level=0.05, return_df=True)
+eps_min = multi_aso(my_models_scores, confidence_level=0.95, return_df=True, seed=seed)
     
 # This is now a DataFrame!
 # eps_min =
-#           model 1   model 2  model 3
-# model 1       1.0       1.0      1.0
-# model 2       0.0       1.0      1.0
-# model 3       1.0       0.0      1.0
+#          model 1   model 2  model 3
+# model 1  1.000000  0.926217      1.0
+# model 2  1.000000  1.000000      1.0
+# model 3  0.820816  0.730487      1.0
 
 ```
 
@@ -315,7 +338,7 @@ score. Below lists some example snippets reporting the results of scenarios 1 an
 
     We compared all pairs of models based on five random seeds each using ASO with a confidence level of 
     $\alpha = 0.05$ (before adjusting for all pair-wise comparisons using the Bonferroni correction). Almost stochastic 
-    dominance ($\epsilon_\text{min} < 0.5)$ is indicated in table X.
+    dominance ($\epsilon_\text{min} < \tau$ with $\tau = 0.2$) is indicated in table X.
 
 ### |:control_knobs:| Sample size
 
@@ -384,11 +407,11 @@ from deepsig import aso
 import numpy as np
 from timeit import timeit
 
-a = np.random.normal(size=5)
-b = np.random.normal(size=5)
+a = np.random.normal(size=1000)
+b = np.random.normal(size=1000)
 
-print(timeit(lambda: aso(a, b, num_jobs=1, show_progress=False), number=5))  # 146.6909574989986
-print(timeit(lambda: aso(a, b, num_jobs=4, show_progress=False), number=5))  # 50.416724971000804
+print(timeit(lambda: aso(a, b, num_jobs=1, show_progress=False), number=5))  # 393.6318126
+print(timeit(lambda: aso(a, b, num_jobs=4, show_progress=False), number=5))  # 139.73514621799995n
 ```
 
 #### |:electric_plug:| Compatibility with PyTorch, Tensorflow, Jax & Numpy
@@ -438,11 +461,15 @@ as many scores as possible should be collected, especially if the variance betwe
   Because this is usually infeasible in practice, Bouthilier et al. (2020) recommend to **vary all other sources of variation**
   between runs to obtain the most trustworthy estimate of the "true" performance, such as data shuffling, weight initialization etc.
 
-* `num_samples` and `num_bootstrap_iterations` can be reduced to increase the speed of `aso()`. However, this is not 
+* `num_bootstrap_iterations` can be reduced to increase the speed of `aso()`. However, this is not 
 recommended as the result of the test will also become less accurate. Technically, <img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/> is a upper bound
   that becomes tighter with the number of samples and bootstrap iterations (del Barrio et al., 2017). Thus, increasing 
   the number of jobs with `num_jobs` instead is always preferred.
   
+* While we could declare a model stochastically dominant with <img src="dabed7f05cf133d9eb92631d564a96a8.svg?invert_in_darkmode" align=middle width=72.19750559999999pt height=21.18721440000001pt/>, we found this to have a comparatively high
+Type I error (false positives). Tests [in our paper](https://arxiv.org/pdf/2204.06815.pdf) have shown that a more useful threshold that trades of Type I and 
+  Type II error between different scenarios might be <img src="9ac49cb370a5b09fca29068ea18eab63.svg?invert_in_darkmode" align=middle width=51.969107849999986pt height=21.18721440000001pt/>.
+  
 * Bootstrap and permutation-randomization are all non-parametric tests, i.e. they don't make any assumptions about 
 the distribution of our test metric. Nevertheless, they differ in their *statistical power*, which is defined as the probability
   that the null hypothesis is being rejected given that there is a difference between A and B. In other words, the more powerful 
@@ -454,7 +481,17 @@ the distribution of our test metric. Nevertheless, they differ in their *statist
 
 ### |:mortar_board:| Cite
 
-If you use the ASO test via `aso()`, please cite the original work:
+Using this package in general, please cite the following:
+
+    @article{ulmer2022deep,
+      title={deep-significance-Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks},
+      author={Ulmer, Dennis and Hardmeier, Christian and Frellsen, Jes},
+      journal={arXiv preprint arXiv:2204.06815},
+      year={2022}
+    }
+
+
+If you use the ASO test via `aso()` or `multi_aso, please cite the original works:
 
     @inproceedings{dror2019deep,
       author    = {Rotem Dror and
@@ -475,21 +512,20 @@ If you use the ASO test via `aso()`, please cite the original work:
       timestamp = {Tue, 28 Jan 2020 10:27:52 +0100},
     }
 
-Using this package in general, please cite the following:
-
-    @software{dennis_ulmer_2021_4638709,
-      author       = {Dennis Ulmer},
-      title        = {{deep-significance: Easy and Better Significance 
-                       Testing for Deep Neural Networks}},
-      month        = mar,
-      year         = 2021,
-      note         = {https://github.com/Kaleidophon/deep-significance},
-      publisher    = {Zenodo},
-      version      = {v1.0.0a},
-      doi          = {10.5281/zenodo.4638709},
-      url          = {https://doi.org/10.5281/zenodo.4638709}
+    @incollection{del2018optimal,
+      title={An optimal transportation approach for assessing almost stochastic order},
+      author={Del Barrio, Eustasio and Cuesta-Albertos, Juan A and Matr{\'a}n, Carlos},
+      booktitle={The Mathematics of the Uncertain},
+      pages={33--44},
+      year={2018},
+      publisher={Springer}
     }
 
+For instance, you can write
+
+    In order to compare models, we use the Almost Stochastic Order test \citep{del2018optimal, dror2019deep} as 
+    implemented by \citet{ulmer2022deep}.
+
 ### |:medal_sports:| Acknowledgements
 
 This package was created out of discussions of the [NLPnorth group](https://nlpnorth.github.io/) at the IT University 
@@ -526,6 +562,9 @@ Dror, Rotem, Shlomov, Segev, and Reichart, Roi. "Deep dominance-how to properly
 
 Efron, Bradley, and Robert J. Tibshirani. "An introduction to the bootstrap." CRC press, 1994.
 
+Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, Donald B Rubin, John
+Carlin, Hal Stern, Donald Rubin, and David Dunson. Bayesian data analysis third edition, 2021.
+
 Henderson, Peter, et al. "Deep reinforcement learning that matters." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. No. 1. 2018.
 
 Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. "Visualizing the Loss Landscape of Neural Nets." NeurIPS 2018: 6391-6401
@@ -534,4 +573,7 @@ Narang, Sharan, et al. "Do Transformer Modifications Transfer Across Implementat
 
 Noreen, Eric W. "Computer intensive methods for hypothesis testing: An introduction." Wiley, New York (1989).
 
+Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. Moving to a world beyond “p< 0.05”,
+2019
+
 Yuan, Ke‐Hai, and Kentaro Hayashi. "Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models." British Journal of Mathematical and Statistical Psychology 56.1 (2003): 93-110.
\ No newline at end of file
diff --git a/docs/build/html/_static/basic.css b/docs/build/html/_static/basic.css
index 24bc73e..912859b 100644
--- a/docs/build/html/_static/basic.css
+++ b/docs/build/html/_static/basic.css
@@ -4,7 +4,7 @@
  *
  * Sphinx stylesheet -- basic theme.
  *
- * :copyright: Copyright 2007-2020 by the Sphinx team, see AUTHORS.
+ * :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
  * :license: BSD, see LICENSE for details.
  *
  */
@@ -130,7 +130,7 @@ ul.search li a {
     font-weight: bold;
 }
 
-ul.search li div.context {
+ul.search li p.context {
     color: #888;
     margin: 2px 0 0 30px;
     text-align: left;
@@ -277,25 +277,25 @@ p.rubric {
     font-weight: bold;
 }
 
-img.align-left, .figure.align-left, object.align-left {
+img.align-left, figure.align-left, .figure.align-left, object.align-left {
     clear: left;
     float: left;
     margin-right: 1em;
 }
 
-img.align-right, .figure.align-right, object.align-right {
+img.align-right, figure.align-right, .figure.align-right, object.align-right {
     clear: right;
     float: right;
     margin-left: 1em;
 }
 
-img.align-center, .figure.align-center, object.align-center {
+img.align-center, figure.align-center, .figure.align-center, object.align-center {
   display: block;
   margin-left: auto;
   margin-right: auto;
 }
 
-img.align-default, .figure.align-default {
+img.align-default, figure.align-default, .figure.align-default {
   display: block;
   margin-left: auto;
   margin-right: auto;
@@ -319,7 +319,8 @@ img.align-default, .figure.align-default {
 
 /* -- sidebars -------------------------------------------------------------- */
 
-div.sidebar {
+div.sidebar,
+aside.sidebar {
     margin: 0 0 0.5em 1em;
     border: 1px solid #ddb;
     padding: 7px;
@@ -377,12 +378,14 @@ div.body p.centered {
 /* -- content of sidebars/topics/admonitions -------------------------------- */
 
 div.sidebar > :last-child,
+aside.sidebar > :last-child,
 div.topic > :last-child,
 div.admonition > :last-child {
     margin-bottom: 0;
 }
 
 div.sidebar::after,
+aside.sidebar::after,
 div.topic::after,
 div.admonition::after,
 blockquote::after {
@@ -455,20 +458,22 @@ td > :last-child {
 
 /* -- figures --------------------------------------------------------------- */
 
-div.figure {
+div.figure, figure {
     margin: 0.5em;
     padding: 0.5em;
 }
 
-div.figure p.caption {
+div.figure p.caption, figcaption {
     padding: 0.3em;
 }
 
-div.figure p.caption span.caption-number {
+div.figure p.caption span.caption-number,
+figcaption span.caption-number {
     font-style: italic;
 }
 
-div.figure p.caption span.caption-text {
+div.figure p.caption span.caption-text,
+figcaption span.caption-text {
 }
 
 /* -- field list styles ----------------------------------------------------- */
@@ -503,6 +508,63 @@ table.hlist td {
     vertical-align: top;
 }
 
+/* -- object description styles --------------------------------------------- */
+
+.sig {
+	font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
+}
+
+.sig-name, code.descname {
+    background-color: transparent;
+    font-weight: bold;
+}
+
+.sig-name {
+	font-size: 1.1em;
+}
+
+code.descname {
+    font-size: 1.2em;
+}
+
+.sig-prename, code.descclassname {
+    background-color: transparent;
+}
+
+.optional {
+    font-size: 1.3em;
+}
+
+.sig-paren {
+    font-size: larger;
+}
+
+.sig-param.n {
+	font-style: italic;
+}
+
+/* C++ specific styling */
+
+.sig-inline.c-texpr,
+.sig-inline.cpp-texpr {
+	font-family: unset;
+}
+
+.sig.c   .k, .sig.c   .kt,
+.sig.cpp .k, .sig.cpp .kt {
+	color: #0033B3;
+}
+
+.sig.c   .m,
+.sig.cpp .m {
+	color: #1750EB;
+}
+
+.sig.c   .s, .sig.c   .sc,
+.sig.cpp .s, .sig.cpp .sc {
+	color: #067D17;
+}
+
 
 /* -- other body styles ----------------------------------------------------- */
 
@@ -629,14 +691,6 @@ dl.glossary dt {
     font-size: 1.1em;
 }
 
-.optional {
-    font-size: 1.3em;
-}
-
-.sig-paren {
-    font-size: larger;
-}
-
 .versionmodified {
     font-style: italic;
 }
@@ -764,8 +818,13 @@ div.code-block-caption code {
 }
 
 table.highlighttable td.linenos,
-div.doctest > div.highlight span.gp {  /* gp: Generic.Prompt */
-    user-select: none;
+span.linenos,
+div.highlight span.gp {  /* gp: Generic.Prompt */
+  user-select: none;
+  -webkit-user-select: text; /* Safari fallback only */
+  -webkit-user-select: none; /* Chrome/Safari */
+  -moz-user-select: none; /* Firefox */
+  -ms-user-select: none; /* IE10+ */
 }
 
 div.code-block-caption span.caption-number {
@@ -780,16 +839,6 @@ div.literal-block-wrapper {
     margin: 1em 0;
 }
 
-code.descname {
-    background-color: transparent;
-    font-weight: bold;
-    font-size: 1.2em;
-}
-
-code.descclassname {
-    background-color: transparent;
-}
-
 code.xref, a code {
     background-color: transparent;
     font-weight: bold;
diff --git a/docs/build/html/_static/doctools.js b/docs/build/html/_static/doctools.js
index daccd20..8cbf1b1 100644
--- a/docs/build/html/_static/doctools.js
+++ b/docs/build/html/_static/doctools.js
@@ -4,7 +4,7 @@
  *
  * Sphinx JavaScript utilities for all documentation.
  *
- * :copyright: Copyright 2007-2020 by the Sphinx team, see AUTHORS.
+ * :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
  * :license: BSD, see LICENSE for details.
  *
  */
@@ -29,9 +29,14 @@ if (!window.console || !console.firebug) {
 
 /**
  * small helper function to urldecode strings
+ *
+ * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL
  */
 jQuery.urldecode = function(x) {
-  return decodeURIComponent(x).replace(/\+/g, ' ');
+  if (!x) {
+    return x
+  }
+  return decodeURIComponent(x.replace(/\+/g, ' '));
 };
 
 /**
@@ -285,9 +290,10 @@ var Documentation = {
   initOnKeyListeners: function() {
     $(document).keydown(function(event) {
       var activeElementType = document.activeElement.tagName;
-      // don't navigate when in search box or textarea
+      // don't navigate when in search box, textarea, dropdown or button
       if (activeElementType !== 'TEXTAREA' && activeElementType !== 'INPUT' && activeElementType !== 'SELECT'
-          && !event.altKey && !event.ctrlKey && !event.metaKey && !event.shiftKey) {
+          && activeElementType !== 'BUTTON' && !event.altKey && !event.ctrlKey && !event.metaKey
+          && !event.shiftKey) {
         switch (event.keyCode) {
           case 37: // left
             var prevHref = $('link[rel="prev"]').prop('href');
@@ -295,12 +301,14 @@ var Documentation = {
               window.location.href = prevHref;
               return false;
             }
+            break;
           case 39: // right
             var nextHref = $('link[rel="next"]').prop('href');
             if (nextHref) {
               window.location.href = nextHref;
               return false;
             }
+            break;
         }
       }
     });
diff --git a/docs/build/html/_static/language_data.js b/docs/build/html/_static/language_data.js
index d2b4ee9..863704b 100644
--- a/docs/build/html/_static/language_data.js
+++ b/docs/build/html/_static/language_data.js
@@ -5,7 +5,7 @@
  * This script contains the language-specific data used by searchtools.js,
  * namely the list of stopwords, stemmer, scorer and splitter.
  *
- * :copyright: Copyright 2007-2020 by the Sphinx team, see AUTHORS.
+ * :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
  * :license: BSD, see LICENSE for details.
  *
  */
@@ -13,7 +13,8 @@
 var stopwords = ["a","and","are","as","at","be","but","by","for","if","in","into","is","it","near","no","not","of","on","or","such","that","the","their","then","there","these","they","this","to","was","will","with"];
 
 
-/* Non-minified version JS is _stemmer.js if file is provided */ 
+/* Non-minified version is copied as a separate JS file, is available */
+
 /**
  * Porter Stemmer
  */
@@ -199,7 +200,6 @@ var Stemmer = function() {
 
 
 
-
 var splitChars = (function() {
     var result = {};
     var singles = [96, 180, 187, 191, 215, 247, 749, 885, 903, 907, 909, 930, 1014, 1648,
diff --git a/docs/build/html/_static/pygments.css b/docs/build/html/_static/pygments.css
index f346859..691aeb8 100644
--- a/docs/build/html/_static/pygments.css
+++ b/docs/build/html/_static/pygments.css
@@ -1,7 +1,7 @@
-pre { line-height: 125%; margin: 0; }
-td.linenos pre { color: #000000; background-color: #f0f0f0; padding-left: 5px; padding-right: 5px; }
-span.linenos { color: #000000; background-color: #f0f0f0; padding-left: 5px; padding-right: 5px; }
-td.linenos pre.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
+pre { line-height: 125%; }
+td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; }
+span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; }
+td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
 span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
 .highlight .hll { background-color: #ffffcc }
 .highlight { background: #eeffcc; }
diff --git a/docs/build/html/_static/searchtools.js b/docs/build/html/_static/searchtools.js
index 970d0d9..e09f926 100644
--- a/docs/build/html/_static/searchtools.js
+++ b/docs/build/html/_static/searchtools.js
@@ -4,7 +4,7 @@
  *
  * Sphinx JavaScript utilities for the full-text search.
  *
- * :copyright: Copyright 2007-2020 by the Sphinx team, see AUTHORS.
+ * :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS.
  * :license: BSD, see LICENSE for details.
  *
  */
@@ -59,10 +59,10 @@ var Search = {
   _pulse_status : -1,
 
   htmlToText : function(htmlString) {
-      var htmlElement = document.createElement('span');
-      htmlElement.innerHTML = htmlString;
-      $(htmlElement).find('.headerlink').remove();
-      docContent = $(htmlElement).find('[role=main]')[0];
+      var virtualDocument = document.implementation.createHTMLDocument('virtual');
+      var htmlElement = $(htmlString, virtualDocument);
+      htmlElement.find('.headerlink').remove();
+      docContent = htmlElement.find('[role=main]')[0];
       if(docContent === undefined) {
           console.warn("Content block not found. Sphinx search tries to obtain it " +
                        "via '[role=main]'. Could you check your theme or template.");
@@ -248,7 +248,7 @@ var Search = {
       // results left, load the summary and display it
       if (results.length) {
         var item = results.pop();
-        var listItem = $('<li style="display:none"></li>');
+        var listItem = $('<li></li>');
         var requestUrl = "";
         var linkUrl = "";
         if (DOCUMENTATION_OPTIONS.BUILDER === 'dirhtml') {
@@ -273,9 +273,9 @@ var Search = {
         if (item[3]) {
           listItem.append($('<span> (' + item[3] + ')</span>'));
           Search.output.append(listItem);
-          listItem.slideDown(5, function() {
+          setTimeout(function() {
             displayNextItem();
-          });
+          }, 5);
         } else if (DOCUMENTATION_OPTIONS.HAS_SOURCE) {
           $.ajax({url: requestUrl,
                   dataType: "text",
@@ -285,16 +285,16 @@ var Search = {
                       listItem.append(Search.makeSearchSummary(data, searchterms, hlterms));
                     }
                     Search.output.append(listItem);
-                    listItem.slideDown(5, function() {
+                    setTimeout(function() {
                       displayNextItem();
-                    });
+                    }, 5);
                   }});
         } else {
           // no source available, just display title
           Search.output.append(listItem);
-          listItem.slideDown(5, function() {
+          setTimeout(function() {
             displayNextItem();
-          });
+          }, 5);
         }
       }
       // search finished, update title and status message
@@ -379,6 +379,13 @@ var Search = {
     return results;
   },
 
+  /**
+   * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
+   */
+  escapeRegExp : function(string) {
+    return string.replace(/[.*+\-?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
+  },
+
   /**
    * search for full-text terms in the index
    */
@@ -402,13 +409,14 @@ var Search = {
       ];
       // add support for partial matches
       if (word.length > 2) {
+        var word_regex = this.escapeRegExp(word);
         for (var w in terms) {
-          if (w.match(word) && !terms[word]) {
+          if (w.match(word_regex) && !terms[word]) {
             _o.push({files: terms[w], score: Scorer.partialTerm})
           }
         }
         for (var w in titleterms) {
-          if (w.match(word) && !titleterms[word]) {
+          if (w.match(word_regex) && !titleterms[word]) {
               _o.push({files: titleterms[w], score: Scorer.partialTitle})
           }
         }
@@ -501,7 +509,7 @@ var Search = {
     var excerpt = ((start > 0) ? '...' : '') +
       $.trim(text.substr(start, 240)) +
       ((start + 240 - text.length) ? '...' : '');
-    var rv = $('<div class="context"></div>').text(excerpt);
+    var rv = $('<p class="context"></p>').text(excerpt);
     $.each(hlwords, function() {
       rv = rv.highlightText(this, 'highlighted');
     });
diff --git a/docs/build/html/_static/underscore.js b/docs/build/html/_static/underscore.js
index 5b55f32..cf177d4 100644
--- a/docs/build/html/_static/underscore.js
+++ b/docs/build/html/_static/underscore.js
@@ -1,31 +1,6 @@
-// Underscore.js 1.3.1
-// (c) 2009-2012 Jeremy Ashkenas, DocumentCloud Inc.
-// Underscore is freely distributable under the MIT license.
-// Portions of Underscore are inspired or borrowed from Prototype,
-// Oliver Steele's Functional, and John Resig's Micro-Templating.
-// For all details and documentation:
-// http://documentcloud.github.com/underscore
-(function(){function q(a,c,d){if(a===c)return a!==0||1/a==1/c;if(a==null||c==null)return a===c;if(a._chain)a=a._wrapped;if(c._chain)c=c._wrapped;if(a.isEqual&&b.isFunction(a.isEqual))return a.isEqual(c);if(c.isEqual&&b.isFunction(c.isEqual))return c.isEqual(a);var e=l.call(a);if(e!=l.call(c))return false;switch(e){case "[object String]":return a==String(c);case "[object Number]":return a!=+a?c!=+c:a==0?1/a==1/c:a==+c;case "[object Date]":case "[object Boolean]":return+a==+c;case "[object RegExp]":return a.source==
-c.source&&a.global==c.global&&a.multiline==c.multiline&&a.ignoreCase==c.ignoreCase}if(typeof a!="object"||typeof c!="object")return false;for(var f=d.length;f--;)if(d[f]==a)return true;d.push(a);var f=0,g=true;if(e=="[object Array]"){if(f=a.length,g=f==c.length)for(;f--;)if(!(g=f in a==f in c&&q(a[f],c[f],d)))break}else{if("constructor"in a!="constructor"in c||a.constructor!=c.constructor)return false;for(var h in a)if(b.has(a,h)&&(f++,!(g=b.has(c,h)&&q(a[h],c[h],d))))break;if(g){for(h in c)if(b.has(c,
-h)&&!f--)break;g=!f}}d.pop();return g}var r=this,G=r._,n={},k=Array.prototype,o=Object.prototype,i=k.slice,H=k.unshift,l=o.toString,I=o.hasOwnProperty,w=k.forEach,x=k.map,y=k.reduce,z=k.reduceRight,A=k.filter,B=k.every,C=k.some,p=k.indexOf,D=k.lastIndexOf,o=Array.isArray,J=Object.keys,s=Function.prototype.bind,b=function(a){return new m(a)};if(typeof exports!=="undefined"){if(typeof module!=="undefined"&&module.exports)exports=module.exports=b;exports._=b}else r._=b;b.VERSION="1.3.1";var j=b.each=
-b.forEach=function(a,c,d){if(a!=null)if(w&&a.forEach===w)a.forEach(c,d);else if(a.length===+a.length)for(var e=0,f=a.length;e<f;e++){if(e in a&&c.call(d,a[e],e,a)===n)break}else for(e in a)if(b.has(a,e)&&c.call(d,a[e],e,a)===n)break};b.map=b.collect=function(a,c,b){var e=[];if(a==null)return e;if(x&&a.map===x)return a.map(c,b);j(a,function(a,g,h){e[e.length]=c.call(b,a,g,h)});if(a.length===+a.length)e.length=a.length;return e};b.reduce=b.foldl=b.inject=function(a,c,d,e){var f=arguments.length>2;a==
-null&&(a=[]);if(y&&a.reduce===y)return e&&(c=b.bind(c,e)),f?a.reduce(c,d):a.reduce(c);j(a,function(a,b,i){f?d=c.call(e,d,a,b,i):(d=a,f=true)});if(!f)throw new TypeError("Reduce of empty array with no initial value");return d};b.reduceRight=b.foldr=function(a,c,d,e){var f=arguments.length>2;a==null&&(a=[]);if(z&&a.reduceRight===z)return e&&(c=b.bind(c,e)),f?a.reduceRight(c,d):a.reduceRight(c);var g=b.toArray(a).reverse();e&&!f&&(c=b.bind(c,e));return f?b.reduce(g,c,d,e):b.reduce(g,c)};b.find=b.detect=
-function(a,c,b){var e;E(a,function(a,g,h){if(c.call(b,a,g,h))return e=a,true});return e};b.filter=b.select=function(a,c,b){var e=[];if(a==null)return e;if(A&&a.filter===A)return a.filter(c,b);j(a,function(a,g,h){c.call(b,a,g,h)&&(e[e.length]=a)});return e};b.reject=function(a,c,b){var e=[];if(a==null)return e;j(a,function(a,g,h){c.call(b,a,g,h)||(e[e.length]=a)});return e};b.every=b.all=function(a,c,b){var e=true;if(a==null)return e;if(B&&a.every===B)return a.every(c,b);j(a,function(a,g,h){if(!(e=
-e&&c.call(b,a,g,h)))return n});return e};var E=b.some=b.any=function(a,c,d){c||(c=b.identity);var e=false;if(a==null)return e;if(C&&a.some===C)return a.some(c,d);j(a,function(a,b,h){if(e||(e=c.call(d,a,b,h)))return n});return!!e};b.include=b.contains=function(a,c){var b=false;if(a==null)return b;return p&&a.indexOf===p?a.indexOf(c)!=-1:b=E(a,function(a){return a===c})};b.invoke=function(a,c){var d=i.call(arguments,2);return b.map(a,function(a){return(b.isFunction(c)?c||a:a[c]).apply(a,d)})};b.pluck=
-function(a,c){return b.map(a,function(a){return a[c]})};b.max=function(a,c,d){if(!c&&b.isArray(a))return Math.max.apply(Math,a);if(!c&&b.isEmpty(a))return-Infinity;var e={computed:-Infinity};j(a,function(a,b,h){b=c?c.call(d,a,b,h):a;b>=e.computed&&(e={value:a,computed:b})});return e.value};b.min=function(a,c,d){if(!c&&b.isArray(a))return Math.min.apply(Math,a);if(!c&&b.isEmpty(a))return Infinity;var e={computed:Infinity};j(a,function(a,b,h){b=c?c.call(d,a,b,h):a;b<e.computed&&(e={value:a,computed:b})});
-return e.value};b.shuffle=function(a){var b=[],d;j(a,function(a,f){f==0?b[0]=a:(d=Math.floor(Math.random()*(f+1)),b[f]=b[d],b[d]=a)});return b};b.sortBy=function(a,c,d){return b.pluck(b.map(a,function(a,b,g){return{value:a,criteria:c.call(d,a,b,g)}}).sort(function(a,b){var c=a.criteria,d=b.criteria;return c<d?-1:c>d?1:0}),"value")};b.groupBy=function(a,c){var d={},e=b.isFunction(c)?c:function(a){return a[c]};j(a,function(a,b){var c=e(a,b);(d[c]||(d[c]=[])).push(a)});return d};b.sortedIndex=function(a,
-c,d){d||(d=b.identity);for(var e=0,f=a.length;e<f;){var g=e+f>>1;d(a[g])<d(c)?e=g+1:f=g}return e};b.toArray=function(a){return!a?[]:a.toArray?a.toArray():b.isArray(a)?i.call(a):b.isArguments(a)?i.call(a):b.values(a)};b.size=function(a){return b.toArray(a).length};b.first=b.head=function(a,b,d){return b!=null&&!d?i.call(a,0,b):a[0]};b.initial=function(a,b,d){return i.call(a,0,a.length-(b==null||d?1:b))};b.last=function(a,b,d){return b!=null&&!d?i.call(a,Math.max(a.length-b,0)):a[a.length-1]};b.rest=
-b.tail=function(a,b,d){return i.call(a,b==null||d?1:b)};b.compact=function(a){return b.filter(a,function(a){return!!a})};b.flatten=function(a,c){return b.reduce(a,function(a,e){if(b.isArray(e))return a.concat(c?e:b.flatten(e));a[a.length]=e;return a},[])};b.without=function(a){return b.difference(a,i.call(arguments,1))};b.uniq=b.unique=function(a,c,d){var d=d?b.map(a,d):a,e=[];b.reduce(d,function(d,g,h){if(0==h||(c===true?b.last(d)!=g:!b.include(d,g)))d[d.length]=g,e[e.length]=a[h];return d},[]);
-return e};b.union=function(){return b.uniq(b.flatten(arguments,true))};b.intersection=b.intersect=function(a){var c=i.call(arguments,1);return b.filter(b.uniq(a),function(a){return b.every(c,function(c){return b.indexOf(c,a)>=0})})};b.difference=function(a){var c=b.flatten(i.call(arguments,1));return b.filter(a,function(a){return!b.include(c,a)})};b.zip=function(){for(var a=i.call(arguments),c=b.max(b.pluck(a,"length")),d=Array(c),e=0;e<c;e++)d[e]=b.pluck(a,""+e);return d};b.indexOf=function(a,c,
-d){if(a==null)return-1;var e;if(d)return d=b.sortedIndex(a,c),a[d]===c?d:-1;if(p&&a.indexOf===p)return a.indexOf(c);for(d=0,e=a.length;d<e;d++)if(d in a&&a[d]===c)return d;return-1};b.lastIndexOf=function(a,b){if(a==null)return-1;if(D&&a.lastIndexOf===D)return a.lastIndexOf(b);for(var d=a.length;d--;)if(d in a&&a[d]===b)return d;return-1};b.range=function(a,b,d){arguments.length<=1&&(b=a||0,a=0);for(var d=arguments[2]||1,e=Math.max(Math.ceil((b-a)/d),0),f=0,g=Array(e);f<e;)g[f++]=a,a+=d;return g};
-var F=function(){};b.bind=function(a,c){var d,e;if(a.bind===s&&s)return s.apply(a,i.call(arguments,1));if(!b.isFunction(a))throw new TypeError;e=i.call(arguments,2);return d=function(){if(!(this instanceof d))return a.apply(c,e.concat(i.call(arguments)));F.prototype=a.prototype;var b=new F,g=a.apply(b,e.concat(i.call(arguments)));return Object(g)===g?g:b}};b.bindAll=function(a){var c=i.call(arguments,1);c.length==0&&(c=b.functions(a));j(c,function(c){a[c]=b.bind(a[c],a)});return a};b.memoize=function(a,
-c){var d={};c||(c=b.identity);return function(){var e=c.apply(this,arguments);return b.has(d,e)?d[e]:d[e]=a.apply(this,arguments)}};b.delay=function(a,b){var d=i.call(arguments,2);return setTimeout(function(){return a.apply(a,d)},b)};b.defer=function(a){return b.delay.apply(b,[a,1].concat(i.call(arguments,1)))};b.throttle=function(a,c){var d,e,f,g,h,i=b.debounce(function(){h=g=false},c);return function(){d=this;e=arguments;var b;f||(f=setTimeout(function(){f=null;h&&a.apply(d,e);i()},c));g?h=true:
-a.apply(d,e);i();g=true}};b.debounce=function(a,b){var d;return function(){var e=this,f=arguments;clearTimeout(d);d=setTimeout(function(){d=null;a.apply(e,f)},b)}};b.once=function(a){var b=false,d;return function(){if(b)return d;b=true;return d=a.apply(this,arguments)}};b.wrap=function(a,b){return function(){var d=[a].concat(i.call(arguments,0));return b.apply(this,d)}};b.compose=function(){var a=arguments;return function(){for(var b=arguments,d=a.length-1;d>=0;d--)b=[a[d].apply(this,b)];return b[0]}};
-b.after=function(a,b){return a<=0?b():function(){if(--a<1)return b.apply(this,arguments)}};b.keys=J||function(a){if(a!==Object(a))throw new TypeError("Invalid object");var c=[],d;for(d in a)b.has(a,d)&&(c[c.length]=d);return c};b.values=function(a){return b.map(a,b.identity)};b.functions=b.methods=function(a){var c=[],d;for(d in a)b.isFunction(a[d])&&c.push(d);return c.sort()};b.extend=function(a){j(i.call(arguments,1),function(b){for(var d in b)a[d]=b[d]});return a};b.defaults=function(a){j(i.call(arguments,
-1),function(b){for(var d in b)a[d]==null&&(a[d]=b[d])});return a};b.clone=function(a){return!b.isObject(a)?a:b.isArray(a)?a.slice():b.extend({},a)};b.tap=function(a,b){b(a);return a};b.isEqual=function(a,b){return q(a,b,[])};b.isEmpty=function(a){if(b.isArray(a)||b.isString(a))return a.length===0;for(var c in a)if(b.has(a,c))return false;return true};b.isElement=function(a){return!!(a&&a.nodeType==1)};b.isArray=o||function(a){return l.call(a)=="[object Array]"};b.isObject=function(a){return a===Object(a)};
-b.isArguments=function(a){return l.call(a)=="[object Arguments]"};if(!b.isArguments(arguments))b.isArguments=function(a){return!(!a||!b.has(a,"callee"))};b.isFunction=function(a){return l.call(a)=="[object Function]"};b.isString=function(a){return l.call(a)=="[object String]"};b.isNumber=function(a){return l.call(a)=="[object Number]"};b.isNaN=function(a){return a!==a};b.isBoolean=function(a){return a===true||a===false||l.call(a)=="[object Boolean]"};b.isDate=function(a){return l.call(a)=="[object Date]"};
-b.isRegExp=function(a){return l.call(a)=="[object RegExp]"};b.isNull=function(a){return a===null};b.isUndefined=function(a){return a===void 0};b.has=function(a,b){return I.call(a,b)};b.noConflict=function(){r._=G;return this};b.identity=function(a){return a};b.times=function(a,b,d){for(var e=0;e<a;e++)b.call(d,e)};b.escape=function(a){return(""+a).replace(/&/g,"&amp;").replace(/</g,"&lt;").replace(/>/g,"&gt;").replace(/"/g,"&quot;").replace(/'/g,"&#x27;").replace(/\//g,"&#x2F;")};b.mixin=function(a){j(b.functions(a),
-function(c){K(c,b[c]=a[c])})};var L=0;b.uniqueId=function(a){var b=L++;return a?a+b:b};b.templateSettings={evaluate:/<%([\s\S]+?)%>/g,interpolate:/<%=([\s\S]+?)%>/g,escape:/<%-([\s\S]+?)%>/g};var t=/.^/,u=function(a){return a.replace(/\\\\/g,"\\").replace(/\\'/g,"'")};b.template=function(a,c){var d=b.templateSettings,d="var __p=[],print=function(){__p.push.apply(__p,arguments);};with(obj||{}){__p.push('"+a.replace(/\\/g,"\\\\").replace(/'/g,"\\'").replace(d.escape||t,function(a,b){return"',_.escape("+
-u(b)+"),'"}).replace(d.interpolate||t,function(a,b){return"',"+u(b)+",'"}).replace(d.evaluate||t,function(a,b){return"');"+u(b).replace(/[\r\n\t]/g," ")+";__p.push('"}).replace(/\r/g,"\\r").replace(/\n/g,"\\n").replace(/\t/g,"\\t")+"');}return __p.join('');",e=new Function("obj","_",d);return c?e(c,b):function(a){return e.call(this,a,b)}};b.chain=function(a){return b(a).chain()};var m=function(a){this._wrapped=a};b.prototype=m.prototype;var v=function(a,c){return c?b(a).chain():a},K=function(a,c){m.prototype[a]=
-function(){var a=i.call(arguments);H.call(a,this._wrapped);return v(c.apply(b,a),this._chain)}};b.mixin(b);j("pop,push,reverse,shift,sort,splice,unshift".split(","),function(a){var b=k[a];m.prototype[a]=function(){var d=this._wrapped;b.apply(d,arguments);var e=d.length;(a=="shift"||a=="splice")&&e===0&&delete d[0];return v(d,this._chain)}});j(["concat","join","slice"],function(a){var b=k[a];m.prototype[a]=function(){return v(b.apply(this._wrapped,arguments),this._chain)}});m.prototype.chain=function(){this._chain=
-true;return this};m.prototype.value=function(){return this._wrapped}}).call(this);
+!function(n,r){"object"==typeof exports&&"undefined"!=typeof module?module.exports=r():"function"==typeof define&&define.amd?define("underscore",r):(n="undefined"!=typeof globalThis?globalThis:n||self,function(){var t=n._,e=n._=r();e.noConflict=function(){return n._=t,e}}())}(this,(function(){
+//     Underscore.js 1.13.1
+//     https://underscorejs.org
+//     (c) 2009-2021 Jeremy Ashkenas, Julian Gonggrijp, and DocumentCloud and Investigative Reporters & Editors
+//     Underscore may be freely distributed under the MIT license.
+var n="1.13.1",r="object"==typeof self&&self.self===self&&self||"object"==typeof global&&global.global===global&&global||Function("return this")()||{},t=Array.prototype,e=Object.prototype,u="undefined"!=typeof Symbol?Symbol.prototype:null,o=t.push,i=t.slice,a=e.toString,f=e.hasOwnProperty,c="undefined"!=typeof ArrayBuffer,l="undefined"!=typeof DataView,s=Array.isArray,p=Object.keys,v=Object.create,h=c&&ArrayBuffer.isView,y=isNaN,d=isFinite,g=!{toString:null}.propertyIsEnumerable("toString"),b=["valueOf","isPrototypeOf","toString","propertyIsEnumerable","hasOwnProperty","toLocaleString"],m=Math.pow(2,53)-1;function j(n,r){return r=null==r?n.length-1:+r,function(){for(var t=Math.max(arguments.length-r,0),e=Array(t),u=0;u<t;u++)e[u]=arguments[u+r];switch(r){case 0:return n.call(this,e);case 1:return n.call(this,arguments[0],e);case 2:return n.call(this,arguments[0],arguments[1],e)}var o=Array(r+1);for(u=0;u<r;u++)o[u]=arguments[u];return o[r]=e,n.apply(this,o)}}function _(n){var r=typeof n;return"function"===r||"object"===r&&!!n}function w(n){return void 0===n}function A(n){return!0===n||!1===n||"[object Boolean]"===a.call(n)}function x(n){var r="[object "+n+"]";return function(n){return a.call(n)===r}}var S=x("String"),O=x("Number"),M=x("Date"),E=x("RegExp"),B=x("Error"),N=x("Symbol"),I=x("ArrayBuffer"),T=x("Function"),k=r.document&&r.document.childNodes;"function"!=typeof/./&&"object"!=typeof Int8Array&&"function"!=typeof k&&(T=function(n){return"function"==typeof n||!1});var D=T,R=x("Object"),F=l&&R(new DataView(new ArrayBuffer(8))),V="undefined"!=typeof Map&&R(new Map),P=x("DataView");var q=F?function(n){return null!=n&&D(n.getInt8)&&I(n.buffer)}:P,U=s||x("Array");function W(n,r){return null!=n&&f.call(n,r)}var z=x("Arguments");!function(){z(arguments)||(z=function(n){return W(n,"callee")})}();var L=z;function $(n){return O(n)&&y(n)}function C(n){return function(){return n}}function K(n){return function(r){var t=n(r);return"number"==typeof t&&t>=0&&t<=m}}function J(n){return function(r){return null==r?void 0:r[n]}}var G=J("byteLength"),H=K(G),Q=/\[object ((I|Ui)nt(8|16|32)|Float(32|64)|Uint8Clamped|Big(I|Ui)nt64)Array\]/;var X=c?function(n){return h?h(n)&&!q(n):H(n)&&Q.test(a.call(n))}:C(!1),Y=J("length");function Z(n,r){r=function(n){for(var r={},t=n.length,e=0;e<t;++e)r[n[e]]=!0;return{contains:function(n){return r[n]},push:function(t){return r[t]=!0,n.push(t)}}}(r);var t=b.length,u=n.constructor,o=D(u)&&u.prototype||e,i="constructor";for(W(n,i)&&!r.contains(i)&&r.push(i);t--;)(i=b[t])in n&&n[i]!==o[i]&&!r.contains(i)&&r.push(i)}function nn(n){if(!_(n))return[];if(p)return p(n);var r=[];for(var t in n)W(n,t)&&r.push(t);return g&&Z(n,r),r}function rn(n,r){var t=nn(r),e=t.length;if(null==n)return!e;for(var u=Object(n),o=0;o<e;o++){var i=t[o];if(r[i]!==u[i]||!(i in u))return!1}return!0}function tn(n){return n instanceof tn?n:this instanceof tn?void(this._wrapped=n):new tn(n)}function en(n){return new Uint8Array(n.buffer||n,n.byteOffset||0,G(n))}tn.VERSION=n,tn.prototype.value=function(){return this._wrapped},tn.prototype.valueOf=tn.prototype.toJSON=tn.prototype.value,tn.prototype.toString=function(){return String(this._wrapped)};var un="[object DataView]";function on(n,r,t,e){if(n===r)return 0!==n||1/n==1/r;if(null==n||null==r)return!1;if(n!=n)return r!=r;var o=typeof n;return("function"===o||"object"===o||"object"==typeof r)&&function n(r,t,e,o){r instanceof tn&&(r=r._wrapped);t instanceof tn&&(t=t._wrapped);var i=a.call(r);if(i!==a.call(t))return!1;if(F&&"[object Object]"==i&&q(r)){if(!q(t))return!1;i=un}switch(i){case"[object RegExp]":case"[object String]":return""+r==""+t;case"[object Number]":return+r!=+r?+t!=+t:0==+r?1/+r==1/t:+r==+t;case"[object Date]":case"[object Boolean]":return+r==+t;case"[object Symbol]":return u.valueOf.call(r)===u.valueOf.call(t);case"[object ArrayBuffer]":case un:return n(en(r),en(t),e,o)}var f="[object Array]"===i;if(!f&&X(r)){if(G(r)!==G(t))return!1;if(r.buffer===t.buffer&&r.byteOffset===t.byteOffset)return!0;f=!0}if(!f){if("object"!=typeof r||"object"!=typeof t)return!1;var c=r.constructor,l=t.constructor;if(c!==l&&!(D(c)&&c instanceof c&&D(l)&&l instanceof l)&&"constructor"in r&&"constructor"in t)return!1}o=o||[];var s=(e=e||[]).length;for(;s--;)if(e[s]===r)return o[s]===t;if(e.push(r),o.push(t),f){if((s=r.length)!==t.length)return!1;for(;s--;)if(!on(r[s],t[s],e,o))return!1}else{var p,v=nn(r);if(s=v.length,nn(t).length!==s)return!1;for(;s--;)if(p=v[s],!W(t,p)||!on(r[p],t[p],e,o))return!1}return e.pop(),o.pop(),!0}(n,r,t,e)}function an(n){if(!_(n))return[];var r=[];for(var t in n)r.push(t);return g&&Z(n,r),r}function fn(n){var r=Y(n);return function(t){if(null==t)return!1;var e=an(t);if(Y(e))return!1;for(var u=0;u<r;u++)if(!D(t[n[u]]))return!1;return n!==hn||!D(t[cn])}}var cn="forEach",ln="has",sn=["clear","delete"],pn=["get",ln,"set"],vn=sn.concat(cn,pn),hn=sn.concat(pn),yn=["add"].concat(sn,cn,ln),dn=V?fn(vn):x("Map"),gn=V?fn(hn):x("WeakMap"),bn=V?fn(yn):x("Set"),mn=x("WeakSet");function jn(n){for(var r=nn(n),t=r.length,e=Array(t),u=0;u<t;u++)e[u]=n[r[u]];return e}function _n(n){for(var r={},t=nn(n),e=0,u=t.length;e<u;e++)r[n[t[e]]]=t[e];return r}function wn(n){var r=[];for(var t in n)D(n[t])&&r.push(t);return r.sort()}function An(n,r){return function(t){var e=arguments.length;if(r&&(t=Object(t)),e<2||null==t)return t;for(var u=1;u<e;u++)for(var o=arguments[u],i=n(o),a=i.length,f=0;f<a;f++){var c=i[f];r&&void 0!==t[c]||(t[c]=o[c])}return t}}var xn=An(an),Sn=An(nn),On=An(an,!0);function Mn(n){if(!_(n))return{};if(v)return v(n);var r=function(){};r.prototype=n;var t=new r;return r.prototype=null,t}function En(n){return _(n)?U(n)?n.slice():xn({},n):n}function Bn(n){return U(n)?n:[n]}function Nn(n){return tn.toPath(n)}function In(n,r){for(var t=r.length,e=0;e<t;e++){if(null==n)return;n=n[r[e]]}return t?n:void 0}function Tn(n,r,t){var e=In(n,Nn(r));return w(e)?t:e}function kn(n){return n}function Dn(n){return n=Sn({},n),function(r){return rn(r,n)}}function Rn(n){return n=Nn(n),function(r){return In(r,n)}}function Fn(n,r,t){if(void 0===r)return n;switch(null==t?3:t){case 1:return function(t){return n.call(r,t)};case 3:return function(t,e,u){return n.call(r,t,e,u)};case 4:return function(t,e,u,o){return n.call(r,t,e,u,o)}}return function(){return n.apply(r,arguments)}}function Vn(n,r,t){return null==n?kn:D(n)?Fn(n,r,t):_(n)&&!U(n)?Dn(n):Rn(n)}function Pn(n,r){return Vn(n,r,1/0)}function qn(n,r,t){return tn.iteratee!==Pn?tn.iteratee(n,r):Vn(n,r,t)}function Un(){}function Wn(n,r){return null==r&&(r=n,n=0),n+Math.floor(Math.random()*(r-n+1))}tn.toPath=Bn,tn.iteratee=Pn;var zn=Date.now||function(){return(new Date).getTime()};function Ln(n){var r=function(r){return n[r]},t="(?:"+nn(n).join("|")+")",e=RegExp(t),u=RegExp(t,"g");return function(n){return n=null==n?"":""+n,e.test(n)?n.replace(u,r):n}}var $n={"&":"&amp;","<":"&lt;",">":"&gt;",'"':"&quot;","'":"&#x27;","`":"&#x60;"},Cn=Ln($n),Kn=Ln(_n($n)),Jn=tn.templateSettings={evaluate:/<%([\s\S]+?)%>/g,interpolate:/<%=([\s\S]+?)%>/g,escape:/<%-([\s\S]+?)%>/g},Gn=/(.)^/,Hn={"'":"'","\\":"\\","\r":"r","\n":"n","\u2028":"u2028","\u2029":"u2029"},Qn=/\\|'|\r|\n|\u2028|\u2029/g;function Xn(n){return"\\"+Hn[n]}var Yn=/^\s*(\w|\$)+\s*$/;var Zn=0;function nr(n,r,t,e,u){if(!(e instanceof r))return n.apply(t,u);var o=Mn(n.prototype),i=n.apply(o,u);return _(i)?i:o}var rr=j((function(n,r){var t=rr.placeholder,e=function(){for(var u=0,o=r.length,i=Array(o),a=0;a<o;a++)i[a]=r[a]===t?arguments[u++]:r[a];for(;u<arguments.length;)i.push(arguments[u++]);return nr(n,e,this,this,i)};return e}));rr.placeholder=tn;var tr=j((function(n,r,t){if(!D(n))throw new TypeError("Bind must be called on a function");var e=j((function(u){return nr(n,e,r,this,t.concat(u))}));return e})),er=K(Y);function ur(n,r,t,e){if(e=e||[],r||0===r){if(r<=0)return e.concat(n)}else r=1/0;for(var u=e.length,o=0,i=Y(n);o<i;o++){var a=n[o];if(er(a)&&(U(a)||L(a)))if(r>1)ur(a,r-1,t,e),u=e.length;else for(var f=0,c=a.length;f<c;)e[u++]=a[f++];else t||(e[u++]=a)}return e}var or=j((function(n,r){var t=(r=ur(r,!1,!1)).length;if(t<1)throw new Error("bindAll must be passed function names");for(;t--;){var e=r[t];n[e]=tr(n[e],n)}return n}));var ir=j((function(n,r,t){return setTimeout((function(){return n.apply(null,t)}),r)})),ar=rr(ir,tn,1);function fr(n){return function(){return!n.apply(this,arguments)}}function cr(n,r){var t;return function(){return--n>0&&(t=r.apply(this,arguments)),n<=1&&(r=null),t}}var lr=rr(cr,2);function sr(n,r,t){r=qn(r,t);for(var e,u=nn(n),o=0,i=u.length;o<i;o++)if(r(n[e=u[o]],e,n))return e}function pr(n){return function(r,t,e){t=qn(t,e);for(var u=Y(r),o=n>0?0:u-1;o>=0&&o<u;o+=n)if(t(r[o],o,r))return o;return-1}}var vr=pr(1),hr=pr(-1);function yr(n,r,t,e){for(var u=(t=qn(t,e,1))(r),o=0,i=Y(n);o<i;){var a=Math.floor((o+i)/2);t(n[a])<u?o=a+1:i=a}return o}function dr(n,r,t){return function(e,u,o){var a=0,f=Y(e);if("number"==typeof o)n>0?a=o>=0?o:Math.max(o+f,a):f=o>=0?Math.min(o+1,f):o+f+1;else if(t&&o&&f)return e[o=t(e,u)]===u?o:-1;if(u!=u)return(o=r(i.call(e,a,f),$))>=0?o+a:-1;for(o=n>0?a:f-1;o>=0&&o<f;o+=n)if(e[o]===u)return o;return-1}}var gr=dr(1,vr,yr),br=dr(-1,hr);function mr(n,r,t){var e=(er(n)?vr:sr)(n,r,t);if(void 0!==e&&-1!==e)return n[e]}function jr(n,r,t){var e,u;if(r=Fn(r,t),er(n))for(e=0,u=n.length;e<u;e++)r(n[e],e,n);else{var o=nn(n);for(e=0,u=o.length;e<u;e++)r(n[o[e]],o[e],n)}return n}function _r(n,r,t){r=qn(r,t);for(var e=!er(n)&&nn(n),u=(e||n).length,o=Array(u),i=0;i<u;i++){var a=e?e[i]:i;o[i]=r(n[a],a,n)}return o}function wr(n){var r=function(r,t,e,u){var o=!er(r)&&nn(r),i=(o||r).length,a=n>0?0:i-1;for(u||(e=r[o?o[a]:a],a+=n);a>=0&&a<i;a+=n){var f=o?o[a]:a;e=t(e,r[f],f,r)}return e};return function(n,t,e,u){var o=arguments.length>=3;return r(n,Fn(t,u,4),e,o)}}var Ar=wr(1),xr=wr(-1);function Sr(n,r,t){var e=[];return r=qn(r,t),jr(n,(function(n,t,u){r(n,t,u)&&e.push(n)})),e}function Or(n,r,t){r=qn(r,t);for(var e=!er(n)&&nn(n),u=(e||n).length,o=0;o<u;o++){var i=e?e[o]:o;if(!r(n[i],i,n))return!1}return!0}function Mr(n,r,t){r=qn(r,t);for(var e=!er(n)&&nn(n),u=(e||n).length,o=0;o<u;o++){var i=e?e[o]:o;if(r(n[i],i,n))return!0}return!1}function Er(n,r,t,e){return er(n)||(n=jn(n)),("number"!=typeof t||e)&&(t=0),gr(n,r,t)>=0}var Br=j((function(n,r,t){var e,u;return D(r)?u=r:(r=Nn(r),e=r.slice(0,-1),r=r[r.length-1]),_r(n,(function(n){var o=u;if(!o){if(e&&e.length&&(n=In(n,e)),null==n)return;o=n[r]}return null==o?o:o.apply(n,t)}))}));function Nr(n,r){return _r(n,Rn(r))}function Ir(n,r,t){var e,u,o=-1/0,i=-1/0;if(null==r||"number"==typeof r&&"object"!=typeof n[0]&&null!=n)for(var a=0,f=(n=er(n)?n:jn(n)).length;a<f;a++)null!=(e=n[a])&&e>o&&(o=e);else r=qn(r,t),jr(n,(function(n,t,e){((u=r(n,t,e))>i||u===-1/0&&o===-1/0)&&(o=n,i=u)}));return o}function Tr(n,r,t){if(null==r||t)return er(n)||(n=jn(n)),n[Wn(n.length-1)];var e=er(n)?En(n):jn(n),u=Y(e);r=Math.max(Math.min(r,u),0);for(var o=u-1,i=0;i<r;i++){var a=Wn(i,o),f=e[i];e[i]=e[a],e[a]=f}return e.slice(0,r)}function kr(n,r){return function(t,e,u){var o=r?[[],[]]:{};return e=qn(e,u),jr(t,(function(r,u){var i=e(r,u,t);n(o,r,i)})),o}}var Dr=kr((function(n,r,t){W(n,t)?n[t].push(r):n[t]=[r]})),Rr=kr((function(n,r,t){n[t]=r})),Fr=kr((function(n,r,t){W(n,t)?n[t]++:n[t]=1})),Vr=kr((function(n,r,t){n[t?0:1].push(r)}),!0),Pr=/[^\ud800-\udfff]|[\ud800-\udbff][\udc00-\udfff]|[\ud800-\udfff]/g;function qr(n,r,t){return r in t}var Ur=j((function(n,r){var t={},e=r[0];if(null==n)return t;D(e)?(r.length>1&&(e=Fn(e,r[1])),r=an(n)):(e=qr,r=ur(r,!1,!1),n=Object(n));for(var u=0,o=r.length;u<o;u++){var i=r[u],a=n[i];e(a,i,n)&&(t[i]=a)}return t})),Wr=j((function(n,r){var t,e=r[0];return D(e)?(e=fr(e),r.length>1&&(t=r[1])):(r=_r(ur(r,!1,!1),String),e=function(n,t){return!Er(r,t)}),Ur(n,e,t)}));function zr(n,r,t){return i.call(n,0,Math.max(0,n.length-(null==r||t?1:r)))}function Lr(n,r,t){return null==n||n.length<1?null==r||t?void 0:[]:null==r||t?n[0]:zr(n,n.length-r)}function $r(n,r,t){return i.call(n,null==r||t?1:r)}var Cr=j((function(n,r){return r=ur(r,!0,!0),Sr(n,(function(n){return!Er(r,n)}))})),Kr=j((function(n,r){return Cr(n,r)}));function Jr(n,r,t,e){A(r)||(e=t,t=r,r=!1),null!=t&&(t=qn(t,e));for(var u=[],o=[],i=0,a=Y(n);i<a;i++){var f=n[i],c=t?t(f,i,n):f;r&&!t?(i&&o===c||u.push(f),o=c):t?Er(o,c)||(o.push(c),u.push(f)):Er(u,f)||u.push(f)}return u}var Gr=j((function(n){return Jr(ur(n,!0,!0))}));function Hr(n){for(var r=n&&Ir(n,Y).length||0,t=Array(r),e=0;e<r;e++)t[e]=Nr(n,e);return t}var Qr=j(Hr);function Xr(n,r){return n._chain?tn(r).chain():r}function Yr(n){return jr(wn(n),(function(r){var t=tn[r]=n[r];tn.prototype[r]=function(){var n=[this._wrapped];return o.apply(n,arguments),Xr(this,t.apply(tn,n))}})),tn}jr(["pop","push","reverse","shift","sort","splice","unshift"],(function(n){var r=t[n];tn.prototype[n]=function(){var t=this._wrapped;return null!=t&&(r.apply(t,arguments),"shift"!==n&&"splice"!==n||0!==t.length||delete t[0]),Xr(this,t)}})),jr(["concat","join","slice"],(function(n){var r=t[n];tn.prototype[n]=function(){var n=this._wrapped;return null!=n&&(n=r.apply(n,arguments)),Xr(this,n)}}));var Zr=Yr({__proto__:null,VERSION:n,restArguments:j,isObject:_,isNull:function(n){return null===n},isUndefined:w,isBoolean:A,isElement:function(n){return!(!n||1!==n.nodeType)},isString:S,isNumber:O,isDate:M,isRegExp:E,isError:B,isSymbol:N,isArrayBuffer:I,isDataView:q,isArray:U,isFunction:D,isArguments:L,isFinite:function(n){return!N(n)&&d(n)&&!isNaN(parseFloat(n))},isNaN:$,isTypedArray:X,isEmpty:function(n){if(null==n)return!0;var r=Y(n);return"number"==typeof r&&(U(n)||S(n)||L(n))?0===r:0===Y(nn(n))},isMatch:rn,isEqual:function(n,r){return on(n,r)},isMap:dn,isWeakMap:gn,isSet:bn,isWeakSet:mn,keys:nn,allKeys:an,values:jn,pairs:function(n){for(var r=nn(n),t=r.length,e=Array(t),u=0;u<t;u++)e[u]=[r[u],n[r[u]]];return e},invert:_n,functions:wn,methods:wn,extend:xn,extendOwn:Sn,assign:Sn,defaults:On,create:function(n,r){var t=Mn(n);return r&&Sn(t,r),t},clone:En,tap:function(n,r){return r(n),n},get:Tn,has:function(n,r){for(var t=(r=Nn(r)).length,e=0;e<t;e++){var u=r[e];if(!W(n,u))return!1;n=n[u]}return!!t},mapObject:function(n,r,t){r=qn(r,t);for(var e=nn(n),u=e.length,o={},i=0;i<u;i++){var a=e[i];o[a]=r(n[a],a,n)}return o},identity:kn,constant:C,noop:Un,toPath:Bn,property:Rn,propertyOf:function(n){return null==n?Un:function(r){return Tn(n,r)}},matcher:Dn,matches:Dn,times:function(n,r,t){var e=Array(Math.max(0,n));r=Fn(r,t,1);for(var u=0;u<n;u++)e[u]=r(u);return e},random:Wn,now:zn,escape:Cn,unescape:Kn,templateSettings:Jn,template:function(n,r,t){!r&&t&&(r=t),r=On({},r,tn.templateSettings);var e=RegExp([(r.escape||Gn).source,(r.interpolate||Gn).source,(r.evaluate||Gn).source].join("|")+"|$","g"),u=0,o="__p+='";n.replace(e,(function(r,t,e,i,a){return o+=n.slice(u,a).replace(Qn,Xn),u=a+r.length,t?o+="'+\n((__t=("+t+"))==null?'':_.escape(__t))+\n'":e?o+="'+\n((__t=("+e+"))==null?'':__t)+\n'":i&&(o+="';\n"+i+"\n__p+='"),r})),o+="';\n";var i,a=r.variable;if(a){if(!Yn.test(a))throw new Error("variable is not a bare identifier: "+a)}else o="with(obj||{}){\n"+o+"}\n",a="obj";o="var __t,__p='',__j=Array.prototype.join,"+"print=function(){__p+=__j.call(arguments,'');};\n"+o+"return __p;\n";try{i=new Function(a,"_",o)}catch(n){throw n.source=o,n}var f=function(n){return i.call(this,n,tn)};return f.source="function("+a+"){\n"+o+"}",f},result:function(n,r,t){var e=(r=Nn(r)).length;if(!e)return D(t)?t.call(n):t;for(var u=0;u<e;u++){var o=null==n?void 0:n[r[u]];void 0===o&&(o=t,u=e),n=D(o)?o.call(n):o}return n},uniqueId:function(n){var r=++Zn+"";return n?n+r:r},chain:function(n){var r=tn(n);return r._chain=!0,r},iteratee:Pn,partial:rr,bind:tr,bindAll:or,memoize:function(n,r){var t=function(e){var u=t.cache,o=""+(r?r.apply(this,arguments):e);return W(u,o)||(u[o]=n.apply(this,arguments)),u[o]};return t.cache={},t},delay:ir,defer:ar,throttle:function(n,r,t){var e,u,o,i,a=0;t||(t={});var f=function(){a=!1===t.leading?0:zn(),e=null,i=n.apply(u,o),e||(u=o=null)},c=function(){var c=zn();a||!1!==t.leading||(a=c);var l=r-(c-a);return u=this,o=arguments,l<=0||l>r?(e&&(clearTimeout(e),e=null),a=c,i=n.apply(u,o),e||(u=o=null)):e||!1===t.trailing||(e=setTimeout(f,l)),i};return c.cancel=function(){clearTimeout(e),a=0,e=u=o=null},c},debounce:function(n,r,t){var e,u,o,i,a,f=function(){var c=zn()-u;r>c?e=setTimeout(f,r-c):(e=null,t||(i=n.apply(a,o)),e||(o=a=null))},c=j((function(c){return a=this,o=c,u=zn(),e||(e=setTimeout(f,r),t&&(i=n.apply(a,o))),i}));return c.cancel=function(){clearTimeout(e),e=o=a=null},c},wrap:function(n,r){return rr(r,n)},negate:fr,compose:function(){var n=arguments,r=n.length-1;return function(){for(var t=r,e=n[r].apply(this,arguments);t--;)e=n[t].call(this,e);return e}},after:function(n,r){return function(){if(--n<1)return r.apply(this,arguments)}},before:cr,once:lr,findKey:sr,findIndex:vr,findLastIndex:hr,sortedIndex:yr,indexOf:gr,lastIndexOf:br,find:mr,detect:mr,findWhere:function(n,r){return mr(n,Dn(r))},each:jr,forEach:jr,map:_r,collect:_r,reduce:Ar,foldl:Ar,inject:Ar,reduceRight:xr,foldr:xr,filter:Sr,select:Sr,reject:function(n,r,t){return Sr(n,fr(qn(r)),t)},every:Or,all:Or,some:Mr,any:Mr,contains:Er,includes:Er,include:Er,invoke:Br,pluck:Nr,where:function(n,r){return Sr(n,Dn(r))},max:Ir,min:function(n,r,t){var e,u,o=1/0,i=1/0;if(null==r||"number"==typeof r&&"object"!=typeof n[0]&&null!=n)for(var a=0,f=(n=er(n)?n:jn(n)).length;a<f;a++)null!=(e=n[a])&&e<o&&(o=e);else r=qn(r,t),jr(n,(function(n,t,e){((u=r(n,t,e))<i||u===1/0&&o===1/0)&&(o=n,i=u)}));return o},shuffle:function(n){return Tr(n,1/0)},sample:Tr,sortBy:function(n,r,t){var e=0;return r=qn(r,t),Nr(_r(n,(function(n,t,u){return{value:n,index:e++,criteria:r(n,t,u)}})).sort((function(n,r){var t=n.criteria,e=r.criteria;if(t!==e){if(t>e||void 0===t)return 1;if(t<e||void 0===e)return-1}return n.index-r.index})),"value")},groupBy:Dr,indexBy:Rr,countBy:Fr,partition:Vr,toArray:function(n){return n?U(n)?i.call(n):S(n)?n.match(Pr):er(n)?_r(n,kn):jn(n):[]},size:function(n){return null==n?0:er(n)?n.length:nn(n).length},pick:Ur,omit:Wr,first:Lr,head:Lr,take:Lr,initial:zr,last:function(n,r,t){return null==n||n.length<1?null==r||t?void 0:[]:null==r||t?n[n.length-1]:$r(n,Math.max(0,n.length-r))},rest:$r,tail:$r,drop:$r,compact:function(n){return Sr(n,Boolean)},flatten:function(n,r){return ur(n,r,!1)},without:Kr,uniq:Jr,unique:Jr,union:Gr,intersection:function(n){for(var r=[],t=arguments.length,e=0,u=Y(n);e<u;e++){var o=n[e];if(!Er(r,o)){var i;for(i=1;i<t&&Er(arguments[i],o);i++);i===t&&r.push(o)}}return r},difference:Cr,unzip:Hr,transpose:Hr,zip:Qr,object:function(n,r){for(var t={},e=0,u=Y(n);e<u;e++)r?t[n[e]]=r[e]:t[n[e][0]]=n[e][1];return t},range:function(n,r,t){null==r&&(r=n||0,n=0),t||(t=r<n?-1:1);for(var e=Math.max(Math.ceil((r-n)/t),0),u=Array(e),o=0;o<e;o++,n+=t)u[o]=n;return u},chunk:function(n,r){if(null==r||r<1)return[];for(var t=[],e=0,u=n.length;e<u;)t.push(i.call(n,e,e+=r));return t},mixin:Yr,default:tn});return Zr._=Zr,Zr}));
\ No newline at end of file
diff --git a/docs/build/html/genindex.html b/docs/build/html/genindex.html
index f927b9b..4437259 100644
--- a/docs/build/html/genindex.html
+++ b/docs/build/html/genindex.html
@@ -11,15 +11,14 @@
     
     <title>Index &#8212; deep-significance 0.9 documentation</title>
 
-    <link rel="stylesheet" href="_static/basic.css" type="text/css" />
-    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <link rel="stylesheet" type="text/css" href="_static/pygments.css" />
+    <link rel="stylesheet" type="text/css" href="_static/basic.css" />
     <link rel="stylesheet" href="_static/bootstrap-4.3.1-dist/css/bootstrap.min.css" type="text/css" />
     <link rel="stylesheet" href="_static/sphinxbootstrap4.css" type="text/css" />
-    <script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
+    <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
     <script src="_static/jquery.js"></script>
     <script src="_static/underscore.js"></script>
     <script src="_static/doctools.js"></script>
-    <script src="_static/language_data.js"></script>
     <script src="_static/bootstrap-4.3.1-dist/js/bootstrap.min.js"></script>
     <script src="_static/sphinxbootstrap4.js"></script>
     <link rel="index" title="Index" href="#" />
@@ -216,6 +215,10 @@ <h2 id="D">D</h2>
 
 <h2 id="G">G</h2>
 <table style="width: 100%" class="indextable genindextable"><tr>
+  <td style="width: 33%; vertical-align: top;"><ul>
+      <li><a href="index.html#deepsig.aso.get_bootstrapped_violation_ratios">get_bootstrapped_violation_ratios() (in module deepsig.aso)</a>
+</li>
+  </ul></td>
   <td style="width: 33%; vertical-align: top;"><ul>
       <li><a href="index.html#deepsig.aso.get_quantile_function">get_quantile_function() (in module deepsig.aso)</a>
 </li>
@@ -269,7 +272,7 @@ <h2 id="P">P</h2>
 <footer class="footer d-flex justify-content-between flex-wrap">
     <div class="copyright">
         <div>&copy; Copyright 2021, Dennis Ulmer.</div>
-      <div>Generated by <a href="http://sphinx.pocoo.org/">Sphinx</a> 3.2.1 using <a href="https://github.com/myyasuda/sphinxbootstrap4theme">sphinxbootstrap4theme</a> 0.6.0.</div>
+      <div>Generated by <a href="http://sphinx.pocoo.org/">Sphinx</a> 4.1.2 using <a href="https://github.com/myyasuda/sphinxbootstrap4theme">sphinxbootstrap4theme</a> 0.6.0.</div>
     </div>
     <div>
         <a href="#" class="btn btn-primary btn-sm" role="botton">Back to top</a>
diff --git a/docs/build/html/index.html b/docs/build/html/index.html
index 5492f6f..fc934b9 100644
--- a/docs/build/html/index.html
+++ b/docs/build/html/index.html
@@ -4,22 +4,22 @@
 <html>
   <head>
     <meta charset="utf-8" />
-    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
+
     <meta charset="utf-8">
     <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
     <meta http-equiv="x-ua-compatible" content="ie=edge">
     
     <title>deep-significance: Easy and Better Significance Testing for Deep Neural Networks &#8212; deep-significance 0.9 documentation</title>
 
-    <link rel="stylesheet" href="_static/basic.css" type="text/css" />
-    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <link rel="stylesheet" type="text/css" href="_static/pygments.css" />
+    <link rel="stylesheet" type="text/css" href="_static/basic.css" />
     <link rel="stylesheet" href="_static/bootstrap-4.3.1-dist/css/bootstrap.min.css" type="text/css" />
     <link rel="stylesheet" href="_static/sphinxbootstrap4.css" type="text/css" />
-    <script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
+    <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
     <script src="_static/jquery.js"></script>
     <script src="_static/underscore.js"></script>
     <script src="_static/doctools.js"></script>
-    <script src="_static/language_data.js"></script>
     <script src="_static/bootstrap-4.3.1-dist/js/bootstrap.min.js"></script>
     <script src="_static/sphinxbootstrap4.js"></script>
     <link rel="index" title="Index" href="genindex.html" />
@@ -110,9 +110,9 @@ <h4>Table Of Contents</h4>
 <li><a class="reference internal" href="#sparkles-other-features">✨ Other features</a></li>
 <li><a class="reference internal" href="#id12">General recommendations &amp; other notes</a></li>
 <li><a class="reference internal" href="#id13">🎓 Cite</a></li>
-<li><a class="reference internal" href="#id14">🏅 Acknowledgements</a></li>
-<li><a class="reference internal" href="#id17">🧑‍🤝‍🧑 Papers using deep-significance</a></li>
-<li><a class="reference internal" href="#id18">📚 Bibliography</a></li>
+<li><a class="reference internal" href="#id16">🏅 Acknowledgements</a></li>
+<li><a class="reference internal" href="#id19">🧑‍🤝‍🧑 Papers using deep-significance</a></li>
+<li><a class="reference internal" href="#id20">📚 Bibliography</a></li>
 </ul>
 </li>
 <li><a class="reference internal" href="#module-deepsig">Documentation</a></li>
@@ -141,7 +141,7 @@ <h4>Table Of Contents</h4>
         <div class="bodywrapper">
           <div class="body">
             
-  <div class="section" id="deep-significance-easy-and-better-significance-testing-for-deep-neural-networks">
+  <section id="deep-significance-easy-and-better-significance-testing-for-deep-neural-networks">
 <h1>deep-significance: Easy and Better Significance Testing for Deep Neural Networks<a class="headerlink" href="#deep-significance-easy-and-better-significance-testing-for-deep-neural-networks" title="Permalink to this headline">¶</a></h1>
 <a class="reference external image-reference" href="https://coveralls.io/github/Kaleidophon/deep-significance?branch=main"><img alt="Coverage Status" src="https://coveralls.io/repos/github/Kaleidophon/deep-significance/badge.svg?branch=main&amp;service=github" /></a>
 <a class="reference external image-reference" href="https://www.gnu.org/licenses/gpl-3.0"><img alt="License: GPL v3" src="https://img.shields.io/badge/License-GPLv3-blue.svg" /></a>
@@ -170,7 +170,7 @@ <h1>deep-significance: Easy and Better Significance Testing for Deep Neural Netw
 <li><p><a class="reference external" href="#people_holding_hands-papers-using-deep-significance">|:people_holding_hands:| Papers using deep-significance</a></p></li>
 <li><p><a class="reference external" href="#books-bibliography">|:books:| Bibliography</a></p></li>
 </ul>
-<div class="section" id="id1">
+<section id="id1">
 <h2>⁉️ Why?<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h2>
 <p>Although Deep Learning has undergone spectacular growth in the recent decade,
 a large portion of experimental evidence is not supported by statistical hypothesis tests. Instead,
@@ -187,15 +187,16 @@ <h2>⁉️ Why?<a class="headerlink" href="#id1" title="Permalink to this headli
 <p>To help mitigate this problem, this package supplies fully-tested re-implementations of useful functions for significance
 testing:</p>
 <ul class="simple">
-<li><p>Statistical Significance tests such as Almost Stochastic Order (Dror et al., 2019), bootstrap (Efron &amp; Tibshirani, 1994) and
-permutation-randomization (Noreen, 1989).</p></li>
+<li><p>Statistical Significance tests such as Almost Stochastic Order (del Barrio et al, 2017; Dror et al., 2019),
+bootstrap (Efron &amp; Tibshirani, 1994) and permutation-randomization (Noreen, 1989).</p></li>
 <li><p>Bonferroni correction methods for multiplicity in datasets (Bonferroni, 1936).</p></li>
 <li><p>Bootstrap power analysis (Yuan &amp; Hayashi, 2003) and other functions to determine the right sample size.</p></li>
 </ul>
 <p>All functions are fully tested and also compatible with common deep learning data structures, such as PyTorch /
 Tensorflow tensors as well as NumPy and Jax arrays.  For examples about the usage, consult the documentation
-<a class="reference external" href="https://deep-significance.readthedocs.io/en/latest/">here</a> or the scenarios in the section <a class="reference external" href="#examples">Examples</a>.</p>
-<div class="section" id="id2">
+<a class="reference external" href="https://deep-significance.readthedocs.io/en/latest/">here</a> , the scenarios in the section <a class="reference external" href="#examples">Examples</a> or
+the <a class="reference external" href="https://github.com/Kaleidophon/deep-significance/tree/main/paper/deep-significance%20demo.ipynb">demo Jupyter notebook</a>.</p>
+<section id="id2">
 <h3>📥 Installation<a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h3>
 <p>The package can simply be installed using <code class="docutils literal notranslate"><span class="pre">pip</span></code> by running</p>
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip3</span> <span class="n">install</span> <span class="n">deepsig</span>
@@ -208,64 +209,74 @@ <h3>📥 Installation<a class="headerlink" href="#id2" title="Permalink to this
 </pre></div>
 </div>
 <p><strong>Warning</strong>: Installed like this, imports will fail when the clones repository is moved.</p>
-</div>
-<div class="section" id="id3">
+</section>
+<section id="id3">
 <h3>🔖 Examples<a class="headerlink" href="#id3" title="Permalink to this headline">¶</a></h3>
 <hr class="docutils" />
 <p><strong>tl;dr</strong>: Use <code class="docutils literal notranslate"><span class="pre">aso()</span></code> to compare scores for two models. If the returned <code class="docutils literal notranslate"><span class="pre">eps_min</span> <span class="pre">&lt;</span> <span class="pre">0.5</span></code>, A is better than B. The lower
-<code class="docutils literal notranslate"><span class="pre">eps_min</span></code>, the more confident the result.</p>
+<code class="docutils literal notranslate"><span class="pre">eps_min</span></code>, the more confident the result (we recommend to check <code class="docutils literal notranslate"><span class="pre">eps_min</span> <span class="pre">&lt;</span> <span class="pre">0.2</span></code> and record <code class="docutils literal notranslate"><span class="pre">eps_min</span></code> alongside
+experimental results).</p>
 <p>⚠️ Testing models with only one set of hyperparameters and only one test set will be able to guarantee superiority
 in all settings. See <a class="reference external" href="#general-recommendations">General Recommendations &amp; other notes</a>.</p>
 <hr class="docutils" />
-<p>In the following, I will lay out three scenarios that describe common use cases for ML practitioners and how to apply
+<p>In the following, we will lay out three scenarios that describe common use cases for ML practitioners and how to apply
 the methods implemented in this package accordingly. For an introduction into statistical hypothesis testing, please
 refer to resources such as <a class="reference external" href="https://machinelearningmastery.com/statistical-hypothesis-tests/">this blog post</a> for a general
 overview or <a class="reference external" href="https://www.aclweb.org/anthology/P18-1128.pdf">Dror et al. (2018)</a> for a NLP-specific point of view.</p>
-<p>In general, in statistical significance testing, we usually compare two algorithms <span class="raw-html-m2r"><img src="53d147e7f3fe6e47ee05b88b166bd3f6.svg?invert_in_darkmode" align=middle width=12.32879834999999pt height=22.465723500000017pt/></span> and <span class="raw-html-m2r"><img src="61e84f854bc6258d4108d08d4c4a0852.svg?invert_in_darkmode" align=middle width=13.29340979999999pt height=22.465723500000017pt/></span> on a dataset <span class="raw-html-m2r"><img src="cbfb1b2a33b28eab8a3e59464768e810.svg?invert_in_darkmode" align=middle width=14.908688849999992pt height=22.465723500000017pt/></span> using
-some evaluation metric <span class="raw-html-m2r"><img src="b5eaea000e06d5cf2e882f8fdbc71e36.svg?invert_in_darkmode" align=middle width=19.740822749999992pt height=22.465723500000017pt/></span> (we assume a higher = better). The difference between the two algorithms on the
-data is then defined as</p>
-<p align="center"><img src="9540dc879d2ecaa7cb245871b24f4e5d.svg?invert_in_darkmode" align=middle width=212.73480854999997pt height=16.438356pt/></p><p>where <span class="raw-html-m2r"><img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/></span> is our test statistic. We then test the following <strong>null hypothesis</strong>:</p>
-<p align="center"><img src="1d210dbbb93bbdc5a632b9443059499d.svg?invert_in_darkmode" align=middle width=100.49629589999999pt height=16.438356pt/></p><p>Thus, we assume our algorithm A to be equally as good or worse than algorithm B and reject the null hypothesis if A
-is better than B (what we actually would like to see). Most statistical significance tests operate using
-<em>p-values</em>, which define the probability that under the null-hypothesis, the <span class="raw-html-m2r"><img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/></span> expected by the test is larger than or
-equal to the observed difference <span class="raw-html-m2r"><img src="ecdae90a73f512871267f358443bd563.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/></span> (that is, for a one-sided test, i.e. we assume A to be better than B):</p>
-<p align="center"><img src="6d2735c4e335ec03c8b45736da4531a3.svg?invert_in_darkmode" align=middle width=135.91559685pt height=16.438356pt/></p><p>We can interpret this equation as follows: Assuming that A is <em>not</em> better than B, the test assumes a corresponding distribution
-of differences that <span class="raw-html-m2r"><img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/></span> is drawn from. How does our actually observed difference <span class="raw-html-m2r"><img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/></span> fit in there?
-This is what the p-value is expressing: If this probability is high, <span class="raw-html-m2r"><img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/></span> is in line with what we expected under
-the null hypothesis, so we conclude A not to better than B. If the
-probability is low, that means that <span class="raw-html-m2r"><img src="94ea44af3034479a1ba3f2f655bcec39.svg?invert_in_darkmode" align=middle width=26.32659479999999pt height=22.831056599999986pt/></span> is quite unlikely under the null hypothesis and that the reverse
-case is more likely - i.e. that it is
-likely <em>larger</em> than <span class="raw-html-m2r"><img src="6dea53e880ae565b82d6b4a6148a0012.svg?invert_in_darkmode" align=middle width=35.622171749999985pt height=24.65753399999998pt/></span> - and we conclude that A is indeed better than B. Note that <strong>the p-value does not
-express whether the null hypothesis is true</strong>.</p>
-<p>To decide when we trust A to be better than B, we set a threshold that will determine when the p-value is small enough
-for us to reject the null hypothesis, this is called the significance level <span class="raw-html-m2r"><img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/></span> and it is often set to be 0.05.</p>
-</div>
-</div>
-<div class="section" id="id5">
+<p>We assume that we have two sets of scores we would like to compare, <span class="raw-html-m2r"><img src="b7e817ab52abd984b082abaa1da6a8e4.svg?invert_in_darkmode" align=middle width=17.44287434999999pt height=22.648391699999998pt/></span> and <span class="raw-html-m2r"><img src="d06f8d92c07734af06da289c13d2beed.svg?invert_in_darkmode" align=middle width=16.80361814999999pt height=22.648391699999998pt/></span>,
+for instance obtained by running two models <span class="raw-html-m2r"><img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/></span> and <span class="raw-html-m2r"><img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/></span> multiple times with a different random seed.
+We can then define a one-sided test statistic  <span class="raw-html-m2r"><img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/></span> based on the gathered observations.
+An example of such test statistics is for instance the difference in observation means. We then formulate the following null-hypothesis:</p>
+<p align="center"><img src="00160c684b3af8ccefcdf19c69712e34.svg?invert_in_darkmode" align=middle width=128.7838134pt height=16.438356pt/></p><p>That means that we actually assume the opposite of our desired case, namely that <span class="raw-html-m2r"><img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/></span> is not better than <span class="raw-html-m2r"><img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/></span>,
+but equally as good or worse, as indicated by the value of the test statistic.
+Usually, the goal becomes to reject this null hypothesis using the SST.
+<em>p</em>-value testing is a frequentist method in the realm of SST.
+It introduces the notion of data that <em>could have been observed</em> if we were to repeat our experiment again using
+the same conditions, which we will write with superscript <span class="raw-html-m2r"><img src="e723e08dae472a15132221e280670a7e.svg?invert_in_darkmode" align=middle width=22.87678634999999pt height=14.15524440000002pt/></span> in order to distinguish them from our actually
+observed scores (Gelman et al., 2021).
+We then define the <em>p</em>-value as the probability that, under the null hypothesis, the test statistic using replicated
+observation is larger than or equal to the <em>observed</em> test statistic:</p>
+<p align="center"><img src="5db9dda6d48361ba963326d3f98a033d.svg?invert_in_darkmode" align=middle width=216.90071865pt height=17.74869195pt/></p><p>We can interpret this expression as follows: Assuming that <span class="raw-html-m2r"><img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/></span> is not better than <span class="raw-html-m2r"><img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/></span>, the test
+assumes a corresponding distribution of statistics that <span class="raw-html-m2r"><img src="38f1e2a089e53d5c990a82f284948953.svg?invert_in_darkmode" align=middle width=7.928075099999989pt height=22.831056599999986pt/></span> is drawn from. So how does the observed test statistic
+<span class="raw-html-m2r"><img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/></span> fit in here? This is what the <span class="raw-html-m2r"><img src="2ec6e630f199f589a2402fdf3e0289d5.svg?invert_in_darkmode" align=middle width=8.270567249999992pt height=14.15524440000002pt/></span>-value expresses: When the
+probability is high, <span class="raw-html-m2r"><img src="ae00ae93dc535f589522f8780b5aa275.svg?invert_in_darkmode" align=middle width=63.909690899999994pt height=24.65753399999998pt/></span> is in line with what we expected under the
+null hypothesis, so we can <em>not</em> reject the null hypothesis, or in other words, we emph{cannot} conclude
+<span class="raw-html-m2r"><img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/></span> to be better than <span class="raw-html-m2r"><img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/></span>. If the probability is low, that means that the observed
+<span class="raw-html-m2r"><img src="67ebeedcf8c4d1141331d07b2cef2b03.svg?invert_in_darkmode" align=middle width=54.77736824999999pt height=24.65753399999998pt/></span> is quite unlikely under the null hypothesis and that the reverse case is
+more likely - i.e. that it is likely larger than - and we conclude that <span class="raw-html-m2r"><img src="d41a53916d4850841d856bc8f5aa809a.svg?invert_in_darkmode" align=middle width=11.87217899999999pt height=22.648391699999998pt/></span> is indeed better than
+<span class="raw-html-m2r"><img src="f0e8ebc4201c3608138c518417f42ac4.svg?invert_in_darkmode" align=middle width=10.95894029999999pt height=22.648391699999998pt/></span>. Note that <strong>the :raw-html-m2r:`&lt;img src=”2ec6e630f199f589a2402fdf3e0289d5.svg?invert_in_darkmode” align=middle width=8.270567249999992pt height=14.15524440000002pt/&gt;`-value does not express whether the null hypothesis is true</strong>. To make our decision
+about whether or not to reject the null hypothesis, we typically determine a threshold - the significance level
+<span class="raw-html-m2r"><img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/></span>, often set to 0.05 - that the <em>p</em>-value has to fall below. However, it has been argued that a better practice
+involves reporting the <em>p</em>-value alongside the results without a pidgeonholing of results into significant and non-significant
+(Wasserstein et al., 2019).</p>
+</section>
+</section>
+<section id="id5">
 <h2>Intermezzo: Almost Stochastic Order - a better significance test for Deep Neural Networks<a class="headerlink" href="#id5" title="Permalink to this headline">¶</a></h2>
 <p>Deep neural networks are highly non-linear models, having their performance highly dependent on hyperparameters, random
 seeds and other (stochastic) factors. Therefore, comparing the means of two models across several runs might not be
 enough to decide if a model A is better than B. In fact, <strong>even aggregating more statistics like standard deviation, minimum
-or maximum might not be enough</strong> to make a decision. For this reason, Dror et al. (2019) introduced <em>Almost Stochastic
-Order</em> (ASO), a test to compare two score distributions.</p>
+or maximum might not be enough</strong> to make a decision. For this reason, del Barrio et al. (2017) and Dror et al. (2019)
+introduced <em>Almost Stochastic Order</em> (ASO), a test to compare two score distributions.</p>
 <p>It builds on the concept of <em>stochastic order</em>: We can compare two distributions and declare one as <em>stochastically dominant</em>
 by comparing their cumulative distribution functions:</p>
 <a class="reference external image-reference" href="img/so.png"><img alt="" src="_images/so.png" /></a>
 <p>Here, the CDF of A is given in red and in green for B. If the CDF of A is lower than B for every <span class="raw-html-m2r"><img src="332cc365a4987aacce0ead01b8bdcc0b.svg?invert_in_darkmode" align=middle width=9.39498779999999pt height=14.15524440000002pt/></span>, we know the
 algorithm A to score higher. However, in practice these cases are rarely so clear-cut (imagine e.g. two normal
 distributions with the same mean but different variances).
-For this reason, Dror et al. (2019) consider the notion of <em>almost stochastic dominance</em> by quantifying the extent to
-which stochastic order is being violated (red area):</p>
+For this reason, del Barrio et al. (2017) and Dror et al. (2019) consider the notion of <em>almost stochastic dominance</em>
+by quantifying the extent to which stochastic order is being violated (red area):</p>
 <a class="reference external image-reference" href="img/aso.png"><img alt="" src="_images/aso.png" /></a>
-<p>ASO returns a value <span class="raw-html-m2r"><img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/></span>, which expresses the amount of violation of stochastic order. If
-<span class="raw-html-m2r"><img src="dabed7f05cf133d9eb92631d564a96a8.svg?invert_in_darkmode" align=middle width=72.19750559999999pt height=21.18721440000001pt/></span>, A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as
+<p>ASO returns a value <span class="raw-html-m2r"><img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/></span>, which expresses (an upper bound to) the amount of violation of stochastic order. If
+<span class="raw-html-m2r"><img src="4cd4877610a47d915f39367760234822.svg?invert_in_darkmode" align=middle width=60.239714699999986pt height=17.723762100000005pt/></span> (where tau is 0.5 or less), A is stochastically dominant over B in more cases than vice versa, then the corresponding algorithm can be declared as
 superior. We can also interpret <span class="raw-html-m2r"><img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/></span> as a <em>confidence score</em>. The lower it is, the more sure we can be
 that A is better than B. Note: <strong>ASO does not compute p-values.</strong> Instead, the null hypothesis formulated as</p>
-<p align="center"><img src="69c5ac8ce10d0dbd0c2b915aaf0472c1.svg?invert_in_darkmode" align=middle width=106.93478895pt height=13.698590399999999pt/></p><p>If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5.
+<p align="center"><img src="06f5ff6214110287d3948e9b44e31a1f.svg?invert_in_darkmode" align=middle width=94.97699804999999pt height=13.698590399999999pt/></p><p>If we want to be more confident about the result of ASO, we can also set the rejection threshold to be lower than 0.5
+(see the discussion in <a class="reference external" href="#general-recommendations">this section</a>).
 Furthermore, the significance level <span class="raw-html-m2r"><img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/></span> is determined as an input argument when running ASO and actively influence
 the resulting <span class="raw-html-m2r"><img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/></span>.</p>
-</div>
-<div class="section" id="id6">
+</section>
+<section id="id6">
 <h2>Scenario 1 - Comparing multiple runs of two models<a class="headerlink" href="#id6" title="Permalink to this headline">¶</a></h2>
 <p>In the simplest scenario, we have retrieved a set of scores from a model A and a baseline B on a dataset, stemming from
 various model runs with different seeds. We want to test whether our model A is better than B (higher scores = better)-
@@ -273,12 +284,15 @@ <h2>Scenario 1 - Comparing multiple runs of two models<a class="headerlink" href
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 <span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">aso</span>
 
+<span class="n">seed</span> <span class="o">=</span> <span class="mi">1234</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
+
 <span class="c1"># Simulate scores</span>
 <span class="n">N</span> <span class="o">=</span> <span class="mi">5</span>  <span class="c1"># Number of random seeds</span>
 <span class="n">my_model_scores</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mf">0.9</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">N</span><span class="p">)</span>
 <span class="n">baseline_scores</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">N</span><span class="p">)</span>
 
-<span class="n">min_eps</span> <span class="o">=</span> <span class="n">aso</span><span class="p">(</span><span class="n">my_model_scores</span><span class="p">,</span> <span class="n">baseline_scores</span><span class="p">)</span>  <span class="c1"># min_eps = 0.0, so A is better</span>
+<span class="n">min_eps</span> <span class="o">=</span> <span class="n">aso</span><span class="p">(</span><span class="n">my_model_scores</span><span class="p">,</span> <span class="n">baseline_scores</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span>  <span class="c1"># min_eps = 0.225, so A is better</span>
 </pre></div>
 </div>
 <p>Note that ASO <strong>does not make any assumptions about the distributions of the scores</strong>.
@@ -286,8 +300,8 @@ <h2>Scenario 1 - Comparing multiple runs of two models<a class="headerlink" href
 (to apply ASO to cases where lower scores indicate better performances, just multiple your scores by -1 before feeding
 them into the function). The more scores of model runs is supplied, the more reliable
 the test becomes, so try to collect scores from as many runs as possible to reject the null hypothesis confidently.</p>
-</div>
-<div class="section" id="id7">
+</section>
+<section id="id7">
 <h2>Scenario 2 - Comparing multiple runs across datasets<a class="headerlink" href="#id7" title="Permalink to this headline">¶</a></h2>
 <p>When comparing models across datasets, we formulate one null hypothesis per dataset. However, we have to make sure not to
 fall prey to the <a class="reference external" href="https://en.wikipedia.org/wiki/Multiple_comparisons_problem">multiple comparisons problem</a>: In short,
@@ -297,6 +311,9 @@ <h2>Scenario 2 - Comparing multiple runs across datasets<a class="headerlink" hr
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 <span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">aso</span>
 
+<span class="n">seed</span> <span class="o">=</span> <span class="mi">1234</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
+
 <span class="c1"># Simulate scores for three datasets</span>
 <span class="n">M</span> <span class="o">=</span> <span class="mi">3</span>  <span class="c1"># Number of datasets</span>
 <span class="n">N</span> <span class="o">=</span> <span class="mi">5</span>  <span class="c1"># Number of random seeds</span>
@@ -304,12 +321,12 @@ <h2>Scenario 2 - Comparing multiple runs across datasets<a class="headerlink" hr
 <span class="n">baseline_scores_per_dataset</span>  <span class="o">=</span> <span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">N</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">M</span><span class="p">)]</span>
 
 <span class="c1"># epsilon_min values with Bonferroni correction</span>
-<span class="n">eps_min</span> <span class="o">=</span> <span class="p">[</span><span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.05</span> <span class="o">/</span> <span class="n">M</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">my_model_scores_per_dataset</span><span class="p">,</span> <span class="n">baseline_scores_per_dataset</span><span class="p">)]</span>
-<span class="c1"># eps_min = [0.1565800030782686, 1, 0.0]</span>
+<span class="n">eps_min</span> <span class="o">=</span> <span class="p">[</span><span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">num_comparisons</span><span class="o">=</span><span class="n">M</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">my_model_scores_per_dataset</span><span class="p">,</span> <span class="n">baseline_scores_per_dataset</span><span class="p">)]</span>
+<span class="c1"># eps_min = [0.006370113450148568, 0.6534772728574852, 0.0]</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="id8">
+</section>
+<section id="id8">
 <h2>Scenario 3 - Comparing sample-level scores<a class="headerlink" href="#id8" title="Permalink to this headline">¶</a></h2>
 <p>In previous examples, we have assumed that we compare two algorithms A and B based on their performance per run, i.e.
 we run each algorithm once per random seed and obtain exactly one score on our test set. In some cases however,
@@ -322,6 +339,9 @@ <h2>Scenario 3 - Comparing sample-level scores<a class="headerlink" href="#id8"
 <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 <span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">aso</span>
 
+<span class="n">seed</span> <span class="o">=</span> <span class="mi">1234</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
+
 <span class="c1"># Simulate scores for three datasets</span>
 <span class="n">M</span> <span class="o">=</span> <span class="mi">40</span>   <span class="c1"># Number of data points</span>
 <span class="n">N</span> <span class="o">=</span> <span class="mi">3</span>  <span class="c1"># Number of random seeds</span>
@@ -330,11 +350,13 @@ <h2>Scenario 3 - Comparing sample-level scores<a class="headerlink" href="#id8"
 <span class="n">pairs</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">product</span><span class="p">(</span><span class="n">my_model_scored_samples_per_run</span><span class="p">,</span> <span class="n">baseline_scored_samples_per_run</span><span class="p">))</span>
 
 <span class="c1"># epsilon_min values with Bonferroni correction</span>
-<span class="n">eps_min</span> <span class="o">=</span> <span class="p">[</span><span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.05</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">pairs</span><span class="p">))</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="n">pairs</span><span class="p">]</span>
+<span class="n">eps_min</span> <span class="o">=</span> <span class="p">[</span><span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">num_comparisons</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">pairs</span><span class="p">),</span> <span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="n">pairs</span><span class="p">]</span>
+<span class="c1"># eps_min = [0.3831678636198528, 0.07194780234194881, 0.9152792807128325, 0.5273463008857844, 0.14946944524461184, 1.0,</span>
+<span class="c1"># 0.6099543280369378, 0.22387448804041898, 1.0]</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="id9">
+</section>
+<section id="id9">
 <h2>Scenario 4 - Comparing more than two models<a class="headerlink" href="#id9" title="Permalink to this headline">¶</a></h2>
 <p>Similarly, when comparing multiple models (now again on a per-seed basis), we can use a similar approach like in the
 previous example. For instance, for three models, we can create a <span class="raw-html-m2r"><img src="9f2b6b0a7f3d99fd3f396a1515926eb3.svg?invert_in_darkmode" align=middle width=36.52961069999999pt height=21.18721440000001pt/></span> matrix and fill the entries
@@ -353,6 +375,9 @@ <h2>Scenario 4 - Comparing more than two models<a class="headerlink" href="#id9"
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 <span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">multi_aso</span>
 
+<span class="n">seed</span> <span class="o">=</span> <span class="mi">1234</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
+
 <span class="n">N</span> <span class="o">=</span> <span class="mi">5</span>  <span class="c1"># Number of random seeds</span>
 <span class="n">M</span> <span class="o">=</span> <span class="mi">3</span>  <span class="c1"># Number of different models / algorithms</span>
 
@@ -360,24 +385,26 @@ <h2>Scenario 4 - Comparing more than two models<a class="headerlink" href="#id9"
 <span class="c1"># Here, we will sample from N(0.1, 0.8), N(0.15, 0.8), N(0.2, 0.8)</span>
 <span class="n">my_models_scores</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="n">loc</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">N</span><span class="p">)</span> <span class="k">for</span> <span class="n">loc</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.1</span> <span class="o">+</span> <span class="mf">0.05</span> <span class="o">*</span> <span class="n">M</span><span class="p">,</span> <span class="n">step</span><span class="o">=</span><span class="mf">0.05</span><span class="p">)])</span>
 
-<span class="n">eps_min</span> <span class="o">=</span> <span class="n">multi_aso</span><span class="p">(</span><span class="n">my_models_scores</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.05</span><span class="p">)</span>
+<span class="n">eps_min</span> <span class="o">=</span> <span class="n">multi_aso</span><span class="p">(</span><span class="n">my_models_scores</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span>
 
 <span class="c1"># eps_min =</span>
-<span class="c1"># array([[1., 1., 1.],</span>
-<span class="c1">#        [0., 1., 1.],</span>
-<span class="c1">#        [0., 0., 1.]])</span>
+<span class="c1"># array([[1.       , 0.92621655, 1.        ],</span>
+<span class="c1">#       [1.        , 1.        , 1.        ],</span>
+<span class="c1">#       [0.82081635, 0.73048716, 1.        ]])</span>
 </pre></div>
 </div>
 <p>In the example, <code class="docutils literal notranslate"><span class="pre">eps_min</span></code> is now a matrix, containing the <span class="raw-html-m2r"><img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/></span> score between all pairs of models (for
 the same model, it set to 1 by default). The matrix is always to be read as ASO(row, column).</p>
 <p>The function applies the bonferroni correction for multiple comparisons by
-default, but this can be turned off by using <code class="docutils literal notranslate"><span class="pre">use_bonferroni=False</span></code>. In order to save compute, the above symmetry
-property is used as well, but this can also be disabled by <code class="docutils literal notranslate"><span class="pre">use_symmetry=False</span></code>.</p>
+default, but this can be turned off by using <code class="docutils literal notranslate"><span class="pre">use_bonferroni=False</span></code>.</p>
 <p>Lastly, when the <code class="docutils literal notranslate"><span class="pre">scores</span></code> argument is a dictionary and the function is called with <code class="docutils literal notranslate"><span class="pre">return_df=True</span></code>, the resulting matrix is
 given as a <code class="docutils literal notranslate"><span class="pre">pandas.DataFrame</span></code> for increased readability:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 <span class="kn">from</span> <span class="nn">deepsig</span> <span class="kn">import</span> <span class="n">multi_aso</span>
 
+<span class="n">seed</span> <span class="o">=</span> <span class="mi">1234</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
+
 <span class="n">N</span> <span class="o">=</span> <span class="mi">5</span>  <span class="c1"># Number of random seeds</span>
 <span class="n">M</span> <span class="o">=</span> <span class="mi">3</span>  <span class="c1"># Number of different models / algorithms</span>
 
@@ -393,18 +420,18 @@ <h2>Scenario 4 - Comparing more than two models<a class="headerlink" href="#id9"
 <span class="c1">#   ...</span>
 <span class="c1"># }</span>
 
-<span class="n">eps_min</span> <span class="o">=</span> <span class="n">multi_aso</span><span class="p">(</span><span class="n">my_models_scores</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.05</span><span class="p">,</span> <span class="n">return_df</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
+<span class="n">eps_min</span> <span class="o">=</span> <span class="n">multi_aso</span><span class="p">(</span><span class="n">my_models_scores</span><span class="p">,</span> <span class="n">confidence_level</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">return_df</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span>
 
 <span class="c1"># This is now a DataFrame!</span>
 <span class="c1"># eps_min =</span>
-<span class="c1">#           model 1   model 2  model 3</span>
-<span class="c1"># model 1       1.0       1.0      1.0</span>
-<span class="c1"># model 2       0.0       1.0      1.0</span>
-<span class="c1"># model 3       1.0       0.0      1.0</span>
+<span class="c1">#          model 1   model 2  model 3</span>
+<span class="c1"># model 1  1.000000  0.926217      1.0</span>
+<span class="c1"># model 2  1.000000  1.000000      1.0</span>
+<span class="c1"># model 3  0.820816  0.730487      1.0</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="newspaper-how-to-report-results">
+</section>
+<section id="newspaper-how-to-report-results">
 <h2>📰 How to report results<a class="headerlink" href="#newspaper-how-to-report-results" title="Permalink to this headline">¶</a></h2>
 <p>When ASO used, two important details have to be reported, namely the confidence level <span class="raw-html-m2r"><img src="c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode" align=middle width=10.57650494999999pt height=14.15524440000002pt/></span> and the <span class="raw-html-m2r"><img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/></span>
 score. Below lists some example snippets reporting the results of scenarios 1 and 4:</p>
@@ -413,11 +440,11 @@ <h2>📰 How to report results<a class="headerlink" href="#newspaper-how-to-repo
 
 We compared all pairs of models based on five random seeds each using ASO with a confidence level of
 $\alpha = 0.05$ (before adjusting for all pair-wise comparisons using the Bonferroni correction). Almost stochastic
-dominance ($\epsilon_\text{min} &lt; 0.5)$ is indicated in table X.
+dominance ($\epsilon_\text{min} &lt; \tau$ with $\tau = 0.2$) is indicated in table X.
 </pre></div>
 </div>
-</div>
-<div class="section" id="control-knobs-sample-size">
+</section>
+<section id="control-knobs-sample-size">
 <h2>🎛️ Sample size<a class="headerlink" href="#control-knobs-sample-size" title="Permalink to this headline">¶</a></h2>
 <p>It can be hard to determine whether the currently collected set of scores is large enough to allow for reliable
 significance testing or whether more scores are required. For this reason, <code class="docutils literal notranslate"><span class="pre">deep-significance</span></code> also implements functions to aid the decision of whether to
@@ -463,8 +490,8 @@ <h2>🎛️ Sample size<a class="headerlink" href="#control-knobs-sample-size" t
 <span class="c1"># But adding two runs to scores2 only increases tightness by 1.06! So spending two more runs on scores1 is better</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="sparkles-other-features">
+</section>
+<section id="sparkles-other-features">
 <h2>✨ Other features<a class="headerlink" href="#sparkles-other-features" title="Permalink to this headline">¶</a></h2>
 <p>Waiting for all the bootstrap iterations to finish can feel tedious, especially when doing many comparisons. Therefore,
 ASO supports multithreading using <code class="docutils literal notranslate"><span class="pre">joblib</span></code>
@@ -473,11 +500,11 @@ <h2>✨ Other features<a class="headerlink" href="#sparkles-other-features" titl
 <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 <span class="kn">from</span> <span class="nn">timeit</span> <span class="kn">import</span> <span class="n">timeit</span>
 
-<span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
-<span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
+<span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
+<span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
 
-<span class="nb">print</span><span class="p">(</span><span class="n">timeit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">num_jobs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">show_progress</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>  <span class="c1"># 146.6909574989986</span>
-<span class="nb">print</span><span class="p">(</span><span class="n">timeit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">num_jobs</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">show_progress</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>  <span class="c1"># 50.416724971000804</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">timeit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">num_jobs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">show_progress</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>  <span class="c1"># 393.6318126</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">timeit</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="n">aso</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">num_jobs</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">show_progress</span><span class="o">=</span><span class="kc">False</span><span class="p">),</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>  <span class="c1"># 139.73514621799995n</span>
 </pre></div>
 </div>
 <p>All tests implemented in this package also can take PyTorch / Tensorflow tensors and Jax or NumPy arrays as arguments:</p>
@@ -506,8 +533,8 @@ <h2>✨ Other features<a class="headerlink" href="#sparkles-other-features" titl
 <span class="nb">print</span><span class="p">(</span><span class="n">bootstrap_test</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">))</span>    <span class="c1"># 0.103</span>
 </pre></div>
 </div>
-</div>
-<div class="section" id="id12">
+</section>
+<section id="id12">
 <h2>General recommendations &amp; other notes<a class="headerlink" href="#id12" title="Permalink to this headline">¶</a></h2>
 <ul class="simple">
 <li><p>Naturally, the CDFs built from <code class="docutils literal notranslate"><span class="pre">scores_a</span></code> and <code class="docutils literal notranslate"><span class="pre">scores_b</span></code> can only be approximations of the true distributions. Therefore,
@@ -516,10 +543,13 @@ <h2>General recommendations &amp; other notes<a class="headerlink" href="#id12"
 <strong>always</strong> be preferable. Ideally, scores should be obtained even using different sets of hyperparameters per model.
 Because this is usually infeasible in practice, Bouthilier et al. (2020) recommend to <strong>vary all other sources of variation</strong>
 between runs to obtain the most trustworthy estimate of the “true” performance, such as data shuffling, weight initialization etc.</p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">num_samples</span></code> and <code class="docutils literal notranslate"><span class="pre">num_bootstrap_iterations</span></code> can be reduced to increase the speed of <code class="docutils literal notranslate"><span class="pre">aso()</span></code>. However, this is not
+<li><p><code class="docutils literal notranslate"><span class="pre">num_bootstrap_iterations</span></code> can be reduced to increase the speed of <code class="docutils literal notranslate"><span class="pre">aso()</span></code>. However, this is not
 recommended as the result of the test will also become less accurate. Technically, <span class="raw-html-m2r"><img src="70bcb72c245ba47b6fc7439da91ec6fc.svg?invert_in_darkmode" align=middle width=28.45332764999999pt height=14.15524440000002pt/></span> is a upper bound
 that becomes tighter with the number of samples and bootstrap iterations (del Barrio et al., 2017). Thus, increasing
 the number of jobs with <code class="docutils literal notranslate"><span class="pre">num_jobs</span></code> instead is always preferred.</p></li>
+<li><p>While we could declare a model stochastically dominant with <span class="raw-html-m2r"><img src="dabed7f05cf133d9eb92631d564a96a8.svg?invert_in_darkmode" align=middle width=72.19750559999999pt height=21.18721440000001pt/></span>, we found this to have a comparatively high
+Type I error (false positives). Tests <a class="reference external" href="https://arxiv.org/pdf/2204.06815.pdf">in our paper</a> have shown that a more useful threshold that trades of Type I and
+Type II error between different scenarios might be <span class="raw-html-m2r"><img src="9ac49cb370a5b09fca29068ea18eab63.svg?invert_in_darkmode" align=middle width=51.969107849999986pt height=21.18721440000001pt/></span>.</p></li>
 <li><p>Bootstrap and permutation-randomization are all non-parametric tests, i.e. they don’t make any assumptions about
 the distribution of our test metric. Nevertheless, they differ in their <em>statistical power</em>, which is defined as the probability
 that the null hypothesis is being rejected given that there is a difference between A and B. In other words, the more powerful
@@ -529,10 +559,19 @@ <h2>General recommendations &amp; other notes<a class="headerlink" href="#id12"
 because these test are in turn less applicable in a Deep Learning setting due to the reasons elaborated on in
 <a class="reference external" href="#interrobang-why">Why?</a>, ASO is still a better choice.</p></li>
 </ul>
-</div>
-<div class="section" id="id13">
+</section>
+<section id="id13">
 <h2>🎓 Cite<a class="headerlink" href="#id13" title="Permalink to this headline">¶</a></h2>
-<p>If you use the ASO test via <code class="docutils literal notranslate"><span class="pre">aso()</span></code>, please cite the original work:</p>
+<p>Using this package in general, please cite the following:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nd">@article</span><span class="p">{</span><span class="n">ulmer2022deep</span><span class="p">,</span>
+  <span class="n">title</span><span class="o">=</span><span class="p">{</span><span class="n">deep</span><span class="o">-</span><span class="n">significance</span><span class="o">-</span><span class="n">Easy</span> <span class="ow">and</span> <span class="n">Meaningful</span> <span class="n">Statistical</span> <span class="n">Significance</span> <span class="n">Testing</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">Age</span> <span class="n">of</span> <span class="n">Neural</span> <span class="n">Networks</span><span class="p">},</span>
+  <span class="n">author</span><span class="o">=</span><span class="p">{</span><span class="n">Ulmer</span><span class="p">,</span> <span class="n">Dennis</span> <span class="ow">and</span> <span class="n">Hardmeier</span><span class="p">,</span> <span class="n">Christian</span> <span class="ow">and</span> <span class="n">Frellsen</span><span class="p">,</span> <span class="n">Jes</span><span class="p">},</span>
+  <span class="n">journal</span><span class="o">=</span><span class="p">{</span><span class="n">arXiv</span> <span class="n">preprint</span> <span class="n">arXiv</span><span class="p">:</span><span class="mf">2204.06815</span><span class="p">},</span>
+  <span class="n">year</span><span class="o">=</span><span class="p">{</span><span class="mi">2022</span><span class="p">}</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>If you use the ASO test via <code class="docutils literal notranslate"><span class="pre">aso()</span></code> or <a href="#id14"><span class="problematic" id="id15">`</span></a>multi_aso, please cite the original works:</p>
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nd">@inproceedings</span><span class="p">{</span><span class="n">dror2019deep</span><span class="p">,</span>
   <span class="n">author</span>    <span class="o">=</span> <span class="p">{</span><span class="n">Rotem</span> <span class="n">Dror</span> <span class="ow">and</span>
                <span class="n">Segev</span> <span class="n">Shlomov</span> <span class="ow">and</span>
@@ -551,26 +590,25 @@ <h2>🎓 Cite<a class="headerlink" href="#id13" title="Permalink to this headlin
   <span class="n">doi</span>       <span class="o">=</span> <span class="p">{</span><span class="mf">10.18653</span><span class="o">/</span><span class="n">v1</span><span class="o">/</span><span class="n">p19</span><span class="o">-</span><span class="mi">1266</span><span class="p">},</span>
   <span class="n">timestamp</span> <span class="o">=</span> <span class="p">{</span><span class="n">Tue</span><span class="p">,</span> <span class="mi">28</span> <span class="n">Jan</span> <span class="mi">2020</span> <span class="mi">10</span><span class="p">:</span><span class="mi">27</span><span class="p">:</span><span class="mi">52</span> <span class="o">+</span><span class="mi">0100</span><span class="p">},</span>
 <span class="p">}</span>
-</pre></div>
-</div>
-<p>Using this package in general, please cite the following:</p>
-<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nd">@software</span><span class="p">{</span><span class="n">dennis_ulmer_2021_4638709</span><span class="p">,</span>
-  <span class="n">author</span>       <span class="o">=</span> <span class="p">{</span><span class="n">Dennis</span> <span class="n">Ulmer</span><span class="p">},</span>
-  <span class="n">title</span>        <span class="o">=</span> <span class="p">{{</span><span class="n">deep</span><span class="o">-</span><span class="n">significance</span><span class="p">:</span> <span class="n">Easy</span> <span class="ow">and</span> <span class="n">Better</span> <span class="n">Significance</span>
-                   <span class="n">Testing</span> <span class="k">for</span> <span class="n">Deep</span> <span class="n">Neural</span> <span class="n">Networks</span><span class="p">}},</span>
-  <span class="n">month</span>        <span class="o">=</span> <span class="n">mar</span><span class="p">,</span>
-  <span class="n">year</span>         <span class="o">=</span> <span class="mi">2021</span><span class="p">,</span>
-  <span class="n">note</span>         <span class="o">=</span> <span class="p">{</span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">Kaleidophon</span><span class="o">/</span><span class="n">deep</span><span class="o">-</span><span class="n">significance</span><span class="p">},</span>
-  <span class="n">publisher</span>    <span class="o">=</span> <span class="p">{</span><span class="n">Zenodo</span><span class="p">},</span>
-  <span class="n">version</span>      <span class="o">=</span> <span class="p">{</span><span class="n">v1</span><span class="o">.</span><span class="mf">0.0</span><span class="n">a</span><span class="p">},</span>
-  <span class="n">doi</span>          <span class="o">=</span> <span class="p">{</span><span class="mf">10.5281</span><span class="o">/</span><span class="n">zenodo</span><span class="o">.</span><span class="mi">4638709</span><span class="p">},</span>
-  <span class="n">url</span>          <span class="o">=</span> <span class="p">{</span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">doi</span><span class="o">.</span><span class="n">org</span><span class="o">/</span><span class="mf">10.5281</span><span class="o">/</span><span class="n">zenodo</span><span class="o">.</span><span class="mi">4638709</span><span class="p">}</span>
+
+<span class="nd">@incollection</span><span class="p">{</span><span class="n">del2018optimal</span><span class="p">,</span>
+  <span class="n">title</span><span class="o">=</span><span class="p">{</span><span class="n">An</span> <span class="n">optimal</span> <span class="n">transportation</span> <span class="n">approach</span> <span class="k">for</span> <span class="n">assessing</span> <span class="n">almost</span> <span class="n">stochastic</span> <span class="n">order</span><span class="p">},</span>
+  <span class="n">author</span><span class="o">=</span><span class="p">{</span><span class="n">Del</span> <span class="n">Barrio</span><span class="p">,</span> <span class="n">Eustasio</span> <span class="ow">and</span> <span class="n">Cuesta</span><span class="o">-</span><span class="n">Albertos</span><span class="p">,</span> <span class="n">Juan</span> <span class="n">A</span> <span class="ow">and</span> <span class="n">Matr</span><span class="p">{</span>\<span class="s1">&#39;a}n, Carlos},</span>
+  <span class="n">booktitle</span><span class="o">=</span><span class="p">{</span><span class="n">The</span> <span class="n">Mathematics</span> <span class="n">of</span> <span class="n">the</span> <span class="n">Uncertain</span><span class="p">},</span>
+  <span class="n">pages</span><span class="o">=</span><span class="p">{</span><span class="mi">33</span><span class="o">--</span><span class="mi">44</span><span class="p">},</span>
+  <span class="n">year</span><span class="o">=</span><span class="p">{</span><span class="mi">2018</span><span class="p">},</span>
+  <span class="n">publisher</span><span class="o">=</span><span class="p">{</span><span class="n">Springer</span><span class="p">}</span>
 <span class="p">}</span>
 </pre></div>
 </div>
+<p>For instance, you can write</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">In</span> <span class="n">order</span> <span class="n">to</span> <span class="n">compare</span> <span class="n">models</span><span class="p">,</span> <span class="n">we</span> <span class="n">use</span> <span class="n">the</span> <span class="n">Almost</span> <span class="n">Stochastic</span> <span class="n">Order</span> <span class="n">test</span> \<span class="n">citep</span><span class="p">{</span><span class="n">del2018optimal</span><span class="p">,</span> <span class="n">dror2019deep</span><span class="p">}</span> <span class="k">as</span>
+<span class="n">implemented</span> <span class="n">by</span> \<span class="n">citet</span><span class="p">{</span><span class="n">ulmer2022deep</span><span class="p">}</span><span class="o">.</span>
+</pre></div>
 </div>
-<div class="section" id="id14">
-<h2>🏅 Acknowledgements<a class="headerlink" href="#id14" title="Permalink to this headline">¶</a></h2>
+</section>
+<section id="id16">
+<h2>🏅 Acknowledgements<a class="headerlink" href="#id16" title="Permalink to this headline">¶</a></h2>
 <p>This package was created out of discussions of the <a class="reference external" href="https://nlpnorth.github.io/">NLPnorth group</a> at the IT University
 Copenhagen, whose members I want to thank for their feedback. The code in this repository is in multiple places based on
 several of <a class="reference external" href="https://rtmdrr.github.io/">Rotem Dror’s</a> repositories, namely
@@ -579,18 +617,18 @@ <h2>🏅 Acknowledgements<a class="headerlink" href="#id14" title="Permalink to
 answer questions and provide feedback to the implementation and documentation of this package.</p>
 <p>The commit message template used in this project can be found <a class="reference external" href="https://github.com/Kaleidophon/commit-template-for-humans">here</a>.
 The inline latex equations were rendered using <a class="reference external" href="https://github.com/leegao/readme2tex">readme2latex</a>.</p>
-</div>
-<div class="section" id="id17">
-<h2>🧑‍🤝‍🧑 Papers using deep-significance<a class="headerlink" href="#id17" title="Permalink to this headline">¶</a></h2>
+</section>
+<section id="id19">
+<h2>🧑‍🤝‍🧑 Papers using deep-significance<a class="headerlink" href="#id19" title="Permalink to this headline">¶</a></h2>
 <p>In this last section of the readme, I would like to refer to works already using <code class="docutils literal notranslate"><span class="pre">deep-significance</span></code>. Open an issue or
 pull request if you would like to see your work added here!</p>
 <ul class="simple">
 <li><p><a class="reference external" href="https://robvanderg.github.io/doc/naacl2021.pdf">“From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding” (van der Groot et al., 2021)</a></p></li>
 <li><p><a class="reference external" href="https://arxiv.org/pdf/2109.04282.pdf">“Cartography Active Learning” (Zhang &amp; Plank, 2021)</a></p></li>
 </ul>
-</div>
-<div class="section" id="id18">
-<h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this headline">¶</a></h2>
+</section>
+<section id="id20">
+<h2>📚 Bibliography<a class="headerlink" href="#id20" title="Permalink to this headline">¶</a></h2>
 <p>Del Barrio, Eustasio, Juan A. Cuesta-Albertos, and Carlos Matrán. “An optimal transportation approach for assessing almost stochastic order.” The Mathematics of the Uncertain. Springer, Cham, 2018. 33-44.</p>
 <p>Bonferroni, Carlo. “Teoria statistica delle classi e calcolo delle probabilita.” Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze 8 (1936): 3-62.</p>
 <p>Borji, Ali. “Negative results in computer vision: A perspective.” Image and Vision Computing 69 (2018): 1-8.</p>
@@ -598,20 +636,24 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 <p>Dror, Rotem, et al. “The hitchhiker’s guide to testing statistical significance in natural language processing.” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018.</p>
 <p>Dror, Rotem, Shlomov, Segev, and Reichart, Roi. “Deep dominance-how to properly compare deep neural models.” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019.</p>
 <p>Efron, Bradley, and Robert J. Tibshirani. “An introduction to the bootstrap.” CRC press, 1994.</p>
+<p>Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, Donald B Rubin, John
+Carlin, Hal Stern, Donald Rubin, and David Dunson. Bayesian data analysis third edition, 2021.</p>
 <p>Henderson, Peter, et al. “Deep reinforcement learning that matters.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. No. 1. 2018.</p>
 <p>Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. “Visualizing the Loss Landscape of Neural Nets.” NeurIPS 2018: 6391-6401</p>
 <p>Narang, Sharan, et al. “Do Transformer Modifications Transfer Across Implementations and Applications?.” arXiv preprint arXiv:2102.11972 (2021).</p>
 <p>Noreen, Eric W. “Computer intensive methods for hypothesis testing: An introduction.” Wiley, New York (1989).</p>
+<p>Ronald L Wasserstein, Allen L Schirm, and Nicole A Lazar. Moving to a world beyond “p&lt; 0.05”,
+2019</p>
 <p>Yuan, Ke‐Hai, and Kentaro Hayashi. “Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models.” British Journal of Mathematical and Statistical Psychology 56.1 (2003): 93-110.</p>
-</div>
-</div>
-<div class="section" id="module-deepsig">
+</section>
+</section>
+<section id="module-deepsig">
 <span id="documentation"></span><h1>Documentation<a class="headerlink" href="#module-deepsig" title="Permalink to this headline">¶</a></h1>
 <span class="target" id="module-deepsig.aso"></span><p>Re-implementation of Almost Stochastic Order (ASO) by <a class="reference external" href="https://arxiv.org/pdf/2010.03039.pdf">Dror et al. (2019)</a>.
 The code here heavily borrows from their <a class="reference external" href="https://github.com/rtmdrr/DeepComparison">original code base</a>.</p>
 <dl class="py function">
-<dt id="deepsig.aso.aso">
-<code class="sig-prename descclassname">deepsig.aso.</code><code class="sig-name descname">aso</code><span class="sig-paren">(</span><em class="sig-param"><span class="n">scores_a</span><span class="p">:</span> <span class="n">Union<span class="p">[</span>tensorflow.python.framework.ops.EagerTensor<span class="p">, </span>tensorflow.python.framework.ops.Tensor<span class="p">, </span>torch.Tensor<span class="p">, </span>torch.LongTensor<span class="p">, </span>torch.FloatTensor<span class="p">, </span>List<span class="p">[</span>float<span class="p">]</span><span class="p">, </span>numpy.array<span class="p">]</span></span></em>, <em class="sig-param"><span class="n">scores_b</span><span class="p">:</span> <span class="n">Union<span class="p">[</span>tensorflow.python.framework.ops.EagerTensor<span class="p">, </span>tensorflow.python.framework.ops.Tensor<span class="p">, </span>torch.Tensor<span class="p">, </span>torch.LongTensor<span class="p">, </span>torch.FloatTensor<span class="p">, </span>List<span class="p">[</span>float<span class="p">]</span><span class="p">, </span>numpy.array<span class="p">]</span></span></em>, <em class="sig-param"><span class="n">confidence_level</span><span class="p">:</span> <span class="n">float</span> <span class="o">=</span> <span class="default_value">0.05</span></em>, <em class="sig-param"><span class="n">num_samples</span><span class="p">:</span> <span class="n">int</span> <span class="o">=</span> <span class="default_value">1000</span></em>, <em class="sig-param"><span class="n">num_bootstrap_iterations</span><span class="p">:</span> <span class="n">int</span> <span class="o">=</span> <span class="default_value">1000</span></em>, <em class="sig-param"><span class="n">dt</span><span class="p">:</span> <span class="n">float</span> <span class="o">=</span> <span class="default_value">0.005</span></em>, <em class="sig-param"><span class="n">num_jobs</span><span class="p">:</span> <span class="n">int</span> <span class="o">=</span> <span class="default_value">1</span></em>, <em class="sig-param"><span class="n">show_progress</span><span class="p">:</span> <span class="n">bool</span> <span class="o">=</span> <span class="default_value">True</span></em>, <em class="sig-param"><span class="n">seed</span><span class="p">:</span> <span class="n">Optional<span class="p">[</span>int<span class="p">]</span></span> <span class="o">=</span> <span class="default_value">None</span></em>, <em class="sig-param"><span class="n">_progress_bar</span><span class="p">:</span> <span class="n">Optional<span class="p">[</span>tqdm.std.tqdm<span class="p">]</span></span> <span class="o">=</span> <span class="default_value">None</span></em><span class="sig-paren">)</span> &#x2192; float<a class="headerlink" href="#deepsig.aso.aso" title="Permalink to this definition">¶</a></dt>
+<dt class="sig sig-object py" id="deepsig.aso.aso">
+<span class="sig-prename descclassname"><span class="pre">deepsig.aso.</span></span><span class="sig-name descname"><span class="pre">aso</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">scores_a</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">jax.interpreters.xla._DeviceArray</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.Tensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.LongTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.FloatTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">scores_b</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">jax.interpreters.xla._DeviceArray</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.Tensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.LongTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.FloatTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">confidence_level</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">float</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">0.95</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_comparisons</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_samples</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1000</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_bootstrap_iterations</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1000</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dt</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">float</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">0.005</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_jobs</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">show_progress</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">bool</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">_progress_bar</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">tqdm.std.tqdm</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">float</span></span></span><a class="headerlink" href="#deepsig.aso.aso" title="Permalink to this definition">¶</a></dt>
 <dd><p>Performs the Almost Stochastic Order test by Dror et al. (2019). The function takes two list of scores as input
 (they do not have to be of the same length) and returns an upper bound to the violation ratio - the minimum epsilon
 threshold. <cite>scores_a</cite> should contain scores of the algorithm which we suspect to be better (in this setup,
@@ -628,7 +670,9 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 </dd>
 <dt><strong>scores_b: List[float]</strong></dt><dd><p>Scores of algorithm B.</p>
 </dd>
-<dt><strong>confidence_level: float</strong></dt><dd><p>Desired confidence level of test. Set to 0.05 by default.</p>
+<dt><strong>confidence_level: float</strong></dt><dd><p>Desired confidence level of test. Set to 0.95 by default.</p>
+</dd>
+<dt><strong>num_comparisons: int</strong></dt><dd><p>Number of comparisons that the test is being used for. Is used to perform a Bonferroni correction.</p>
 </dd>
 <dt><strong>num_samples: int</strong></dt><dd><p>Number of samples from the score distributions during every bootstrap iteration when estimating sigma.</p>
 </dd>
@@ -656,23 +700,65 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 </dd></dl>
 
 <dl class="py function">
-<dt id="deepsig.aso.compute_violation_ratio">
-<code class="sig-prename descclassname">deepsig.aso.</code><code class="sig-name descname">compute_violation_ratio</code><span class="sig-paren">(</span><em class="sig-param"><span class="n">scores_a</span><span class="p">:</span> <span class="n">numpy.array</span></em>, <em class="sig-param"><span class="n">scores_b</span><span class="p">:</span> <span class="n">numpy.array</span></em>, <em class="sig-param"><span class="n">dt</span><span class="p">:</span> <span class="n">float</span></em><span class="sig-paren">)</span> &#x2192; float<a class="headerlink" href="#deepsig.aso.compute_violation_ratio" title="Permalink to this definition">¶</a></dt>
+<dt class="sig sig-object py" id="deepsig.aso.compute_violation_ratio">
+<span class="sig-prename descclassname"><span class="pre">deepsig.aso.</span></span><span class="sig-name descname"><span class="pre">compute_violation_ratio</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">scores_a</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">scores_b</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quantile_func_a</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">Callable</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quantile_func_b</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">Callable</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dt</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">float</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">0.001</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">float</span></span></span><a class="headerlink" href="#deepsig.aso.compute_violation_ratio" title="Permalink to this definition">¶</a></dt>
 <dd><p>Compute the violation ration e_W2 (equation 4 + 5).</p>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters</dt>
 <dd class="field-odd"><dl class="simple">
+<dt><strong>scores_a:  Optional[np.array]</strong></dt><dd><p>Scores of algorithm A.</p>
+</dd>
+<dt><strong>scores_b:  Optional[np.array]</strong></dt><dd><p>Scores of algorithm B.</p>
+</dd>
+<dt><strong>dt: float</strong></dt><dd><p>Differential for t during integral calculation.</p>
+</dd>
+<dt><strong>quantile_func_a: Optional[Callable]</strong></dt><dd><p>Quantile function based on the first set of scores.</p>
+</dd>
+<dt><strong>quantile_func_b: Optional[Callable]</strong></dt><dd><p>Quantile function based on the second set of scores.</p>
+</dd>
+</dl>
+</dd>
+<dt class="field-even">Returns</dt>
+<dd class="field-even"><dl class="simple">
+<dt>float</dt><dd><p>Return violation ratio.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+</dd></dl>
+
+<dl class="py function">
+<dt class="sig sig-object py" id="deepsig.aso.get_bootstrapped_violation_ratios">
+<span class="sig-prename descclassname"><span class="pre">deepsig.aso.</span></span><span class="sig-name descname"><span class="pre">get_bootstrapped_violation_ratios</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">scores_a</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">jax.interpreters.xla._DeviceArray</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.Tensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.LongTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.FloatTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">scores_b</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">jax.interpreters.xla._DeviceArray</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.Tensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.LongTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.FloatTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quantile_func_a</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Callable</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quantile_func_b</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Callable</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_bootstrap_iterations</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dt</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">float</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_jobs</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">show_progress</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">bool</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">_progress_bar</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">tqdm.std.tqdm</span><span class="p"><span class="pre">]</span></span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#deepsig.aso.get_bootstrapped_violation_ratios" title="Permalink to this definition">¶</a></dt>
+<dd><p>Retrieve violation ratios computed based on a number of bootstrap samples.</p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters</dt>
+<dd class="field-odd"><dl class="simple">
 <dt><strong>scores_a: List[float]</strong></dt><dd><p>Scores of algorithm A.</p>
 </dd>
 <dt><strong>scores_b: List[float]</strong></dt><dd><p>Scores of algorithm B.</p>
 </dd>
+<dt><strong>quantile_func_a: Callable</strong></dt><dd><p>Quantile function based on the first set of scores.</p>
+</dd>
+<dt><strong>quantile_func_b: Callable</strong></dt><dd><p>Quantile function based on the second set of scores.</p>
+</dd>
+<dt><strong>num_bootstrap_iterations: int</strong></dt><dd><p>Number of bootstrap iterations when estimating sigma.</p>
+</dd>
 <dt><strong>dt: float</strong></dt><dd><p>Differential for t during integral calculation.</p>
 </dd>
+<dt><strong>num_jobs: int</strong></dt><dd><p>Number of threads that bootstrap iterations are divided among.</p>
+</dd>
+<dt><strong>show_progress: bool</strong></dt><dd><p>Show progress bar. Default is True.</p>
+</dd>
+<dt><strong>seed: Optional[int]</strong></dt><dd><p>Set seed for reproducibility purposes. Default is None (meaning no seed is used).</p>
+</dd>
+<dt><strong>_progress_bar: Optional[tqdm]</strong></dt><dd><p>Hands over a progress bar object when called by multi_aso(). Only for internal use.</p>
+</dd>
 </dl>
 </dd>
 <dt class="field-even">Returns</dt>
 <dd class="field-even"><dl class="simple">
-<dt>float</dt><dd><p>Return violation ratio.</p>
+<dt>List[float]</dt><dd><p>Bootstrapped violation ratios.</p>
 </dd>
 </dl>
 </dd>
@@ -680,8 +766,8 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 </dd></dl>
 
 <dl class="py function">
-<dt id="deepsig.aso.get_quantile_function">
-<code class="sig-prename descclassname">deepsig.aso.</code><code class="sig-name descname">get_quantile_function</code><span class="sig-paren">(</span><em class="sig-param"><span class="n">scores</span><span class="p">:</span> <span class="n">numpy.array</span></em><span class="sig-paren">)</span> &#x2192; Callable<a class="headerlink" href="#deepsig.aso.get_quantile_function" title="Permalink to this definition">¶</a></dt>
+<dt class="sig sig-object py" id="deepsig.aso.get_quantile_function">
+<span class="sig-prename descclassname"><span class="pre">deepsig.aso.</span></span><span class="sig-name descname"><span class="pre">get_quantile_function</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">scores</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">numpy.array</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">Callable</span></span></span><a class="headerlink" href="#deepsig.aso.get_quantile_function" title="Permalink to this definition">¶</a></dt>
 <dd><p>Return the quantile function corresponding to an empirical distribution of scores.</p>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters</dt>
@@ -700,8 +786,8 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 </dd></dl>
 
 <dl class="py function">
-<dt id="deepsig.aso.multi_aso">
-<code class="sig-prename descclassname">deepsig.aso.</code><code class="sig-name descname">multi_aso</code><span class="sig-paren">(</span><em class="sig-param"><span class="n">scores</span><span class="p">:</span> <span class="n">Union<span class="p">[</span>Dict<span class="p">[</span>str<span class="p">, </span>List<span class="p">[</span>float<span class="p">]</span><span class="p">]</span><span class="p">, </span>Dict<span class="p">[</span>str<span class="p">, </span>numpy.array<span class="p">]</span><span class="p">, </span>numpy.array<span class="p">, </span>List<span class="p">[</span>List<span class="p">[</span>float<span class="p">]</span><span class="p">]</span><span class="p">]</span></span></em>, <em class="sig-param"><span class="n">confidence_level</span><span class="p">:</span> <span class="n">float</span> <span class="o">=</span> <span class="default_value">0.05</span></em>, <em class="sig-param"><span class="n">use_bonferroni</span><span class="p">:</span> <span class="n">bool</span> <span class="o">=</span> <span class="default_value">True</span></em>, <em class="sig-param"><span class="n">use_symmetry</span><span class="p">:</span> <span class="n">bool</span> <span class="o">=</span> <span class="default_value">True</span></em>, <em class="sig-param"><span class="n">num_samples</span><span class="p">:</span> <span class="n">int</span> <span class="o">=</span> <span class="default_value">1000</span></em>, <em class="sig-param"><span class="n">num_bootstrap_iterations</span><span class="p">:</span> <span class="n">int</span> <span class="o">=</span> <span class="default_value">1000</span></em>, <em class="sig-param"><span class="n">dt</span><span class="p">:</span> <span class="n">float</span> <span class="o">=</span> <span class="default_value">0.005</span></em>, <em class="sig-param"><span class="n">num_jobs</span><span class="p">:</span> <span class="n">int</span> <span class="o">=</span> <span class="default_value">1</span></em>, <em class="sig-param"><span class="n">return_df</span><span class="p">:</span> <span class="n">bool</span> <span class="o">=</span> <span class="default_value">False</span></em>, <em class="sig-param"><span class="n">show_progress</span><span class="p">:</span> <span class="n">bool</span> <span class="o">=</span> <span class="default_value">True</span></em>, <em class="sig-param"><span class="n">seed</span><span class="p">:</span> <span class="n">Optional<span class="p">[</span>int<span class="p">]</span></span> <span class="o">=</span> <span class="default_value">None</span></em><span class="sig-paren">)</span> &#x2192; Union<span class="p">[</span>numpy.array<span class="p">, </span>pandas.core.frame.DataFrame<span class="p">]</span><a class="headerlink" href="#deepsig.aso.multi_aso" title="Permalink to this definition">¶</a></dt>
+<dt class="sig sig-object py" id="deepsig.aso.multi_aso">
+<span class="sig-prename descclassname"><span class="pre">deepsig.aso.</span></span><span class="sig-name descname"><span class="pre">multi_aso</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">scores</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">Dict</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">Dict</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">confidence_level</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">float</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">0.95</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">use_bonferroni</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">bool</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">use_symmetry</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">bool</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_samples</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1000</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_bootstrap_iterations</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1000</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dt</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">float</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">0.005</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_jobs</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">return_df</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">bool</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">show_progress</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">bool</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">numpy.array</span><span class="p"><span class="pre">,</span> </span><span class="pre">pandas.core.frame.DataFrame</span><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#deepsig.aso.multi_aso" title="Permalink to this definition">¶</a></dt>
 <dd><p>Provides easy function to compare the scores of multiple models at ones. Scores can be supplied in various forms
 (dictionary, nested list, 2D arrays or tensors). Returns a matrix (or pandas.DataFrame) with results. Applies
 Bonferroni correction to confidence level by default, but can be disabled by use_bonferroni=False.</p>
@@ -711,7 +797,7 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 <dt><strong>scores: ScoreCollection</strong></dt><dd><p>Collection of model scores. Should be either dictionary of model name to model scores, nested Python list,
 2D numpy or Jax array, or 2D Tensorflow or PyTorch tensor.</p>
 </dd>
-<dt><strong>confidence_level: float</strong></dt><dd><p>Desired confidence level of test. Set to 0.05 by default.</p>
+<dt><strong>confidence_level: float</strong></dt><dd><p>Desired confidence level of test. Set to 0.95 by default.</p>
 </dd>
 <dt><strong>use_bonferroni: bool</strong></dt><dd><p>Indicate whether Bonferroni correction should be applied to confidence level in order to adjust for the number
 of comparisons. Default is True.</p>
@@ -749,8 +835,8 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 <span class="target" id="module-deepsig.bootstrap"></span><p>Implementation of paired bootstrap test
 <a class="reference external" href="https://cds.cern.ch/record/526679/files/0412042312_TOC.pdf">(Efron &amp; Tibshirani, 1994)</a>.</p>
 <dl class="py function">
-<dt id="deepsig.bootstrap.bootstrap_test">
-<code class="sig-prename descclassname">deepsig.bootstrap.</code><code class="sig-name descname">bootstrap_test</code><span class="sig-paren">(</span><em class="sig-param"><span class="n">scores_a</span><span class="p">:</span> <span class="n">Union<span class="p">[</span>tensorflow.python.framework.ops.EagerTensor<span class="p">, </span>tensorflow.python.framework.ops.Tensor<span class="p">, </span>torch.Tensor<span class="p">, </span>torch.LongTensor<span class="p">, </span>torch.FloatTensor<span class="p">, </span>List<span class="p">[</span>float<span class="p">]</span><span class="p">, </span>numpy.array<span class="p">]</span></span></em>, <em class="sig-param"><span class="n">scores_b</span><span class="p">:</span> <span class="n">Union<span class="p">[</span>tensorflow.python.framework.ops.EagerTensor<span class="p">, </span>tensorflow.python.framework.ops.Tensor<span class="p">, </span>torch.Tensor<span class="p">, </span>torch.LongTensor<span class="p">, </span>torch.FloatTensor<span class="p">, </span>List<span class="p">[</span>float<span class="p">]</span><span class="p">, </span>numpy.array<span class="p">]</span></span></em>, <em class="sig-param"><span class="n">num_samples</span><span class="p">:</span> <span class="n">int</span> <span class="o">=</span> <span class="default_value">1000</span></em><span class="sig-paren">)</span> &#x2192; float<a class="headerlink" href="#deepsig.bootstrap.bootstrap_test" title="Permalink to this definition">¶</a></dt>
+<dt class="sig sig-object py" id="deepsig.bootstrap.bootstrap_test">
+<span class="sig-prename descclassname"><span class="pre">deepsig.bootstrap.</span></span><span class="sig-name descname"><span class="pre">bootstrap_test</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">scores_a</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">jax.interpreters.xla._DeviceArray</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.Tensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.LongTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.FloatTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">scores_b</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">jax.interpreters.xla._DeviceArray</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.Tensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.LongTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.FloatTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_samples</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1000</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_jobs</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">float</span></span></span><a class="headerlink" href="#deepsig.bootstrap.bootstrap_test" title="Permalink to this definition">¶</a></dt>
 <dd><p>Implementation of paired bootstrap test. A p-value is being estimated by comparing the mean of scores
 for two algorithms to the means of resampled populations, where <cite>num_samples</cite> determines the number of
 times we resample.</p>
@@ -761,10 +847,14 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 <dd class="field-odd"><dl class="simple">
 <dt><strong>scores_a: ArrayLike</strong></dt><dd><p>Scores of algorithm A.</p>
 </dd>
-<dt><strong>scores_b: ArrrayLike</strong></dt><dd><p>Scores of algorithm B.</p>
+<dt><strong>scores_b: ArrayLike</strong></dt><dd><p>Scores of algorithm B.</p>
 </dd>
 <dt><strong>num_samples: int</strong></dt><dd><p>Number of bootstrap samples used for estimation.</p>
 </dd>
+<dt><strong>num_jobs: int</strong></dt><dd><p>Number of threads that bootstrap iterations are divided among.</p>
+</dd>
+<dt><strong>seed: Optional[int]</strong></dt><dd><p>Set seed for reproducibility purposes. Default is None (meaning no seed is used).</p>
+</dd>
 </dl>
 </dd>
 <dt class="field-even">Returns</dt>
@@ -781,8 +871,8 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 <a class="reference external" href="https://github.com/rtmdrr/replicability-analysis-NLP">this codebase</a> corresponding to the
 <a class="reference external" href="https://arxiv.org/abs/1709.09500">Dror et al. (2017)</a> publication.</p>
 <dl class="py function">
-<dt id="deepsig.correction.bonferroni_correction">
-<code class="sig-prename descclassname">deepsig.correction.</code><code class="sig-name descname">bonferroni_correction</code><span class="sig-paren">(</span><em class="sig-param"><span class="n">p_values</span><span class="p">:</span> <span class="n">Union<span class="p">[</span>tensorflow.python.framework.ops.EagerTensor<span class="p">, </span>tensorflow.python.framework.ops.Tensor<span class="p">, </span>torch.Tensor<span class="p">, </span>torch.LongTensor<span class="p">, </span>torch.FloatTensor<span class="p">, </span>List<span class="p">[</span>float<span class="p">]</span><span class="p">, </span>numpy.array<span class="p">]</span></span></em><span class="sig-paren">)</span> &#x2192; numpy.array<a class="headerlink" href="#deepsig.correction.bonferroni_correction" title="Permalink to this definition">¶</a></dt>
+<dt class="sig sig-object py" id="deepsig.correction.bonferroni_correction">
+<span class="sig-prename descclassname"><span class="pre">deepsig.correction.</span></span><span class="sig-name descname"><span class="pre">bonferroni_correction</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">p_values</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">jax.interpreters.xla._DeviceArray</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.Tensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.LongTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.FloatTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">numpy.array</span></span></span><a class="headerlink" href="#deepsig.correction.bonferroni_correction" title="Permalink to this definition">¶</a></dt>
 <dd><p>Correct for multiple comparisons based on Bonferroni’s method.</p>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters</dt>
@@ -801,8 +891,8 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 </dd></dl>
 
 <dl class="py function">
-<dt id="deepsig.correction.calculate_partial_conjunction">
-<code class="sig-prename descclassname">deepsig.correction.</code><code class="sig-name descname">calculate_partial_conjunction</code><span class="sig-paren">(</span><em class="sig-param"><span class="n">sorted_p_values</span><span class="p">:</span> <span class="n">numpy.array</span></em>, <em class="sig-param"><span class="n">u</span><span class="p">:</span> <span class="n">int</span></em><span class="sig-paren">)</span> &#x2192; float<a class="headerlink" href="#deepsig.correction.calculate_partial_conjunction" title="Permalink to this definition">¶</a></dt>
+<dt class="sig sig-object py" id="deepsig.correction.calculate_partial_conjunction">
+<span class="sig-prename descclassname"><span class="pre">deepsig.correction.</span></span><span class="sig-name descname"><span class="pre">calculate_partial_conjunction</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">sorted_p_values</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">numpy.array</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">u</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">float</span></span></span><a class="headerlink" href="#deepsig.correction.calculate_partial_conjunction" title="Permalink to this definition">¶</a></dt>
 <dd><p>Calculate the partial conjunction p-value for u out of N.</p>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters</dt>
@@ -824,8 +914,8 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 
 <span class="target" id="module-deepsig.permutation"></span><p>Implementation of paired sign test.</p>
 <dl class="py function">
-<dt id="deepsig.permutation.permutation_test">
-<code class="sig-prename descclassname">deepsig.permutation.</code><code class="sig-name descname">permutation_test</code><span class="sig-paren">(</span><em class="sig-param"><span class="n">scores_a</span><span class="p">:</span> <span class="n">Union<span class="p">[</span>tensorflow.python.framework.ops.EagerTensor<span class="p">, </span>tensorflow.python.framework.ops.Tensor<span class="p">, </span>torch.Tensor<span class="p">, </span>torch.LongTensor<span class="p">, </span>torch.FloatTensor<span class="p">, </span>List<span class="p">[</span>float<span class="p">]</span><span class="p">, </span>numpy.array<span class="p">]</span></span></em>, <em class="sig-param"><span class="n">scores_b</span><span class="p">:</span> <span class="n">Union<span class="p">[</span>tensorflow.python.framework.ops.EagerTensor<span class="p">, </span>tensorflow.python.framework.ops.Tensor<span class="p">, </span>torch.Tensor<span class="p">, </span>torch.LongTensor<span class="p">, </span>torch.FloatTensor<span class="p">, </span>List<span class="p">[</span>float<span class="p">]</span><span class="p">, </span>numpy.array<span class="p">]</span></span></em>, <em class="sig-param"><span class="n">num_samples</span><span class="p">:</span> <span class="n">int</span> <span class="o">=</span> <span class="default_value">1000</span></em><span class="sig-paren">)</span> &#x2192; float<a class="headerlink" href="#deepsig.permutation.permutation_test" title="Permalink to this definition">¶</a></dt>
+<dt class="sig sig-object py" id="deepsig.permutation.permutation_test">
+<span class="sig-prename descclassname"><span class="pre">deepsig.permutation.</span></span><span class="sig-name descname"><span class="pre">permutation_test</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">scores_a</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">jax.interpreters.xla._DeviceArray</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.Tensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.LongTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.FloatTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">scores_b</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">jax.interpreters.xla._DeviceArray</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.Tensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.LongTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.FloatTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_samples</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1000</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_jobs</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">float</span></span></span><a class="headerlink" href="#deepsig.permutation.permutation_test" title="Permalink to this definition">¶</a></dt>
 <dd><p>Implementation of a permutation-randomization test. Scores of A and B will be randomly swapped and the difference
 in samples is then compared to the original difference.</p>
 <p>The test is single-tailed, where we want to verify that the algorithm corresponding to <cite>scores_a</cite> is better than
@@ -839,6 +929,10 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 </dd>
 <dt><strong>num_samples: int</strong></dt><dd><p>Number of permutations used for estimation.</p>
 </dd>
+<dt><strong>num_jobs: int</strong></dt><dd><p>Number of threads that bootstrap iterations are divided among.</p>
+</dd>
+<dt><strong>seed: Optional[int]</strong></dt><dd><p>Set seed for reproducibility purposes. Default is None (meaning no seed is used).</p>
+</dd>
 </dl>
 </dd>
 <dt class="field-even">Returns</dt>
@@ -852,8 +946,8 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 
 <span class="target" id="module-deepsig.sample_size"></span><p>Implement functions to help determine the right sample size for experiments.</p>
 <dl class="py function">
-<dt id="deepsig.sample_size.aso_uncertainty_reduction">
-<code class="sig-prename descclassname">deepsig.sample_size.</code><code class="sig-name descname">aso_uncertainty_reduction</code><span class="sig-paren">(</span><em class="sig-param"><span class="n">m_old</span><span class="p">:</span> <span class="n">int</span></em>, <em class="sig-param"><span class="n">n_old</span><span class="p">:</span> <span class="n">int</span></em>, <em class="sig-param"><span class="n">m_new</span><span class="p">:</span> <span class="n">int</span></em>, <em class="sig-param"><span class="n">n_new</span><span class="p">:</span> <span class="n">int</span></em><span class="sig-paren">)</span> &#x2192; float<a class="headerlink" href="#deepsig.sample_size.aso_uncertainty_reduction" title="Permalink to this definition">¶</a></dt>
+<dt class="sig sig-object py" id="deepsig.sample_size.aso_uncertainty_reduction">
+<span class="sig-prename descclassname"><span class="pre">deepsig.sample_size.</span></span><span class="sig-name descname"><span class="pre">aso_uncertainty_reduction</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">m_old</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_old</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">m_new</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">n_new</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">float</span></span></span><a class="headerlink" href="#deepsig.sample_size.aso_uncertainty_reduction" title="Permalink to this definition">¶</a></dt>
 <dd><p>Compute the reduction of uncertainty of tightness of estimate for violation ratio e_W2(F, G).
 This is based on the CLT in <a class="reference external" href="https://arxiv.org/pdf/1705.01788.pdf">del Barrio et al. (2018)</a> Theorem 2.4 / eq. 9.</p>
 <dl class="field-list simple">
@@ -879,8 +973,8 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 </dd></dl>
 
 <dl class="py function">
-<dt id="deepsig.sample_size.bootstrap_power_analysis">
-<code class="sig-prename descclassname">deepsig.sample_size.</code><code class="sig-name descname">bootstrap_power_analysis</code><span class="sig-paren">(</span><em class="sig-param"><span class="n">scores</span><span class="p">:</span> <span class="n">Union<span class="p">[</span>tensorflow.python.framework.ops.EagerTensor<span class="p">, </span>tensorflow.python.framework.ops.Tensor<span class="p">, </span>torch.Tensor<span class="p">, </span>torch.LongTensor<span class="p">, </span>torch.FloatTensor<span class="p">, </span>List<span class="p">[</span>float<span class="p">]</span><span class="p">, </span>numpy.array<span class="p">]</span></span></em>, <em class="sig-param"><span class="n">scalar</span><span class="p">:</span> <span class="n">float</span> <span class="o">=</span> <span class="default_value">1.25</span></em>, <em class="sig-param"><span class="n">num_bootstrap_iterations</span><span class="p">:</span> <span class="n">int</span> <span class="o">=</span> <span class="default_value">5000</span></em>, <em class="sig-param"><span class="n">significance_threshold</span><span class="p">:</span> <span class="n">float</span> <span class="o">=</span> <span class="default_value">0.05</span></em>, <em class="sig-param"><span class="n">significance_test</span><span class="p">:</span> <span class="n">Optional<span class="p">[</span>Callable<span class="p">]</span></span> <span class="o">=</span> <span class="default_value">None</span></em>, <em class="sig-param"><span class="n">show_progress</span><span class="p">:</span> <span class="n">bool</span> <span class="o">=</span> <span class="default_value">True</span></em>, <em class="sig-param"><span class="n">seed</span><span class="p">:</span> <span class="n">Optional<span class="p">[</span>int<span class="p">]</span></span> <span class="o">=</span> <span class="default_value">None</span></em><span class="sig-paren">)</span> &#x2192; float<a class="headerlink" href="#deepsig.sample_size.bootstrap_power_analysis" title="Permalink to this definition">¶</a></dt>
+<dt class="sig sig-object py" id="deepsig.sample_size.bootstrap_power_analysis">
+<span class="sig-prename descclassname"><span class="pre">deepsig.sample_size.</span></span><span class="sig-name descname"><span class="pre">bootstrap_power_analysis</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">scores</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Union</span><span class="p"><span class="pre">[</span></span><span class="pre">jax.interpreters.xla._DeviceArray</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.Tensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.LongTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">torch.FloatTensor</span><span class="p"><span class="pre">,</span> </span><span class="pre">List</span><span class="p"><span class="pre">[</span></span><span class="pre">float</span><span class="p"><span class="pre">]</span></span><span class="p"><span class="pre">,</span> </span><span class="pre">numpy.array</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">scalar</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">float</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">1.25</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_bootstrap_iterations</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">int</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">5000</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">significance_threshold</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">float</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">0.05</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">significance_test</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">Callable</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">show_progress</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">bool</span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">seed</span></span><span class="p"><span class="pre">:</span></span> <span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">]</span></span></span> <span class="o"><span class="pre">=</span></span> <span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">float</span></span></span><a class="headerlink" href="#deepsig.sample_size.bootstrap_power_analysis" title="Permalink to this definition">¶</a></dt>
 <dd><p>Perform bootstrap power analysis [1] to see whether the amount of collected scores is sufficient. It determines
 the statistical power of the sample, i.e. the probability of an statistically significant effect to be found given
 that there is one (that is, the lower the power, the higher the probability of a Type II error).</p>
@@ -926,7 +1020,7 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 </dl>
 </dd></dl>
 
-</div>
+</section>
 
 
           </div>
@@ -939,7 +1033,7 @@ <h2>📚 Bibliography<a class="headerlink" href="#id18" title="Permalink to this
 <footer class="footer d-flex justify-content-between flex-wrap">
     <div class="copyright">
         <div>&copy; Copyright 2021, Dennis Ulmer.</div>
-      <div>Generated by <a href="http://sphinx.pocoo.org/">Sphinx</a> 3.2.1 using <a href="https://github.com/myyasuda/sphinxbootstrap4theme">sphinxbootstrap4theme</a> 0.6.0.</div>
+      <div>Generated by <a href="http://sphinx.pocoo.org/">Sphinx</a> 4.1.2 using <a href="https://github.com/myyasuda/sphinxbootstrap4theme">sphinxbootstrap4theme</a> 0.6.0.</div>
     </div>
     <div>
         <a href="#" class="btn btn-primary btn-sm" role="botton">Back to top</a>
diff --git a/docs/build/html/objects.inv b/docs/build/html/objects.inv
index bb9f936..4ccfc1f 100644
Binary files a/docs/build/html/objects.inv and b/docs/build/html/objects.inv differ
diff --git a/docs/build/html/search.html b/docs/build/html/search.html
index 88f25a4..a23960a 100644
--- a/docs/build/html/search.html
+++ b/docs/build/html/search.html
@@ -11,15 +11,14 @@
     
     <title>Search &#8212; deep-significance 0.9 documentation</title>
 
-    <link rel="stylesheet" href="_static/basic.css" type="text/css" />
-    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <link rel="stylesheet" type="text/css" href="_static/pygments.css" />
+    <link rel="stylesheet" type="text/css" href="_static/basic.css" />
     <link rel="stylesheet" href="_static/bootstrap-4.3.1-dist/css/bootstrap.min.css" type="text/css" />
     <link rel="stylesheet" href="_static/sphinxbootstrap4.css" type="text/css" />
-    <script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
+    <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
     <script src="_static/jquery.js"></script>
     <script src="_static/underscore.js"></script>
     <script src="_static/doctools.js"></script>
-    <script src="_static/language_data.js"></script>
     <script src="_static/searchtools.js"></script>
     <script src="_static/bootstrap-4.3.1-dist/js/bootstrap.min.js"></script>
     <script src="_static/sphinxbootstrap4.js"></script>
@@ -134,7 +133,7 @@ <h1 id="search-documentation">Search</h1>
 <footer class="footer d-flex justify-content-between flex-wrap">
     <div class="copyright">
         <div>&copy; Copyright 2021, Dennis Ulmer.</div>
-      <div>Generated by <a href="http://sphinx.pocoo.org/">Sphinx</a> 3.2.1 using <a href="https://github.com/myyasuda/sphinxbootstrap4theme">sphinxbootstrap4theme</a> 0.6.0.</div>
+      <div>Generated by <a href="http://sphinx.pocoo.org/">Sphinx</a> 4.1.2 using <a href="https://github.com/myyasuda/sphinxbootstrap4theme">sphinxbootstrap4theme</a> 0.6.0.</div>
     </div>
     <div>
         <a href="#" class="btn btn-primary btn-sm" role="botton">Back to top</a>
diff --git a/docs/build/html/searchindex.js b/docs/build/html/searchindex.js
index f612268..5db80bf 100644
--- a/docs/build/html/searchindex.js
+++ b/docs/build/html/searchindex.js
@@ -1 +1 @@
-Search.setIndex({docnames:["README_DOCS","index"],envversion:{"sphinx.domains.c":2,"sphinx.domains.changeset":1,"sphinx.domains.citation":1,"sphinx.domains.cpp":3,"sphinx.domains.index":1,"sphinx.domains.javascript":2,"sphinx.domains.math":2,"sphinx.domains.python":2,"sphinx.domains.rst":2,"sphinx.domains.std":1,sphinx:56},filenames:["README_DOCS.md","index.rst"],objects:{"":{deepsig:[1,0,0,"-"]},"deepsig.aso":{aso:[1,1,1,""],compute_violation_ratio:[1,1,1,""],get_quantile_function:[1,1,1,""],multi_aso:[1,1,1,""]},"deepsig.bootstrap":{bootstrap_test:[1,1,1,""]},"deepsig.correction":{bonferroni_correction:[1,1,1,""],calculate_partial_conjunction:[1,1,1,""]},"deepsig.permutation":{permutation_test:[1,1,1,""]},"deepsig.sample_size":{aso_uncertainty_reduction:[1,1,1,""],bootstrap_power_analysis:[1,1,1,""]},deepsig:{aso:[1,0,0,"-"],bootstrap:[1,0,0,"-"],correction:[1,0,0,"-"],permutation:[1,0,0,"-"],sample_size:[1,0,0,"-"]}},objnames:{"0":["py","module","Python module"],"1":["py","function","Python function"]},objtypes:{"0":"py:module","1":"py:function"},terms:{"005":1,"0100":[0,1],"0583005244258363":[0,1],"081":[0,1],"1000":1,"103":[0,1],"110":[0,1],"11972":[0,1],"1266":[0,1],"146":[0,1],"1547005383792515":[0,1],"1565800030782686":[0,1],"16183816183816183":[0,1],"18653":[0,1],"1936":[0,1],"1989":[0,1],"1994":[0,1],"2003":[0,1],"2017":[0,1],"2018":[0,1],"2019":[0,1],"2020":[0,1],"2021":[0,1],"2102":[0,1],"2556":[0,1],"2773":[0,1],"2785":[0,1],"416724971000804":[0,1],"4638709":[0,1],"5000":1,"5281":[0,1],"56th":[0,1],"57th":[0,1],"6391":[0,1],"6401":[0,1],"6909574989986":[0,1],"case":[0,1],"default":[0,1],"float":1,"function":[0,1],"import":[0,1],"int":1,"long":[0,1],"matr\u00e1n":[0,1],"new":[0,1],"null":[0,1],"public":1,"return":[0,1],"short":[0,1],"true":[0,1],"try":[0,1],"while":[0,1],Adding:[0,1],But:[0,1],For:1,That:[0,1],The:[0,1],Then:[0,1],Use:[0,1],Using:[0,1],_progress_bar:1,aaai:[0,1],abl:[0,1],about:[0,1],abov:[0,1],accept:[0,1],accident:[0,1],accordingli:[0,1],account:[0,1],accur:[0,1],achiev:[0,1],acl:[0,1],activ:[0,1],actual:[0,1],added:[0,1],adding:[0,1],adjust:[0,1],again:[0,1],aggreg:[0,1],aid:[0,1],alberto:[0,1],algorithm:[0,1],ali:[0,1],all:[0,1],allow:[0,1],alpha:[0,1],alreadi:[0,1],also:[0,1],although:[0,1],alwai:[0,1],among:1,amount:[0,1],analys:[0,1],analysi:[0,1],anderson:[0,1],ani:[0,1],anna:[0,1],annual:[0,1],anoth:[0,1],answer:[0,1],appeal:[0,1],appli:[0,1],applic:[0,1],approach:[0,1],approxim:[0,1],arang:[0,1],architectur:[0,1],area:[0,1],argument:[0,1],around:[0,1],arrai:[0,1],arraylik:1,arrraylik:1,artifici:[0,1],arxiv:[0,1],aso:1,aso_uncertainty_reduct:[0,1],assess:[0,1],associ:[0,1],assum:[0,1],assumpt:[0,1],astrai:[0,1],august:[0,1],author:[0,1],auxiliari:[0,1],avail:[0,1],avoid:1,bar:1,barrio:[0,1],base:[0,1],baselin:[0,1],baseline_scor:[0,1],baseline_scored_samples_per_run:[0,1],baseline_scores_per_dataset:[0,1],basi:[0,1],becaus:[0,1],becom:[0,1],been:[0,1],befor:[0,1],being:[0,1],belong:1,below:[0,1],benchmark:[0,1],between:[0,1],blog:[0,1],board:0,bonferroni:[0,1],bonferroni_correct:[0,1],book:1,bookmark:1,booktitl:[0,1],bool:1,bootstrap:1,bootstrap_power_analysi:[0,1],bootstrap_test:[0,1],borji:[0,1],borrow:1,both:[0,1],bound:[0,1],bouthili:[0,1],bouthilli:[0,1],bradlei:[0,1],british:[0,1],build:[0,1],built:[0,1],calcolo:[0,1],calcul:1,calculate_partial_conjunct:1,call:[0,1],callabl:1,can:[0,1],cannot:[0,1],carlo:[0,1],cartographi:[0,1],cdf:[0,1],cham:[0,1],chanc:[0,1],chang:1,choic:[0,1],christoph:[0,1],classi:[0,1],clear:[0,1],clone:[0,1],clt:1,code:[0,1],codebas:1,collect:[0,1],column:[0,1],com:[0,1],commericiali:[0,1],commit:[0,1],common:[0,1],comparison:[0,1],compat:1,comput:[0,1],compute_violation_ratio:1,concept:[0,1],conclud:[0,1],conclus:[0,1],conduct:[0,1],confer:[0,1],confid:[0,1],confidence_level:[0,1],conjunct:1,conserv:[0,1],consid:[0,1],consult:[0,1],contain:[0,1],content:[0,1],control:0,convex:[0,1],copenhagen:[0,1],core:1,correct:[0,1],correspond:[0,1],covari:[0,1],crc:[0,1],creat:[0,1],cuesta:[0,1],cumul:[0,1],current:[0,1],cut:[0,1],darl:[0,1],data:[0,1],datafram:[0,1],david:[0,1],decad:[0,1],decid:[0,1],decis:[0,1],declar:[0,1],decreas:[0,1],deem:1,deepsig:[0,1],defin:[0,1],degre:[0,1],del:[0,1],dell:[0,1],denni:[0,1],dennis_ulmer_2021_4638709:[0,1],denot:1,depend:[0,1],der:[0,1],describ:[0,1],desir:1,detail:[0,1],determin:[0,1],deviat:[0,1],diagon:[0,1],dict:[0,1],dictionari:[0,1],die:0,differ:[0,1],differenti:1,difficult:[0,1],disabl:[0,1],discuss:[0,1],displai:[0,1],distribut:[0,1],divid:[0,1],document:0,doe:[0,1],doi:[0,1],doing:[0,1],domin:[0,1],don:[0,1],done:1,drawn:[0,1],dropout:[0,1],dror2019deep:[0,1],dror:[0,1],due:[0,1],dure:[0,1],e_w2:1,each:[0,1],eagertensor:1,economich:[0,1],editor:[0,1],effect:1,efron:[0,1],either:1,elabor:[0,1],electr:0,element:[0,1],empir:[0,1],endang:[0,1],english:[0,1],enough:[0,1],ensur:[0,1],entri:[0,1],enumer:[0,1],eps_min:[0,1],epsilon:1,epsilon_:[0,1],epsilon_min:[0,1],equal:[0,1],equat:[0,1],equival:[0,1],eric:[0,1],error:[0,1],especi:[0,1],estim:[0,1],etc:[0,1],eustasio:[0,1],evalu:[0,1],even:[0,1],everi:[0,1],evid:[0,1],exactli:[0,1],examin:1,expect:[0,1],experi:1,experiment:[0,1],express:[0,1],extent:[0,1],fact:[0,1],factor:[0,1],fail:[0,1],faith:[0,1],fall:[0,1],fals:[0,1],farmer:0,feed:[0,1],feedback:[0,1],feel:[0,1],few:[0,1],field:[0,1],fill:[0,1],finish:[0,1],finit:[0,1],firenz:[0,1],first:[0,1],fit:[0,1],five:[0,1],floattensor:1,florenc:[0,1],follow:[0,1],form:1,formul:[0,1],found:[0,1],frame:1,framework:1,from:[0,1],fulli:[0,1],furthermor:[0,1],game:0,gather:1,gavin:[0,1],get:[0,1],get_quantile_funct:1,git:[0,1],github:[0,1],give:[0,1],given:[0,1],goldstein:[0,1],good:[0,1],green:[0,1],groot:[0,1],group:[0,1],growth:[0,1],guarante:[0,1],guid:[0,1],hai:[0,1],half:[0,1],hand:[0,1],hao:[0,1],hard:[0,1],has:[0,1],have:[0,1],hayashi:[0,1],heavili:1,help:[0,1],henderson:[0,1],her:[0,1],here:[0,1],high:[0,1],higher:[0,1],highli:[0,1],hitchhik:[0,1],hold:[0,1],howev:[0,1],http:[0,1],hyperparamet:[0,1],hypothesi:[0,1],ideal:[0,1],imag:[0,1],imagin:[0,1],implement:[0,1],improv:[0,1],inbox:0,inbox_trai:1,increas:[0,1],inde:[0,1],indic:[0,1],infeas:[0,1],infer:[0,1],influenc:[0,1],initi:[0,1],inlin:[0,1],inproceed:[0,1],input:[0,1],instanc:[0,1],instead:[0,1],integr:1,intellig:[0,1],intens:[0,1],intern:1,interpret:[0,1],interrobang:1,introduc:[0,1],introduct:[0,1],intuit:1,issu:[0,1],istituto:[0,1],itali:[0,1],iter:[0,1],itertool:[0,1],jan:[0,1],jax:1,job:[0,1],joblib:[0,1],journal:[0,1],juan:[0,1],juli:[0,1],just:[0,1],kaleidophon:[0,1],kentaro:[0,1],kind:[0,1],knob:0,know:[0,1],known:[0,1],korhonen:[0,1],lai:[0,1],lambda:[0,1],landscap:[0,1],languag:[0,1],larg:[0,1],larger:[0,1],last:[0,1],lastli:[0,1],later:[0,1],latex:[0,1],lead:[0,1],learn:[0,1],len:[0,1],length:1,less:[0,1],let:[0,1],lift:[0,1],like:[0,1],line:[0,1],linear:[0,1],linguist:[0,1],list:[0,1],llu:[0,1],loc:[0,1],local:[0,1],longtensor:1,look:[0,1],loss:[0,1],low:[0,1],lower:[0,1],m_new:[0,1],m_old:[0,1],machin:[0,1],make:[0,1],mani:[0,1],mar:[0,1],mask:[0,1],mathemat:[0,1],matrix:[0,1],matter:[0,1],maximum:[0,1],mean:[0,1],medal:0,medal_sport:1,meet:[0,1],member:[0,1],messag:[0,1],method:[0,1],metric:[0,1],might:[0,1],min:[0,1],min_ep:[0,1],minimum:[0,1],mistakenli:[0,1],mitig:[0,1],modif:[0,1],modifi:[0,1],modul:1,month:[0,1],mortar:0,mortar_board:1,most:[0,1],move:[0,1],multi_aso:[0,1],multithread:[0,1],my_model_scor:[0,1],my_model_scored_samples_per_run:[0,1],my_model_scores_per_dataset:[0,1],my_models_scor:[0,1],n_new:[0,1],n_old:[0,1],name:[0,1],narang:[0,1],natur:[0,1],necessarili:[0,1],neg:[0,1],nest:[0,1],net:[0,1],neurip:[0,1],nevertheless:[0,1],next:[0,1],nlp:[0,1],nlpnorth:[0,1],non:[0,1],none:1,noreen:[0,1],normal:[0,1],notion:[0,1],now:[0,1],num_bootstrap_iter:[0,1],num_job:[0,1],num_sampl:[0,1],number:[0,1],numpi:1,object:1,observ:[0,1],obtain:[0,1],off:[0,1],often:[0,1],old:[0,1],onc:[0,1],one:[0,1],ones:1,onli:[0,1],open:[0,1],oper:[0,1],ops:1,optim:[0,1],option:[0,1],org:[0,1],origin:[0,1],otherwis:1,our:[0,1],ourselv:[0,1],out:[0,1],over:[0,1],overview:[0,1],p19:[0,1],p_valu:1,packag:[0,1],page:[0,1],pair:[0,1],panda:[0,1],paramet:1,parametr:[0,1],partial:1,pass:[0,1],peopl:0,people_holding_hand:1,per:[0,1],percentag:[0,1],perform:[0,1],permut:1,permutation_test:[0,1],person:[0,1],perspect:[0,1],peter:[0,1],pick:[0,1],pip3:[0,1],pip:[0,1],place:[0,1],plagu:[0,1],plank:[0,1],pleas:[0,1],plug:0,point:[0,1],popul:1,portion:[0,1],possess:[0,1],possibl:[0,1],post:[0,1],power2:[0,1],power:[0,1],practic:[0,1],practition:[0,1],prefer:[0,1],prei:[0,1],preprint:[0,1],press:[0,1],previou:[0,1],print:[0,1],probabilita:[0,1],probabl:[0,1],problem:[0,1],problemat:[0,1],proceed:[0,1],process:[0,1],product:[0,1],progress:[0,1],project:[0,1],properli:[0,1],properti:[0,1],propos:[0,1],provid:[0,1],psycholog:[0,1],pubblicazioni:[0,1],publish:[0,1],pull:[0,1],purpos:[0,1],python:1,pytorch:1,quantifi:[0,1],quantil:1,quantiti:[0,1],question:[0,1],quit:[0,1],randn:[0,1],random:[0,1],randomli:1,rang:[0,1],rare:[0,1],ratio:[0,1],ration:1,read:[0,1],readabl:[0,1],readm:[0,1],readme2latex:[0,1],reason:[0,1],recent:[0,1],red1:[0,1],red2:[0,1],red:[0,1],reduc:[0,1],reduct:1,refer:[0,1],reichart:[0,1],reinforc:[0,1],reject:[0,1],reliabl:[0,1],render:[0,1],repeat:[0,1],replic:1,repositori:[0,1],reproduc:1,request:[0,1],requir:[0,1],resampl:1,resourc:[0,1],restrict:[0,1],retriev:[0,1],return_df:[0,1],revers:[0,1],revert:[0,1],right:[0,1],risk:[0,1],robert:[0,1],roi:[0,1],rotem:[0,1],row:[0,1],rquez:[0,1],safe:1,same:[0,1],sample_s:1,save:[0,1],scalar:1,scale:[0,1],scienz:[0,1],scorecollect:1,scores1:[0,1],scores2:[0,1],scores_a:[0,1],scores_b:[0,1],second:[0,1],secondli:[0,1],section:[0,1],see:[0,1],seed:1,seem:[0,1],seen:[0,1],segev:[0,1],set:1,setup:[0,1],sever:[0,1],shapiro:[0,1],sharan:[0,1],shlomov:[0,1],shortcut:[0,1],shot:[0,1],should:[0,1],show:1,show_progress:[0,1],shuffl:[0,1],side:[0,1],sigma:1,sign:1,significance_test:[0,1],significance_threshold:1,similar:[0,1],similarli:[0,1],simplest:[0,1],simpli:[0,1],simul:[0,1],singl:[0,1],small:[0,1],smaller:[0,1],snippet:[0,1],softwar:[0,1],some:[0,1],someth:[0,1],sometim:[0,1],sort:1,sorted_p_valu:1,sourc:[0,1],specif:[0,1],spectacular:[0,1],speed:[0,1],spend:[0,1],spoken:[0,1],sport:0,springer:[0,1],standard:[0,1],statist:[0,1],statistica:[0,1],std:1,stem:[0,1],step:[0,1],still:[0,1],str:1,structur:[0,1],student:[0,1],studer:[0,1],studi:[0,1],success:[0,1],suffici:1,suggest:1,superior:[0,1],suppli:[0,1],support:[0,1],sure:[0,1],surfac:[0,1],suspect:1,suspici:[0,1],swap:1,symmetri:[0,1],system:[0,1],tabl:[0,1],tail:1,take:[0,1],task:[0,1],taylor:[0,1],technic:[0,1],tediou:[0,1],templat:[0,1],tensor:[0,1],tensorflow:1,teoria:[0,1],text:[0,1],thank:[0,1],thei:[0,1],them:[0,1],theorem:1,therefor:[0,1],thi:[0,1],thread:1,three:[0,1],threshold:[0,1],thu:[0,1],tibshirani:[0,1],tight:[0,1],tighter:[0,1],time:1,timeit:[0,1],timestamp:[0,1],titl:[0,1],tom:[0,1],too:[0,1],torch:[0,1],total:1,tqdm:1,trai:0,transfer:[0,1],transform:[0,1],translat:[0,1],transport:[0,1],traum:[0,1],trust:[0,1],trustworthi:[0,1],tue:[0,1],turn:[0,1],type:[0,1],ulmer:[0,1],uncertain:[0,1],uncertainti:[0,1],under:[0,1],undergon:[0,1],understand:[0,1],uniform:[0,1],union:1,univers:[0,1],unlik:[0,1],upper:[0,1],url:[0,1],usag:[0,1],use:[0,1],use_bonferroni:[0,1],use_symmetri:[0,1],used:[0,1],useful:[0,1],uses:[0,1],usual:[0,1],valu:[0,1],van:[0,1],vari:[0,1],varianc:[0,1],variat:[0,1],variou:[0,1],veri:[0,1],verifi:1,versa:[0,1],version:[0,1],via:[0,1],vice:[0,1],view:[0,1],violat:[0,1],vision:[0,1],visual:[0,1],vol:[0,1],volum:[0,1],wai:[0,1],wait:[0,1],want:[0,1],warn:[0,1],weight:[0,1],welch:[0,1],well:[0,1],were:[0,1],what:[0,1],when:[0,1],where:[0,1],whether:[0,1],which:[0,1],whose:[0,1],wilei:[0,1],wilk:[0,1],wise:[0,1],woman:0,word:[0,1],work:[0,1],wors:[0,1],would:[0,1],wrap:1,xavier:[0,1],year:[0,1],yield:[0,1],york:[0,1],you:[0,1],your:[0,1],yuan:[0,1],zenodo:[0,1],zero:[0,1],zhang:[0,1],zheng:[0,1],zip:[0,1]},titles:["deep-significance: Easy and Better Significance Testing for Deep Neural Networks","deep-significance: Easy and Better Significance Testing for Deep Neural Networks"],titleterms:{For:0,acknowledg:[0,1],across:[0,1],almost:[0,1],aso:0,better:[0,1],bibliographi:[0,1],book:0,bookmark:0,bootstrap:0,cite:[0,1],compar:[0,1],compat:0,control_knob:0,dataset:[0,1],deep:[0,1],document:1,easi:[0,1],electric_plug:0,exampl:[0,1],featur:[0,1],game_di:0,gener:[0,1],how:[0,1],impati:0,inbox_trai:0,instal:[0,1],intermezzo:[0,1],interrobang:0,jax:0,level:[0,1],medal_sport:0,model:[0,1],more:[0,1],mortar_board:0,multi:0,multipl:[0,1],network:[0,1],neural:[0,1],newspap:0,note:[0,1],numpi:0,order:[0,1],other:[0,1],paper:[0,1],people_holding_hand:0,permut:0,pytorch:0,recommend:[0,1],replic:0,report:[0,1],result:[0,1],rocket:0,run:[0,1],sampl:[0,1],scenario:[0,1],score:[0,1],seed:0,set:0,signific:[0,1],size:[0,1],sparkl:0,stochast:[0,1],tensorflow:0,test:[0,1],than:[0,1],thread:0,two:[0,1],using:[0,1],why:[0,1],woman_farm:0}})
\ No newline at end of file
+Search.setIndex({docnames:["README_DOCS","index"],envversion:{"sphinx.domains.c":2,"sphinx.domains.changeset":1,"sphinx.domains.citation":1,"sphinx.domains.cpp":4,"sphinx.domains.index":1,"sphinx.domains.javascript":2,"sphinx.domains.math":2,"sphinx.domains.python":3,"sphinx.domains.rst":2,"sphinx.domains.std":2,sphinx:56},filenames:["README_DOCS.md","index.rst"],objects:{"":{deepsig:[1,0,0,"-"]},"deepsig.aso":{aso:[1,1,1,""],compute_violation_ratio:[1,1,1,""],get_bootstrapped_violation_ratios:[1,1,1,""],get_quantile_function:[1,1,1,""],multi_aso:[1,1,1,""]},"deepsig.bootstrap":{bootstrap_test:[1,1,1,""]},"deepsig.correction":{bonferroni_correction:[1,1,1,""],calculate_partial_conjunction:[1,1,1,""]},"deepsig.permutation":{permutation_test:[1,1,1,""]},"deepsig.sample_size":{aso_uncertainty_reduction:[1,1,1,""],bootstrap_power_analysis:[1,1,1,""]},deepsig:{aso:[1,0,0,"-"],bootstrap:[1,0,0,"-"],correction:[1,0,0,"-"],permutation:[1,0,0,"-"],sample_size:[1,0,0,"-"]}},objnames:{"0":["py","module","Python module"],"1":["py","function","Python function"]},objtypes:{"0":"py:module","1":"py:function"},terms:{"0":[0,1],"000000":[0,1],"001":1,"005":1,"006370113450148568":[0,1],"0100":[0,1],"05":[0,1],"0583005244258363":[0,1],"06":[0,1],"06815":[0,1],"07194780234194881":[0,1],"081":[0,1],"10":[0,1],"1000":[0,1],"103":[0,1],"110":[0,1],"11972":[0,1],"1234":[0,1],"1266":[0,1],"139":[0,1],"14":1,"14946944524461184":[0,1],"15":[0,1],"1547005383792515":[0,1],"15524440000002pt":1,"16183816183816183":[0,1],"18653":[0,1],"1936":[0,1],"1989":[0,1],"1994":[0,1],"20":[0,1],"2003":[0,1],"2017":[0,1],"2018":[0,1],"2019":[0,1],"2020":[0,1],"2021":[0,1],"2022":[0,1],"2102":[0,1],"2204":[0,1],"22387448804041898":[0,1],"225":[0,1],"25":1,"2556":[0,1],"27":[0,1],"270567249999992pt":1,"2773":[0,1],"2785":[0,1],"28":[0,1],"2d":1,"2ec6e630f199f589a2402fdf3e0289d5":1,"32":[0,1],"33":[0,1],"3831678636198528":[0,1],"393":[0,1],"40":[0,1],"44":[0,1],"5":[0,1],"50":[0,1],"5000":1,"52":[0,1],"5273463008857844":[0,1],"56":[0,1],"56th":[0,1],"57th":[0,1],"6099543280369378":[0,1],"62":[0,1],"6318126":[0,1],"6391":[0,1],"6401":[0,1],"6534772728574852":[0,1],"69":[0,1],"7":[0,1],"730487":[0,1],"73048716":[0,1],"73514621799995n":[0,1],"8":[0,1],"820816":[0,1],"82081635":[0,1],"9":[0,1],"9152792807128325":[0,1],"92621655":[0,1],"926217":[0,1],"93":[0,1],"95":[0,1],"case":[0,1],"default":[0,1],"do":[0,1],"float":1,"function":[0,1],"import":[0,1],"int":1,"long":[0,1],"matr\u00e1n":[0,1],"new":[0,1],"null":[0,1],"public":1,"return":[0,1],"short":[0,1],"true":[0,1],"try":[0,1],"while":[0,1],A:[0,1],But:[0,1],By:[0,1],For:1,IT:[0,1],If:[0,1],In:[0,1],Is:1,It:[0,1],No:[0,1],That:[0,1],The:[0,1],Then:[0,1],To:[0,1],_:[0,1],_devicearrai:1,_progress_bar:1,aaai:[0,1],abl:[0,1],about:[0,1],abov:[0,1],accept:[0,1],accident:[0,1],accordingli:[0,1],account:[0,1],accur:[0,1],achiev:[0,1],acl:[0,1],activ:[0,1],actual:[0,1],ad:[0,1],adjust:[0,1],ag:[0,1],again:[0,1],aggreg:[0,1],aid:[0,1],aki:[0,1],al:[0,1],alberto:[0,1],algorithm:[0,1],ali:[0,1],align:1,all:[0,1],allen:[0,1],allow:[0,1],alongsid:[0,1],alpha:[0,1],alreadi:[0,1],also:[0,1],although:[0,1],alwai:[0,1],among:1,amount:[0,1],an:[0,1],analys:[0,1],analysi:[0,1],anderson:[0,1],andrew:[0,1],ani:[0,1],anna:[0,1],annual:[0,1],anoth:[0,1],answer:[0,1],appeal:[0,1],appli:[0,1],applic:[0,1],approach:[0,1],approxim:[0,1],ar:[0,1],arang:[0,1],architectur:[0,1],area:[0,1],argu:[0,1],argument:[0,1],around:[0,1],arrai:[0,1],arraylik:1,articl:[0,1],artifici:[0,1],arxiv:[0,1],aso:1,aso_uncertainty_reduct:[0,1],assess:[0,1],associ:[0,1],assum:[0,1],assumpt:[0,1],astrai:[0,1],august:[0,1],author:[0,1],auxiliari:[0,1],avail:[0,1],avoid:1,b:[0,1],bar:1,barrio:[0,1],base:[0,1],baselin:[0,1],baseline_scor:[0,1],baseline_scored_samples_per_run:[0,1],baseline_scores_per_dataset:[0,1],basi:[0,1],bayesian:[0,1],becaus:[0,1],becom:[0,1],been:[0,1],befor:[0,1],being:[0,1],belong:1,below:[0,1],benchmark:[0,1],between:[0,1],beyond:[0,1],blog:[0,1],board:0,bonferroni:[0,1],bonferroni_correct:[0,1],book:1,bookmark:1,booktitl:[0,1],bool:1,bootstrap:1,bootstrap_power_analysi:[0,1],bootstrap_test:[0,1],borji:[0,1],borrow:1,both:[0,1],bound:[0,1],bouthili:[0,1],bouthilli:[0,1],bradlei:[0,1],british:[0,1],build:[0,1],built:[0,1],calcolo:[0,1],calcul:1,calculate_partial_conjunct:1,call:[0,1],callabl:1,can:[0,1],cannot:[0,1],carlin:[0,1],carlo:[0,1],cartographi:[0,1],cd:[0,1],cdf:[0,1],cham:[0,1],chanc:[0,1],chang:1,check:[0,1],choic:[0,1],christian:[0,1],christoph:[0,1],citep:[0,1],citet:[0,1],classi:[0,1],clear:[0,1],clone:[0,1],clt:1,code:[0,1],codebas:1,collect:[0,1],column:[0,1],com:[0,1],commericiali:[0,1],commit:[0,1],common:[0,1],comparison:[0,1],compat:1,comput:[0,1],compute_violation_ratio:1,concept:[0,1],conclud:[0,1],conclus:[0,1],condit:[0,1],conduct:[0,1],confer:[0,1],confid:[0,1],confidence_level:[0,1],conjunct:1,conserv:[0,1],consid:[0,1],consult:[0,1],contain:[0,1],content:[0,1],control:0,convex:[0,1],copenhagen:[0,1],core:1,correct:[0,1],correspond:[0,1],could:[0,1],covari:[0,1],crc:[0,1],creat:[0,1],cuesta:[0,1],cumul:[0,1],current:[0,1],cut:[0,1],darl:[0,1],data:[0,1],datafram:[0,1],david:[0,1],decad:[0,1],decid:[0,1],decis:[0,1],declar:[0,1],decreas:[0,1],deem:1,deepsig:[0,1],defin:[0,1],degre:[0,1],del2018optim:[0,1],del:[0,1],dell:[0,1],demo:[0,1],denni:[0,1],denot:1,depend:[0,1],der:[0,1],describ:[0,1],desir:[0,1],detail:[0,1],determin:[0,1],deviat:[0,1],di:[0,1],diagon:[0,1],dict:[0,1],dictionari:[0,1],die:0,differ:[0,1],differenti:1,difficult:[0,1],disabl:1,discuss:[0,1],displai:[0,1],distinguish:[0,1],distribut:[0,1],divid:[0,1],document:0,doe:[0,1],doi:[0,1],domin:[0,1],don:[0,1],donald:[0,1],done:1,dr:[0,1],drawn:[0,1],dropout:[0,1],dror2019deep:[0,1],dror:[0,1],dt:1,due:[0,1],dunson:[0,1],dure:[0,1],e:[0,1],e_w2:1,each:[0,1],economich:[0,1],edit:[0,1],editor:[0,1],effect:1,efron:[0,1],either:1,elabor:[0,1],electr:0,element:[0,1],emph:[0,1],empir:[0,1],endang:[0,1],english:[0,1],enough:[0,1],ensur:[0,1],entri:[0,1],enumer:[0,1],eps_min:[0,1],epsilon:1,epsilon_:[0,1],epsilon_min:[0,1],eq:1,equal:[0,1],equat:[0,1],equival:[0,1],eric:[0,1],error:[0,1],especi:[0,1],estim:[0,1],et:[0,1],etc:[0,1],eustasio:[0,1],even:[0,1],everi:[0,1],evid:[0,1],exactli:[0,1],examin:1,expect:[0,1],experi:[0,1],experiment:[0,1],express:[0,1],extent:[0,1],f:[0,1],fact:[0,1],factor:[0,1],fail:[0,1],faith:[0,1],fall:[0,1],fals:[0,1],farmer:0,feed:[0,1],feedback:[0,1],feel:[0,1],few:[0,1],field:[0,1],fill:[0,1],finish:[0,1],finit:[0,1],firenz:[0,1],first:[0,1],fit:[0,1],five:[0,1],floattensor:1,florenc:[0,1],follow:[0,1],form:1,formul:[0,1],found:[0,1],frame:1,frellsen:[0,1],frequentist:[0,1],from:[0,1],fulli:[0,1],furthermor:[0,1],g:[0,1],game:0,gather:[0,1],gavin:[0,1],gelman:[0,1],get:[0,1],get_bootstrapped_violation_ratio:1,get_quantile_funct:1,git:[0,1],github:[0,1],give:[0,1],given:[0,1],go:[0,1],goal:[0,1],goldstein:[0,1],good:[0,1],green:[0,1],groot:[0,1],group:[0,1],growth:[0,1],guarante:[0,1],guid:[0,1],ha:[0,1],hai:[0,1],hal:[0,1],half:[0,1],hand:[0,1],hao:[0,1],hard:[0,1],hardmeier:[0,1],have:[0,1],hayashi:[0,1],heavili:1,height:1,help:[0,1],henderson:[0,1],her:[0,1],here:[0,1],high:[0,1],higher:[0,1],highli:[0,1],hitchhik:[0,1],hold:[0,1],howev:[0,1],html:1,http:[0,1],hyperparamet:[0,1],hypothesi:[0,1],i:[0,1],ideal:[0,1],ii:[0,1],imag:[0,1],imagin:[0,1],img:1,implement:[0,1],improv:[0,1],inbox:0,inbox_trai:1,incollect:[0,1],increas:[0,1],inde:[0,1],indic:[0,1],infeas:[0,1],infer:[0,1],influenc:[0,1],initi:[0,1],inlin:[0,1],inproceed:[0,1],input:[0,1],instanc:[0,1],instead:[0,1],integr:1,intellig:[0,1],intens:[0,1],intern:1,interpret:[0,1],interrobang:1,introduc:[0,1],introduct:[0,1],intuit:1,invert_in_darkmod:1,involv:[0,1],issu:[0,1],istituto:[0,1],itali:[0,1],iter:[0,1],itertool:[0,1],j:[0,1],jan:[0,1],jax:1,je:[0,1],job:[0,1],joblib:[0,1],john:[0,1],journal:[0,1],juan:[0,1],juli:[0,1],jupyt:[0,1],just:[0,1],kaleidophon:[0,1],ke:[0,1],kentaro:[0,1],kind:[0,1],knob:0,know:[0,1],known:[0,1],korhonen:[0,1],l:[0,1],lai:[0,1],lambda:[0,1],landscap:[0,1],languag:[0,1],larg:[0,1],larger:[0,1],last:[0,1],lastli:[0,1],later:[0,1],latex:[0,1],lazar:[0,1],lead:[0,1],learn:[0,1],len:[0,1],length:1,less:[0,1],let:[0,1],li:[0,1],lift:[0,1],like:[0,1],line:[0,1],linear:[0,1],linguist:[0,1],list:[0,1],llu:[0,1],loc:[0,1],local:[0,1],longtensor:1,look:[0,1],loss:[0,1],low:[0,1],lower:[0,1],m2r:1,m:[0,1],m_new:[0,1],m_old:[0,1],machin:[0,1],make:[0,1],mani:[0,1],mask:[0,1],mathemat:[0,1],matr:[0,1],matrix:[0,1],matter:[0,1],maximum:[0,1],mean:[0,1],meaning:[0,1],medal:0,medal_sport:1,meet:[0,1],member:[0,1],messag:[0,1],method:[0,1],metric:[0,1],middl:1,might:[0,1],min:[0,1],min_ep:[0,1],minimum:[0,1],mistakenli:[0,1],mitig:[0,1],ml:[0,1],modif:[0,1],modifi:[0,1],modul:1,mortar:0,mortar_board:1,most:[0,1],move:[0,1],multi_aso:[0,1],multithread:[0,1],my_model_scor:[0,1],my_model_scored_samples_per_run:[0,1],my_model_scores_per_dataset:[0,1],my_models_scor:[0,1],n:[0,1],n_new:[0,1],n_old:[0,1],name:[0,1],narang:[0,1],natur:[0,1],necessarili:[0,1],neg:[0,1],nest:[0,1],net:[0,1],neurip:[0,1],nevertheless:[0,1],next:[0,1],nicol:[0,1],nlp:[0,1],nlpnorth:[0,1],non:[0,1],none:1,noreen:[0,1],normal:[0,1],notebook:[0,1],notion:[0,1],now:[0,1],np:[0,1],num_bootstrap_iter:[0,1],num_comparison:[0,1],num_job:[0,1],num_sampl:1,number:[0,1],numpi:1,object:1,observ:[0,1],obtain:[0,1],off:[0,1],often:[0,1],old:[0,1],onc:[0,1],one:[0,1],ones:1,onli:[0,1],open:[0,1],opposit:[0,1],optim:[0,1],option:[0,1],org:[0,1],origin:[0,1],otherwis:1,our:[0,1],ourselv:[0,1],out:[0,1],over:[0,1],overview:[0,1],p19:[0,1],p:[0,1],p_valu:1,packag:[0,1],page:[0,1],pair:[0,1],panda:[0,1],paramet:1,parametr:[0,1],partial:1,pass:[0,1],pd:1,peopl:0,people_holding_hand:1,per:[0,1],percentag:[0,1],perform:[0,1],permut:1,permutation_test:[0,1],person:[0,1],perspect:[0,1],peter:[0,1],pick:[0,1],pidgeonhol:[0,1],pip3:[0,1],pip:[0,1],place:[0,1],plagu:[0,1],plank:[0,1],pleas:[0,1],plug:0,point:[0,1],popul:1,portion:[0,1],posit:[0,1],possess:[0,1],possibl:[0,1],post:[0,1],power2:[0,1],power:[0,1],practic:[0,1],practition:[0,1],prefer:[0,1],prei:[0,1],preprint:[0,1],press:[0,1],previou:[0,1],print:[0,1],probabilita:[0,1],probabl:[0,1],problem:[0,1],problemat:[0,1],proceed:[0,1],process:[0,1],product:[0,1],progress:[0,1],project:[0,1],properli:[0,1],properti:[0,1],propos:[0,1],provid:[0,1],psycholog:[0,1],pubblicazioni:[0,1],publish:[0,1],pull:[0,1],purpos:[0,1],python:1,pytorch:1,quantifi:[0,1],quantil:1,quantile_func_a:1,quantile_func_b:1,quantiti:[0,1],question:[0,1],quit:[0,1],r:[0,1],randn:[0,1],random:[0,1],randomli:1,rang:[0,1],rare:[0,1],ratio:[0,1],ration:1,raw:1,re:[0,1],read:[0,1],readabl:[0,1],readm:[0,1],readme2latex:[0,1],realm:[0,1],reason:[0,1],recent:[0,1],record:[0,1],red1:[0,1],red2:[0,1],red:[0,1],reduc:[0,1],reduct:1,refer:[0,1],reichart:[0,1],reinforc:[0,1],reject:[0,1],reliabl:[0,1],render:[0,1],repeat:[0,1],replic:1,repositori:[0,1],reproduc:1,request:[0,1],requir:[0,1],resampl:1,resourc:[0,1],restrict:[0,1],retriev:[0,1],return_df:[0,1],revers:[0,1],revert:[0,1],right:[0,1],risk:[0,1],robert:[0,1],roi:[0,1],ronald:[0,1],rotem:[0,1],row:[0,1],rquez:[0,1],rubin:[0,1],s:[0,1],safe:1,same:[0,1],sample_s:1,save:1,scalar:1,scale:[0,1],schirm:[0,1],scienz:[0,1],scorecollect:1,scores1:[0,1],scores2:[0,1],scores_a:[0,1],scores_b:[0,1],second:[0,1],secondli:[0,1],section:[0,1],see:[0,1],seed:1,seem:[0,1],seen:[0,1],segev:[0,1],set:1,setup:[0,1],sever:[0,1],shapiro:[0,1],sharan:[0,1],shlomov:[0,1],shortcut:[0,1],shot:[0,1],should:[0,1],show:1,show_progress:[0,1],shown:[0,1],shuffl:[0,1],side:[0,1],sigma:1,sign:1,significance_test:[0,1],significance_threshold:1,similar:[0,1],similarli:[0,1],simplest:[0,1],simpli:[0,1],simul:[0,1],singl:[0,1],small:[0,1],smaller:[0,1],snippet:[0,1],so:[0,1],some:[0,1],someth:[0,1],sometim:[0,1],sort:1,sorted_p_valu:1,sourc:[0,1],specif:[0,1],spectacular:[0,1],speed:[0,1],spend:[0,1],spoken:[0,1],sport:0,springer:[0,1],src:1,sst:[0,1],standard:[0,1],statist:[0,1],statistica:[0,1],std:1,stem:[0,1],step:[0,1],stern:[0,1],still:[0,1],str:1,structur:[0,1],student:[0,1],studer:[0,1],studi:[0,1],success:[0,1],suffici:1,suggest:1,superior:[0,1],superscript:[0,1],suppli:[0,1],support:[0,1],sure:[0,1],surfac:[0,1],suspect:1,suspici:[0,1],svg:1,swap:1,symmetri:[0,1],system:[0,1],t:[0,1],tabl:[0,1],tail:1,take:[0,1],task:[0,1],tau:[0,1],taylor:[0,1],technic:[0,1],tediou:[0,1],templat:[0,1],tensor:[0,1],tensorflow:1,teoria:[0,1],text:[0,1],thank:[0,1],thei:[0,1],them:[0,1],theorem:1,therefor:[0,1],thi:[0,1],third:[0,1],thread:1,three:[0,1],threshold:[0,1],thu:[0,1],tibshirani:[0,1],tight:[0,1],tighter:[0,1],time:[0,1],timeit:[0,1],timestamp:[0,1],titl:[0,1],tl:[0,1],tom:[0,1],too:[0,1],torch:[0,1],total:1,tqdm:1,trade:[0,1],trai:0,transfer:[0,1],transform:[0,1],translat:[0,1],transport:[0,1],traum:[0,1],trustworthi:[0,1],tue:[0,1],turn:[0,1],type:[0,1],typic:[0,1],u:1,ulmer2022deep:[0,1],ulmer:[0,1],uncertain:[0,1],uncertainti:[0,1],under:[0,1],undergon:[0,1],understand:[0,1],uniform:[0,1],union:1,univers:[0,1],unlik:[0,1],up:[0,1],upper:[0,1],url:[0,1],usag:[0,1],use_bonferroni:[0,1],use_symmetri:1,usual:[0,1],v1:[0,1],valu:[0,1],van:[0,1],vari:[0,1],varianc:[0,1],variat:[0,1],variou:[0,1],vehtari:[0,1],veri:[0,1],verifi:1,versa:[0,1],version:[0,1],via:[0,1],vice:[0,1],view:[0,1],violat:[0,1],vision:[0,1],visual:[0,1],vol:[0,1],volum:[0,1],w:[0,1],wa:[0,1],wai:[0,1],wait:[0,1],want:[0,1],warn:[0,1],wasserstein:[0,1],we:[0,1],weight:[0,1],welch:[0,1],well:[0,1],were:[0,1],what:[0,1],when:[0,1],where:[0,1],whether:[0,1],which:[0,1],whose:[0,1],width:1,wilei:[0,1],wilk:[0,1],wise:[0,1],without:[0,1],woman:0,word:[0,1],work:[0,1],world:[0,1],wors:[0,1],would:[0,1],wrap:1,write:[0,1],x:[0,1],xavier:[0,1],xla:1,xu:[0,1],year:[0,1],yield:[0,1],york:[0,1],you:[0,1],your:[0,1],yuan:[0,1],zero:[0,1],zhang:[0,1],zheng:[0,1],zip:[0,1]},titles:["deep-significance: Easy and Better Significance Testing for Deep Neural Networks","deep-significance: Easy and Better Significance Testing for Deep Neural Networks"],titleterms:{"1":[0,1],"2":[0,1],"3":[0,1],"4":[0,1],For:0,acknowledg:[0,1],across:[0,1],almost:[0,1],aso:0,better:[0,1],bibliographi:[0,1],book:0,bookmark:0,bootstrap:0,cite:[0,1],compar:[0,1],compat:0,control_knob:0,dataset:[0,1],deep:[0,1],document:1,easi:[0,1],electric_plug:0,exampl:[0,1],featur:[0,1],game_di:0,gener:[0,1],how:[0,1],impati:0,inbox_trai:0,instal:[0,1],intermezzo:[0,1],interrobang:0,jax:0,level:[0,1],medal_sport:0,model:[0,1],more:[0,1],mortar_board:0,multi:0,multipl:[0,1],network:[0,1],neural:[0,1],newspap:0,note:[0,1],numpi:0,order:[0,1],other:[0,1],paper:[0,1],people_holding_hand:0,permut:0,pytorch:0,recommend:[0,1],replic:0,report:[0,1],result:[0,1],rocket:0,run:[0,1],sampl:[0,1],scenario:[0,1],score:[0,1],seed:0,set:0,signific:[0,1],size:[0,1],sparkl:0,stochast:[0,1],tensorflow:0,test:[0,1],than:[0,1],thread:0,two:[0,1],us:[0,1],why:[0,1],woman_farm:0}})
\ No newline at end of file
diff --git a/docs/img/logo.png b/docs/img/logo.png
new file mode 100644
index 0000000..ae32794
Binary files /dev/null and b/docs/img/logo.png differ
diff --git a/docs/img/logo_.png b/docs/img/logo_.png
new file mode 100644
index 0000000..04c7bfe
Binary files /dev/null and b/docs/img/logo_.png differ
diff --git a/paper/README.md b/paper/README.md
new file mode 100644
index 0000000..23aca53
--- /dev/null
+++ b/paper/README.md
@@ -0,0 +1,12 @@
+# deep-significance - Paper README
+
+This README quickly summarizes the usage and content of the code used in the companion paper for the package.
+The paper adds additional software requirements to those of the package, which can be installed via
+
+    pip3 install -r paper_requirements.txt
+
+Afterwards, simply run `python3 test_comparison.py` to reproduce figures 2, 4 (a) and 5. Run 
+`python3 test_aso_params.py` to reproduce figures 4 (b) + (c). In case you don't want to re-run all the scripts, all 
+results are stores in the `img/` folder, including plots and the raw data in Python's `pickle` format.
+
+The code for the demo used in section 6.1 is given in `deep-significance demo.ipynb`.
\ No newline at end of file
diff --git a/paper/deep-significance demo.ipynb b/paper/deep-significance demo.ipynb
new file mode 100644
index 0000000..54e0be3
--- /dev/null
+++ b/paper/deep-significance demo.ipynb	
@@ -0,0 +1,982 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# deep-significance Demo"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this demo, we will demonstrate some of the functionalities in the deep-significance package using the Cart Pole problem (Barto et al. 1983) as implemented in OpenAI gym. \n",
+    "\n",
+    "Since this is a demo, we will use an extremely simple approach to tackling reinforcement learning problems with neural networks, namely *Deep Q-networks* (Mnih et al., 2015). Back in 2015, Deep Q-networks where the first approach to obtain competitive scores on many Atari games. In this demo, we will specificly use the package to determine the effect of replay memory on the model. \n",
+    "\n",
+    "Deep Q-Learning tries to approximate the optimal action-value function defined as \n",
+    "\n",
+    "\\begin{equation*}\n",
+    "    Q^*(s, a) = \\max_\\pi \\mathbb{E}\\big[ r_t + \\gamma r_{t+1} + \\gamma^2 r_{t+2} + \\ldots \\big| s_t = s, a_t = a, \\pi \\big]\n",
+    "\\end{equation*}\n",
+    "\n",
+    "The definition above reads as follow: The optimal action-value function is the policy $\\pi$ that maximizes the future reward $r_t$ at a state $s_t$ by performing an action $a_t$, with subsequent rewards being increasingly discounted by a factor $\\gamma$. The model weights are updated using the following $l_2$ loss:\n",
+    "\n",
+    "\\begin{equation*}\n",
+    "    \\mathcal{L}(\\theta) = \\mathbb{E}_{(s, a, r, s^\\prime) \\sim U(\\text{Buffer})}\\bigg[\\Big(r + \\max_{a^\\prime} Q(s^\\prime, a^\\prime; \\theta^\\text{target}) - Q(s, a; \\theta)\\Big)^2\\bigg]\n",
+    "\\end{equation*}\n",
+    "\n",
+    "Two aspects of this loss function are especially noteworthy: First of all, since we do not know the true value of the $Q$-function in most cases, the predicted value $Q(s, a; \\theta)$ is compared against the reward plus outcome of the greedy action chosen by a *target* network: To avoid having to ``hit a moving target'' (Van Hasselt et al., 2018), the target network is only updated every couple of training steps by copying the main networks parameters. Secondly, the state, action and reward used to compute the loss are not the ones just observed by the model, but instead are uniformly sampled from a *replay buffer*, a sort of memory that past experiences gets added to during training.\n",
+    "\n",
+    "For that purpose, let us first define the environment along with some project requirements:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# STD\n",
+    "import random\n",
+    "\n",
+    "# EXT\n",
+    "import gym\n",
+    "import numpy as np\n",
+    "import torch\n",
+    "import torch.optim as optim\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "# Import package functions\n",
+    "# To use deepsig in your project, simply use pip install deepsig\n",
+    "import sys\n",
+    "sys.path.insert(0, \"../\")\n",
+    "from deepsig import aso, multi_aso, aso_uncertainty_reduction, bootstrap_power_analysis, bootstrap_test"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Set constants\n",
+    "SEED = 42\n",
+    "\n",
+    "# Set hyperparameters\n",
+    "NUM_EPISODES = 100\n",
+    "TARGET_UPDATE_FREQ = 10\n",
+    "MAX_STEPS = 1000\n",
+    "BATCH_SIZE = 128\n",
+    "DISCOUNT_FACTOR = 0.8\n",
+    "LEARN_RATE = 1e-3\n",
+    "NUM_HIDDEN = 256\n",
+    "MEMORY_SIZE = 10000\n",
+    "SHOW_AGENT = False  # Set this to true if you want to see the agent learning\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<torch._C.Generator at 0x1068e4570>"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "env = gym.envs.make(\"CartPole-v1\")\n",
+    "\n",
+    "# Seed for replicability\n",
+    "env.seed(SEED)\n",
+    "random.seed(SEED)\n",
+    "torch.manual_seed(SEED)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we define a super simple Deep Q-network and replay memory class:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class QNetwork(nn.Module):\n",
+    "\n",
+    "    def __init__(self, n_in, n_out, num_hidden=128):\n",
+    "        nn.Module.__init__(self)\n",
+    "        self.l1 = nn.Linear(n_in, num_hidden)\n",
+    "        self.l2 = nn.Linear(num_hidden, n_out)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        out = self.l1(x)\n",
+    "        out = F.relu(out)\n",
+    "        out = self.l2(out)\n",
+    "        return out\n",
+    "\n",
+    "\n",
+    "class ReplayMemory:\n",
+    "\n",
+    "    def __init__(self, capacity):\n",
+    "        self.capacity = capacity\n",
+    "        self.memory = []\n",
+    "\n",
+    "    def push(self, transition):\n",
+    "        if self.capacity == len(self.memory):\n",
+    "            self.memory.pop(0)\n",
+    "        self.memory.append(transition)\n",
+    "\n",
+    "    def sample(self, batch_size):\n",
+    "        return random.sample(self.memory, batch_size)\n",
+    "\n",
+    "    def __len__(self):\n",
+    "        return len(self.memory)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we implement some utility functions:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def select_action(model, state, epsilon):\n",
+    "    with torch.no_grad():\n",
+    "        action = model(torch.Tensor(state))\n",
+    "        return torch.argmax(action).item() if random.random() > epsilon else random.choice([0,1])\n",
+    "\n",
+    "def get_epsilon(it):\n",
+    "    return 0.05 if it >= 1000 else - 0.00095 * it + 1\n",
+    "    \n",
+    "def compute_target(model, reward, next_state, done, discount_factor, target_net):\n",
+    "\n",
+    "    targets = reward + (target_net(next_state).max(1)[0] * discount_factor) * (1 - done.float())\n",
+    "    \n",
+    "    return targets.unsqueeze(1)\n",
+    "\n",
+    "def compute_q_val(model, state, action):\n",
+    "    q_val = model(state)\n",
+    "    q_val = q_val.gather(1, action.unsqueeze(1).view(-1, 1))\n",
+    "    return q_val\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def train(model, memory, optimizer, batch_size, discount_factor, target_net):\n",
+    "    # don't learn without some decent experience\n",
+    "    if len(memory) < batch_size:\n",
+    "        return None\n",
+    "\n",
+    "    # random transition batch is taken from experience replay memory\n",
+    "    transitions = memory.sample(batch_size)\n",
+    "\n",
+    "    # transition is a list of 4-tuples, instead we want 4 vectors (as torch.Tensor's)\n",
+    "    state, action, reward, next_state, done = zip(*transitions)\n",
+    "\n",
+    "    # convert to PyTorch and define types\n",
+    "    state = torch.tensor(state, dtype=torch.float)\n",
+    "    action = torch.tensor(action, dtype=torch.int64)  # Need 64 bit to use them as index\n",
+    "    next_state = torch.tensor(next_state, dtype=torch.float)\n",
+    "    reward = torch.tensor(reward, dtype=torch.float)\n",
+    "    done = torch.tensor(done, dtype=torch.uint8)  # Boolean\n",
+    "    action = action.squeeze()\n",
+    "\n",
+    "    # compute the q value\n",
+    "    q_val = compute_q_val(model, state, action)\n",
+    "\n",
+    "    with torch.no_grad():  # Don't compute gradient info for the target (semi-gradient)\n",
+    "        target = compute_target(model, reward, next_state, done, discount_factor, target_net)\n",
+    "\n",
+    "    # loss is measured from error between current and newly expected Q values\n",
+    "    loss = F.smooth_l1_loss(q_val, target)\n",
+    "\n",
+    "    # backpropagation of loss to Neural Network (PyTorch magic)\n",
+    "    optimizer.zero_grad()\n",
+    "    loss.backward()\n",
+    "    optimizer.step()\n",
+    "\n",
+    "    return loss.item()  # Returns a Python scalar, and releases history (similar to .detach())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def run_episodes(train, model, memory, env, num_episodes, batch_size, discount_factor, learn_rate, target_net,\n",
+    "                 target_update_freq, max_steps, show_agent):\n",
+    "    optimizer = optim.Adam(model.parameters(), learn_rate)\n",
+    "    global_steps = 0  # Count the steps (do not reset at episode start, to compute epsilon)\n",
+    "    episode_rewards = []\n",
+    "\n",
+    "    for i in range(num_episodes):\n",
+    "        steps = 0\n",
+    "        state = env.reset()\n",
+    "        cum_reward = 0\n",
+    "        done = False\n",
+    "\n",
+    "        while not done:\n",
+    "            \n",
+    "            if show_agent:\n",
+    "                env.render()\n",
+    "            \n",
+    "            steps += 1\n",
+    "\n",
+    "            eps = get_epsilon(global_steps)\n",
+    "\n",
+    "            action = select_action(model, state, eps)\n",
+    "\n",
+    "            if steps % target_update_freq == 0:\n",
+    "                target_net.load_state_dict(model.state_dict())\n",
+    "\n",
+    "            train(model, memory, optimizer, batch_size, discount_factor, target_net)\n",
+    "            next_state, reward, done, _ = env.step(action)\n",
+    "            cum_reward += reward\n",
+    "\n",
+    "            memory.push((state, action, reward, next_state, done))\n",
+    "            state = next_state\n",
+    "\n",
+    "            if steps >= max_steps:\n",
+    "                done = True\n",
+    "\n",
+    "        global_steps += steps\n",
+    "        episode_rewards.append(cum_reward)\n",
+    "    \n",
+    "    if show_agent:\n",
+    "        env.close()\n",
+    "\n",
+    "    return episode_rewards\n",
+    "\n",
+    "\n",
+    "def run_dqn(env, num_episodes, memory_size, num_hidden, batch_size, discount_factor, learn_rate, target_update_freq,\n",
+    "            max_steps, show_agent):\n",
+    "    memory = ReplayMemory(memory_size)\n",
+    "    n_out = env.action_space.n\n",
+    "\n",
+    "    n_in = len(env.observation_space.low)\n",
+    "    model = QNetwork(n_in, n_out, num_hidden)\n",
+    "    target_net = QNetwork(n_in, n_out, num_hidden)\n",
+    "\n",
+    "    cum_reward = run_episodes(\n",
+    "        train=train, model=model, memory=memory, env=env, num_episodes=num_episodes, batch_size=batch_size,\n",
+    "        discount_factor=discount_factor, learn_rate=learn_rate, target_net=target_net,\n",
+    "        target_update_freq=target_update_freq, max_steps=max_steps, show_agent=show_agent\n",
+    "    )\n",
+    "    return cum_reward"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "With the main code ready, we would now like to perform some experiments. Namely, we would like to find out what kind of effect the number of steps to update the target network has on the cumulative rewards. A first way to do this is to run one agent for two different setting (namely 10 and 20) and compare the distributions over rewards obtained during training:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[244.0, 175.0, 343.0, 244.0, 371.0, 230.0, 251.0, 284.0, 268.0, 267.0, 231.0, 60.0, 173.0, 303.0, 310.0, 224.0, 500.0, 465.0, 249.0, 308.0]\n",
+      "[194.0, 250.0, 213.0, 162.0, 203.0, 337.0, 76.0, 170.0, 208.0, 207.0, 290.0, 212.0, 232.0, 217.0, 175.0, 207.0, 179.0, 209.0, 212.0, 189.0]\n"
+     ]
+    }
+   ],
+   "source": [
+    "rewards_freq_10 = run_dqn(\n",
+    "    env, \n",
+    "    batch_size=BATCH_SIZE,\n",
+    "    num_episodes=NUM_EPISODES, \n",
+    "    memory_size=MEMORY_SIZE, \n",
+    "    num_hidden=NUM_HIDDEN, \n",
+    "    discount_factor=DISCOUNT_FACTOR, \n",
+    "    learn_rate=LEARN_RATE, \n",
+    "    target_update_freq=10, \n",
+    "    max_steps=MAX_STEPS,\n",
+    "    show_agent=SHOW_AGENT\n",
+    ")\n",
+    "\n",
+    "rewards_freq_20 = run_dqn(\n",
+    "    env, \n",
+    "    batch_size=BATCH_SIZE,\n",
+    "    num_episodes=NUM_EPISODES, \n",
+    "    memory_size=MEMORY_SIZE, \n",
+    "    num_hidden=NUM_HIDDEN, \n",
+    "    discount_factor=DISCOUNT_FACTOR, \n",
+    "    learn_rate=LEARN_RATE, \n",
+    "    target_update_freq=10, \n",
+    "    max_steps=MAX_STEPS,\n",
+    "    show_agent=SHOW_AGENT\n",
+    ")\n",
+    "\n",
+    "# Print the last 20 rewards for both\n",
+    "print(rewards_freq_10[-20:])\n",
+    "print(rewards_freq_20[-20:])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This looks relatively similar, so which approach was more successful? We can try to answer this question using the Almost Stochastic Order test (ASO). Roughly, it works by comparing the overlap of the two empricial cumulative distribution functions of scores and checking for their overlap - if one approach approach is yielding consistently higher rewards compared to the other one, they should not overlap (and the test score should be close to 0)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Bootstrap iterations: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 999/1000 [00:05<00:00, 187.51it/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "0.1315818429367487"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "aso(rewards_freq_10, rewards_freq_20, num_jobs=4, seed=SEED)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Since the test scores is quite low, this gives us an indication that waiting 20 steps to update the target network might be beneficial! Nevertheless, this comes with a caveat - we only checked one model run per value, and neural networks are infamous for being sensitive to their random initialization. Therefore, instead of comparing the reward distributions of two single models per run, let us compare the **distribution over average rewards over multiple runs**. We start by doing 5 runs each."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Performing run #1...\n",
+      "Performing run #2...\n",
+      "Performing run #3...\n",
+      "Performing run #4...\n",
+      "Performing run #5...\n",
+      "[152.92, 107.4, 135.69, 138.38, 170.26]\n",
+      "[151.41, 129.96, 146.2, 96.06, 159.39]\n"
+     ]
+    }
+   ],
+   "source": [
+    "reward_dist_freq_10, reward_dist_freq_20 = [], []\n",
+    "\n",
+    "for i in range(5):\n",
+    "    print(f\"Performing run #{i+1}...\")\n",
+    "    reward_dist_freq_10.append(\n",
+    "        np.mean(run_dqn(\n",
+    "            env, \n",
+    "            batch_size=BATCH_SIZE,\n",
+    "            num_episodes=NUM_EPISODES, \n",
+    "            memory_size=MEMORY_SIZE, \n",
+    "            num_hidden=NUM_HIDDEN, \n",
+    "            discount_factor=DISCOUNT_FACTOR, \n",
+    "            learn_rate=LEARN_RATE, \n",
+    "            target_update_freq=10, \n",
+    "            max_steps=MAX_STEPS,\n",
+    "            show_agent=SHOW_AGENT\n",
+    "        ))\n",
+    "    )\n",
+    "    reward_dist_freq_20.append(\n",
+    "        np.mean(run_dqn(\n",
+    "            env, \n",
+    "            batch_size=BATCH_SIZE,\n",
+    "            num_episodes=NUM_EPISODES, \n",
+    "            memory_size=MEMORY_SIZE, \n",
+    "            num_hidden=NUM_HIDDEN, \n",
+    "            discount_factor=DISCOUNT_FACTOR, \n",
+    "            learn_rate=LEARN_RATE, \n",
+    "            target_update_freq=10, \n",
+    "            max_steps=MAX_STEPS,\n",
+    "            show_agent=SHOW_AGENT\n",
+    "        ))\n",
+    "    )\n",
+    "    \n",
+    "print(reward_dist_freq_10)\n",
+    "print(reward_dist_freq_20)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It can sometimes be a tricky question to decide whether one has collected enough scores to allow for meaningful comparisons, especially when this question has to be balanced against the cost of compute. When the variance in our scores is too high, we might be faced with misleading results, if it is sufficient, we run more models for no apparent reason. For this purpose, deepsig implements two different functions.\n",
+    "\n",
+    "First, we will take a look at bootstrap power analysis: It increases all scores in the sample by a certain factor, and then use bootstrapped versions of both samples and perform a significance test. Since the modified, new sample received a lift, the result should come out significant in most cases. If not, this is an indication that the original sample contains too much variance. Let's check that for our scores:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 3576.79it/s]\n",
+      " 15%|█████████████████████████████████▌                                                                                                                                                                                                    | 730/5000 [00:00<00:01, 3617.93it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.5884\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 3609.43it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.5024\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(bootstrap_power_analysis(reward_dist_freq_10, seed=SEED))\n",
+    "print(bootstrap_power_analysis(reward_dist_freq_20, seed=SEED))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "These scores have a direct statistical interpretation, since they signify the *statistical power*. The higher the statistical power, the lower the probability of a Type II error or false negative, i.e. not rejecting the null hypothesis when it should be! A common rule of thumb is to thrive for a power of ~0.8, therefore we might want to collect more samples here. For instance, we could decide to collect 10 or 15 samples in total. \n",
+    "\n",
+    "Another, ASO-specific way to help us make that decision is the following function:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.4142135623730951\n",
+      "1.7320508075688772\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(aso_uncertainty_reduction(m_old=5, n_old=5, m_new=10, n_new=10))\n",
+    "print(aso_uncertainty_reduction(m_old=5, n_old=5, m_new=15, n_new=15))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Since ASO only computes the \"true\" test score value in the limit of infinitely large samples, the estimate obtained using bootstrapping has some inherent variance, which can be reduced by adding more scores to the sample. The function above compute by what factor the uncertainty in the test result is being reduced. \n",
+    "\n",
+    "We can thus read the above as adding five more samples reducing the uncertainty by a factor of 1.41, while adding ten more sample only reduces it by 1.73. To strike a compromise with our computational budget, we thus only add five more samples each."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Performing run #6...\n",
+      "Performing run #7...\n",
+      "Performing run #8...\n",
+      "Performing run #9...\n",
+      "Performing run #10...\n"
+     ]
+    }
+   ],
+   "source": [
+    "for i in range(5):\n",
+    "    print(f\"Performing run #{i+6}...\")\n",
+    "    reward_dist_freq_10.append(\n",
+    "        np.mean(run_dqn(\n",
+    "            env, \n",
+    "            batch_size=BATCH_SIZE,\n",
+    "            num_episodes=NUM_EPISODES, \n",
+    "            memory_size=MEMORY_SIZE, \n",
+    "            num_hidden=NUM_HIDDEN, \n",
+    "            discount_factor=DISCOUNT_FACTOR, \n",
+    "            learn_rate=LEARN_RATE, \n",
+    "            target_update_freq=10, \n",
+    "            max_steps=MAX_STEPS,\n",
+    "            show_agent=SHOW_AGENT\n",
+    "        ))\n",
+    "    )\n",
+    "    reward_dist_freq_20.append(\n",
+    "        np.mean(run_dqn(\n",
+    "            env, \n",
+    "            batch_size=BATCH_SIZE,\n",
+    "            num_episodes=NUM_EPISODES, \n",
+    "            memory_size=MEMORY_SIZE, \n",
+    "            num_hidden=NUM_HIDDEN, \n",
+    "            discount_factor=DISCOUNT_FACTOR, \n",
+    "            learn_rate=LEARN_RATE, \n",
+    "            target_update_freq=10, \n",
+    "            max_steps=MAX_STEPS,\n",
+    "            show_agent=SHOW_AGENT\n",
+    "        ))\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As a sanity check, we repeat the bootstrap analysis again:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 3620.63it/s]\n",
+      " 14%|████████████████████████████████▌                                                                                                                                                                                                     | 708/5000 [00:00<00:01, 3535.33it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.8036\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 3572.68it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.9084\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(bootstrap_power_analysis(reward_dist_freq_10, seed=SEED))\n",
+    "print(bootstrap_power_analysis(reward_dist_freq_20, seed=SEED))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The power has increased! We now come back to the comparison:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Bootstrap iterations: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 999/1000 [00:06<00:00, 165.24it/s]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.1315818429367487\n",
+      "0.005\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(aso(rewards_freq_10, rewards_freq_20, num_jobs=4, seed=SEED))\n",
+    "print(bootstrap_test(rewards_freq_10, rewards_freq_20, num_jobs=4, seed=SEED))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we can be fairly confident in our assessment! The last part of this demo wants to demonstrate how we could facilitate comparisons between multiple models at once, for which the package also implements a specific function. Let us first train a third kind of model for a number of runs. This time, we do not vary the update frequency of the target network, but instead the discount factor. Not that there is no specific reason we test eight runs here other than two demonstrate that ASO does not require equally-sized samples:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Performing run #1...\n",
+      "Performing run #2...\n",
+      "Performing run #3...\n",
+      "Performing run #4...\n",
+      "Performing run #5...\n",
+      "Performing run #6...\n",
+      "Performing run #7...\n",
+      "Performing run #8...\n"
+     ]
+    }
+   ],
+   "source": [
+    "reward_dist_discount_06 = []\n",
+    "\n",
+    "for i in range(8):\n",
+    "    print(f\"Performing run #{i+1}...\")\n",
+    "    reward_dist_discount_06.append(\n",
+    "        np.mean(run_dqn(\n",
+    "            env, \n",
+    "            batch_size=BATCH_SIZE,\n",
+    "            num_episodes=NUM_EPISODES, \n",
+    "            memory_size=MEMORY_SIZE, \n",
+    "            num_hidden=NUM_HIDDEN, \n",
+    "            discount_factor=0.6, \n",
+    "            learn_rate=LEARN_RATE, \n",
+    "            target_update_freq=10, \n",
+    "            max_steps=MAX_STEPS,\n",
+    "            show_agent=SHOW_AGENT\n",
+    "        ))\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Model comparisons:  98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍   | 2950/3000 [00:07<00:00, 387.04it/s]"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "array([[1.        , 1.        , 0.25973275],\n",
+       "       [1.        , 1.        , 0.1830454 ],\n",
+       "       [1.        , 1.        , 1.        ]])"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "multi_aso([reward_dist_freq_10, reward_dist_freq_20, reward_dist_discount_06], num_jobs=4, seed=SEED)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can read this result as the violation ratio of <row> compared  to <column> is value. Note that by suppling a dictionary as an argument and using `return_df=True`, we can output the result in a more readable form:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Model comparisons:   0%|                                                                                                                                                                                                                               | 0/3000 [00:00<?, ?it/s]\u001b[A\n",
+      "Model comparisons:   2%|███▉                                                                                                                                                                                                                 | 56/3000 [00:00<00:07, 409.05it/s]\u001b[A\n",
+      "Model comparisons:   4%|████████▍                                                                                                                                                                                                           | 120/3000 [00:00<00:06, 451.83it/s]\u001b[A\n",
+      "Model comparisons:   8%|█████████████████▌                                                                                                                                                                                                  | 248/3000 [00:00<00:05, 473.94it/s]\u001b[A\n",
+      "Model comparisons:  10%|██████████████████████                                                                                                                                                                                              | 312/3000 [00:00<00:06, 430.77it/s]\u001b[A\n",
+      "Model comparisons:  13%|██████████████████████████▌                                                                                                                                                                                         | 376/3000 [00:00<00:06, 406.99it/s]\u001b[A\n",
+      "Model comparisons:  15%|███████████████████████████████                                                                                                                                                                                     | 440/3000 [00:01<00:06, 396.16it/s]\u001b[A\n",
+      "Model comparisons:  17%|███████████████████████████████████▌                                                                                                                                                                                | 504/3000 [00:01<00:06, 384.48it/s]\u001b[A\n",
+      "Model comparisons:  19%|████████████████████████████████████████▏                                                                                                                                                                           | 568/3000 [00:01<00:06, 372.95it/s]\u001b[A\n",
+      "Model comparisons:  21%|████████████████████████████████████████████▋                                                                                                                                                                       | 632/3000 [00:01<00:06, 369.37it/s]\u001b[A\n",
+      "Model comparisons:  23%|█████████████████████████████████████████████████▏                                                                                                                                                                  | 696/3000 [00:01<00:06, 367.59it/s]\u001b[A\n",
+      "Model comparisons:  25%|█████████████████████████████████████████████████████▋                                                                                                                                                              | 760/3000 [00:01<00:06, 366.36it/s]\u001b[A\n",
+      "Model comparisons:  27%|██████████████████████████████████████████████████████████▏                                                                                                                                                         | 824/3000 [00:02<00:05, 364.29it/s]\u001b[A\n",
+      "Model comparisons:  30%|██████████████████████████████████████████████████████████████▊                                                                                                                                                     | 888/3000 [00:02<00:05, 361.05it/s]\u001b[A\n",
+      "Model comparisons:  32%|███████████████████████████████████████████████████████████████████▎                                                                                                                                                | 952/3000 [00:02<00:05, 357.11it/s]\u001b[A\n",
+      "Model comparisons:  33%|██████████████████████████████████████████████████████████████████████▎                                                                                                                                            | 1000/3000 [00:02<00:08, 244.74it/s]\u001b[A\n",
+      "Model comparisons:  35%|██████████████████████████████████████████████████████████████████████████▏                                                                                                                                        | 1055/3000 [00:03<00:06, 281.31it/s]\u001b[A\n",
+      "Model comparisons:  37%|██████████████████████████████████████████████████████████████████████████████▋                                                                                                                                    | 1119/3000 [00:03<00:05, 321.26it/s]\u001b[A\n",
+      "Model comparisons:  42%|███████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                           | 1247/3000 [00:03<00:04, 372.75it/s]\u001b[A\n",
+      "Model comparisons:  44%|████████████████████████████████████████████████████████████████████████████████████████████▏                                                                                                                      | 1311/3000 [00:03<00:04, 364.21it/s]\u001b[A\n",
+      "Model comparisons:  46%|████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                  | 1375/3000 [00:03<00:04, 355.61it/s]\u001b[A\n",
+      "Model comparisons:  48%|█████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                                                             | 1439/3000 [00:03<00:04, 354.40it/s]\u001b[A\n",
+      "Model comparisons:  50%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                         | 1503/3000 [00:04<00:04, 355.43it/s]\u001b[A\n",
+      "Model comparisons:  52%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                                                    | 1567/3000 [00:04<00:04, 351.43it/s]\u001b[A\n",
+      "Model comparisons:  54%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                | 1631/3000 [00:04<00:03, 349.53it/s]\u001b[A\n",
+      "Model comparisons:  56%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                                           | 1695/3000 [00:04<00:03, 348.90it/s]\u001b[A\n",
+      "Model comparisons:  59%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                       | 1759/3000 [00:04<00:03, 346.94it/s]\u001b[A\n",
+      "Model comparisons:  61%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                                  | 1823/3000 [00:05<00:03, 348.92it/s]\u001b[A\n",
+      "Model comparisons:  63%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                              | 1887/3000 [00:05<00:03, 345.68it/s]\u001b[A\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Model comparisons:  65%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                         | 1951/3000 [00:05<00:03, 345.35it/s]\u001b[A\n",
+      "Model comparisons:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                      | 1999/3000 [00:05<00:04, 247.28it/s]\u001b[A\n",
+      "Model comparisons:  68%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                                  | 2054/3000 [00:05<00:03, 283.82it/s]\u001b[A\n",
+      "Model comparisons:  71%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                              | 2118/3000 [00:06<00:02, 324.40it/s]\u001b[A\n",
+      "Model comparisons:  75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                     | 2246/3000 [00:06<00:02, 376.96it/s]\u001b[A\n",
+      "Model comparisons:  77%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                | 2310/3000 [00:06<00:01, 370.86it/s]\u001b[A\n",
+      "Model comparisons:  79%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                            | 2374/3000 [00:06<00:01, 367.02it/s]\u001b[A\n",
+      "Model comparisons:  81%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                       | 2438/3000 [00:06<00:01, 364.38it/s]\u001b[A\n",
+      "Model comparisons:  83%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                   | 2502/3000 [00:07<00:01, 363.95it/s]\u001b[A\n",
+      "Model comparisons:  86%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                              | 2566/3000 [00:07<00:01, 362.10it/s]\u001b[A\n",
+      "Model comparisons:  88%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                          | 2630/3000 [00:07<00:01, 357.99it/s]\u001b[A\n",
+      "Model comparisons:  90%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                     | 2694/3000 [00:07<00:00, 353.92it/s]\u001b[A\n",
+      "Model comparisons:  92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                 | 2758/3000 [00:07<00:00, 351.88it/s]\u001b[A\n",
+      "Model comparisons:  94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍            | 2822/3000 [00:08<00:00, 346.18it/s]\u001b[A\n",
+      "Model comparisons:  96%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉        | 2886/3000 [00:08<00:00, 344.53it/s]\u001b[A\n",
+      "Model comparisons:  98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍   | 2950/3000 [00:08<00:00, 343.88it/s]\u001b[A"
+     ]
+    }
+   ],
+   "source": [
+    "res_df = multi_aso(\n",
+    "    {\n",
+    "        \"update freq = 10\": reward_dist_freq_10, \n",
+    "        \"update freq = 20\": reward_dist_freq_20, \n",
+    "        \"discount factor = 0.6\": reward_dist_discount_06\n",
+    "    },\n",
+    "    num_jobs=4, seed=SEED, return_df=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>update freq = 10</th>\n",
+       "      <th>update freq = 20</th>\n",
+       "      <th>discount factor = 0.6</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>update freq = 10</th>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>0.259733</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>update freq = 20</th>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>0.183045</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>discount factor = 0.6</th>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>1.000000</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                       update freq = 10  update freq = 20  \\\n",
+       "update freq = 10                    1.0               1.0   \n",
+       "update freq = 20                    1.0               1.0   \n",
+       "discount factor = 0.6               1.0               1.0   \n",
+       "\n",
+       "                       discount factor = 0.6  \n",
+       "update freq = 10                    0.259733  \n",
+       "update freq = 20                    0.183045  \n",
+       "discount factor = 0.6               1.000000  "
+      ]
+     },
+     "execution_count": 21,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\r",
+      "Model comparisons: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 2997/3000 [00:18<00:00, 387.04it/s]"
+     ]
+    }
+   ],
+   "source": [
+    "res_df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Thus, we can conclude here that lowering the discount factor actually seems to have a negative impact on the obtained rewards. \n",
+    "\n",
+    "First of all, thank for following this demo so far! Before letting you play with the different functions yourself, here a few disclaimers:\n",
+    "\n",
+    "1. This demo didn't try to put forth a realistic experimental pipeline in Reinforcement learning - the cart pole problem is just a cute problem for demonstration purposes.\n",
+    "2. The use of significance threshold is very controversial, and ASO is no exception - instead of marking your results as significant / non-significant, report the output of the scores along with your effect size.\n",
+    "3. Significance tests aren't perfect and come with a certain degree of uncertainty, and ASO is no exception\n",
+    "    \n",
+    "For more information on the functions, check out the documentation under https://deep-significance.readthedocs.io/en/latest/ or leave an issue on the Github repository https://github.com/Kaleidophon/deep-significance.\n",
+    "\n",
+    "\n",
+    "### Bibliography\n",
+    "\n",
+    "* Andrew G Barto, Richard S Sutton, and Charles W Anderson. Neuronlike adaptive elements that cansolve difficult learning control problems.IEEE transactions on systems, man, and cybernetics, (5):834–846, 1983.\n",
+    "\n",
+    "* Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare,Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level controlthrough deep reinforcement learning.nature, 518(7540):529–533, 2015\n",
+    "\n",
+    "* Hado Van Hasselt,  Yotam Doron,  Florian Strub,  Matteo Hessel,  Nicolas Sonnerat,  and JosephModayil.  Deep reinforcement learning and the deadly triad.arXiv preprint arXiv:1812.02648,2018.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/paper/img/.DS_Store b/paper/img/.DS_Store
new file mode 100644
index 0000000..611a02c
Binary files /dev/null and b/paper/img/.DS_Store differ
diff --git a/paper/img/laplace/type1_dists.png b/paper/img/laplace/type1_dists.png
new file mode 100644
index 0000000..bf3dd02
Binary files /dev/null and b/paper/img/laplace/type1_dists.png differ
diff --git a/paper/img/laplace/type1_rates.pkl b/paper/img/laplace/type1_rates.pkl
new file mode 100644
index 0000000..1a659e9
Binary files /dev/null and b/paper/img/laplace/type1_rates.pkl differ
diff --git a/paper/img/laplace/type1_rates.png b/paper/img/laplace/type1_rates.png
new file mode 100644
index 0000000..79248a2
Binary files /dev/null and b/paper/img/laplace/type1_rates.png differ
diff --git a/paper/img/normal/.DS_Store b/paper/img/normal/.DS_Store
new file mode 100644
index 0000000..aae85cb
Binary files /dev/null and b/paper/img/normal/.DS_Store differ
diff --git a/paper/img/normal/type1_dists.png b/paper/img/normal/type1_dists.png
new file mode 100644
index 0000000..888105a
Binary files /dev/null and b/paper/img/normal/type1_dists.png differ
diff --git a/paper/img/normal/type1_rates.pkl b/paper/img/normal/type1_rates.pkl
new file mode 100644
index 0000000..e08d72a
Binary files /dev/null and b/paper/img/normal/type1_rates.pkl differ
diff --git a/paper/img/normal/type1_rates.png b/paper/img/normal/type1_rates.png
new file mode 100644
index 0000000..e36cd38
Binary files /dev/null and b/paper/img/normal/type1_rates.png differ
diff --git a/paper/img/normal/type2_mean_rates.pkl b/paper/img/normal/type2_mean_rates.pkl
new file mode 100644
index 0000000..bc1718b
Binary files /dev/null and b/paper/img/normal/type2_mean_rates.pkl differ
diff --git a/paper/img/normal/type2_mean_rates.png b/paper/img/normal/type2_mean_rates.png
new file mode 100644
index 0000000..539581a
Binary files /dev/null and b/paper/img/normal/type2_mean_rates.png differ
diff --git a/paper/img/normal/type2_rates.pkl b/paper/img/normal/type2_rates.pkl
new file mode 100644
index 0000000..2235101
Binary files /dev/null and b/paper/img/normal/type2_rates.pkl differ
diff --git a/paper/img/normal/type2_rates.png b/paper/img/normal/type2_rates.png
new file mode 100644
index 0000000..f9d7d3f
Binary files /dev/null and b/paper/img/normal/type2_rates.png differ
diff --git a/paper/img/rayleigh/type1_dists.png b/paper/img/rayleigh/type1_dists.png
new file mode 100644
index 0000000..1d574b9
Binary files /dev/null and b/paper/img/rayleigh/type1_dists.png differ
diff --git a/paper/img/rayleigh/type1_rates.pkl b/paper/img/rayleigh/type1_rates.pkl
new file mode 100644
index 0000000..9927a40
Binary files /dev/null and b/paper/img/rayleigh/type1_rates.pkl differ
diff --git a/paper/img/rayleigh/type1_rates.png b/paper/img/rayleigh/type1_rates.png
new file mode 100644
index 0000000..6559349
Binary files /dev/null and b/paper/img/rayleigh/type1_rates.png differ
diff --git a/paper/img/type1_bootstrap_rates.pkl b/paper/img/type1_bootstrap_rates.pkl
new file mode 100644
index 0000000..fa9017e
Binary files /dev/null and b/paper/img/type1_bootstrap_rates.pkl differ
diff --git a/paper/img/type1_bootstrap_rates.png b/paper/img/type1_bootstrap_rates.png
new file mode 100644
index 0000000..72dc907
Binary files /dev/null and b/paper/img/type1_bootstrap_rates.png differ
diff --git a/paper/img/type1_confidence_rates.pkl b/paper/img/type1_confidence_rates.pkl
new file mode 100644
index 0000000..41d17ef
Binary files /dev/null and b/paper/img/type1_confidence_rates.pkl differ
diff --git a/paper/img/type1_confidence_rates.png b/paper/img/type1_confidence_rates.png
new file mode 100644
index 0000000..56948aa
Binary files /dev/null and b/paper/img/type1_confidence_rates.png differ
diff --git a/paper/paper_requirements.txt b/paper/paper_requirements.txt
new file mode 100644
index 0000000..b54cddd
--- /dev/null
+++ b/paper/paper_requirements.txt
@@ -0,0 +1,6 @@
+matplotlib==3.4.3
+notebook==6.4.5
+gym==0.22.0
+torch==1.8.0
+numpy==1.19.5
+pygame==2.1.2
\ No newline at end of file
diff --git a/paper/plot_distributions.py b/paper/plot_distributions.py
new file mode 100644
index 0000000..01c0a6b
--- /dev/null
+++ b/paper/plot_distributions.py
@@ -0,0 +1,66 @@
+"""
+Illustrate the different types of distributions used for significance test comparisons.
+"""
+
+# EXT
+import matplotlib.pyplot as plt
+import numpy as np
+from scipy.stats import laplace, norm, rayleigh
+
+
+if __name__ == "__main__":
+    x = np.linspace(-3, 6, 9000)
+
+    plt.figure(figsize=(8, 6))
+    plt.rcParams.update(
+        {"font.size": 18, "text.usetex": True, "legend.loc": "upper right"}
+    )
+
+    ax = plt.gca()
+    # ax.set_ylim(0, 1)
+    ax.yaxis.grid()
+    ax.xaxis.grid()
+
+    # Plot normal
+    plt.plot(
+        x,
+        norm.pdf(x, loc=0.5, scale=1.5),
+        label="Normal",
+        alpha=0.8,
+        color="tab:blue",
+        linewidth=1.4,
+    )
+
+    # Plot normal mixture
+    plt.plot(
+        x,
+        norm.pdf(x, loc=-0.5, scale=0.25) * 0.3 + norm.pdf(x, loc=2.5, scale=1.0) * 0.7,
+        label="Normal Mixture",
+        alpha=0.8,
+        color="tab:orange",
+        linewidth=1.4,
+    )
+
+    # Plot Laplace
+    plt.plot(
+        x,
+        laplace.pdf(x, loc=0.5, scale=1.5),
+        label="Laplace",
+        alpha=0.8,
+        color="tab:green",
+        linewidth=1.4,
+    )
+
+    # Plot Rayleigh
+    plt.plot(
+        x,
+        rayleigh.pdf(x, scale=2),
+        label="Rayleigh",
+        alpha=0.8,
+        color="tab:red",
+        linewidth=1.4,
+    )
+
+    plt.legend()
+    plt.tight_layout()
+    plt.savefig("img/distributions.png")
diff --git a/paper/test_aso_params.py b/paper/test_aso_params.py
new file mode 100644
index 0000000..c814d0b
--- /dev/null
+++ b/paper/test_aso_params.py
@@ -0,0 +1,198 @@
+"""
+Test the Type I error rate of ASO as a function of the confidence score and bootstrap iterations.
+"""
+
+# STD
+import pickle
+from typing import Optional, Tuple, List
+
+# EXT
+import matplotlib.pyplot as plt
+from deepsig import aso
+import numpy as np
+from tqdm import tqdm
+
+# CONST
+COLOR_MARKER = ("darkred", "*")
+SAVE_DIR = "./img"
+NUM_SIMULATIONS = 250
+BOOTSTRAP_ITERS = [250, 500, 750, 1000]
+CONFIDENCE_LEVELS = [0.1, 0.05, 0.01, 0.005]
+
+
+def test_type1_error_bootstrap(
+    bootstrap_iters: List[int],
+    num_simulations: int = 200,
+    sample_size: int = 5,
+    loc: float = 0,
+    scale: float = 1.5,
+    color_and_marker: Optional[Tuple[str, str]] = None,
+    save_dir: Optional[str] = None,
+):
+    simulation_results = {iters: [] for iters in bootstrap_iters}
+
+    with tqdm(total=len(bootstrap_iters) * num_simulations) as progress_bar:
+        for iters in bootstrap_iters:
+            for _ in range(num_simulations):
+
+                # Sample scores for this round
+                scores_a = np.random.normal(loc=loc, scale=scale, size=sample_size)
+                scores_b = np.random.normal(loc=loc, scale=scale, size=sample_size)
+
+                simulation_results[iters].append(
+                    aso(
+                        scores_a,
+                        scores_b,
+                        show_progress=False,
+                        num_jobs=4,
+                        num_bootstrap_iterations=iters,
+                    )
+                )
+                progress_bar.update(1)
+
+    with open(f"{save_dir}/type1_bootstrap_rates.pkl", "wb") as out_file:
+        pickle.dump(simulation_results, out_file)
+
+    # Plot Type I error rates as line plot
+    plt.figure(figsize=(8, 6))
+    plt.rcParams.update(
+        {"font.size": 20, "text.usetex": True, "legend.loc": "upper right"}
+    )
+
+    color, marker, marker_size = None, None, None
+
+    if color_and_marker is not None:
+        color, marker = color_and_marker
+        marker_size = 16
+
+    box_plot = plt.boxplot(
+        [simulation_results[iters] for iters in bootstrap_iters],
+        sym=marker,
+        widths=0.65,
+        positions=np.arange(0, len(bootstrap_iters)),
+        flierprops={
+            "marker": marker,
+            "markerfacecolor": color,
+            "markersize": marker_size,
+        },
+    )
+
+    if color is not None:
+        plt.setp(box_plot["boxes"], color=color)
+        plt.setp(box_plot["whiskers"], color=color)
+        plt.setp(box_plot["caps"], color=color)
+        plt.setp(box_plot["medians"], color=color)
+
+    ax = plt.gca()
+    # ax.set_ylim(0, 1)
+    ax.yaxis.grid()
+    plt.xticks(
+        np.arange(0, len(bootstrap_iters)), [str(iters) for iters in bootstrap_iters]
+    )
+    plt.xlabel("Num. Bootstrap Iterations")
+    plt.ylabel(r"$\varepsilon_\mathrm{min}$")
+
+    if save_dir is not None:
+        plt.tight_layout()
+        plt.savefig(f"{save_dir}/type1_bootstrap_rates.png")
+    else:
+        plt.show()
+
+    plt.close()
+
+
+def test_type1_error_confidence(
+    confidence_levels: List[float],
+    num_simulations: int = 200,
+    sample_size: int = 5,
+    loc: float = 0,
+    scale: float = 1.5,
+    color_and_marker: Optional[Tuple[str, str]] = None,
+    save_dir: Optional[str] = None,
+):
+    simulation_results = {level: [] for level in confidence_levels}
+
+    with tqdm(total=len(confidence_levels) * num_simulations) as progress_bar:
+        for level in confidence_levels:
+            for _ in range(num_simulations):
+                # Sample scores for this round
+                scores_a = np.random.normal(loc=loc, scale=scale, size=sample_size)
+                scores_b = np.random.normal(loc=loc, scale=scale, size=sample_size)
+
+                simulation_results[level].append(
+                    aso(
+                        scores_a,
+                        scores_b,
+                        show_progress=False,
+                        num_jobs=4,
+                        confidence_level=level,
+                    )
+                )
+                progress_bar.update(1)
+
+    with open(f"{save_dir}/type1_confidence_rates.pkl", "wb") as out_file:
+        pickle.dump(simulation_results, out_file)
+
+    # Plot Type I error rates as line plot
+    plt.figure(figsize=(8, 6))
+    plt.rcParams.update(
+        {"font.size": 20, "text.usetex": True, "legend.loc": "upper right"}
+    )
+
+    color, marker, marker_size = None, None, None
+
+    if color_and_marker is not None:
+        color, marker = color_and_marker
+        marker_size = 16
+
+    box_plot = plt.boxplot(
+        [simulation_results[level] for level in confidence_levels],
+        positions=np.arange(0, len(confidence_levels)),
+        sym=marker,
+        widths=0.65,
+        flierprops={
+            "marker": marker,
+            "markerfacecolor": color,
+            "markersize": marker_size,
+        },
+    )
+
+    if color is not None:
+        plt.setp(box_plot["boxes"], color=color)
+        plt.setp(box_plot["whiskers"], color=color)
+        plt.setp(box_plot["caps"], color=color)
+        plt.setp(box_plot["medians"], color=color)
+
+    ax = plt.gca()
+    # ax.set_ylim(0, 1)
+    ax.yaxis.grid()
+    plt.xticks(
+        np.arange(0, len(confidence_levels)),
+        [str(level) for level in confidence_levels],
+    )
+    plt.xlabel("Confidence level")
+    plt.ylabel(r"$\varepsilon_\mathrm{min}$")
+
+    if save_dir is not None:
+        plt.tight_layout()
+        plt.savefig(f"{save_dir}/type1_confidence_rates.png")
+    else:
+        plt.show()
+
+    plt.close()
+
+
+if __name__ == "__main__":
+    test_type1_error_bootstrap(
+        BOOTSTRAP_ITERS,
+        num_simulations=NUM_SIMULATIONS,
+        color_and_marker=COLOR_MARKER,
+        save_dir=SAVE_DIR,
+    )
+
+    test_type1_error_confidence(
+        CONFIDENCE_LEVELS,
+        num_simulations=NUM_SIMULATIONS,
+        color_and_marker=COLOR_MARKER,
+        save_dir=SAVE_DIR,
+    )
diff --git a/paper/test_bootstrap_variants.py b/paper/test_bootstrap_variants.py
new file mode 100644
index 0000000..6ed8e4f
--- /dev/null
+++ b/paper/test_bootstrap_variants.py
@@ -0,0 +1,545 @@
+# STD
+from collections import defaultdict
+import math
+import os
+import scipy
+from typing import Optional, Dict, Callable, Any, List
+from warnings import warn
+
+# EXT
+import matplotlib.pyplot as plt
+import numpy as np
+import pandas as pd
+from scipy.stats import norm as normal
+from tqdm import tqdm
+from joblib import Parallel, delayed
+from joblib.externals.loky import set_loky_pickler
+from deepsig.conversion import score_pair_conversion
+from deepsig.aso import ArrayLike, get_quantile_function
+
+# CONST
+SAMPLE_SIZE = 20
+SAVE_DIR = "./img"
+NUM_SIMULATIONS = 100
+VARIANT_COLORS = {
+    "Classic Bootstrap": "darkred",
+    "Dror et al. (2019)": "darkblue",
+    r"Bootstrap $\varepsilon_{\mathcal{W}_2}$ mean": "forestgreen",
+    "Bootstrap correction": "darkorange",
+    "Cond. Bootstrap corr.": "darkviolet",
+    "Cond. Bootstrap corr. 2": "slategray",
+}
+
+# MISC
+set_loky_pickler("dill")  # Avoid weird joblib error with multi_aso
+
+
+def compute_violation_ratio(
+    scores_a: np.array,
+    scores_b: np.array,
+    dt: float,
+    quantile_func_a: Optional[Callable] = None,
+    quantile_func_b: Optional[Callable] = None,
+) -> float:
+    """
+    Compute the violation ration e_W2 (equation 4 + 5).
+
+    Parameters
+    ----------
+    scores_a: List[float]
+        Scores of algorithm A.
+    scores_b: List[float]
+        Scores of algorithm B.
+    dt: float
+        Differential for t during integral calculation.
+
+    Returns
+    -------
+    float
+        Return violation ratio.
+    """
+    if quantile_func_a is None:
+        quantile_func_a = get_quantile_function(scores_a)
+
+    if quantile_func_b is None:
+        quantile_func_b = get_quantile_function(scores_b)
+
+    squared_wasserstein_dist = 0
+    int_violation_set = 0  # Integral over violation set A_X
+
+    for p in np.arange(0 + dt, 1 - dt, dt):
+        diff = quantile_func_b(p) - quantile_func_a(p)
+        squared_wasserstein_dist += (diff ** 2) * dt
+        int_violation_set += (max(diff, 0) ** 2) * dt
+
+    if squared_wasserstein_dist == 0:
+        warn("Division by zero encountered in violation ratio.")
+        violation_ratio = 0.5
+
+    else:
+        violation_ratio = int_violation_set / squared_wasserstein_dist
+
+    return violation_ratio
+
+
+@score_pair_conversion
+def aso_bootstrap_comparisons(
+    scores_a: ArrayLike,
+    scores_b: ArrayLike,
+    confidence_level: float = 0.05,
+    num_samples: int = 1000,
+    num_bootstrap_iterations: int = 1000,
+    dt: float = 0.005,
+    num_jobs: int = 2,
+    show_progress: bool = False,
+    seed: Optional[int] = None,
+    _progress_bar: Optional[tqdm] = None,
+) -> Dict[str, float]:
+    """
+    Like the package ASO function, but compares different choices of bootstrap estimator.
+
+    Parameters
+    ----------
+    scores_a: List[float]
+        Scores of algorithm A.
+    scores_b: List[float]
+        Scores of algorithm B.
+    confidence_level: float
+        Desired confidence level of test. Set to 0.05 by default.
+    num_samples: int
+        Number of samples from the score distributions during every bootstrap iteration when estimating sigma.
+    num_bootstrap_iterations: int
+        Number of bootstrap iterations when estimating sigma.
+    dt: float
+        Differential for t during integral calculation.
+    num_jobs: int
+        Number of threads that bootstrap iterations are divided among.
+    show_progress: bool
+        Show progress bar. Default is True.
+    seed: Optional[int]
+        Set seed for reproducibility purposes. Default is None (meaning no seed is used).
+    _progress_bar: Optional[tqdm]
+        Hands over a progress bar object when called by multi_aso(). Only for internal use.
+
+    Returns
+    -------
+    float
+        Return an upper bound to the violation ratio. If it falls below 0.5, the null hypothesis can be rejected.
+    """
+    assert (
+        len(scores_a) > 0 and len(scores_b) > 0
+    ), "Both lists of scores must be non-empty."
+    assert num_samples > 0, "num_samples must be positive, {} found.".format(
+        num_samples
+    )
+    assert (
+        num_bootstrap_iterations > 0
+    ), "num_samples must be positive, {} found.".format(num_bootstrap_iterations)
+    assert num_jobs > 0, "Number of jobs has to be at least 1, {} found.".format(
+        num_jobs
+    )
+
+    violation_ratio = compute_violation_ratio(scores_a, scores_b, dt)
+
+    # Based on the actual number of samples
+    const1 = np.sqrt(len(scores_a) * len(scores_b) / (len(scores_a) + len(scores_b)))
+    quantile_func_a = get_quantile_function(scores_a)
+    quantile_func_b = get_quantile_function(scores_b)
+
+    def _progress_iter(high: int, progress_bar: tqdm):
+        """
+        This function is used when a shared progress bar is passed from multi_aso() - every time the iterator yields an
+        element, the progress bar is updated by one. It essentially behaves like a simplified range() function.
+
+        Parameters
+        ----------
+        high: int
+            Number of elements in iterator.
+        progress_bar: tqdm
+            Shared progress bar.
+        """
+        current = 0
+
+        while current < high:
+            yield current
+            current += 1
+            progress_bar.update(1)
+
+    # Add progress bar if applicable
+    if show_progress and _progress_bar is None:
+        iters = tqdm(range(num_bootstrap_iterations), desc="Bootstrap iterations")
+
+    # Shared progress bar when called from multi_aso()
+    elif _progress_bar is not None:
+        iters = _progress_iter(num_bootstrap_iterations, _progress_bar)
+
+    else:
+        iters = range(num_bootstrap_iterations)
+
+    # Set seeds for different jobs if applicable
+    # "Sub-seeds" for jobs are just seed argument + job index
+    seeds = (
+        [None] * num_bootstrap_iterations
+        if seed is None
+        else [
+            seed + offset
+            for offset in range(1, math.ceil((num_bootstrap_iterations + 1)))
+        ]
+    )
+
+    def _bootstrap_iter(seed: Optional[int] = None):
+        """
+        One bootstrap iteration. Wrapped in a function so it can be handed to joblib.Parallel.
+        """
+        # When running multiple jobs, these modules have to be re-imported for some reason to avoid an error
+        # Use dir() to check whether module is available in local scope:
+        # https://stackoverflow.com/questions/30483246/how-to-check-if-a-module-has-been-imported
+        if "np" not in dir() or "deepsig" not in dir():
+            import numpy as np
+            from deepsig.aso import compute_violation_ratio
+
+        if seed is not None:
+            np.random.seed(seed)
+
+        sampled_scores_a = quantile_func_a(np.random.uniform(0, 1, len(scores_a)))
+        sampled_scores_b = quantile_func_b(np.random.uniform(0, 1, len(scores_b)))
+        sample = compute_violation_ratio(
+            sampled_scores_a,
+            sampled_scores_b,
+            dt,
+        )
+
+        return sample
+
+    # Initialize worker pool and start iterations
+    parallel = Parallel(n_jobs=num_jobs)
+    samples = parallel(delayed(_bootstrap_iter)(seed) for seed, _ in zip(seeds, iters))
+
+    # Compute the different variants of the bootstrap estimator
+
+    # 1. Classic bootstrap estimator
+    sigma_hat1 = np.std(
+        1 / (num_bootstrap_iterations - 1) * (samples - np.mean(samples))
+    )
+    min_epsilon1 = np.clip(
+        violation_ratio - (1 / const1) * sigma_hat1 * normal.ppf(confidence_level),
+        0,
+        1,
+    )
+
+    # 2. ASO as implemented by Dror et al. (2019)
+    sigma_hat2 = np.std(const1 * (samples - violation_ratio))
+    min_epsilon2 = np.clip(
+        violation_ratio - (1 / const1) * sigma_hat2 * normal.ppf(confidence_level),
+        0,
+        1,
+    )
+
+    # 3. Like 2., but using the expected violation ratio for sigma
+    sigma_hat3 = np.std(const1 * (samples - np.mean(samples)))
+    min_epsilon3 = np.clip(
+        violation_ratio - (1 / const1) * sigma_hat3 * normal.ppf(confidence_level),
+        0,
+        1,
+    )
+
+    # 4. Like 3, but with the classic bootstrap bias correction
+    corrected_bootstrap_violation_ratio = np.clip(
+        2 * violation_ratio - np.mean(samples), 0, 1
+    )
+    min_epsilon4 = np.clip(
+        corrected_bootstrap_violation_ratio
+        - (1 / const1) * sigma_hat3 * normal.ppf(confidence_level),
+        0,
+        1,
+    )
+
+    # 5. Like 4., but with conditionally corrected bootstrap estimate
+    bias = np.mean(samples) - violation_ratio
+    sigma_hat_corr = np.std(1 / (len(samples) - 1) * (samples - np.mean(samples)))
+    min_epsilon5 = np.clip(
+        (
+            corrected_bootstrap_violation_ratio
+            if bias >= sigma_hat_corr
+            else violation_ratio
+        )
+        - (1 / const1) * sigma_hat3 * normal.ppf(confidence_level),
+        0,
+        1,
+    )
+
+    # 6. Like 5, but conditional correction happens based on the later used sigma hat
+    min_epsilon6 = np.clip(
+        (corrected_bootstrap_violation_ratio if bias >= sigma_hat3 else violation_ratio)
+        - (1 / const1) * sigma_hat3 * normal.ppf(confidence_level),
+        0,
+        1,
+    )
+
+    return {
+        "Classic Bootstrap": min_epsilon1,
+        "Dror et al. (2019)": min_epsilon2,
+        r"Bootstrap $\varepsilon_{\mathcal{W}_2}$ mean": min_epsilon3,
+        "Bootstrap correction": min_epsilon4,
+        "Cond. Bootstrap corr.": min_epsilon5,
+        "Cond. Bootstrap corr. 2": min_epsilon6,
+    }
+
+
+def test_type1_error(
+    sample_size: int,
+    colors: Dict[str, str],
+    name: str,
+    num_simulations: int = 200,
+    thresholds: List[float] = [0.05, 0.1, 0.2, 0.3, 0.4, 0.5],
+    dist_func: Callable = np.random.normal,
+    dist_params: Dict[str, Any] = {"loc": 0, "scale": 1.5},
+    save_dir: Optional[str] = None,
+):
+    """
+    Test the rate of type I error (false positive) under different samples sizes.
+
+    Parameters
+    ----------
+    sample_size: int
+        Sample size used in experiments.
+    colors: Dict[str, str]
+        Colors corresponding to each test for plotting.
+    name: str
+        Name of the experiment.
+    num_simulations: int
+        Number of simulations conducted.
+    dist_func: Callable
+        Distribution function that is used for sampling.
+    dist_params: Dict[str, Any]
+        Parameters of the distribution function.
+    save_dir: Optional[str]
+        Directory that plots should be saved to.
+    """
+    simulation_results = defaultdict(list)
+
+    with tqdm(total=len(colors) * num_simulations) as progress_bar:
+        for _ in range(num_simulations):
+
+            # Sample scores for this round
+            scores_a = dist_func(**dist_params, size=sample_size)
+            scores_b = dist_func(**dist_params, size=sample_size)
+
+            results = aso_bootstrap_comparisons(scores_a, scores_b)
+
+            for variant, res in results.items():
+                simulation_results[variant].append(res)
+
+            progress_bar.update(len(colors))
+
+    # with open(f"{save_dir}/type1_pg_rates.pkl", "wb") as out_file:
+    #    pickle.dump(simulation_results, out_file)
+
+    # Plot Type I error rates as line plot
+    plt.figure(figsize=(8, 6))
+    plt.rcParams.update(
+        {"font.size": 18, "text.usetex": True, "legend.loc": "upper right"}
+    )
+
+    # Create datastructure for boxplots
+    data = [simulation_results[test_name] for test_name in simulation_results.keys()]
+
+    box_plot = plt.boxplot(
+        data,
+        widths=0.45,
+        patch_artist=True,
+    )
+
+    for variant_name, patch, color in zip(
+        simulation_results.keys(), box_plot["boxes"], colors.values()
+    ):
+        patch.set_edgecolor(color)
+        patch.set_facecolor("white")
+
+        plt.plot([], color=color, label=variant_name)
+
+    plt.xticks(
+        range(1, len(colors) + 1),
+        [
+            f"{(np.array(simulation_results[variant_name]) <= thresholds[0]).astype(float).mean():.2f}"
+            for variant_name in simulation_results.keys()
+        ],
+    )
+
+    ax = plt.gca()
+    ax.set_ylim(0, 1)
+    # ax.set_xlim(-2, 3)
+    ax.yaxis.grid()
+    plt.xlabel("Bootstrap variants")
+    plt.ylabel(r"$\varepsilon_\mathrm{min}$")
+    plt.legend()
+
+    if save_dir is not None:
+        plt.tight_layout()
+        plt.savefig(f"{save_dir}/type1_bootstrap_dists_{name}.png")
+    else:
+        plt.show()
+
+    with open(f"{save_dir}/type1_bootstrap_rates_{name}.txt", "w") as out_file:
+        rates_df = pd.DataFrame(index=thresholds, columns=simulation_results.keys())
+
+        for threshold in thresholds:
+            for variant_name, data in simulation_results.items():
+                rates_df.at[threshold, variant_name] = (
+                    (np.array(data) <= threshold).astype(float).mean()
+                )
+
+        out_file.write(rates_df.to_latex())
+
+    plt.close()
+
+
+def test_type2_error(
+    sample_size: int,
+    colors: Dict[str, str],
+    name: str,
+    num_simulations: int = 200,
+    thresholds: List[float] = [0.05, 0.1, 0.2, 0.3, 0.4, 0.5],
+    dist_func: Callable = np.random.normal,
+    inv_cdf_func: Callable = scipy.stats.norm.ppf,
+    dist_params: Dict[str, Any] = {"loc": 0, "scale": 0.5},
+    dist_params2: Dict[str, Any] = {"loc": -0.25, "scale": 1.5},
+    save_dir: Optional[str] = None,
+):
+    """
+    Test the rate of type I error (false positive) under different samples sizes.
+
+    Parameters
+    ----------
+    sample_size: int
+        Sample size used in experiments.
+    colors: Dict[str, str]
+        Colors corresponding to each test for plotting.
+    name: str
+        Name of the experiment.
+    num_simulations: int
+        Number of simulations conducted.
+    dist_func: Callable
+        Distribution function that is used for sampling.
+    inv_cdf_funcL Callable
+        Inverse cumulative distribution function in order to compute the exact violation ratio.
+    dist_params: Dict[str, Any]
+        Parameters of the distribution function.
+    dist_params2: Dict[str, Any]
+        Parameters of the comparison distribution function.
+    save_dir: Optional[str]
+        Directory that plots should be saved to.
+    """
+    simulation_results = defaultdict(list)
+
+    with tqdm(total=len(colors) * num_simulations) as progress_bar:
+        for _ in range(num_simulations):
+
+            # Sample scores for this round
+            scores_a = dist_func(**dist_params, size=sample_size)
+            scores_b = dist_func(**dist_params2, size=sample_size)
+
+            results = aso_bootstrap_comparisons(scores_a, scores_b)
+
+            for variant, res in results.items():
+                simulation_results[variant].append(res)
+
+            progress_bar.update(len(colors))
+
+    # with open(f"{save_dir}/type1_pg_rates.pkl", "wb") as out_file:
+    #    pickle.dump(simulation_results, out_file)
+
+    # Plot Type I error rates as line plot
+    plt.figure(figsize=(8, 6))
+    plt.rcParams.update(
+        {"font.size": 18, "text.usetex": True, "legend.loc": "upper right"}
+    )
+
+    # Create datastructure for boxplots
+    data = [simulation_results[test_name] for test_name in simulation_results.keys()]
+
+    box_plot = plt.boxplot(
+        data,
+        widths=0.45,
+        patch_artist=True,
+    )
+
+    for variant_name, patch, color in zip(
+        simulation_results.keys(), box_plot["boxes"], colors.values()
+    ):
+        patch.set_edgecolor(color)
+        patch.set_facecolor("white")
+
+        plt.plot([], color=color, label=variant_name)
+
+    real_violation_ratio = compute_violation_ratio(
+        [],
+        [],
+        dt=0.05,
+        quantile_func_a=lambda p: inv_cdf_func(p, **dist_params),
+        quantile_func_b=lambda p: inv_cdf_func(p, **dist_params2),
+    )
+
+    plt.xticks(
+        range(1, len(colors) + 1),
+        [
+            f"{(np.array(simulation_results[variant_name]) > thresholds[0]).astype(float).mean():.2f}"
+            for variant_name in simulation_results.keys()
+        ],
+    )
+
+    ax = plt.gca()
+    ax.set_ylim(0, 1)
+    x = np.arange(ax.get_xlim()[0], ax.get_xlim()[1] + 1)
+    plt.plot(
+        x,
+        np.ones(len(x)) * real_violation_ratio,
+        alpha=0.8,
+        linestyle="--",
+        color="black",
+    )
+    ax.yaxis.grid()
+    plt.xlabel("Bootstrap variants")
+    plt.ylabel(r"$\varepsilon_\mathrm{min}$")
+    plt.legend()
+
+    if save_dir is not None:
+        plt.tight_layout()
+        plt.savefig(f"{save_dir}/type2_bootstrap_dists_{name}.png")
+    else:
+        plt.show()
+
+    plt.close()
+
+    with open(f"{save_dir}/type2_bootstrap_rates_{name}.txt", "w") as out_file:
+        rates_df = pd.DataFrame(index=thresholds, columns=simulation_results.keys())
+
+        for threshold in thresholds:
+            for variant_name, data in simulation_results.items():
+                rates_df.at[threshold, variant_name] = (
+                    (np.array(data) > threshold).astype(float).mean()
+                )
+
+        out_file.write(rates_df.to_latex())
+
+
+if __name__ == "__main__":
+
+    if not os.path.exists(SAVE_DIR):
+        os.mkdir(SAVE_DIR)
+
+    test_type1_error(
+        sample_size=SAMPLE_SIZE,
+        colors=VARIANT_COLORS,
+        num_simulations=NUM_SIMULATIONS,
+        name="normal",
+        save_dir=SAVE_DIR,
+    )
+
+    test_type2_error(
+        sample_size=SAMPLE_SIZE,
+        colors=VARIANT_COLORS,
+        num_simulations=NUM_SIMULATIONS,
+        name="normal",
+        save_dir=SAVE_DIR,
+    )
diff --git a/paper/test_comparison.py b/paper/test_comparison.py
new file mode 100644
index 0000000..c8d16b1
--- /dev/null
+++ b/paper/test_comparison.py
@@ -0,0 +1,769 @@
+"""
+Compare ASO against other significance tests and measure Type I and Type II error under an increasing number of samples.
+"""
+
+# STD
+import argparse
+from itertools import product
+import os
+import pickle
+from typing import Dict, Callable, List, Tuple, Optional, Any
+import warnings
+
+# EXT
+import matplotlib.pyplot as plt
+import numpy as np
+from scipy.stats import ttest_ind, wilcoxon, mannwhitneyu
+import pandas as pd
+from tqdm import tqdm
+
+# PACKAGE
+from deepsig import aso, bootstrap_test, permutation_test
+
+# CONST
+NUM_JOBS_ASO = 16
+NUM_JOBS_REST = 4
+CONSIDERED_TESTS = {
+    "ASO": lambda a, b: aso(a, b, show_progress=False, num_jobs=NUM_JOBS_ASO),
+    "Student's t": lambda a, b: ttest_ind(a, b, equal_var=False, alternative="greater")[
+        1
+    ],
+    "Bootstrap": lambda a, b: bootstrap_test(a, b, num_jobs=NUM_JOBS_REST),
+    "Permutation": lambda a, b: permutation_test(a, b, num_jobs=NUM_JOBS_REST),
+    "Wilcoxon": lambda a, b: wilcoxon(a, b, alternative="greater").pvalue,
+    "Mann-Whitney U": lambda a, b: mannwhitneyu(a, b, alternative="greater").pvalue,
+}
+CONSIDERED_TEST_COLORS_MARKERS = {
+    "ASO": ("darkred", "*"),
+    "Student's t": ("darkblue", "o"),
+    "Bootstrap": ("forestgreen", "^"),
+    "Permutation": ("darkorange", "P"),
+    "Wilcoxon": ("darkviolet", "x"),
+    "Mann-Whitney U": ("slategray", "p"),
+}
+SAMPLE_SIZES = [5, 10, 15, 20]
+MEAN_DIFFS = [0.25, 0.5, 0.75, 1]
+SAVE_DIR = "./img"
+NUM_SIMULATIONS = {
+    "ASO": 500,
+    "Student's t": 1000,
+    "Bootstrap": 1000,
+    "Permutation": 1000,
+    "Wilcoxon": 1000,
+    "Mann-Whitney U": 1000,
+}
+P_VALUE_TRESHOLD = 0.05
+ASO_THRESHOLD = 0.2
+ALL_THRESHOLDS = [0.05, 0.1, 0.2, 0.3, 0.4, 0.5]
+
+
+def test_type1_error(
+    tests: Dict[str, Callable],
+    sample_sizes: List[int],
+    num_simulations: Dict[str, int],
+    dist_func: Callable = np.random.normal,
+    dist_params: Dict[str, Any] = {"loc": 0, "scale": 1.5},
+    p_value_threshold: float = 0.05,
+    aso_threshold: float = 0.2,
+    all_thresholds: List[float] = [0.05, 0.1, 0.2, 0.3, 0.4, 0.5],
+    y_lim: Optional[Tuple[float, float]] = None,
+    colors_and_markers: Optional[Dict[str, Tuple[str, str]]] = None,
+    save_dir: Optional[str] = None,
+    plot: bool = True,
+    plot_from_pickle: bool = False,
+):
+    """
+    Test the rate of type I error (false positive) under different samples sizes.
+
+    Parameters
+    ----------
+    tests: Dict[str, Callable]
+        Considered tests.
+    sample_sizes: List[int]
+        Samples sizes that are being tested.
+    num_simulations: Dict[str, int]
+        Number of simulations conducted per method as dict.
+    dist_func: Callable
+        Distribution function that is used for sampling.
+    dist_params: Dict[str, Any]
+        Parameters of the distribution function.
+    p_value_threshold: float
+        Threshold that test results has to fall below in order for significance to be claimed.
+    aso_threshold: float
+        Threshold that ASO test results has to fall below in order for significance to be claimed.
+    all_thresholds: List[float]
+        List of all threshold that Type I error will be computed and printed as a table for.
+    colors_and_markers: Optional[Dict[str, Tuple[str, str]]]
+        Colors and markers corresponding to each test for plotting.
+    save_dir: Optional[str]
+        Directory that plots should be saved to.
+    plot: bool
+        Indicate whether plots should be created.
+    plot_from_pickle: bool
+        Indicate whether simulating experiments should be skipped in favor of just loading results from a pickle.
+        Default is false.
+    """
+
+    if not plot_from_pickle:
+        simulation_results = {
+            test_name: {sample_size: [] for sample_size in sample_sizes}
+            for test_name in tests
+        }
+        max_simulations = max(num_simulations.values())
+
+        with tqdm(
+            total=len(sample_sizes) * sum(num_simulations.values())
+        ) as progress_bar:
+            for sample_size in sample_sizes:
+                for simulation_idx in range(max_simulations):
+
+                    # Sample scores for this round
+                    scores_a = dist_func(**dist_params, size=sample_size)
+                    scores_b = dist_func(**dist_params, size=sample_size)
+
+                    for test_name, test_func in tests.items():
+
+                        if simulation_idx < num_simulations[test_name]:
+                            simulation_results[test_name][sample_size].append(
+                                test_func(scores_a, scores_b)
+                            )
+                            progress_bar.update(1)
+
+        with open(f"{save_dir}/type1_rates.pkl", "wb") as out_file:
+            pickle.dump(simulation_results, out_file)
+
+    else:
+        try:
+            with open(f"{save_dir}/type1_rates.pkl", "rb") as in_file:
+                simulation_results = pickle.load(in_file)
+
+            # Overwrite with sample_sizes actually found in loaded pickle
+            first_key = list(simulation_results.keys())[0]
+            sample_sizes = list(simulation_results[first_key].keys())
+
+        except FileNotFoundError:
+            warnings.warn(
+                f"File '{save_dir}/type1_rates.pkl' not found, no plots generated."
+            )
+            return
+
+    # Plotting
+    if plot or plot_from_pickle:
+        y = {
+            test_name: [
+                (
+                    np.array(simulation_results[test_name][sample_size])
+                    <= (p_value_threshold if test_name != "ASO" else aso_threshold)
+                )
+                .astype(float)
+                .mean()
+                for sample_size in sample_sizes
+            ]
+            for test_name in tests
+        }
+        plot_lines(
+            y=y,
+            groups=sample_sizes,
+            x_label="Sample Size",
+            y_label="Type I Error Rate",
+            save_dir=save_dir,
+            file_name="type1_rates",
+            y_lim=y_lim,
+            colors_and_markers=colors_and_markers,
+        )
+
+        plot_boxes(
+            results=simulation_results,
+            tests=tests,
+            groups=sample_sizes,
+            x_label="Sample Size",
+            y_label=r"$p$-value / $\varepsilon_\mathrm{min}$",
+            save_dir=save_dir,
+            file_name="type1_dists",
+            colors_and_markers=colors_and_markers,
+        )
+
+    # Creating a table
+    df = pd.DataFrame(
+        columns=tests.keys(),
+        index=pd.MultiIndex.from_tuples(
+            product(sample_sizes, all_thresholds, repeat=1),
+            names=["sample_size", "threshold"],
+        ),
+    )
+
+    for sample_size in sample_sizes:
+        for threshold in all_thresholds:
+            for test_name in tests:
+                df.at[(sample_size, threshold), test_name] = (
+                    (
+                        np.array(simulation_results[test_name][sample_size])
+                        <= (threshold if test_name != "ASO" else threshold)
+                    )
+                    .astype(float)
+                    .mean()
+                )
+
+    print(df.to_latex())
+
+
+def test_type2_error_sample_size(
+    tests: Dict[str, Callable],
+    sample_sizes: List[int],
+    num_simulations: Dict[str, int],
+    dist_func: Callable = np.random.normal,
+    dist1_params: Dict[str, Any] = {"loc": 0.5, "scale": 1.5},
+    dist2_params: Dict[str, Any] = {"loc": 0, "scale": 1.5},
+    p_value_threshold: float = 0.05,
+    aso_threshold: float = 0.2,
+    all_thresholds: List[float] = [0.05, 0.1, 0.2, 0.3, 0.4, 0.5],
+    y_lim: Optional[Tuple[float, float]] = None,
+    colors_and_markers: Optional[Dict[str, Tuple[str, str]]] = None,
+    save_dir: Optional[str] = None,
+    plot: bool = True,
+    plot_from_pickle: bool = False,
+):
+    """
+    Test the rate of type 2 error (false negative) under different samples sizes.
+
+    Parameters
+    ----------
+    tests: Dict[str, Callable]
+        Considered tests.
+    sample_sizes: List[int]
+        Samples sizes that are being tested.
+    num_simulations: Dict[str, int]
+        Number of simulations conducted per method as dict.
+    dist_func: Callable
+        Distribution function that is used for sampling.
+    dist1_params: Dict[str, Any]
+        Parameters of the first distribution function.
+    dist2_params: Dict[str, Any]
+        Parameters of the second distribution function.
+    p_value_threshold: float
+        Threshold that test results has to fall below in order for significance to be claimed.
+    aso_threshold: float
+        Threshold that ASO test results has to fall below in order for significance to be claimed.
+    all_thresholds: List[float]
+        List of all threshold that Type I error will be computed and printed as a table for.
+    colors_and_markers: Optional[Dict[str, Tuple[str, str]]]
+        Colors and markers corresponding to each test for plotting.
+    save_dir: Optional[str]
+        Directory that plots should be saved to.
+    plot: bool
+        Indicate whether plots should be created.
+    plot_from_pickle: bool
+        Indicate whether simulating experiments should be skipped in favor of just loading results from a pickle.
+        Default is false.
+    """
+    if not plot_from_pickle:
+        simulation_results = {
+            test_name: {sample_size: [] for sample_size in sample_sizes}
+            for test_name in tests
+        }
+        max_simulations = max(num_simulations.values())
+
+        with tqdm(
+            total=len(sample_sizes) * sum(num_simulations.values())
+        ) as progress_bar:
+            for sample_size in sample_sizes:
+                for simulation_idx in range(max_simulations):
+
+                    # Sample scores for this round
+                    scores_a = dist_func(**dist1_params, size=sample_size)
+                    scores_b = dist_func(**dist2_params, size=sample_size)
+
+                    for test_name, test_func in tests.items():
+
+                        if simulation_idx < num_simulations[test_name]:
+                            simulation_results[test_name][sample_size].append(
+                                test_func(scores_a, scores_b)
+                            )
+                            progress_bar.update(1)
+
+        with open(f"{save_dir}/type2_rates.pkl", "wb") as out_file:
+            pickle.dump(simulation_results, out_file)
+
+    else:
+        try:
+            with open(f"{save_dir}/type2_rates.pkl", "rb") as in_file:
+                simulation_results = pickle.load(in_file)
+
+            # Overwrite with sample_sizes actually found in loaded pickle
+            first_key = list(simulation_results.keys())[0]
+            sample_sizes = list(simulation_results[first_key].keys())
+
+        except FileNotFoundError:
+            warnings.warn(
+                f"File '{save_dir}/type2_rates.pkl' not found, no plots generated."
+            )
+            return
+
+    # Plot Type II error rates as line plot
+    if plot or plot_from_pickle:
+        y = {
+            test_name: [
+                (
+                    np.array(simulation_results[test_name][sample_size])
+                    >= (p_value_threshold if test_name != "ASO" else aso_threshold)
+                )
+                .astype(float)
+                .mean()
+                for sample_size in sample_sizes
+            ]
+            for test_name in tests
+        }
+        plot_lines(
+            y=y,
+            groups=sample_sizes,
+            x_label="Sample Size",
+            y_label="Type II Error Rate",
+            save_dir=save_dir,
+            y_lim=y_lim,
+            file_name="type2_rates",
+            colors_and_markers=colors_and_markers,
+        )
+
+        plot_boxes(
+            results=simulation_results,
+            tests=tests,
+            groups=sample_sizes,
+            x_label="Sample Size",
+            y_label=r"$p$-value / $\varepsilon_\mathrm{min}$",
+            save_dir=save_dir,
+            file_name="type2_dists",
+            colors_and_markers=colors_and_markers,
+        )
+
+    # Creating a table
+    df = pd.DataFrame(
+        columns=tests.keys(),
+        index=pd.MultiIndex.from_tuples(
+            product(sample_sizes, all_thresholds, repeat=1),
+            names=["sample_size", "threshold"],
+        ),
+    )
+
+    for sample_size in sample_sizes:
+        for threshold in all_thresholds:
+            for test_name in tests:
+                df.at[(sample_size, threshold), test_name] = (
+                    (np.array(simulation_results[test_name][sample_size]) >= threshold)
+                    .astype(float)
+                    .mean()
+                )
+
+    print(df.to_latex())
+
+
+def test_type2_error_mean_difference(
+    tests: Dict[str, Callable],
+    mean_differences: List[float],
+    num_simulations: Dict[str, int],
+    target_param: str = "loc",
+    dist_func: Callable = np.random.normal,
+    dist_params: Dict[str, Any] = {"loc": 0, "scale": 1.5},
+    sample_size: int = 5,
+    p_value_threshold: float = 0.05,
+    aso_threshold: float = 0.2,
+    all_thresholds: List[float] = [0.05, 0.1, 0.2, 0.3, 0.4, 0.5],
+    y_lim: Optional[Tuple[float, float]] = None,
+    colors_and_markers: Optional[Dict[str, Tuple[str, str]]] = None,
+    save_dir: Optional[str] = None,
+    plot: bool = True,
+    plot_from_pickle: bool = False,
+):
+    """
+    Test the rate of type II error under different mean differences between the two distributions that samples are taken
+    from.
+
+    Parameters
+    ----------
+    tests: Dict[str, Callable]
+        Considered tests.
+    mean_differences: List[float]
+        Mean differences between distributions that simulations are run for.
+    num_simulations: Dict[str, int]
+        Number of simulations conducted per method as dict.
+    target_param: str
+        Name of parameter affected by mean_differences.
+    dist_func: Callable
+        Distribution function that is used for sampling.
+    dist_params: Dict[str, Any]
+        Parameters of the distribution function.
+    sample_size: int
+        Number of samples for simulations.
+    p_value_threshold: float
+        Threshold that test results has to fall below in order for significance to be claimed.
+    aso_threshold: float
+        Threshold that ASO test results has to fall below in order for significance to be claimed.
+    all_thresholds: List[float]
+        List of all threshold that Type I error will be computed and printed as a table for.
+    colors_and_markers: Optional[Dict[str, Tuple[str, str]]]
+        Colors and markers corresponding to each test for plotting.
+    save_dir: Optional[str]
+        Directory that plots should be saved to.
+    plot: bool
+        Indicate whether plots should be created.
+    plot_from_pickle: bool
+        Indicate whether simulating experiments should be skipped in favor of just loading results from a pickle.
+        Default is false.
+    """
+    if not plot_from_pickle:
+        simulation_results = {
+            test_name: {mean_diff: [] for mean_diff in mean_differences}
+            for test_name in tests
+        }
+        max_simulations = max(num_simulations.values())
+
+        with tqdm(
+            total=len(mean_differences) * sum(num_simulations.values())
+        ) as progress_bar:
+            for mean_diff in mean_differences:
+                for simulation_idx in range(max_simulations):
+
+                    # Sample scores for this round
+                    modified_dist_params = {
+                        param: (value + mean_diff if param == target_param else value)
+                        for param, value in dist_params.items()
+                    }
+                    scores_a = dist_func(**modified_dist_params, size=sample_size)
+                    scores_b = dist_func(**dist_params, size=sample_size)
+
+                    for test_name, test_func in tests.items():
+
+                        if simulation_idx < num_simulations[test_name]:
+                            simulation_results[test_name][mean_diff].append(
+                                test_func(scores_a, scores_b)
+                            )
+                            progress_bar.update(1)
+
+        with open(f"{save_dir}/type2_mean_rates.pkl", "wb") as out_file:
+            pickle.dump(simulation_results, out_file)
+
+    else:
+        try:
+            with open(f"{save_dir}/type2_mean_rates.pkl", "rb") as in_file:
+                simulation_results = pickle.load(in_file)
+
+            # Overwrite with sample_sizes actually found in loaded pickle
+            first_key = list(simulation_results.keys())[0]
+            mean_differences = list(simulation_results[first_key].keys())
+
+        except FileNotFoundError:
+            warnings.warn(
+                f"File '{save_dir}/type2_mean_rates.pkl' not found, no plots generated."
+            )
+            return
+
+    # Plot Type II error rates as line plot
+    if plot or plot_from_pickle:
+        y = {
+            test_name: [
+                (
+                    np.array(simulation_results[test_name][mean_difference])
+                    >= (p_value_threshold if test_name != "ASO" else aso_threshold)
+                )
+                .astype(float)
+                .mean()
+                for mean_difference in mean_differences
+            ]
+            for test_name in tests
+        }
+        plot_lines(
+            y=y,
+            groups=mean_differences,
+            x_label="Mean difference",
+            y_label="Type II Error Rate",
+            save_dir=save_dir,
+            y_lim=y_lim,
+            file_name="type2_mean_rates",
+            colors_and_markers=colors_and_markers,
+        )
+
+        plot_boxes(
+            results=simulation_results,
+            tests=tests,
+            groups=mean_differences,
+            x_label="Mean difference",
+            y_label=r"$p$-value / $\varepsilon_\mathrm{min}$",
+            save_dir=save_dir,
+            file_name="type2_mean_dists",
+            colors_and_markers=colors_and_markers,
+        )
+
+    # Creating a table
+    df = pd.DataFrame(
+        columns=tests.keys(),
+        index=pd.MultiIndex.from_tuples(
+            product(mean_differences, all_thresholds, repeat=1),
+            names=["diff", "threshold"],
+        ),
+    )
+
+    for diff in mean_differences:
+        for threshold in all_thresholds:
+            for test_name in tests:
+                df.at[(diff, threshold), test_name] = (
+                    (np.array(simulation_results[test_name][diff]) >= threshold)
+                    .astype(float)
+                    .mean()
+                )
+
+    print(df.to_latex())
+
+
+def plot_lines(
+    y: Dict[str, List[float]],
+    groups: List[Any],
+    x_label: str,
+    y_label: str,
+    save_dir: str,
+    file_name: str,
+    y_lim: Optional[Tuple[float, float]] = None,
+    colors_and_markers: Optional[Dict[str, Tuple[str, str]]] = None,
+):
+    """
+    Plot data as line plots.
+
+    Parameters
+    ----------
+    y: Dict[str, List[float]]
+        Data to be plotted as data by test by group.
+    groups: List[Any]
+        Names of groups to be plotted on the x-axis.
+    x_label: str
+        x-axis label.
+    y_label: str
+        y-axis label.
+    save_dir: str
+        Directory the plot should be saved to.
+    file_name: str
+        File name for the plot.
+    colors_and_markers: Optional[Dict[str, Tuple[str, str]]]
+        Colors and markers corresponding to each test for plotting.
+    """
+    # Plot Type I error rates as line plot
+    plt.figure(figsize=(8, 6))
+    plt.rcParams.update(
+        {"font.size": 18, "text.usetex": True, "legend.loc": "upper right"}
+    )
+
+    for test_name, data in y.items():
+        color, marker, marker_size = None, None, None
+
+        if colors_and_markers is not None:
+            color, marker = colors_and_markers[test_name]
+            marker_size = 16
+
+        plt.plot(
+            groups,
+            data,
+            label=test_name,
+            color=color,
+            marker=marker,
+            markersize=marker_size,
+            alpha=0.8,
+        )
+
+    ax = plt.gca()
+
+    if y_lim is not None:
+        ax.set_ylim(*y_lim)
+
+    ax.yaxis.grid()
+    plt.xticks(groups, [str(group) for group in groups])
+    plt.xlabel(x_label)
+    plt.ylabel(y_label)
+    plt.legend()
+
+    plt.tight_layout()
+    plt.savefig(f"{save_dir}/{file_name}.png")
+    plt.close()
+
+
+def plot_boxes(
+    tests: Dict[str, Callable],
+    results: Dict[str, Dict[str, float]],
+    groups: List[Any],
+    x_label: str,
+    y_label: str,
+    save_dir: str,
+    file_name: str,
+    colors_and_markers: Optional[Dict[str, Tuple[str, str]]] = None,
+):
+    """
+    Plot data as box-and-whiskers plot.
+
+    Parameters
+    ----------
+    tests: Dict[str, Callable]
+        Considered tests.
+    results: Dict[str, Dict[str, float]]
+        Simulation results.
+    groups: List[Any]
+        Names of groups to be plotted on the x-axis.
+    x_label: str
+        x-axis label.
+    y_label: str
+        y-axis label.
+    save_dir: str
+        Directory the plot should be saved to.
+    file_name: str
+        File name for the plot.
+    colors_and_markers: Optional[Dict[str, Tuple[str, str]]]
+        Colors and markers corresponding to each test for plotting.
+    """
+    # Create data structure for boxplots
+    data = [[results[test_name][group] for group in groups] for test_name in tests]
+
+    # Create offsets for box plots
+    spacing = 0.5
+    offsets = (
+        np.arange(0, spacing * len(tests), spacing) - spacing * (len(tests) - 1) / 2
+    )
+
+    for test_name, test_data, offset in zip(tests.keys(), data, offsets):
+        color, marker = (
+            (None, None)
+            if colors_and_markers is None
+            else colors_and_markers[test_name]
+        )
+
+        box_plot = plt.boxplot(
+            test_data,
+            positions=np.arange(0, len(groups)) * len(tests) + offset,
+            sym=marker,
+            widths=0.45,
+            flierprops={"marker": ".", "markerfacecolor": color},
+        )
+
+        if color is not None:
+            plt.setp(box_plot["boxes"], color=color)
+            plt.setp(box_plot["whiskers"], color=color)
+            plt.setp(box_plot["caps"], color=color)
+            plt.setp(box_plot["medians"], color=color)
+
+            plt.plot([], color=color, label=test_name)
+
+    ax = plt.gca()
+    ax.set_ylim(0, 1)
+    ax.set_xlim(-2, len(groups) * len(tests))
+    ax.yaxis.grid()
+    plt.xticks(np.arange(0, len(groups) * len(tests), len(tests)), groups)
+    plt.xlabel(x_label)
+    plt.ylabel(y_label)
+    plt.legend()
+
+    plt.tight_layout()
+    plt.savefig(f"{save_dir}/{file_name}.png")
+
+    plt.close()
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--plot-from-pickle", action="store_true", default=False)
+    parser.add_argument("--plot", action="store_true", default=False)
+    args = parser.parse_args()
+
+    def normal_mixture(
+        loc: float,
+        scale: float,
+        size: int,
+        loc2: float = -0.5,
+        scale2: float = 0.25,
+        mixture_coeff: float = 0.7,
+    ):
+        """
+        Define a simple sampling procedure from a mixture of two normal distributions.
+        """
+        samples1 = np.random.normal(
+            loc=loc, scale=scale, size=int(size * mixture_coeff)
+        )
+        samples2 = np.random.normal(
+            loc=loc2, scale=scale2, size=size - int(size * mixture_coeff)
+        )
+
+        combined_samples = np.concatenate((samples1, samples2))
+
+        return combined_samples
+
+    for dist_func, target_param, dist1_params, dist2_params, save_dir in zip(
+        [np.random.normal, normal_mixture, np.random.laplace, np.random.rayleigh],
+        ["loc", "loc", "loc", "scale"],
+        [
+            {"loc": 0.5, "scale": 1.5},
+            {"loc": 2.5, "scale": 1},
+            {"loc": 0.5, "scale": 1.5},
+            {"scale": 2},
+        ],
+        [
+            {"loc": 0, "scale": 1.5},
+            {"loc": 1.5, "scale": 1},
+            {"loc": 0, "scale": 1.5},
+            {"scale": 1},
+        ],
+        [
+            f"{SAVE_DIR}/normal",
+            f"{SAVE_DIR}/normal_mix",
+            f"{SAVE_DIR}/laplace",
+            f"{SAVE_DIR}/rayleigh",
+        ],
+    ):
+        print(f"{'#' * 10} {dist_func.__name__} {'#' * 10}")
+
+        if not os.path.exists(save_dir):
+            os.mkdir(save_dir)
+
+        print("Type I error")
+        test_type1_error(
+            tests=CONSIDERED_TESTS,
+            dist_func=dist_func,
+            dist_params=dist2_params,
+            sample_sizes=SAMPLE_SIZES,
+            p_value_threshold=P_VALUE_TRESHOLD,
+            aso_threshold=ASO_THRESHOLD,
+            all_thresholds=ALL_THRESHOLDS,
+            colors_and_markers=CONSIDERED_TEST_COLORS_MARKERS,
+            y_lim=(0, 0.12),
+            save_dir=save_dir,
+            num_simulations=NUM_SIMULATIONS,
+            plot=args.plot,
+            plot_from_pickle=args.plot_from_pickle,
+        )
+
+        if dist_func in (np.random.normal, normal_mixture):
+            print("Type II error sample size")
+            test_type2_error_sample_size(
+                tests=CONSIDERED_TESTS,
+                dist_func=dist_func,
+                dist1_params=dist1_params,
+                dist2_params=dist2_params,
+                sample_sizes=SAMPLE_SIZES,
+                p_value_threshold=P_VALUE_TRESHOLD,
+                aso_threshold=ASO_THRESHOLD,
+                all_thresholds=ALL_THRESHOLDS,
+                colors_and_markers=CONSIDERED_TEST_COLORS_MARKERS,
+                save_dir=save_dir,
+                num_simulations=NUM_SIMULATIONS,
+                plot=args.plot,
+                plot_from_pickle=args.plot_from_pickle,
+            )
+
+            print("Type II error mean differences")
+            test_type2_error_mean_difference(
+                tests=CONSIDERED_TESTS,
+                dist_func=dist_func,
+                dist_params=dist2_params,
+                target_param=target_param,
+                mean_differences=MEAN_DIFFS,
+                p_value_threshold=P_VALUE_TRESHOLD,
+                aso_threshold=ASO_THRESHOLD,
+                all_thresholds=ALL_THRESHOLDS,
+                colors_and_markers=CONSIDERED_TEST_COLORS_MARKERS,
+                save_dir=save_dir,
+                num_simulations=NUM_SIMULATIONS,
+                plot=args.plot,
+                plot_from_pickle=args.plot_from_pickle,
+            )
diff --git a/setup.py b/setup.py
index f1fa54a..3713c1f 100644
--- a/setup.py
+++ b/setup.py
@@ -8,7 +8,7 @@
 
 setup(
     name="deepsig",
-    version="1.2.3",
+    version="1.2.5",
     author="Dennis Ulmer",
     description="Easy Significance Testing for Deep Neural Networks.",
     long_description=long_description,
diff --git a/svgs/00160c684b3af8ccefcdf19c69712e34.svg b/svgs/00160c684b3af8ccefcdf19c69712e34.svg
new file mode 100644
index 0000000..01a022a
--- /dev/null
+++ b/svgs/00160c684b3af8ccefcdf19c69712e34.svg
@@ -0,0 +1,31 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="78.050796pt" height="9.96264pt" viewBox="101.178868 -57.554161 78.050796 9.96264" readme2tex:offset="0" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g2-20" d="M6.724782-5.917808C6.834371-5.967621 6.914072-6.017435 6.914072-6.136986C6.914072-6.246575 6.834371-6.336239 6.714819-6.336239C6.665006-6.336239 6.575342-6.296389 6.535492-6.276463L1.026152-3.676214C.856787-3.596513 .826899-3.526775 .826899-3.447073C.826899-3.35741 .886675-3.287671 1.026152-3.227895L6.535492-.637609C6.665006-.56787 6.684932-.56787 6.714819-.56787C6.824408-.56787 6.914072-.657534 6.914072-.767123C6.914072-.856787 6.874222-.916563 6.704857-.996264L1.494396-3.447073L6.724782-5.917808ZM6.56538 1.364882C6.734745 1.364882 6.914072 1.364882 6.914072 1.165629S6.704857 .966376 6.555417 .966376H1.185554C1.036115 .966376 .826899 .966376 .826899 1.165629S1.006227 1.364882 1.175592 1.364882H6.56538Z" />
+<ns0:path id="g0-65" d="M2.559402-4.756164C2.510585-4.87472 2.496638-4.909589 2.419925-4.909589S2.308344-4.846824 2.273474-4.763138L.683437-.592777C.599751-.369614 .4533-.265006 .265006-.244085C.230137-.244085 .118555-.230137 .118555-.125529C.118555 0 .230137 0 .355666 0H1.471482C1.597011 0 1.701619 0 1.701619-.125529C1.701619-.230137 1.603985-.237111 1.555168-.244085C1.457534-.251059 1.283188-.278954 1.283188-.697385C1.283188-.927522 1.332005-1.150685 1.380822-1.373848H2.915068C3.159153-.690411 3.166127-.578829 3.173101-.432379C3.005729-.265006 2.810461-.251059 2.733748-.244085C2.66401-.237111 2.615193-.188294 2.615193-.125529C2.615193 0 2.719801 0 2.84533 0H4.665504C4.791034 0 4.902615 0 4.902615-.125529C4.902615-.223163 4.811955-.237111 4.749191-.244085C4.595766-.27198 4.47721-.390535 4.337733-.704359L2.559402-4.756164ZM1.45056-1.617933C1.610959-2.231631 1.834122-2.824408 2.064259-3.410212C2.280448-2.942964 2.698879-1.93873 2.817435-1.617933H1.45056ZM1.039103-.836862L1.046077-.829888V-.760149C1.039103-.746202 1.039103-.739228 1.039103-.697385C1.039103-.669489 1.039103-.404483 1.136737-.244085H.760149C.850809-.341719 .878705-.425405 .920548-.536986L1.039103-.836862ZM2.168867-3.7868L2.419925-4.463263L4.11457-.606725C4.142466-.536986 4.219178-.369614 4.316812-.244085H3.326526C3.396264-.285928 3.417186-.383562 3.417186-.467248C3.417186-.892653 2.810461-2.308344 2.510585-3.005729L2.168867-3.7868Z" />
+<ns0:path id="g0-66" d="M2.873225-2.677958C3.103362-2.901121 3.159153-3.256787 3.159153-3.556663C3.159153-4.037858 3.005729-4.337733 2.84533-4.505106C3.326526-4.449315 3.821669-4.274969 3.821669-3.584558C3.821669-3.138232 3.333499-2.775592 2.873225-2.677958ZM1.827148-4.156413C1.827148-4.29589 1.827148-4.372603 1.924782-4.449315C1.952677-4.463263 2.057285-4.533001 2.224658-4.533001C2.531507-4.533001 2.915068-4.288917 2.915068-3.556663C2.915068-2.775592 2.517559-2.608219 1.827148-2.594271V-4.156413ZM3.291656-2.559402C3.682192-2.761644 4.065753-3.110336 4.065753-3.584558C4.065753-4.553923 3.263761-4.777086 2.412951-4.777086H.299875C.174346-4.777086 .062765-4.777086 .062765-4.651557C.062765-4.533001 .18132-4.533001 .292902-4.533001C.711333-4.533001 .732254-4.463263 .732254-4.128518V-.648568C.732254-.299875 .704359-.244085 .265006-.244085C.188294-.244085 .062765-.244085 .062765-.125529C.062765 0 .174346 0 .299875 0H2.371108C3.228892 0 4.33076-.334745 4.33076-1.30411C4.33076-2.02939 3.800747-2.419925 3.291656-2.559402ZM3.068493-.327771C3.326526-.620672 3.375342-.990286 3.375342-1.30411C3.375342-1.785305 3.270735-2.147945 2.991781-2.378082C3.633375-2.273474 4.086675-1.910834 4.086675-1.30411C4.086675-.774097 3.668244-.467248 3.068493-.327771ZM1.827148-.620672V-2.350187C2.385056-2.350187 2.622167-2.350187 2.838356-2.189788C3.110336-1.980573 3.131258-1.527273 3.131258-1.30411C3.131258-1.03213 3.110336-.244085 2.238605-.244085C1.827148-.244085 1.827148-.488169 1.827148-.620672ZM1.673724-.244085H.913574C.976339-.369614 .976339-.54396 .976339-.63462V-4.142466C.976339-4.233126 .976339-4.407472 .913574-4.533001H1.673724C1.583064-4.42142 1.583064-4.281943 1.583064-4.177335V-.599751C1.583064-.495143 1.583064-.355666 1.673724-.244085Z" />
+<ns0:path id="g1-83" d="M.71731-1.046077H.727273C.727273-1.046077 .876712-.806974 1.185554-.508095C1.036115-.508095 .86675-.488169 .71731-.408468V-1.046077ZM2.092154-3.048568C2.86924-2.709838 3.92528-2.261519 3.92528-1.344956C3.92528-.637609 3.616438-.229141 2.630137-.229141C2.11208-.229141 1.673724-.508095 1.325031-.86675C.737235-1.464508 .71731-2.002491 .71731-2.221669C.71731-2.311333 .637609-2.391034 .547945-2.391034C.368618-2.391034 .368618-2.231631 .368618-2.062267V-.209215C.368618-.039851 .368618 .119552 .547945 .119552C.617684 .119552 .647572 .089664 .697385 .029888C.86675-.129514 1.046077-.159402 1.175592-.159402C1.43462-.159402 1.673724-.069738 1.763387-.029888C2.171856 .119552 2.460772 .119552 2.620174 .119552C4.054795 .119552 5.280199-.607721 5.280199-1.912827C5.280199-2.749689 4.772105-3.646326 3.347447-4.204234C2.460772-4.562889 1.474471-4.951432 1.474471-5.648817C1.474471-6.196762 1.863014-6.665006 2.570361-6.665006C3.287671-6.665006 4.254047-5.917808 4.443337-5.080946C4.473225-4.941469 4.493151-4.841843 4.632628-4.841843C4.801993-4.841843 4.801993-5.001245 4.801993-5.17061V-6.684932C4.801993-6.854296 4.801993-7.013699 4.632628-7.013699C4.552927-7.013699 4.513076-6.963885 4.473225-6.933998C4.383562-6.844334 4.244085-6.734745 3.995019-6.734745S3.516812-6.814446 3.39726-6.854296C3.078456-6.973848 2.849315-7.013699 2.550436-7.013699C1.195517-7.013699 .288917-6.346202 .288917-5.021171C.288917-4.373599 .557908-3.696139 2.092154-3.048568ZM4.4533-5.88792C4.383562-5.997509 4.293898-6.097136 4.014944-6.386052C4.283935-6.396015 4.443337-6.475716 4.4533-6.485679V-5.88792ZM4.124533-.587796C4.26401-.876712 4.273973-1.275218 4.273973-1.344956C4.273973-2.391034 3.237858-2.938979 2.460772-3.267746C1.524284-3.666252 .637609-4.084682 .637609-5.021171C.637609-5.489415 .767123-6.057285 1.305106-6.366127L1.315068-6.356164C1.135741-6.067248 1.125778-5.728518 1.125778-5.648817C1.125778-4.742217 2.221669-4.283935 3.118306-3.92528C3.486924-3.775841 3.995019-3.576588 4.383562-3.178082C4.702366-2.82939 4.931507-2.450809 4.931507-1.912827C4.931507-1.135741 4.423412-.747198 4.124533-.587796Z" />
+<ns0:path id="g4-40" d="M3.297634 2.391034C3.297634 2.361146 3.297634 2.34122 3.128269 2.171856C1.882939 .916563 1.564134-.966376 1.564134-2.49066C1.564134-4.224159 1.942715-5.957659 3.16812-7.202989C3.297634-7.32254 3.297634-7.342466 3.297634-7.372354C3.297634-7.442092 3.257783-7.47198 3.198007-7.47198C3.098381-7.47198 2.201743-6.794521 1.613948-5.529265C1.105853-4.433375 .986301-3.327522 .986301-2.49066C.986301-1.713574 1.09589-.508095 1.643836 .617684C2.241594 1.843088 3.098381 2.49066 3.198007 2.49066C3.257783 2.49066 3.297634 2.460772 3.297634 2.391034Z" />
+<ns0:path id="g4-41" d="M2.879203-2.49066C2.879203-3.267746 2.769614-4.473225 2.221669-5.599004C1.62391-6.824408 .767123-7.47198 .667497-7.47198C.607721-7.47198 .56787-7.43213 .56787-7.372354C.56787-7.342466 .56787-7.32254 .757161-7.143213C1.733499-6.156912 2.30137-4.572852 2.30137-2.49066C2.30137-.787049 1.932752 .966376 .697385 2.221669C.56787 2.34122 .56787 2.361146 .56787 2.391034C.56787 2.450809 .607721 2.49066 .667497 2.49066C.767123 2.49066 1.663761 1.8132 2.251557 .547945C2.759651-.547945 2.879203-1.653798 2.879203-2.49066Z" />
+<ns0:path id="g4-48" d="M4.582814-3.188045C4.582814-3.985056 4.533001-4.782067 4.184309-5.519303C3.726027-6.475716 2.909091-6.635118 2.49066-6.635118C1.892902-6.635118 1.165629-6.37609 .757161-5.449564C.438356-4.762142 .388543-3.985056 .388543-3.188045C.388543-2.440847 .428394-1.544209 .836862-.787049C1.265255 .019925 1.992528 .219178 2.480697 .219178C3.01868 .219178 3.775841 .009963 4.214197-.936488C4.533001-1.62391 4.582814-2.400996 4.582814-3.188045ZM2.480697 0C2.092154 0 1.504359-.249066 1.325031-1.205479C1.215442-1.803238 1.215442-2.719801 1.215442-3.307597C1.215442-3.945205 1.215442-4.60274 1.295143-5.140722C1.484433-6.326276 2.231631-6.41594 2.480697-6.41594C2.809465-6.41594 3.466999-6.236613 3.656289-5.250311C3.755915-4.692403 3.755915-3.935243 3.755915-3.307597C3.755915-2.560399 3.755915-1.882939 3.646326-1.24533C3.496887-.298879 2.929016 0 2.480697 0Z" />
+<ns0:path id="g4-58" d="M1.912827-3.765878C1.912827-4.054795 1.673724-4.293898 1.384807-4.293898S.856787-4.054795 .856787-3.765878S1.09589-3.237858 1.384807-3.237858S1.912827-3.476961 1.912827-3.765878ZM1.912827-.52802C1.912827-.816936 1.673724-1.05604 1.384807-1.05604S.856787-.816936 .856787-.52802S1.09589 0 1.384807 0S1.912827-.239103 1.912827-.52802Z" />
+<ns0:path id="g5-48" d="M3.598506-2.224658C3.598506-2.991781 3.507846-3.542715 3.187049-4.030884C2.970859-4.351681 2.538481-4.630635 1.980573-4.630635C.36264-4.630635 .36264-2.726775 .36264-2.224658S.36264 .139477 1.980573 .139477S3.598506-1.72254 3.598506-2.224658ZM1.980573-.055791C1.659776-.055791 1.234371-.244085 1.094894-.81594C.99726-1.227397 .99726-1.799253 .99726-2.315318C.99726-2.824408 .99726-3.354421 1.101868-3.737983C1.248319-4.288917 1.694645-4.435367 1.980573-4.435367C2.357161-4.435367 2.719801-4.20523 2.84533-3.800747C2.956912-3.424159 2.963885-2.922042 2.963885-2.315318C2.963885-1.799253 2.963885-1.283188 2.873225-.843836C2.733748-.209215 2.259527-.055791 1.980573-.055791Z" />
+<ns0:path id="g3-14" d="M2.630137-4.353674C1.384807-4.054795 .418431-2.759651 .418431-1.554172C.418431-.597758 1.05604 .119552 1.992528 .119552C3.158157 .119552 3.985056-1.444583 3.985056-2.819427C3.985056-3.726027 3.58655-4.224159 3.247821-4.672478C2.889166-5.120797 2.30137-5.867995 2.30137-6.306351C2.30137-6.525529 2.500623-6.764633 2.849315-6.764633C3.148194-6.764633 3.347447-6.635118 3.556663-6.495641C3.755915-6.37609 3.955168-6.246575 4.104608-6.246575C4.353674-6.246575 4.503113-6.485679 4.503113-6.645081C4.503113-6.864259 4.343711-6.894147 3.985056-6.973848C3.466999-7.083437 3.327522-7.083437 3.16812-7.083437C2.391034-7.083437 2.032379-6.655044 2.032379-6.057285C2.032379-5.519303 2.321295-4.961395 2.630137-4.353674ZM2.749689-4.134496C2.998755-3.676214 3.297634-3.138232 3.297634-2.420922C3.297634-1.763387 2.919054-.099626 1.992528-.099626C1.444583-.099626 1.036115-.518057 1.036115-1.275218C1.036115-1.902864 1.404732-3.775841 2.749689-4.134496Z" />
+<ns0:path id="g3-59" d="M2.022416-.009963C2.022416-.667497 1.77335-1.05604 1.384807-1.05604C1.05604-1.05604 .856787-.806974 .856787-.52802C.856787-.259029 1.05604 0 1.384807 0C1.504359 0 1.633873-.039851 1.733499-.129514C1.763387-.14944 1.77335-.159402 1.783313-.159402S1.803238-.14944 1.803238-.009963C1.803238 .727273 1.454545 1.325031 1.125778 1.653798C1.016189 1.763387 1.016189 1.783313 1.016189 1.8132C1.016189 1.882939 1.066002 1.92279 1.115816 1.92279C1.225405 1.92279 2.022416 1.155666 2.022416-.009963Z" />
+<ns0:path id="g3-72" d="M7.601494-6.03736C7.691158-6.396015 7.711083-6.495641 8.438356-6.495641C8.697385-6.495641 8.777086-6.495641 8.777086-6.694894C8.777086-6.804483 8.667497-6.804483 8.637609-6.804483C8.358655-6.804483 7.641345-6.774595 7.362391-6.774595C7.073474-6.774595 6.366127-6.804483 6.07721-6.804483C5.997509-6.804483 5.88792-6.804483 5.88792-6.60523C5.88792-6.495641 5.977584-6.495641 6.166874-6.495641C6.1868-6.495641 6.37609-6.495641 6.545455-6.475716C6.724782-6.455791 6.814446-6.445828 6.814446-6.316314C6.814446-6.276463 6.804483-6.256538 6.774595-6.127024L6.176837-3.696139H3.138232L3.726027-6.03736C3.815691-6.396015 3.845579-6.495641 4.562889-6.495641C4.821918-6.495641 4.901619-6.495641 4.901619-6.694894C4.901619-6.804483 4.79203-6.804483 4.762142-6.804483C4.483188-6.804483 3.765878-6.774595 3.486924-6.774595C3.198007-6.774595 2.49066-6.804483 2.201743-6.804483C2.122042-6.804483 2.012453-6.804483 2.012453-6.60523C2.012453-6.495641 2.102117-6.495641 2.291407-6.495641C2.311333-6.495641 2.500623-6.495641 2.669988-6.475716C2.849315-6.455791 2.938979-6.445828 2.938979-6.316314C2.938979-6.276463 2.929016-6.246575 2.899128-6.127024L1.564134-.777086C1.464508-.388543 1.444583-.308842 .657534-.308842C.478207-.308842 .388543-.308842 .388543-.109589C.388543 0 .508095 0 .52802 0C.806974 0 1.514321-.029888 1.793275-.029888C2.002491-.029888 2.221669-.019925 2.430884-.019925C2.650062-.019925 2.86924 0 3.078456 0C3.158157 0 3.277709 0 3.277709-.199253C3.277709-.308842 3.188045-.308842 2.998755-.308842C2.630137-.308842 2.351183-.308842 2.351183-.488169C2.351183-.547945 2.371108-.597758 2.381071-.657534L3.058531-3.387298H6.097136C5.678705-1.733499 5.449564-.787049 5.409714-.637609C5.310087-.318804 5.120797-.308842 4.503113-.308842C4.353674-.308842 4.26401-.308842 4.26401-.109589C4.26401 0 4.383562 0 4.403487 0C4.682441 0 5.389788-.029888 5.668742-.029888C5.877958-.029888 6.097136-.019925 6.306351-.019925C6.525529-.019925 6.744707 0 6.953923 0C7.033624 0 7.153176 0 7.153176-.199253C7.153176-.308842 7.063512-.308842 6.874222-.308842C6.505604-.308842 6.22665-.308842 6.22665-.488169C6.22665-.547945 6.246575-.597758 6.256538-.657534L7.601494-6.03736Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="101.178868" y="-50.082181" ns1:href="#g3-72" />
+<ns0:use x="109.460324" y="-48.5878" ns1:href="#g5-48" />
+<ns0:use x="116.697026" y="-50.082181" ns1:href="#g4-58" />
+<ns0:use x="122.231767" y="-50.082181" ns1:href="#g3-14" />
+<ns0:use x="127.036659" y="-50.082181" ns1:href="#g4-40" />
+<ns0:use x="130.911033" y="-50.082181" ns1:href="#g1-83" />
+<ns0:use x="136.44585" y="-48.5878" ns1:href="#g0-65" />
+<ns0:use x="141.98059" y="-50.082181" ns1:href="#g3-59" />
+<ns0:use x="146.408398" y="-50.082181" ns1:href="#g1-83" />
+<ns0:use x="151.943215" y="-48.5878" ns1:href="#g0-66" />
+<ns0:use x="157.090539" y="-50.082181" ns1:href="#g4-41" />
+<ns0:use x="163.732245" y="-50.082181" ns1:href="#g2-20" />
+<ns0:use x="174.248325" y="-50.082181" ns1:href="#g4-48" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/06f5ff6214110287d3948e9b44e31a1f.svg b/svgs/06f5ff6214110287d3948e9b44e31a1f.svg
new file mode 100644
index 0000000..424e60e
--- /dev/null
+++ b/svgs/06f5ff6214110287d3948e9b44e31a1f.svg
@@ -0,0 +1,24 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="57.561817pt" height="8.302176pt" viewBox="111.423364 -56.889976 57.561817 8.302176" readme2tex:offset="0" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g2-58" d="M1.912827-3.765878C1.912827-4.054795 1.673724-4.293898 1.384807-4.293898S.856787-4.054795 .856787-3.765878S1.09589-3.237858 1.384807-3.237858S1.912827-3.476961 1.912827-3.765878ZM1.912827-.52802C1.912827-.816936 1.673724-1.05604 1.384807-1.05604S.856787-.816936 .856787-.52802S1.09589 0 1.384807 0S1.912827-.239103 1.912827-.52802Z" />
+<ns0:path id="g0-21" d="M6.714819-3.227895C6.854296-3.287671 6.914072-3.35741 6.914072-3.447073C6.914072-3.5467 6.874222-3.606476 6.714819-3.676214L1.225405-6.266501C1.085928-6.336239 1.046077-6.336239 1.026152-6.336239C.9066-6.336239 .826899-6.246575 .826899-6.136986C.826899-6.017435 .9066-5.967621 1.016189-5.917808L6.246575-3.457036L1.036115-.996264C.836862-.9066 .826899-.826899 .826899-.767123C.826899-.657534 .916563-.56787 1.026152-.56787C1.05604-.56787 1.075965-.56787 1.205479-.637609L6.714819-3.227895ZM6.56538 1.364882C6.734745 1.364882 6.914072 1.364882 6.914072 1.165629S6.704857 .966376 6.555417 .966376H1.185554C1.036115 .966376 .826899 .966376 .826899 1.165629S1.006227 1.364882 1.175592 1.364882H6.56538Z" />
+<ns0:path id="g3-48" d="M3.598506-2.224658C3.598506-2.991781 3.507846-3.542715 3.187049-4.030884C2.970859-4.351681 2.538481-4.630635 1.980573-4.630635C.36264-4.630635 .36264-2.726775 .36264-2.224658S.36264 .139477 1.980573 .139477S3.598506-1.72254 3.598506-2.224658ZM1.980573-.055791C1.659776-.055791 1.234371-.244085 1.094894-.81594C.99726-1.227397 .99726-1.799253 .99726-2.315318C.99726-2.824408 .99726-3.354421 1.101868-3.737983C1.248319-4.288917 1.694645-4.435367 1.980573-4.435367C2.357161-4.435367 2.719801-4.20523 2.84533-3.800747C2.956912-3.424159 2.963885-2.922042 2.963885-2.315318C2.963885-1.799253 2.963885-1.283188 2.873225-.843836C2.733748-.209215 2.259527-.055791 1.980573-.055791Z" />
+<ns0:path id="g3-105" d="M1.471482-4.302864C1.471482-4.505106 1.30411-4.700374 1.066999-4.700374C.857783-4.700374 .669489-4.533001 .669489-4.302864C.669489-4.051806 .871731-3.898381 1.066999-3.898381C1.290162-3.898381 1.471482-4.072727 1.471482-4.302864ZM.411457-2.998755V-2.747696C.850809-2.747696 .913574-2.705853 .913574-2.364134V-.550934C.913574-.251059 .843836-.251059 .390535-.251059V0C.404483 0 .892653-.027895 1.171606-.027895C1.415691-.027895 1.66675-.020922 1.910834 0V-.251059C1.506351-.251059 1.436613-.251059 1.436613-.54396V-3.075467L.411457-2.998755Z" />
+<ns0:path id="g3-109" d="M5.732503-2.113076C5.732503-2.719801 5.432628-3.075467 4.686426-3.075467C4.135492-3.075467 3.758904-2.775592 3.57061-2.426899C3.431133-2.922042 3.054545-3.075467 2.545455-3.075467C1.973599-3.075467 1.603985-2.761644 1.408717-2.399004H1.401743V-3.075467L.376588-2.998755V-2.747696C.843836-2.747696 .899626-2.698879 .899626-2.357161V-.550934C.899626-.251059 .829888-.251059 .376588-.251059V0C.390535 0 .878705-.027895 1.171606-.027895C1.429639-.027895 1.910834-.006974 1.973599 0V-.251059C1.520299-.251059 1.45056-.251059 1.45056-.550934V-1.806227C1.45056-2.538481 2.02939-2.880199 2.489664-2.880199C2.977833-2.880199 3.040598-2.496638 3.040598-2.140971V-.550934C3.040598-.251059 2.970859-.251059 2.517559-.251059V0C2.531507 0 3.019676-.027895 3.312578-.027895C3.57061-.027895 4.051806-.006974 4.11457 0V-.251059C3.66127-.251059 3.591532-.251059 3.591532-.550934V-1.806227C3.591532-2.538481 4.170361-2.880199 4.630635-2.880199C5.118804-2.880199 5.181569-2.496638 5.181569-2.140971V-.550934C5.181569-.251059 5.111831-.251059 4.658531-.251059V0C4.672478 0 5.160648-.027895 5.453549-.027895C5.711582-.027895 6.192777-.006974 6.255542 0V-.251059C5.802242-.251059 5.732503-.251059 5.732503-.550934V-2.113076Z" />
+<ns0:path id="g3-110" d="M3.591532-2.113076C3.591532-2.719801 3.291656-3.075467 2.545455-3.075467C1.973599-3.075467 1.603985-2.761644 1.408717-2.399004H1.401743V-3.075467L.376588-2.998755V-2.747696C.843836-2.747696 .899626-2.698879 .899626-2.357161V-.550934C.899626-.251059 .829888-.251059 .376588-.251059V0C.390535 0 .878705-.027895 1.171606-.027895C1.429639-.027895 1.910834-.006974 1.973599 0V-.251059C1.520299-.251059 1.45056-.251059 1.45056-.550934V-1.806227C1.45056-2.538481 2.02939-2.880199 2.489664-2.880199C2.977833-2.880199 3.040598-2.496638 3.040598-2.140971V-.550934C3.040598-.251059 2.970859-.251059 2.517559-.251059V0C2.531507 0 3.019676-.027895 3.312578-.027895C3.57061-.027895 4.051806-.006974 4.11457 0V-.251059C3.66127-.251059 3.591532-.251059 3.591532-.550934V-2.113076Z" />
+<ns0:path id="g1-15" d="M2.968867-2.251557C3.128269-2.251557 3.307597-2.251557 3.307597-2.420922C3.307597-2.560399 3.188045-2.560399 3.01868-2.560399H1.404732C1.643836-3.407223 2.201743-3.985056 3.108344-3.985056H3.417186C3.58655-3.985056 3.745953-3.985056 3.745953-4.154421C3.745953-4.293898 3.616438-4.293898 3.447073-4.293898H3.098381C1.803238-4.293898 .468244-3.297634 .468244-1.77335C.468244-.67746 1.215442 .109589 2.271482 .109589C2.919054 .109589 3.566625-.288917 3.566625-.398506C3.566625-.428394 3.556663-.537983 3.466999-.537983C3.447073-.537983 3.427148-.537983 3.337484-.478207C3.028643-.278954 2.660025-.109589 2.291407-.109589C1.713574-.109589 1.215442-.52802 1.215442-1.404732C1.215442-1.753425 1.295143-2.132005 1.325031-2.251557H2.968867Z" />
+<ns0:path id="g1-28" d="M2.929016-3.716065H4.60274C4.732254-3.716065 5.090909-3.716065 5.090909-4.054795C5.090909-4.293898 4.881694-4.293898 4.692403-4.293898H1.902864C1.703611-4.293898 1.315068-4.293898 .876712-3.825654C.547945-3.466999 .268991-2.988792 .268991-2.929016C.268991-2.919054 .268991-2.82939 .388543-2.82939C.468244-2.82939 .488169-2.86924 .547945-2.948941C1.036115-3.716065 1.603985-3.716065 1.8132-3.716065H2.6401L1.663761-.518057C1.62391-.398506 1.564134-.18929 1.564134-.14944C1.564134-.039851 1.633873 .119552 1.853051 .119552C2.181818 .119552 2.231631-.159402 2.261519-.308842L2.929016-3.716065Z" />
+<ns0:path id="g1-72" d="M7.601494-6.03736C7.691158-6.396015 7.711083-6.495641 8.438356-6.495641C8.697385-6.495641 8.777086-6.495641 8.777086-6.694894C8.777086-6.804483 8.667497-6.804483 8.637609-6.804483C8.358655-6.804483 7.641345-6.774595 7.362391-6.774595C7.073474-6.774595 6.366127-6.804483 6.07721-6.804483C5.997509-6.804483 5.88792-6.804483 5.88792-6.60523C5.88792-6.495641 5.977584-6.495641 6.166874-6.495641C6.1868-6.495641 6.37609-6.495641 6.545455-6.475716C6.724782-6.455791 6.814446-6.445828 6.814446-6.316314C6.814446-6.276463 6.804483-6.256538 6.774595-6.127024L6.176837-3.696139H3.138232L3.726027-6.03736C3.815691-6.396015 3.845579-6.495641 4.562889-6.495641C4.821918-6.495641 4.901619-6.495641 4.901619-6.694894C4.901619-6.804483 4.79203-6.804483 4.762142-6.804483C4.483188-6.804483 3.765878-6.774595 3.486924-6.774595C3.198007-6.774595 2.49066-6.804483 2.201743-6.804483C2.122042-6.804483 2.012453-6.804483 2.012453-6.60523C2.012453-6.495641 2.102117-6.495641 2.291407-6.495641C2.311333-6.495641 2.500623-6.495641 2.669988-6.475716C2.849315-6.455791 2.938979-6.445828 2.938979-6.316314C2.938979-6.276463 2.929016-6.246575 2.899128-6.127024L1.564134-.777086C1.464508-.388543 1.444583-.308842 .657534-.308842C.478207-.308842 .388543-.308842 .388543-.109589C.388543 0 .508095 0 .52802 0C.806974 0 1.514321-.029888 1.793275-.029888C2.002491-.029888 2.221669-.019925 2.430884-.019925C2.650062-.019925 2.86924 0 3.078456 0C3.158157 0 3.277709 0 3.277709-.199253C3.277709-.308842 3.188045-.308842 2.998755-.308842C2.630137-.308842 2.351183-.308842 2.351183-.488169C2.351183-.547945 2.371108-.597758 2.381071-.657534L3.058531-3.387298H6.097136C5.678705-1.733499 5.449564-.787049 5.409714-.637609C5.310087-.318804 5.120797-.308842 4.503113-.308842C4.353674-.308842 4.26401-.308842 4.26401-.109589C4.26401 0 4.383562 0 4.403487 0C4.682441 0 5.389788-.029888 5.668742-.029888C5.877958-.029888 6.097136-.019925 6.306351-.019925C6.525529-.019925 6.744707 0 6.953923 0C7.033624 0 7.153176 0 7.153176-.199253C7.153176-.308842 7.063512-.308842 6.874222-.308842C6.505604-.308842 6.22665-.308842 6.22665-.488169C6.22665-.547945 6.246575-.597758 6.256538-.657534L7.601494-6.03736Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="111.423364" y="-50.082181" ns1:href="#g1-72" />
+<ns0:use x="119.70482" y="-48.5878" ns1:href="#g3-48" />
+<ns0:use x="126.941522" y="-50.082181" ns1:href="#g2-58" />
+<ns0:use x="132.476263" y="-50.082181" ns1:href="#g1-15" />
+<ns0:use x="136.520137" y="-48.5878" ns1:href="#g3-109" />
+<ns0:use x="143.065069" y="-48.5878" ns1:href="#g3-105" />
+<ns0:use x="145.320515" y="-48.5878" ns1:href="#g3-110" />
+<ns0:use x="152.98616" y="-50.082181" ns1:href="#g0-21" />
+<ns0:use x="163.50224" y="-50.082181" ns1:href="#g1-28" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/2ec6e630f199f589a2402fdf3e0289d5.svg b/svgs/2ec6e630f199f589a2402fdf3e0289d5.svg
new file mode 100644
index 0000000..ee25278
--- /dev/null
+++ b/svgs/2ec6e630f199f589a2402fdf3e0289d5.svg
@@ -0,0 +1,9 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="5.012464999999995pt" height="8.578936000000013pt" viewBox="-52.07469 -66.326817 5.012464999999995 8.578936000000013" readme2tex:offset="1.9371819999999937" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g0-112" d="M.448319 1.215442C.368618 1.554172 .348692 1.62391-.089664 1.62391C-.209215 1.62391-.318804 1.62391-.318804 1.8132C-.318804 1.892902-.268991 1.932752-.18929 1.932752C.079701 1.932752 .368618 1.902864 .647572 1.902864C.976339 1.902864 1.315068 1.932752 1.633873 1.932752C1.683686 1.932752 1.8132 1.932752 1.8132 1.733499C1.8132 1.62391 1.713574 1.62391 1.574097 1.62391C1.075965 1.62391 1.075965 1.554172 1.075965 1.464508C1.075965 1.344956 1.494396-.278954 1.564134-.52802C1.693649-.239103 1.972603 .109589 2.480697 .109589C3.636364 .109589 4.881694-1.344956 4.881694-2.809465C4.881694-3.745953 4.313823-4.403487 3.556663-4.403487C3.058531-4.403487 2.580324-4.044832 2.251557-3.656289C2.15193-4.194271 1.723537-4.403487 1.354919-4.403487C.896638-4.403487 .707347-4.014944 .617684-3.835616C.438356-3.496887 .308842-2.899128 .308842-2.86924C.308842-2.769614 .408468-2.769614 .428394-2.769614C.52802-2.769614 .537983-2.779577 .597758-2.998755C.767123-3.706102 .966376-4.184309 1.325031-4.184309C1.494396-4.184309 1.633873-4.104608 1.633873-3.726027C1.633873-3.496887 1.603985-3.387298 1.564134-3.217933L.448319 1.215442ZM2.201743-3.108344C2.271482-3.377335 2.540473-3.656289 2.719801-3.805729C3.068493-4.11457 3.35741-4.184309 3.526775-4.184309C3.92528-4.184309 4.164384-3.835616 4.164384-3.247821S3.835616-1.514321 3.656289-1.135741C3.317559-.438356 2.839352-.109589 2.470735-.109589C1.8132-.109589 1.683686-.936488 1.683686-.996264C1.683686-1.016189 1.683686-1.036115 1.713574-1.155666L2.201743-3.108344Z" />
+<ns0:path id="g1-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g0-112" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/38f1e2a089e53d5c990a82f284948953.svg b/svgs/38f1e2a089e53d5c990a82f284948953.svg
new file mode 100644
index 0000000..5f7c280
--- /dev/null
+++ b/svgs/38f1e2a089e53d5c990a82f284948953.svg
@@ -0,0 +1,9 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="4.804893999999994pt" height="13.837003999999993pt" viewBox="-52.07469 -68.955851 4.804893999999994 13.837003999999993" readme2tex:offset="3.552713678800501e-15" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g0-14" d="M2.630137-4.353674C1.384807-4.054795 .418431-2.759651 .418431-1.554172C.418431-.597758 1.05604 .119552 1.992528 .119552C3.158157 .119552 3.985056-1.444583 3.985056-2.819427C3.985056-3.726027 3.58655-4.224159 3.247821-4.672478C2.889166-5.120797 2.30137-5.867995 2.30137-6.306351C2.30137-6.525529 2.500623-6.764633 2.849315-6.764633C3.148194-6.764633 3.347447-6.635118 3.556663-6.495641C3.755915-6.37609 3.955168-6.246575 4.104608-6.246575C4.353674-6.246575 4.503113-6.485679 4.503113-6.645081C4.503113-6.864259 4.343711-6.894147 3.985056-6.973848C3.466999-7.083437 3.327522-7.083437 3.16812-7.083437C2.391034-7.083437 2.032379-6.655044 2.032379-6.057285C2.032379-5.519303 2.321295-4.961395 2.630137-4.353674ZM2.749689-4.134496C2.998755-3.676214 3.297634-3.138232 3.297634-2.420922C3.297634-1.763387 2.919054-.099626 1.992528-.099626C1.444583-.099626 1.036115-.518057 1.036115-1.275218C1.036115-1.902864 1.404732-3.775841 2.749689-4.134496Z" />
+<ns0:path id="g1-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g0-14" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/4cd4877610a47d915f39367760234822.svg b/svgs/4cd4877610a47d915f39367760234822.svg
new file mode 100644
index 0000000..f2c9e66
--- /dev/null
+++ b/svgs/4cd4877610a47d915f39367760234822.svg
@@ -0,0 +1,19 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="36.508917999999994pt" height="10.741674000000003pt" viewBox="-52.07469 -67.408186 36.508917999999994 10.741674000000003" readme2tex:offset="1.494380999999998" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g2-105" d="M1.471482-4.302864C1.471482-4.505106 1.30411-4.700374 1.066999-4.700374C.857783-4.700374 .669489-4.533001 .669489-4.302864C.669489-4.051806 .871731-3.898381 1.066999-3.898381C1.290162-3.898381 1.471482-4.072727 1.471482-4.302864ZM.411457-2.998755V-2.747696C.850809-2.747696 .913574-2.705853 .913574-2.364134V-.550934C.913574-.251059 .843836-.251059 .390535-.251059V0C.404483 0 .892653-.027895 1.171606-.027895C1.415691-.027895 1.66675-.020922 1.910834 0V-.251059C1.506351-.251059 1.436613-.251059 1.436613-.54396V-3.075467L.411457-2.998755Z" />
+<ns0:path id="g2-109" d="M5.732503-2.113076C5.732503-2.719801 5.432628-3.075467 4.686426-3.075467C4.135492-3.075467 3.758904-2.775592 3.57061-2.426899C3.431133-2.922042 3.054545-3.075467 2.545455-3.075467C1.973599-3.075467 1.603985-2.761644 1.408717-2.399004H1.401743V-3.075467L.376588-2.998755V-2.747696C.843836-2.747696 .899626-2.698879 .899626-2.357161V-.550934C.899626-.251059 .829888-.251059 .376588-.251059V0C.390535 0 .878705-.027895 1.171606-.027895C1.429639-.027895 1.910834-.006974 1.973599 0V-.251059C1.520299-.251059 1.45056-.251059 1.45056-.550934V-1.806227C1.45056-2.538481 2.02939-2.880199 2.489664-2.880199C2.977833-2.880199 3.040598-2.496638 3.040598-2.140971V-.550934C3.040598-.251059 2.970859-.251059 2.517559-.251059V0C2.531507 0 3.019676-.027895 3.312578-.027895C3.57061-.027895 4.051806-.006974 4.11457 0V-.251059C3.66127-.251059 3.591532-.251059 3.591532-.550934V-1.806227C3.591532-2.538481 4.170361-2.880199 4.630635-2.880199C5.118804-2.880199 5.181569-2.496638 5.181569-2.140971V-.550934C5.181569-.251059 5.111831-.251059 4.658531-.251059V0C4.672478 0 5.160648-.027895 5.453549-.027895C5.711582-.027895 6.192777-.006974 6.255542 0V-.251059C5.802242-.251059 5.732503-.251059 5.732503-.550934V-2.113076Z" />
+<ns0:path id="g2-110" d="M3.591532-2.113076C3.591532-2.719801 3.291656-3.075467 2.545455-3.075467C1.973599-3.075467 1.603985-2.761644 1.408717-2.399004H1.401743V-3.075467L.376588-2.998755V-2.747696C.843836-2.747696 .899626-2.698879 .899626-2.357161V-.550934C.899626-.251059 .829888-.251059 .376588-.251059V0C.390535 0 .878705-.027895 1.171606-.027895C1.429639-.027895 1.910834-.006974 1.973599 0V-.251059C1.520299-.251059 1.45056-.251059 1.45056-.550934V-1.806227C1.45056-2.538481 2.02939-2.880199 2.489664-2.880199C2.977833-2.880199 3.040598-2.496638 3.040598-2.140971V-.550934C3.040598-.251059 2.970859-.251059 2.517559-.251059V0C2.531507 0 3.019676-.027895 3.312578-.027895C3.57061-.027895 4.051806-.006974 4.11457 0V-.251059C3.66127-.251059 3.591532-.251059 3.591532-.550934V-2.113076Z" />
+<ns0:path id="g0-15" d="M2.968867-2.251557C3.128269-2.251557 3.307597-2.251557 3.307597-2.420922C3.307597-2.560399 3.188045-2.560399 3.01868-2.560399H1.404732C1.643836-3.407223 2.201743-3.985056 3.108344-3.985056H3.417186C3.58655-3.985056 3.745953-3.985056 3.745953-4.154421C3.745953-4.293898 3.616438-4.293898 3.447073-4.293898H3.098381C1.803238-4.293898 .468244-3.297634 .468244-1.77335C.468244-.67746 1.215442 .109589 2.271482 .109589C2.919054 .109589 3.566625-.288917 3.566625-.398506C3.566625-.428394 3.556663-.537983 3.466999-.537983C3.447073-.537983 3.427148-.537983 3.337484-.478207C3.028643-.278954 2.660025-.109589 2.291407-.109589C1.713574-.109589 1.215442-.52802 1.215442-1.404732C1.215442-1.753425 1.295143-2.132005 1.325031-2.251557H2.968867Z" />
+<ns0:path id="g0-28" d="M2.929016-3.716065H4.60274C4.732254-3.716065 5.090909-3.716065 5.090909-4.054795C5.090909-4.293898 4.881694-4.293898 4.692403-4.293898H1.902864C1.703611-4.293898 1.315068-4.293898 .876712-3.825654C.547945-3.466999 .268991-2.988792 .268991-2.929016C.268991-2.919054 .268991-2.82939 .388543-2.82939C.468244-2.82939 .488169-2.86924 .547945-2.948941C1.036115-3.716065 1.603985-3.716065 1.8132-3.716065H2.6401L1.663761-.518057C1.62391-.398506 1.564134-.18929 1.564134-.14944C1.564134-.039851 1.633873 .119552 1.853051 .119552C2.181818 .119552 2.231631-.159402 2.261519-.308842L2.929016-3.716065Z" />
+<ns0:path id="g0-60" d="M6.724782-4.961395C6.844334-5.021171 6.914072-5.070984 6.914072-5.180573S6.824408-5.379826 6.714819-5.379826C6.684932-5.379826 6.665006-5.379826 6.535492-5.310087L1.016189-2.709838C.9066-2.660025 .826899-2.610212 .826899-2.49066S.9066-2.321295 1.016189-2.271482L6.535492 .328767C6.665006 .398506 6.684932 .398506 6.714819 .398506C6.824408 .398506 6.914072 .308842 6.914072 .199253S6.844334 .039851 6.724782-.019925L1.494396-2.49066L6.724782-4.961395Z" />
+<ns0:path id="g1-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g0-15" />
+<ns0:use x="-48.030816" y="-60.542968" ns1:href="#g2-109" />
+<ns0:use x="-41.485884" y="-60.542968" ns1:href="#g2-105" />
+<ns0:use x="-39.230438" y="-60.542968" ns1:href="#g2-110" />
+<ns0:use x="-31.564793" y="-62.037349" ns1:href="#g0-60" />
+<ns0:use x="-21.048713" y="-62.037349" ns1:href="#g0-28" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/5db9dda6d48361ba963326d3f98a033d.svg b/svgs/5db9dda6d48361ba963326d3f98a033d.svg
new file mode 100644
index 0000000..9dc08e6
--- /dev/null
+++ b/svgs/5db9dda6d48361ba963326d3f98a033d.svg
@@ -0,0 +1,50 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="131.454981pt" height="10.756783pt" viewBox="74.476785 -57.872382 131.454981 10.756783" readme2tex:offset="0" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g2-21" d="M6.714819-3.227895C6.854296-3.287671 6.914072-3.35741 6.914072-3.447073C6.914072-3.5467 6.874222-3.606476 6.714819-3.676214L1.225405-6.266501C1.085928-6.336239 1.046077-6.336239 1.026152-6.336239C.9066-6.336239 .826899-6.246575 .826899-6.136986C.826899-6.017435 .9066-5.967621 1.016189-5.917808L6.246575-3.457036L1.036115-.996264C.836862-.9066 .826899-.826899 .826899-.767123C.826899-.657534 .916563-.56787 1.026152-.56787C1.05604-.56787 1.075965-.56787 1.205479-.637609L6.714819-3.227895ZM6.56538 1.364882C6.734745 1.364882 6.914072 1.364882 6.914072 1.165629S6.704857 .966376 6.555417 .966376H1.185554C1.036115 .966376 .826899 .966376 .826899 1.165629S1.006227 1.364882 1.175592 1.364882H6.56538Z" />
+<ns0:path id="g2-106" d="M1.58406-7.113325C1.58406-7.292653 1.58406-7.47198 1.384807-7.47198S1.185554-7.292653 1.185554-7.113325V2.132005C1.185554 2.311333 1.185554 2.49066 1.384807 2.49066S1.58406 2.311333 1.58406 2.132005V-7.113325Z" />
+<ns0:path id="g0-65" d="M2.559402-4.756164C2.510585-4.87472 2.496638-4.909589 2.419925-4.909589S2.308344-4.846824 2.273474-4.763138L.683437-.592777C.599751-.369614 .4533-.265006 .265006-.244085C.230137-.244085 .118555-.230137 .118555-.125529C.118555 0 .230137 0 .355666 0H1.471482C1.597011 0 1.701619 0 1.701619-.125529C1.701619-.230137 1.603985-.237111 1.555168-.244085C1.457534-.251059 1.283188-.278954 1.283188-.697385C1.283188-.927522 1.332005-1.150685 1.380822-1.373848H2.915068C3.159153-.690411 3.166127-.578829 3.173101-.432379C3.005729-.265006 2.810461-.251059 2.733748-.244085C2.66401-.237111 2.615193-.188294 2.615193-.125529C2.615193 0 2.719801 0 2.84533 0H4.665504C4.791034 0 4.902615 0 4.902615-.125529C4.902615-.223163 4.811955-.237111 4.749191-.244085C4.595766-.27198 4.47721-.390535 4.337733-.704359L2.559402-4.756164ZM1.45056-1.617933C1.610959-2.231631 1.834122-2.824408 2.064259-3.410212C2.280448-2.942964 2.698879-1.93873 2.817435-1.617933H1.45056ZM1.039103-.836862L1.046077-.829888V-.760149C1.039103-.746202 1.039103-.739228 1.039103-.697385C1.039103-.669489 1.039103-.404483 1.136737-.244085H.760149C.850809-.341719 .878705-.425405 .920548-.536986L1.039103-.836862ZM2.168867-3.7868L2.419925-4.463263L4.11457-.606725C4.142466-.536986 4.219178-.369614 4.316812-.244085H3.326526C3.396264-.285928 3.417186-.383562 3.417186-.467248C3.417186-.892653 2.810461-2.308344 2.510585-3.005729L2.168867-3.7868Z" />
+<ns0:path id="g0-66" d="M2.873225-2.677958C3.103362-2.901121 3.159153-3.256787 3.159153-3.556663C3.159153-4.037858 3.005729-4.337733 2.84533-4.505106C3.326526-4.449315 3.821669-4.274969 3.821669-3.584558C3.821669-3.138232 3.333499-2.775592 2.873225-2.677958ZM1.827148-4.156413C1.827148-4.29589 1.827148-4.372603 1.924782-4.449315C1.952677-4.463263 2.057285-4.533001 2.224658-4.533001C2.531507-4.533001 2.915068-4.288917 2.915068-3.556663C2.915068-2.775592 2.517559-2.608219 1.827148-2.594271V-4.156413ZM3.291656-2.559402C3.682192-2.761644 4.065753-3.110336 4.065753-3.584558C4.065753-4.553923 3.263761-4.777086 2.412951-4.777086H.299875C.174346-4.777086 .062765-4.777086 .062765-4.651557C.062765-4.533001 .18132-4.533001 .292902-4.533001C.711333-4.533001 .732254-4.463263 .732254-4.128518V-.648568C.732254-.299875 .704359-.244085 .265006-.244085C.188294-.244085 .062765-.244085 .062765-.125529C.062765 0 .174346 0 .299875 0H2.371108C3.228892 0 4.33076-.334745 4.33076-1.30411C4.33076-2.02939 3.800747-2.419925 3.291656-2.559402ZM3.068493-.327771C3.326526-.620672 3.375342-.990286 3.375342-1.30411C3.375342-1.785305 3.270735-2.147945 2.991781-2.378082C3.633375-2.273474 4.086675-1.910834 4.086675-1.30411C4.086675-.774097 3.668244-.467248 3.068493-.327771ZM1.827148-.620672V-2.350187C2.385056-2.350187 2.622167-2.350187 2.838356-2.189788C3.110336-1.980573 3.131258-1.527273 3.131258-1.30411C3.131258-1.03213 3.110336-.244085 2.238605-.244085C1.827148-.244085 1.827148-.488169 1.827148-.620672ZM1.673724-.244085H.913574C.976339-.369614 .976339-.54396 .976339-.63462V-4.142466C.976339-4.233126 .976339-4.407472 .913574-4.533001H1.673724C1.583064-4.42142 1.583064-4.281943 1.583064-4.177335V-.599751C1.583064-.495143 1.583064-.355666 1.673724-.244085Z" />
+<ns0:path id="g5-48" d="M3.598506-2.224658C3.598506-2.991781 3.507846-3.542715 3.187049-4.030884C2.970859-4.351681 2.538481-4.630635 1.980573-4.630635C.36264-4.630635 .36264-2.726775 .36264-2.224658S.36264 .139477 1.980573 .139477S3.598506-1.72254 3.598506-2.224658ZM1.980573-.055791C1.659776-.055791 1.234371-.244085 1.094894-.81594C.99726-1.227397 .99726-1.799253 .99726-2.315318C.99726-2.824408 .99726-3.354421 1.101868-3.737983C1.248319-4.288917 1.694645-4.435367 1.980573-4.435367C2.357161-4.435367 2.719801-4.20523 2.84533-3.800747C2.956912-3.424159 2.963885-2.922042 2.963885-2.315318C2.963885-1.799253 2.963885-1.283188 2.873225-.843836C2.733748-.209215 2.259527-.055791 1.980573-.055791Z" />
+<ns0:path id="g5-101" d="M3.068493-1.590037C3.214944-1.590037 3.263761-1.590037 3.263761-1.743462C3.263761-2.357161 2.922042-3.110336 1.882939-3.110336C.969365-3.110336 .27198-2.385056 .27198-1.527273C.27198-.641594 1.046077 .069738 1.973599 .069738C2.915068 .069738 3.263761-.683437 3.263761-.836862C3.263761-.864757 3.249813-.934496 3.145205-.934496C3.054545-.934496 3.040598-.892653 3.019676-.822914C2.803487-.258032 2.273474-.153425 2.022416-.153425C1.694645-.153425 1.380822-.299875 1.171606-.564882C.913574-.892653 .9066-1.318057 .9066-1.590037H3.068493ZM.913574-1.771357C.990286-2.75467 1.617933-2.915068 1.882939-2.915068C2.740722-2.915068 2.768618-1.945704 2.775592-1.771357H.913574Z" />
+<ns0:path id="g5-112" d="M1.93873 1.101868C1.48543 1.101868 1.415691 1.101868 1.415691 .801993V-.334745C1.45056-.299875 1.778331 .069738 2.364134 .069738C3.284682 .069738 4.072727-.620672 4.072727-1.506351C4.072727-2.364134 3.361395-3.075467 2.475716-3.075467C2.078207-3.075467 1.673724-2.929016 1.387796-2.650062V-3.075467L.341719-2.998755V-2.747696C.829888-2.747696 .864757-2.712827 .864757-2.419925V.801993C.864757 1.101868 .795019 1.101868 .341719 1.101868V1.352927C.355666 1.352927 .843836 1.325031 1.136737 1.325031C1.39477 1.325031 1.875965 1.345953 1.93873 1.352927V1.101868ZM1.415691-2.322291C1.624907-2.670984 2.02939-2.852304 2.399004-2.852304C2.984807-2.852304 3.438107-2.238605 3.438107-1.506351C3.438107-.711333 2.915068-.125529 2.322291-.125529C1.708593-.125529 1.436613-.662516 1.415691-.704359V-2.322291Z" />
+<ns0:path id="g5-114" d="M1.387796-1.590037C1.387796-2.175841 1.659776-2.880199 2.322291-2.880199C2.259527-2.831382 2.203736-2.740722 2.203736-2.629141C2.203736-2.399004 2.385056-2.308344 2.517559-2.308344C2.684932-2.308344 2.838356-2.419925 2.838356-2.629141C2.838356-2.866252 2.615193-3.075467 2.287422-3.075467C1.93873-3.075467 1.555168-2.859278 1.345953-2.329265H1.338979V-3.075467L.341719-2.998755V-2.747696C.808966-2.747696 .864757-2.698879 .864757-2.357161V-.550934C.864757-.251059 .795019-.251059 .341719-.251059V0C.376588 0 .850809-.027895 1.136737-.027895C1.436613-.027895 1.743462-.013948 2.043337 0V-.251059H1.903861C1.387796-.251059 1.387796-.327771 1.387796-.564882V-1.590037Z" />
+<ns0:path id="g1-83" d="M.71731-1.046077H.727273C.727273-1.046077 .876712-.806974 1.185554-.508095C1.036115-.508095 .86675-.488169 .71731-.408468V-1.046077ZM2.092154-3.048568C2.86924-2.709838 3.92528-2.261519 3.92528-1.344956C3.92528-.637609 3.616438-.229141 2.630137-.229141C2.11208-.229141 1.673724-.508095 1.325031-.86675C.737235-1.464508 .71731-2.002491 .71731-2.221669C.71731-2.311333 .637609-2.391034 .547945-2.391034C.368618-2.391034 .368618-2.231631 .368618-2.062267V-.209215C.368618-.039851 .368618 .119552 .547945 .119552C.617684 .119552 .647572 .089664 .697385 .029888C.86675-.129514 1.046077-.159402 1.175592-.159402C1.43462-.159402 1.673724-.069738 1.763387-.029888C2.171856 .119552 2.460772 .119552 2.620174 .119552C4.054795 .119552 5.280199-.607721 5.280199-1.912827C5.280199-2.749689 4.772105-3.646326 3.347447-4.204234C2.460772-4.562889 1.474471-4.951432 1.474471-5.648817C1.474471-6.196762 1.863014-6.665006 2.570361-6.665006C3.287671-6.665006 4.254047-5.917808 4.443337-5.080946C4.473225-4.941469 4.493151-4.841843 4.632628-4.841843C4.801993-4.841843 4.801993-5.001245 4.801993-5.17061V-6.684932C4.801993-6.854296 4.801993-7.013699 4.632628-7.013699C4.552927-7.013699 4.513076-6.963885 4.473225-6.933998C4.383562-6.844334 4.244085-6.734745 3.995019-6.734745S3.516812-6.814446 3.39726-6.854296C3.078456-6.973848 2.849315-7.013699 2.550436-7.013699C1.195517-7.013699 .288917-6.346202 .288917-5.021171C.288917-4.373599 .557908-3.696139 2.092154-3.048568ZM4.4533-5.88792C4.383562-5.997509 4.293898-6.097136 4.014944-6.386052C4.283935-6.396015 4.443337-6.475716 4.4533-6.485679V-5.88792ZM4.124533-.587796C4.26401-.876712 4.273973-1.275218 4.273973-1.344956C4.273973-2.391034 3.237858-2.938979 2.460772-3.267746C1.524284-3.666252 .637609-4.084682 .637609-5.021171C.637609-5.489415 .767123-6.057285 1.305106-6.366127L1.315068-6.356164C1.135741-6.067248 1.125778-5.728518 1.125778-5.648817C1.125778-4.742217 2.221669-4.283935 3.118306-3.92528C3.486924-3.775841 3.995019-3.576588 4.383562-3.178082C4.702366-2.82939 4.931507-2.450809 4.931507-1.912827C4.931507-1.135741 4.423412-.747198 4.124533-.587796Z" />
+<ns0:path id="g4-40" d="M3.297634 2.391034C3.297634 2.361146 3.297634 2.34122 3.128269 2.171856C1.882939 .916563 1.564134-.966376 1.564134-2.49066C1.564134-4.224159 1.942715-5.957659 3.16812-7.202989C3.297634-7.32254 3.297634-7.342466 3.297634-7.372354C3.297634-7.442092 3.257783-7.47198 3.198007-7.47198C3.098381-7.47198 2.201743-6.794521 1.613948-5.529265C1.105853-4.433375 .986301-3.327522 .986301-2.49066C.986301-1.713574 1.09589-.508095 1.643836 .617684C2.241594 1.843088 3.098381 2.49066 3.198007 2.49066C3.257783 2.49066 3.297634 2.460772 3.297634 2.391034Z" />
+<ns0:path id="g4-41" d="M2.879203-2.49066C2.879203-3.267746 2.769614-4.473225 2.221669-5.599004C1.62391-6.824408 .767123-7.47198 .667497-7.47198C.607721-7.47198 .56787-7.43213 .56787-7.372354C.56787-7.342466 .56787-7.32254 .757161-7.143213C1.733499-6.156912 2.30137-4.572852 2.30137-2.49066C2.30137-.787049 1.932752 .966376 .697385 2.221669C.56787 2.34122 .56787 2.361146 .56787 2.391034C.56787 2.450809 .607721 2.49066 .667497 2.49066C.767123 2.49066 1.663761 1.8132 2.251557 .547945C2.759651-.547945 2.879203-1.653798 2.879203-2.49066Z" />
+<ns0:path id="g3-14" d="M2.630137-4.353674C1.384807-4.054795 .418431-2.759651 .418431-1.554172C.418431-.597758 1.05604 .119552 1.992528 .119552C3.158157 .119552 3.985056-1.444583 3.985056-2.819427C3.985056-3.726027 3.58655-4.224159 3.247821-4.672478C2.889166-5.120797 2.30137-5.867995 2.30137-6.306351C2.30137-6.525529 2.500623-6.764633 2.849315-6.764633C3.148194-6.764633 3.347447-6.635118 3.556663-6.495641C3.755915-6.37609 3.955168-6.246575 4.104608-6.246575C4.353674-6.246575 4.503113-6.485679 4.503113-6.645081C4.503113-6.864259 4.343711-6.894147 3.985056-6.973848C3.466999-7.083437 3.327522-7.083437 3.16812-7.083437C2.391034-7.083437 2.032379-6.655044 2.032379-6.057285C2.032379-5.519303 2.321295-4.961395 2.630137-4.353674ZM2.749689-4.134496C2.998755-3.676214 3.297634-3.138232 3.297634-2.420922C3.297634-1.763387 2.919054-.099626 1.992528-.099626C1.444583-.099626 1.036115-.518057 1.036115-1.275218C1.036115-1.902864 1.404732-3.775841 2.749689-4.134496Z" />
+<ns0:path id="g3-59" d="M2.022416-.009963C2.022416-.667497 1.77335-1.05604 1.384807-1.05604C1.05604-1.05604 .856787-.806974 .856787-.52802C.856787-.259029 1.05604 0 1.384807 0C1.504359 0 1.633873-.039851 1.733499-.129514C1.763387-.14944 1.77335-.159402 1.783313-.159402S1.803238-.14944 1.803238-.009963C1.803238 .727273 1.454545 1.325031 1.125778 1.653798C1.016189 1.763387 1.016189 1.783313 1.016189 1.8132C1.016189 1.882939 1.066002 1.92279 1.115816 1.92279C1.225405 1.92279 2.022416 1.155666 2.022416-.009963Z" />
+<ns0:path id="g3-72" d="M7.601494-6.03736C7.691158-6.396015 7.711083-6.495641 8.438356-6.495641C8.697385-6.495641 8.777086-6.495641 8.777086-6.694894C8.777086-6.804483 8.667497-6.804483 8.637609-6.804483C8.358655-6.804483 7.641345-6.774595 7.362391-6.774595C7.073474-6.774595 6.366127-6.804483 6.07721-6.804483C5.997509-6.804483 5.88792-6.804483 5.88792-6.60523C5.88792-6.495641 5.977584-6.495641 6.166874-6.495641C6.1868-6.495641 6.37609-6.495641 6.545455-6.475716C6.724782-6.455791 6.814446-6.445828 6.814446-6.316314C6.814446-6.276463 6.804483-6.256538 6.774595-6.127024L6.176837-3.696139H3.138232L3.726027-6.03736C3.815691-6.396015 3.845579-6.495641 4.562889-6.495641C4.821918-6.495641 4.901619-6.495641 4.901619-6.694894C4.901619-6.804483 4.79203-6.804483 4.762142-6.804483C4.483188-6.804483 3.765878-6.774595 3.486924-6.774595C3.198007-6.774595 2.49066-6.804483 2.201743-6.804483C2.122042-6.804483 2.012453-6.804483 2.012453-6.60523C2.012453-6.495641 2.102117-6.495641 2.291407-6.495641C2.311333-6.495641 2.500623-6.495641 2.669988-6.475716C2.849315-6.455791 2.938979-6.445828 2.938979-6.316314C2.938979-6.276463 2.929016-6.246575 2.899128-6.127024L1.564134-.777086C1.464508-.388543 1.444583-.308842 .657534-.308842C.478207-.308842 .388543-.308842 .388543-.109589C.388543 0 .508095 0 .52802 0C.806974 0 1.514321-.029888 1.793275-.029888C2.002491-.029888 2.221669-.019925 2.430884-.019925C2.650062-.019925 2.86924 0 3.078456 0C3.158157 0 3.277709 0 3.277709-.199253C3.277709-.308842 3.188045-.308842 2.998755-.308842C2.630137-.308842 2.351183-.308842 2.351183-.488169C2.351183-.547945 2.371108-.597758 2.381071-.657534L3.058531-3.387298H6.097136C5.678705-1.733499 5.449564-.787049 5.409714-.637609C5.310087-.318804 5.120797-.308842 4.503113-.308842C4.353674-.308842 4.26401-.308842 4.26401-.109589C4.26401 0 4.383562 0 4.403487 0C4.682441 0 5.389788-.029888 5.668742-.029888C5.877958-.029888 6.097136-.019925 6.306351-.019925C6.525529-.019925 6.744707 0 6.953923 0C7.033624 0 7.153176 0 7.153176-.199253C7.153176-.308842 7.063512-.308842 6.874222-.308842C6.505604-.308842 6.22665-.308842 6.22665-.488169C6.22665-.547945 6.246575-.597758 6.256538-.657534L7.601494-6.03736Z" />
+<ns0:path id="g3-112" d="M.448319 1.215442C.368618 1.554172 .348692 1.62391-.089664 1.62391C-.209215 1.62391-.318804 1.62391-.318804 1.8132C-.318804 1.892902-.268991 1.932752-.18929 1.932752C.079701 1.932752 .368618 1.902864 .647572 1.902864C.976339 1.902864 1.315068 1.932752 1.633873 1.932752C1.683686 1.932752 1.8132 1.932752 1.8132 1.733499C1.8132 1.62391 1.713574 1.62391 1.574097 1.62391C1.075965 1.62391 1.075965 1.554172 1.075965 1.464508C1.075965 1.344956 1.494396-.278954 1.564134-.52802C1.693649-.239103 1.972603 .109589 2.480697 .109589C3.636364 .109589 4.881694-1.344956 4.881694-2.809465C4.881694-3.745953 4.313823-4.403487 3.556663-4.403487C3.058531-4.403487 2.580324-4.044832 2.251557-3.656289C2.15193-4.194271 1.723537-4.403487 1.354919-4.403487C.896638-4.403487 .707347-4.014944 .617684-3.835616C.438356-3.496887 .308842-2.899128 .308842-2.86924C.308842-2.769614 .408468-2.769614 .428394-2.769614C.52802-2.769614 .537983-2.779577 .597758-2.998755C.767123-3.706102 .966376-4.184309 1.325031-4.184309C1.494396-4.184309 1.633873-4.104608 1.633873-3.726027C1.633873-3.496887 1.603985-3.387298 1.564134-3.217933L.448319 1.215442ZM2.201743-3.108344C2.271482-3.377335 2.540473-3.656289 2.719801-3.805729C3.068493-4.11457 3.35741-4.184309 3.526775-4.184309C3.92528-4.184309 4.164384-3.835616 4.164384-3.247821S3.835616-1.514321 3.656289-1.135741C3.317559-.438356 2.839352-.109589 2.470735-.109589C1.8132-.109589 1.683686-.936488 1.683686-.996264C1.683686-1.016189 1.683686-1.036115 1.713574-1.155666L2.201743-3.108344Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="74.476785" y="-50.082181" ns1:href="#g3-112" />
+<ns0:use x="79.48925" y="-50.082181" ns1:href="#g4-40" />
+<ns0:use x="83.363624" y="-50.082181" ns1:href="#g3-14" />
+<ns0:use x="88.168515" y="-50.082181" ns1:href="#g4-40" />
+<ns0:use x="92.042889" y="-50.082181" ns1:href="#g1-83" />
+<ns0:use x="97.577706" y="-54.869754" ns1:href="#g5-114" />
+<ns0:use x="100.691049" y="-54.869754" ns1:href="#g5-101" />
+<ns0:use x="104.233341" y="-54.869754" ns1:href="#g5-112" />
+<ns0:use x="97.577706" y="-47.115599" ns1:href="#g0-65" />
+<ns0:use x="109.131654" y="-50.082181" ns1:href="#g3-59" />
+<ns0:use x="113.559462" y="-50.082181" ns1:href="#g1-83" />
+<ns0:use x="119.094279" y="-54.869754" ns1:href="#g5-114" />
+<ns0:use x="122.207622" y="-54.869754" ns1:href="#g5-101" />
+<ns0:use x="125.749913" y="-54.869754" ns1:href="#g5-112" />
+<ns0:use x="119.094279" y="-47.115599" ns1:href="#g0-66" />
+<ns0:use x="130.648226" y="-50.082181" ns1:href="#g4-41" />
+<ns0:use x="137.289932" y="-50.082181" ns1:href="#g2-21" />
+<ns0:use x="147.806012" y="-50.082181" ns1:href="#g3-14" />
+<ns0:use x="152.610904" y="-50.082181" ns1:href="#g4-40" />
+<ns0:use x="156.485278" y="-50.082181" ns1:href="#g1-83" />
+<ns0:use x="162.020095" y="-48.5878" ns1:href="#g0-65" />
+<ns0:use x="167.554836" y="-50.082181" ns1:href="#g3-59" />
+<ns0:use x="171.982644" y="-50.082181" ns1:href="#g1-83" />
+<ns0:use x="177.517461" y="-48.5878" ns1:href="#g0-66" />
+<ns0:use x="182.664784" y="-50.082181" ns1:href="#g4-41" />
+<ns0:use x="186.539158" y="-50.082181" ns1:href="#g2-106" />
+<ns0:use x="189.306567" y="-50.082181" ns1:href="#g3-72" />
+<ns0:use x="197.588023" y="-48.5878" ns1:href="#g5-48" />
+<ns0:use x="202.057392" y="-50.082181" ns1:href="#g4-41" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/67ebeedcf8c4d1141331d07b2cef2b03.svg b/svgs/67ebeedcf8c4d1141331d07b2cef2b03.svg
new file mode 100644
index 0000000..0d3e4e5
--- /dev/null
+++ b/svgs/67ebeedcf8c4d1141331d07b2cef2b03.svg
@@ -0,0 +1,20 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="33.198404999999994pt" height="14.94395999999999pt" viewBox="-52.07469 -69.509329 33.198404999999994 14.94395999999999" readme2tex:offset="2.4906600000000054" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g0-66" d="M2.873225-2.677958C3.103362-2.901121 3.159153-3.256787 3.159153-3.556663C3.159153-4.037858 3.005729-4.337733 2.84533-4.505106C3.326526-4.449315 3.821669-4.274969 3.821669-3.584558C3.821669-3.138232 3.333499-2.775592 2.873225-2.677958ZM1.827148-4.156413C1.827148-4.29589 1.827148-4.372603 1.924782-4.449315C1.952677-4.463263 2.057285-4.533001 2.224658-4.533001C2.531507-4.533001 2.915068-4.288917 2.915068-3.556663C2.915068-2.775592 2.517559-2.608219 1.827148-2.594271V-4.156413ZM3.291656-2.559402C3.682192-2.761644 4.065753-3.110336 4.065753-3.584558C4.065753-4.553923 3.263761-4.777086 2.412951-4.777086H.299875C.174346-4.777086 .062765-4.777086 .062765-4.651557C.062765-4.533001 .18132-4.533001 .292902-4.533001C.711333-4.533001 .732254-4.463263 .732254-4.128518V-.648568C.732254-.299875 .704359-.244085 .265006-.244085C.188294-.244085 .062765-.244085 .062765-.125529C.062765 0 .174346 0 .299875 0H2.371108C3.228892 0 4.33076-.334745 4.33076-1.30411C4.33076-2.02939 3.800747-2.419925 3.291656-2.559402ZM3.068493-.327771C3.326526-.620672 3.375342-.990286 3.375342-1.30411C3.375342-1.785305 3.270735-2.147945 2.991781-2.378082C3.633375-2.273474 4.086675-1.910834 4.086675-1.30411C4.086675-.774097 3.668244-.467248 3.068493-.327771ZM1.827148-.620672V-2.350187C2.385056-2.350187 2.622167-2.350187 2.838356-2.189788C3.110336-1.980573 3.131258-1.527273 3.131258-1.30411C3.131258-1.03213 3.110336-.244085 2.238605-.244085C1.827148-.244085 1.827148-.488169 1.827148-.620672ZM1.673724-.244085H.913574C.976339-.369614 .976339-.54396 .976339-.63462V-4.142466C.976339-4.233126 .976339-4.407472 .913574-4.533001H1.673724C1.583064-4.42142 1.583064-4.281943 1.583064-4.177335V-.599751C1.583064-.495143 1.583064-.355666 1.673724-.244085Z" />
+<ns0:path id="g1-83" d="M.71731-1.046077H.727273C.727273-1.046077 .876712-.806974 1.185554-.508095C1.036115-.508095 .86675-.488169 .71731-.408468V-1.046077ZM2.092154-3.048568C2.86924-2.709838 3.92528-2.261519 3.92528-1.344956C3.92528-.637609 3.616438-.229141 2.630137-.229141C2.11208-.229141 1.673724-.508095 1.325031-.86675C.737235-1.464508 .71731-2.002491 .71731-2.221669C.71731-2.311333 .637609-2.391034 .547945-2.391034C.368618-2.391034 .368618-2.231631 .368618-2.062267V-.209215C.368618-.039851 .368618 .119552 .547945 .119552C.617684 .119552 .647572 .089664 .697385 .029888C.86675-.129514 1.046077-.159402 1.175592-.159402C1.43462-.159402 1.673724-.069738 1.763387-.029888C2.171856 .119552 2.460772 .119552 2.620174 .119552C4.054795 .119552 5.280199-.607721 5.280199-1.912827C5.280199-2.749689 4.772105-3.646326 3.347447-4.204234C2.460772-4.562889 1.474471-4.951432 1.474471-5.648817C1.474471-6.196762 1.863014-6.665006 2.570361-6.665006C3.287671-6.665006 4.254047-5.917808 4.443337-5.080946C4.473225-4.941469 4.493151-4.841843 4.632628-4.841843C4.801993-4.841843 4.801993-5.001245 4.801993-5.17061V-6.684932C4.801993-6.854296 4.801993-7.013699 4.632628-7.013699C4.552927-7.013699 4.513076-6.963885 4.473225-6.933998C4.383562-6.844334 4.244085-6.734745 3.995019-6.734745S3.516812-6.814446 3.39726-6.854296C3.078456-6.973848 2.849315-7.013699 2.550436-7.013699C1.195517-7.013699 .288917-6.346202 .288917-5.021171C.288917-4.373599 .557908-3.696139 2.092154-3.048568ZM4.4533-5.88792C4.383562-5.997509 4.293898-6.097136 4.014944-6.386052C4.283935-6.396015 4.443337-6.475716 4.4533-6.485679V-5.88792ZM4.124533-.587796C4.26401-.876712 4.273973-1.275218 4.273973-1.344956C4.273973-2.391034 3.237858-2.938979 2.460772-3.267746C1.524284-3.666252 .637609-4.084682 .637609-5.021171C.637609-5.489415 .767123-6.057285 1.305106-6.366127L1.315068-6.356164C1.135741-6.067248 1.125778-5.728518 1.125778-5.648817C1.125778-4.742217 2.221669-4.283935 3.118306-3.92528C3.486924-3.775841 3.995019-3.576588 4.383562-3.178082C4.702366-2.82939 4.931507-2.450809 4.931507-1.912827C4.931507-1.135741 4.423412-.747198 4.124533-.587796Z" />
+<ns0:path id="g2-14" d="M2.630137-4.353674C1.384807-4.054795 .418431-2.759651 .418431-1.554172C.418431-.597758 1.05604 .119552 1.992528 .119552C3.158157 .119552 3.985056-1.444583 3.985056-2.819427C3.985056-3.726027 3.58655-4.224159 3.247821-4.672478C2.889166-5.120797 2.30137-5.867995 2.30137-6.306351C2.30137-6.525529 2.500623-6.764633 2.849315-6.764633C3.148194-6.764633 3.347447-6.635118 3.556663-6.495641C3.755915-6.37609 3.955168-6.246575 4.104608-6.246575C4.353674-6.246575 4.503113-6.485679 4.503113-6.645081C4.503113-6.864259 4.343711-6.894147 3.985056-6.973848C3.466999-7.083437 3.327522-7.083437 3.16812-7.083437C2.391034-7.083437 2.032379-6.655044 2.032379-6.057285C2.032379-5.519303 2.321295-4.961395 2.630137-4.353674ZM2.749689-4.134496C2.998755-3.676214 3.297634-3.138232 3.297634-2.420922C3.297634-1.763387 2.919054-.099626 1.992528-.099626C1.444583-.099626 1.036115-.518057 1.036115-1.275218C1.036115-1.902864 1.404732-3.775841 2.749689-4.134496Z" />
+<ns0:path id="g2-59" d="M2.022416-.009963C2.022416-.667497 1.77335-1.05604 1.384807-1.05604C1.05604-1.05604 .856787-.806974 .856787-.52802C.856787-.259029 1.05604 0 1.384807 0C1.504359 0 1.633873-.039851 1.733499-.129514C1.763387-.14944 1.77335-.159402 1.783313-.159402S1.803238-.14944 1.803238-.009963C1.803238 .727273 1.454545 1.325031 1.125778 1.653798C1.016189 1.763387 1.016189 1.783313 1.016189 1.8132C1.016189 1.882939 1.066002 1.92279 1.115816 1.92279C1.225405 1.92279 2.022416 1.155666 2.022416-.009963Z" />
+<ns0:path id="g3-40" d="M3.297634 2.391034C3.297634 2.361146 3.297634 2.34122 3.128269 2.171856C1.882939 .916563 1.564134-.966376 1.564134-2.49066C1.564134-4.224159 1.942715-5.957659 3.16812-7.202989C3.297634-7.32254 3.297634-7.342466 3.297634-7.372354C3.297634-7.442092 3.257783-7.47198 3.198007-7.47198C3.098381-7.47198 2.201743-6.794521 1.613948-5.529265C1.105853-4.433375 .986301-3.327522 .986301-2.49066C.986301-1.713574 1.09589-.508095 1.643836 .617684C2.241594 1.843088 3.098381 2.49066 3.198007 2.49066C3.257783 2.49066 3.297634 2.460772 3.297634 2.391034Z" />
+<ns0:path id="g3-41" d="M2.879203-2.49066C2.879203-3.267746 2.769614-4.473225 2.221669-5.599004C1.62391-6.824408 .767123-7.47198 .667497-7.47198C.607721-7.47198 .56787-7.43213 .56787-7.372354C.56787-7.342466 .56787-7.32254 .757161-7.143213C1.733499-6.156912 2.30137-4.572852 2.30137-2.49066C2.30137-.787049 1.932752 .966376 .697385 2.221669C.56787 2.34122 .56787 2.361146 .56787 2.391034C.56787 2.450809 .607721 2.49066 .667497 2.49066C.767123 2.49066 1.663761 1.8132 2.251557 .547945C2.759651-.547945 2.879203-1.653798 2.879203-2.49066Z" />
+<ns0:path id="g3-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g2-14" />
+<ns0:use x="-47.269798" y="-62.037349" ns1:href="#g3-40" />
+<ns0:use x="-43.395424" y="-62.037349" ns1:href="#g1-83" />
+<ns0:use x="-37.860607" y="-62.037349" ns1:href="#g2-59" />
+<ns0:use x="-33.432799" y="-62.037349" ns1:href="#g1-83" />
+<ns0:use x="-27.897982" y="-60.542968" ns1:href="#g0-66" />
+<ns0:use x="-22.750659" y="-62.037349" ns1:href="#g3-41" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/9ac49cb370a5b09fca29068ea18eab63.svg b/svgs/9ac49cb370a5b09fca29068ea18eab63.svg
new file mode 100644
index 0000000..a2ae3e3
--- /dev/null
+++ b/svgs/9ac49cb370a5b09fca29068ea18eab63.svg
@@ -0,0 +1,17 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="31.496428999999992pt" height="12.840736000000007pt" viewBox="-52.07469 -68.457717 31.496428999999992 12.840736000000007" readme2tex:offset="-3.552713678800501e-15" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g0-28" d="M2.929016-3.716065H4.60274C4.732254-3.716065 5.090909-3.716065 5.090909-4.054795C5.090909-4.293898 4.881694-4.293898 4.692403-4.293898H1.902864C1.703611-4.293898 1.315068-4.293898 .876712-3.825654C.547945-3.466999 .268991-2.988792 .268991-2.929016C.268991-2.919054 .268991-2.82939 .388543-2.82939C.468244-2.82939 .488169-2.86924 .547945-2.948941C1.036115-3.716065 1.603985-3.716065 1.8132-3.716065H2.6401L1.663761-.518057C1.62391-.398506 1.564134-.18929 1.564134-.14944C1.564134-.039851 1.633873 .119552 1.853051 .119552C2.181818 .119552 2.231631-.159402 2.261519-.308842L2.929016-3.716065Z" />
+<ns0:path id="g0-58" d="M1.912827-.52802C1.912827-.816936 1.673724-1.05604 1.384807-1.05604S.856787-.816936 .856787-.52802S1.09589 0 1.384807 0S1.912827-.239103 1.912827-.52802Z" />
+<ns0:path id="g1-48" d="M4.582814-3.188045C4.582814-3.985056 4.533001-4.782067 4.184309-5.519303C3.726027-6.475716 2.909091-6.635118 2.49066-6.635118C1.892902-6.635118 1.165629-6.37609 .757161-5.449564C.438356-4.762142 .388543-3.985056 .388543-3.188045C.388543-2.440847 .428394-1.544209 .836862-.787049C1.265255 .019925 1.992528 .219178 2.480697 .219178C3.01868 .219178 3.775841 .009963 4.214197-.936488C4.533001-1.62391 4.582814-2.400996 4.582814-3.188045ZM2.480697 0C2.092154 0 1.504359-.249066 1.325031-1.205479C1.215442-1.803238 1.215442-2.719801 1.215442-3.307597C1.215442-3.945205 1.215442-4.60274 1.295143-5.140722C1.484433-6.326276 2.231631-6.41594 2.480697-6.41594C2.809465-6.41594 3.466999-6.236613 3.656289-5.250311C3.755915-4.692403 3.755915-3.935243 3.755915-3.307597C3.755915-2.560399 3.755915-1.882939 3.646326-1.24533C3.496887-.298879 2.929016 0 2.480697 0Z" />
+<ns0:path id="g1-50" d="M1.265255-.767123L2.321295-1.793275C3.875467-3.16812 4.473225-3.706102 4.473225-4.702366C4.473225-5.838107 3.576588-6.635118 2.361146-6.635118C1.235367-6.635118 .498132-5.718555 .498132-4.83188C.498132-4.273973 .996264-4.273973 1.026152-4.273973C1.195517-4.273973 1.544209-4.393524 1.544209-4.801993C1.544209-5.061021 1.364882-5.32005 1.016189-5.32005C.936488-5.32005 .916563-5.32005 .886675-5.310087C1.115816-5.957659 1.653798-6.326276 2.231631-6.326276C3.138232-6.326276 3.566625-5.519303 3.566625-4.702366C3.566625-3.905355 3.068493-3.118306 2.520548-2.500623L.607721-.368618C.498132-.259029 .498132-.239103 .498132 0H4.194271L4.473225-1.733499H4.224159C4.174346-1.43462 4.104608-.996264 4.004981-.846824C3.935243-.767123 3.277709-.767123 3.058531-.767123H1.265255Z" />
+<ns0:path id="g1-61" d="M6.844334-3.257783C6.993773-3.257783 7.183064-3.257783 7.183064-3.457036S6.993773-3.656289 6.854296-3.656289H.886675C.747198-3.656289 .557908-3.656289 .557908-3.457036S.747198-3.257783 .896638-3.257783H6.844334ZM6.854296-1.325031C6.993773-1.325031 7.183064-1.325031 7.183064-1.524284S6.993773-1.723537 6.844334-1.723537H.896638C.747198-1.723537 .557908-1.723537 .557908-1.524284S.747198-1.325031 .886675-1.325031H6.854296Z" />
+<ns0:path id="g1-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g0-28" />
+<ns0:use x="-43.824428" y="-62.037349" ns1:href="#g1-61" />
+<ns0:use x="-33.308348" y="-62.037349" ns1:href="#g1-48" />
+<ns0:use x="-28.327009" y="-62.037349" ns1:href="#g0-58" />
+<ns0:use x="-25.5596" y="-62.037349" ns1:href="#g1-50" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/ae00ae93dc535f589522f8780b5aa275.svg b/svgs/ae00ae93dc535f589522f8780b5aa275.svg
new file mode 100644
index 0000000..02197b0
--- /dev/null
+++ b/svgs/ae00ae93dc535f589522f8780b5aa275.svg
@@ -0,0 +1,22 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="38.733146pt" height="14.94395999999999pt" viewBox="-52.07469 -69.509329 38.733146 14.94395999999999" readme2tex:offset="2.4906600000000054" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g1-83" d="M.71731-1.046077H.727273C.727273-1.046077 .876712-.806974 1.185554-.508095C1.036115-.508095 .86675-.488169 .71731-.408468V-1.046077ZM2.092154-3.048568C2.86924-2.709838 3.92528-2.261519 3.92528-1.344956C3.92528-.637609 3.616438-.229141 2.630137-.229141C2.11208-.229141 1.673724-.508095 1.325031-.86675C.737235-1.464508 .71731-2.002491 .71731-2.221669C.71731-2.311333 .637609-2.391034 .547945-2.391034C.368618-2.391034 .368618-2.231631 .368618-2.062267V-.209215C.368618-.039851 .368618 .119552 .547945 .119552C.617684 .119552 .647572 .089664 .697385 .029888C.86675-.129514 1.046077-.159402 1.175592-.159402C1.43462-.159402 1.673724-.069738 1.763387-.029888C2.171856 .119552 2.460772 .119552 2.620174 .119552C4.054795 .119552 5.280199-.607721 5.280199-1.912827C5.280199-2.749689 4.772105-3.646326 3.347447-4.204234C2.460772-4.562889 1.474471-4.951432 1.474471-5.648817C1.474471-6.196762 1.863014-6.665006 2.570361-6.665006C3.287671-6.665006 4.254047-5.917808 4.443337-5.080946C4.473225-4.941469 4.493151-4.841843 4.632628-4.841843C4.801993-4.841843 4.801993-5.001245 4.801993-5.17061V-6.684932C4.801993-6.854296 4.801993-7.013699 4.632628-7.013699C4.552927-7.013699 4.513076-6.963885 4.473225-6.933998C4.383562-6.844334 4.244085-6.734745 3.995019-6.734745S3.516812-6.814446 3.39726-6.854296C3.078456-6.973848 2.849315-7.013699 2.550436-7.013699C1.195517-7.013699 .288917-6.346202 .288917-5.021171C.288917-4.373599 .557908-3.696139 2.092154-3.048568ZM4.4533-5.88792C4.383562-5.997509 4.293898-6.097136 4.014944-6.386052C4.283935-6.396015 4.443337-6.475716 4.4533-6.485679V-5.88792ZM4.124533-.587796C4.26401-.876712 4.273973-1.275218 4.273973-1.344956C4.273973-2.391034 3.237858-2.938979 2.460772-3.267746C1.524284-3.666252 .637609-4.084682 .637609-5.021171C.637609-5.489415 .767123-6.057285 1.305106-6.366127L1.315068-6.356164C1.135741-6.067248 1.125778-5.728518 1.125778-5.648817C1.125778-4.742217 2.221669-4.283935 3.118306-3.92528C3.486924-3.775841 3.995019-3.576588 4.383562-3.178082C4.702366-2.82939 4.931507-2.450809 4.931507-1.912827C4.931507-1.135741 4.423412-.747198 4.124533-.587796Z" />
+<ns0:path id="g2-14" d="M2.630137-4.353674C1.384807-4.054795 .418431-2.759651 .418431-1.554172C.418431-.597758 1.05604 .119552 1.992528 .119552C3.158157 .119552 3.985056-1.444583 3.985056-2.819427C3.985056-3.726027 3.58655-4.224159 3.247821-4.672478C2.889166-5.120797 2.30137-5.867995 2.30137-6.306351C2.30137-6.525529 2.500623-6.764633 2.849315-6.764633C3.148194-6.764633 3.347447-6.635118 3.556663-6.495641C3.755915-6.37609 3.955168-6.246575 4.104608-6.246575C4.353674-6.246575 4.503113-6.485679 4.503113-6.645081C4.503113-6.864259 4.343711-6.894147 3.985056-6.973848C3.466999-7.083437 3.327522-7.083437 3.16812-7.083437C2.391034-7.083437 2.032379-6.655044 2.032379-6.057285C2.032379-5.519303 2.321295-4.961395 2.630137-4.353674ZM2.749689-4.134496C2.998755-3.676214 3.297634-3.138232 3.297634-2.420922C3.297634-1.763387 2.919054-.099626 1.992528-.099626C1.444583-.099626 1.036115-.518057 1.036115-1.275218C1.036115-1.902864 1.404732-3.775841 2.749689-4.134496Z" />
+<ns0:path id="g2-59" d="M2.022416-.009963C2.022416-.667497 1.77335-1.05604 1.384807-1.05604C1.05604-1.05604 .856787-.806974 .856787-.52802C.856787-.259029 1.05604 0 1.384807 0C1.504359 0 1.633873-.039851 1.733499-.129514C1.763387-.14944 1.77335-.159402 1.783313-.159402S1.803238-.14944 1.803238-.009963C1.803238 .727273 1.454545 1.325031 1.125778 1.653798C1.016189 1.763387 1.016189 1.783313 1.016189 1.8132C1.016189 1.882939 1.066002 1.92279 1.115816 1.92279C1.225405 1.92279 2.022416 1.155666 2.022416-.009963Z" />
+<ns0:path id="g0-65" d="M2.559402-4.756164C2.510585-4.87472 2.496638-4.909589 2.419925-4.909589S2.308344-4.846824 2.273474-4.763138L.683437-.592777C.599751-.369614 .4533-.265006 .265006-.244085C.230137-.244085 .118555-.230137 .118555-.125529C.118555 0 .230137 0 .355666 0H1.471482C1.597011 0 1.701619 0 1.701619-.125529C1.701619-.230137 1.603985-.237111 1.555168-.244085C1.457534-.251059 1.283188-.278954 1.283188-.697385C1.283188-.927522 1.332005-1.150685 1.380822-1.373848H2.915068C3.159153-.690411 3.166127-.578829 3.173101-.432379C3.005729-.265006 2.810461-.251059 2.733748-.244085C2.66401-.237111 2.615193-.188294 2.615193-.125529C2.615193 0 2.719801 0 2.84533 0H4.665504C4.791034 0 4.902615 0 4.902615-.125529C4.902615-.223163 4.811955-.237111 4.749191-.244085C4.595766-.27198 4.47721-.390535 4.337733-.704359L2.559402-4.756164ZM1.45056-1.617933C1.610959-2.231631 1.834122-2.824408 2.064259-3.410212C2.280448-2.942964 2.698879-1.93873 2.817435-1.617933H1.45056ZM1.039103-.836862L1.046077-.829888V-.760149C1.039103-.746202 1.039103-.739228 1.039103-.697385C1.039103-.669489 1.039103-.404483 1.136737-.244085H.760149C.850809-.341719 .878705-.425405 .920548-.536986L1.039103-.836862ZM2.168867-3.7868L2.419925-4.463263L4.11457-.606725C4.142466-.536986 4.219178-.369614 4.316812-.244085H3.326526C3.396264-.285928 3.417186-.383562 3.417186-.467248C3.417186-.892653 2.810461-2.308344 2.510585-3.005729L2.168867-3.7868Z" />
+<ns0:path id="g0-66" d="M2.873225-2.677958C3.103362-2.901121 3.159153-3.256787 3.159153-3.556663C3.159153-4.037858 3.005729-4.337733 2.84533-4.505106C3.326526-4.449315 3.821669-4.274969 3.821669-3.584558C3.821669-3.138232 3.333499-2.775592 2.873225-2.677958ZM1.827148-4.156413C1.827148-4.29589 1.827148-4.372603 1.924782-4.449315C1.952677-4.463263 2.057285-4.533001 2.224658-4.533001C2.531507-4.533001 2.915068-4.288917 2.915068-3.556663C2.915068-2.775592 2.517559-2.608219 1.827148-2.594271V-4.156413ZM3.291656-2.559402C3.682192-2.761644 4.065753-3.110336 4.065753-3.584558C4.065753-4.553923 3.263761-4.777086 2.412951-4.777086H.299875C.174346-4.777086 .062765-4.777086 .062765-4.651557C.062765-4.533001 .18132-4.533001 .292902-4.533001C.711333-4.533001 .732254-4.463263 .732254-4.128518V-.648568C.732254-.299875 .704359-.244085 .265006-.244085C.188294-.244085 .062765-.244085 .062765-.125529C.062765 0 .174346 0 .299875 0H2.371108C3.228892 0 4.33076-.334745 4.33076-1.30411C4.33076-2.02939 3.800747-2.419925 3.291656-2.559402ZM3.068493-.327771C3.326526-.620672 3.375342-.990286 3.375342-1.30411C3.375342-1.785305 3.270735-2.147945 2.991781-2.378082C3.633375-2.273474 4.086675-1.910834 4.086675-1.30411C4.086675-.774097 3.668244-.467248 3.068493-.327771ZM1.827148-.620672V-2.350187C2.385056-2.350187 2.622167-2.350187 2.838356-2.189788C3.110336-1.980573 3.131258-1.527273 3.131258-1.30411C3.131258-1.03213 3.110336-.244085 2.238605-.244085C1.827148-.244085 1.827148-.488169 1.827148-.620672ZM1.673724-.244085H.913574C.976339-.369614 .976339-.54396 .976339-.63462V-4.142466C.976339-4.233126 .976339-4.407472 .913574-4.533001H1.673724C1.583064-4.42142 1.583064-4.281943 1.583064-4.177335V-.599751C1.583064-.495143 1.583064-.355666 1.673724-.244085Z" />
+<ns0:path id="g3-40" d="M3.297634 2.391034C3.297634 2.361146 3.297634 2.34122 3.128269 2.171856C1.882939 .916563 1.564134-.966376 1.564134-2.49066C1.564134-4.224159 1.942715-5.957659 3.16812-7.202989C3.297634-7.32254 3.297634-7.342466 3.297634-7.372354C3.297634-7.442092 3.257783-7.47198 3.198007-7.47198C3.098381-7.47198 2.201743-6.794521 1.613948-5.529265C1.105853-4.433375 .986301-3.327522 .986301-2.49066C.986301-1.713574 1.09589-.508095 1.643836 .617684C2.241594 1.843088 3.098381 2.49066 3.198007 2.49066C3.257783 2.49066 3.297634 2.460772 3.297634 2.391034Z" />
+<ns0:path id="g3-41" d="M2.879203-2.49066C2.879203-3.267746 2.769614-4.473225 2.221669-5.599004C1.62391-6.824408 .767123-7.47198 .667497-7.47198C.607721-7.47198 .56787-7.43213 .56787-7.372354C.56787-7.342466 .56787-7.32254 .757161-7.143213C1.733499-6.156912 2.30137-4.572852 2.30137-2.49066C2.30137-.787049 1.932752 .966376 .697385 2.221669C.56787 2.34122 .56787 2.361146 .56787 2.391034C.56787 2.450809 .607721 2.49066 .667497 2.49066C.767123 2.49066 1.663761 1.8132 2.251557 .547945C2.759651-.547945 2.879203-1.653798 2.879203-2.49066Z" />
+<ns0:path id="g3-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g2-14" />
+<ns0:use x="-47.269798" y="-62.037349" ns1:href="#g3-40" />
+<ns0:use x="-43.395424" y="-62.037349" ns1:href="#g1-83" />
+<ns0:use x="-37.860607" y="-60.542968" ns1:href="#g0-65" />
+<ns0:use x="-32.325866" y="-62.037349" ns1:href="#g2-59" />
+<ns0:use x="-27.898058" y="-62.037349" ns1:href="#g1-83" />
+<ns0:use x="-22.363241" y="-60.542968" ns1:href="#g0-66" />
+<ns0:use x="-17.215918" y="-62.037349" ns1:href="#g3-41" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/b7e817ab52abd984b082abaa1da6a8e4.svg b/svgs/b7e817ab52abd984b082abaa1da6a8e4.svg
new file mode 100644
index 0000000..dc8cc37
--- /dev/null
+++ b/svgs/b7e817ab52abd984b082abaa1da6a8e4.svg
@@ -0,0 +1,11 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="10.571438999999994pt" height="13.726298pt" viewBox="-52.07469 -68.900498 10.571438999999994 13.726298" readme2tex:offset="1.4943810000000006" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g0-65" d="M2.559402-4.756164C2.510585-4.87472 2.496638-4.909589 2.419925-4.909589S2.308344-4.846824 2.273474-4.763138L.683437-.592777C.599751-.369614 .4533-.265006 .265006-.244085C.230137-.244085 .118555-.230137 .118555-.125529C.118555 0 .230137 0 .355666 0H1.471482C1.597011 0 1.701619 0 1.701619-.125529C1.701619-.230137 1.603985-.237111 1.555168-.244085C1.457534-.251059 1.283188-.278954 1.283188-.697385C1.283188-.927522 1.332005-1.150685 1.380822-1.373848H2.915068C3.159153-.690411 3.166127-.578829 3.173101-.432379C3.005729-.265006 2.810461-.251059 2.733748-.244085C2.66401-.237111 2.615193-.188294 2.615193-.125529C2.615193 0 2.719801 0 2.84533 0H4.665504C4.791034 0 4.902615 0 4.902615-.125529C4.902615-.223163 4.811955-.237111 4.749191-.244085C4.595766-.27198 4.47721-.390535 4.337733-.704359L2.559402-4.756164ZM1.45056-1.617933C1.610959-2.231631 1.834122-2.824408 2.064259-3.410212C2.280448-2.942964 2.698879-1.93873 2.817435-1.617933H1.45056ZM1.039103-.836862L1.046077-.829888V-.760149C1.039103-.746202 1.039103-.739228 1.039103-.697385C1.039103-.669489 1.039103-.404483 1.136737-.244085H.760149C.850809-.341719 .878705-.425405 .920548-.536986L1.039103-.836862ZM2.168867-3.7868L2.419925-4.463263L4.11457-.606725C4.142466-.536986 4.219178-.369614 4.316812-.244085H3.326526C3.396264-.285928 3.417186-.383562 3.417186-.467248C3.417186-.892653 2.810461-2.308344 2.510585-3.005729L2.168867-3.7868Z" />
+<ns0:path id="g1-83" d="M.71731-1.046077H.727273C.727273-1.046077 .876712-.806974 1.185554-.508095C1.036115-.508095 .86675-.488169 .71731-.408468V-1.046077ZM2.092154-3.048568C2.86924-2.709838 3.92528-2.261519 3.92528-1.344956C3.92528-.637609 3.616438-.229141 2.630137-.229141C2.11208-.229141 1.673724-.508095 1.325031-.86675C.737235-1.464508 .71731-2.002491 .71731-2.221669C.71731-2.311333 .637609-2.391034 .547945-2.391034C.368618-2.391034 .368618-2.231631 .368618-2.062267V-.209215C.368618-.039851 .368618 .119552 .547945 .119552C.617684 .119552 .647572 .089664 .697385 .029888C.86675-.129514 1.046077-.159402 1.175592-.159402C1.43462-.159402 1.673724-.069738 1.763387-.029888C2.171856 .119552 2.460772 .119552 2.620174 .119552C4.054795 .119552 5.280199-.607721 5.280199-1.912827C5.280199-2.749689 4.772105-3.646326 3.347447-4.204234C2.460772-4.562889 1.474471-4.951432 1.474471-5.648817C1.474471-6.196762 1.863014-6.665006 2.570361-6.665006C3.287671-6.665006 4.254047-5.917808 4.443337-5.080946C4.473225-4.941469 4.493151-4.841843 4.632628-4.841843C4.801993-4.841843 4.801993-5.001245 4.801993-5.17061V-6.684932C4.801993-6.854296 4.801993-7.013699 4.632628-7.013699C4.552927-7.013699 4.513076-6.963885 4.473225-6.933998C4.383562-6.844334 4.244085-6.734745 3.995019-6.734745S3.516812-6.814446 3.39726-6.854296C3.078456-6.973848 2.849315-7.013699 2.550436-7.013699C1.195517-7.013699 .288917-6.346202 .288917-5.021171C.288917-4.373599 .557908-3.696139 2.092154-3.048568ZM4.4533-5.88792C4.383562-5.997509 4.293898-6.097136 4.014944-6.386052C4.283935-6.396015 4.443337-6.475716 4.4533-6.485679V-5.88792ZM4.124533-.587796C4.26401-.876712 4.273973-1.275218 4.273973-1.344956C4.273973-2.391034 3.237858-2.938979 2.460772-3.267746C1.524284-3.666252 .637609-4.084682 .637609-5.021171C.637609-5.489415 .767123-6.057285 1.305106-6.366127L1.315068-6.356164C1.135741-6.067248 1.125778-5.728518 1.125778-5.648817C1.125778-4.742217 2.221669-4.283935 3.118306-3.92528C3.486924-3.775841 3.995019-3.576588 4.383562-3.178082C4.702366-2.82939 4.931507-2.450809 4.931507-1.912827C4.931507-1.135741 4.423412-.747198 4.124533-.587796Z" />
+<ns0:path id="g2-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g1-83" />
+<ns0:use x="-46.539873" y="-60.542968" ns1:href="#g0-65" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/d06f8d92c07734af06da289c13d2beed.svg b/svgs/d06f8d92c07734af06da289c13d2beed.svg
new file mode 100644
index 0000000..5437bd8
--- /dev/null
+++ b/svgs/d06f8d92c07734af06da289c13d2beed.svg
@@ -0,0 +1,11 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="10.184010999999995pt" height="13.726298pt" viewBox="-52.07469 -68.900498 10.184010999999995 13.726298" readme2tex:offset="1.4943810000000006" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g0-66" d="M2.873225-2.677958C3.103362-2.901121 3.159153-3.256787 3.159153-3.556663C3.159153-4.037858 3.005729-4.337733 2.84533-4.505106C3.326526-4.449315 3.821669-4.274969 3.821669-3.584558C3.821669-3.138232 3.333499-2.775592 2.873225-2.677958ZM1.827148-4.156413C1.827148-4.29589 1.827148-4.372603 1.924782-4.449315C1.952677-4.463263 2.057285-4.533001 2.224658-4.533001C2.531507-4.533001 2.915068-4.288917 2.915068-3.556663C2.915068-2.775592 2.517559-2.608219 1.827148-2.594271V-4.156413ZM3.291656-2.559402C3.682192-2.761644 4.065753-3.110336 4.065753-3.584558C4.065753-4.553923 3.263761-4.777086 2.412951-4.777086H.299875C.174346-4.777086 .062765-4.777086 .062765-4.651557C.062765-4.533001 .18132-4.533001 .292902-4.533001C.711333-4.533001 .732254-4.463263 .732254-4.128518V-.648568C.732254-.299875 .704359-.244085 .265006-.244085C.188294-.244085 .062765-.244085 .062765-.125529C.062765 0 .174346 0 .299875 0H2.371108C3.228892 0 4.33076-.334745 4.33076-1.30411C4.33076-2.02939 3.800747-2.419925 3.291656-2.559402ZM3.068493-.327771C3.326526-.620672 3.375342-.990286 3.375342-1.30411C3.375342-1.785305 3.270735-2.147945 2.991781-2.378082C3.633375-2.273474 4.086675-1.910834 4.086675-1.30411C4.086675-.774097 3.668244-.467248 3.068493-.327771ZM1.827148-.620672V-2.350187C2.385056-2.350187 2.622167-2.350187 2.838356-2.189788C3.110336-1.980573 3.131258-1.527273 3.131258-1.30411C3.131258-1.03213 3.110336-.244085 2.238605-.244085C1.827148-.244085 1.827148-.488169 1.827148-.620672ZM1.673724-.244085H.913574C.976339-.369614 .976339-.54396 .976339-.63462V-4.142466C.976339-4.233126 .976339-4.407472 .913574-4.533001H1.673724C1.583064-4.42142 1.583064-4.281943 1.583064-4.177335V-.599751C1.583064-.495143 1.583064-.355666 1.673724-.244085Z" />
+<ns0:path id="g1-83" d="M.71731-1.046077H.727273C.727273-1.046077 .876712-.806974 1.185554-.508095C1.036115-.508095 .86675-.488169 .71731-.408468V-1.046077ZM2.092154-3.048568C2.86924-2.709838 3.92528-2.261519 3.92528-1.344956C3.92528-.637609 3.616438-.229141 2.630137-.229141C2.11208-.229141 1.673724-.508095 1.325031-.86675C.737235-1.464508 .71731-2.002491 .71731-2.221669C.71731-2.311333 .637609-2.391034 .547945-2.391034C.368618-2.391034 .368618-2.231631 .368618-2.062267V-.209215C.368618-.039851 .368618 .119552 .547945 .119552C.617684 .119552 .647572 .089664 .697385 .029888C.86675-.129514 1.046077-.159402 1.175592-.159402C1.43462-.159402 1.673724-.069738 1.763387-.029888C2.171856 .119552 2.460772 .119552 2.620174 .119552C4.054795 .119552 5.280199-.607721 5.280199-1.912827C5.280199-2.749689 4.772105-3.646326 3.347447-4.204234C2.460772-4.562889 1.474471-4.951432 1.474471-5.648817C1.474471-6.196762 1.863014-6.665006 2.570361-6.665006C3.287671-6.665006 4.254047-5.917808 4.443337-5.080946C4.473225-4.941469 4.493151-4.841843 4.632628-4.841843C4.801993-4.841843 4.801993-5.001245 4.801993-5.17061V-6.684932C4.801993-6.854296 4.801993-7.013699 4.632628-7.013699C4.552927-7.013699 4.513076-6.963885 4.473225-6.933998C4.383562-6.844334 4.244085-6.734745 3.995019-6.734745S3.516812-6.814446 3.39726-6.854296C3.078456-6.973848 2.849315-7.013699 2.550436-7.013699C1.195517-7.013699 .288917-6.346202 .288917-5.021171C.288917-4.373599 .557908-3.696139 2.092154-3.048568ZM4.4533-5.88792C4.383562-5.997509 4.293898-6.097136 4.014944-6.386052C4.283935-6.396015 4.443337-6.475716 4.4533-6.485679V-5.88792ZM4.124533-.587796C4.26401-.876712 4.273973-1.275218 4.273973-1.344956C4.273973-2.391034 3.237858-2.938979 2.460772-3.267746C1.524284-3.666252 .637609-4.084682 .637609-5.021171C.637609-5.489415 .767123-6.057285 1.305106-6.366127L1.315068-6.356164C1.135741-6.067248 1.125778-5.728518 1.125778-5.648817C1.125778-4.742217 2.221669-4.283935 3.118306-3.92528C3.486924-3.775841 3.995019-3.576588 4.383562-3.178082C4.702366-2.82939 4.931507-2.450809 4.931507-1.912827C4.931507-1.135741 4.423412-.747198 4.124533-.587796Z" />
+<ns0:path id="g2-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g1-83" />
+<ns0:use x="-46.539873" y="-60.542968" ns1:href="#g0-66" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/d41a53916d4850841d856bc8f5aa809a.svg b/svgs/d41a53916d4850841d856bc8f5aa809a.svg
new file mode 100644
index 0000000..7cc84c3
--- /dev/null
+++ b/svgs/d41a53916d4850841d856bc8f5aa809a.svg
@@ -0,0 +1,9 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="7.195259999999994pt" height="13.726298pt" viewBox="-52.07469 -68.900498 7.195259999999994 13.726298" readme2tex:offset="0.0" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g0-65" d="M3.656289-6.794521C3.58655-6.963885 3.566625-7.013699 3.457036-7.013699S3.297634-6.924035 3.247821-6.804483L.976339-.846824C.856787-.52802 .647572-.37858 .37858-.348692C.328767-.348692 .169365-.328767 .169365-.179328C.169365 0 .328767 0 .508095 0H2.102117C2.281445 0 2.430884 0 2.430884-.179328C2.430884-.328767 2.291407-.33873 2.221669-.348692C2.082192-.358655 1.833126-.398506 1.833126-.996264C1.833126-1.325031 1.902864-1.643836 1.972603-1.96264H4.164384C4.513076-.986301 4.523039-.826899 4.533001-.617684C4.293898-.37858 4.014944-.358655 3.905355-.348692C3.805729-.33873 3.73599-.268991 3.73599-.179328C3.73599 0 3.88543 0 4.064757 0H6.665006C6.844334 0 7.003736 0 7.003736-.179328C7.003736-.318804 6.874222-.33873 6.784558-.348692C6.56538-.388543 6.396015-.557908 6.196762-1.006227L3.656289-6.794521ZM2.072229-2.311333C2.30137-3.188045 2.620174-4.034869 2.948941-4.871731C3.257783-4.204234 3.855542-2.769614 4.024907-2.311333H2.072229ZM1.484433-1.195517L1.494396-1.185554V-1.085928C1.484433-1.066002 1.484433-1.05604 1.484433-.996264C1.484433-.956413 1.484433-.577833 1.62391-.348692H1.085928C1.215442-.488169 1.255293-.607721 1.315068-.767123L1.484433-1.195517ZM3.098381-5.409714L3.457036-6.37609L5.877958-.86675C5.917808-.767123 6.027397-.52802 6.166874-.348692H4.752179C4.851806-.408468 4.881694-.547945 4.881694-.667497C4.881694-1.275218 4.014944-3.297634 3.58655-4.293898L3.098381-5.409714Z" />
+<ns0:path id="g1-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g0-65" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/e723e08dae472a15132221e280670a7e.svg b/svgs/e723e08dae472a15132221e280670a7e.svg
new file mode 100644
index 0000000..e414696
--- /dev/null
+++ b/svgs/e723e08dae472a15132221e280670a7e.svg
@@ -0,0 +1,13 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="13.864718999999994pt" height="8.578936000000013pt" viewBox="-52.07469 -66.326817 13.864718999999994 8.578936000000013" readme2tex:offset="1.9371819999999937" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g0-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+<ns0:path id="g0-101" d="M1.115816-2.510585C1.175592-3.995019 2.012453-4.244085 2.351183-4.244085C3.377335-4.244085 3.476961-2.899128 3.476961-2.510585H1.115816ZM1.105853-2.30137H3.88543C4.104608-2.30137 4.134496-2.30137 4.134496-2.510585C4.134496-3.496887 3.596513-4.463263 2.351183-4.463263C1.195517-4.463263 .278954-3.437111 .278954-2.191781C.278954-.856787 1.325031 .109589 2.470735 .109589C3.686177 .109589 4.134496-.996264 4.134496-1.185554C4.134496-1.285181 4.054795-1.305106 4.004981-1.305106C3.915318-1.305106 3.895392-1.24533 3.875467-1.165629C3.526775-.139477 2.630137-.139477 2.530511-.139477C2.032379-.139477 1.633873-.438356 1.404732-.806974C1.105853-1.285181 1.105853-1.942715 1.105853-2.30137Z" />
+<ns0:path id="g0-112" d="M1.713574-3.745953V-4.403487L.278954-4.293898V-3.985056C.986301-3.985056 1.05604-3.92528 1.05604-3.486924V1.175592C1.05604 1.62391 .946451 1.62391 .278954 1.62391V1.932752C.617684 1.92279 1.135741 1.902864 1.39477 1.902864C1.663761 1.902864 2.171856 1.92279 2.520548 1.932752V1.62391C1.853051 1.62391 1.743462 1.62391 1.743462 1.175592V-.498132V-.587796C1.793275-.428394 2.211706 .109589 2.968867 .109589C4.154421 .109589 5.190535-.86675 5.190535-2.15193C5.190535-3.417186 4.224159-4.403487 3.108344-4.403487C2.331258-4.403487 1.912827-3.965131 1.713574-3.745953ZM1.743462-1.135741V-3.35741C2.032379-3.865504 2.520548-4.154421 3.028643-4.154421C3.755915-4.154421 4.363636-3.277709 4.363636-2.15193C4.363636-.946451 3.666252-.109589 2.929016-.109589C2.530511-.109589 2.15193-.308842 1.882939-.71731C1.743462-.926526 1.743462-.936488 1.743462-1.135741Z" />
+<ns0:path id="g0-114" d="M1.663761-3.307597V-4.403487L.278954-4.293898V-3.985056C.976339-3.985056 1.05604-3.915318 1.05604-3.427148V-.757161C1.05604-.308842 .946451-.308842 .278954-.308842V0C.667497-.009963 1.135741-.029888 1.414695-.029888C1.8132-.029888 2.281445-.029888 2.67995 0V-.308842H2.470735C1.733499-.308842 1.713574-.418431 1.713574-.777086V-2.311333C1.713574-3.297634 2.132005-4.184309 2.889166-4.184309C2.958904-4.184309 2.978829-4.184309 2.998755-4.174346C2.968867-4.164384 2.769614-4.044832 2.769614-3.785803C2.769614-3.506849 2.978829-3.35741 3.198007-3.35741C3.377335-3.35741 3.626401-3.476961 3.626401-3.795766S3.317559-4.403487 2.889166-4.403487C2.161893-4.403487 1.803238-3.73599 1.663761-3.307597Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g0-114" />
+<ns0:use x="-48.172639" y="-62.037349" ns1:href="#g0-101" />
+<ns0:use x="-43.744788" y="-62.037349" ns1:href="#g0-112" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file
diff --git a/svgs/f0e8ebc4201c3608138c518417f42ac4.svg b/svgs/f0e8ebc4201c3608138c518417f42ac4.svg
new file mode 100644
index 0000000..d0d3f34
--- /dev/null
+++ b/svgs/f0e8ebc4201c3608138c518417f42ac4.svg
@@ -0,0 +1,9 @@
+<ns0:svg xmlns:ns0="http://www.w3.org/2000/svg" xmlns:ns1="http://www.w3.org/1999/xlink" version="1.1" width="6.641781999999994pt" height="13.726298pt" viewBox="-52.07469 -68.900498 6.641781999999994 13.726298" readme2tex:offset="0.0" xmlns:readme2tex="http://github.com/leegao/readme2tex/">
+<ns0:defs>
+<ns0:path id="g0-66" d="M1.39477-5.917808C1.39477-6.047323 1.39477-6.296389 1.305106-6.475716H2.391034C2.261519-6.316314 2.261519-6.117061 2.261519-5.967621V-.856787C2.261519-.707347 2.261519-.508095 2.391034-.348692H1.305106C1.39477-.52802 1.39477-.777086 1.39477-.9066V-5.917808ZM4.104608-3.825654C4.433375-4.144458 4.513076-4.652553 4.513076-5.080946C4.513076-5.768369 4.293898-6.196762 4.064757-6.435866C4.752179-6.356164 5.459527-6.107098 5.459527-5.120797C5.459527-4.483188 4.762142-3.965131 4.104608-3.825654ZM2.610212-5.937733C2.610212-6.136986 2.610212-6.246575 2.749689-6.356164C2.789539-6.37609 2.938979-6.475716 3.178082-6.475716C3.616438-6.475716 4.164384-6.127024 4.164384-5.080946C4.164384-3.965131 3.596513-3.726027 2.610212-3.706102V-5.937733ZM4.702366-3.656289C5.260274-3.945205 5.808219-4.443337 5.808219-5.120797C5.808219-6.505604 4.662516-6.824408 3.447073-6.824408H.428394C.249066-6.824408 .089664-6.824408 .089664-6.645081C.089664-6.475716 .259029-6.475716 .418431-6.475716C1.016189-6.475716 1.046077-6.37609 1.046077-5.897883V-.926526C1.046077-.428394 1.006227-.348692 .37858-.348692C.268991-.348692 .089664-.348692 .089664-.179328C.089664 0 .249066 0 .428394 0H3.387298C4.612702 0 6.1868-.478207 6.1868-1.863014C6.1868-2.899128 5.429639-3.457036 4.702366-3.656289ZM2.610212-.886675V-3.35741C3.407223-3.35741 3.745953-3.35741 4.054795-3.128269C4.443337-2.82939 4.473225-2.181818 4.473225-1.863014C4.473225-1.474471 4.443337-.348692 3.198007-.348692C2.610212-.348692 2.610212-.697385 2.610212-.886675ZM4.383562-.468244C4.752179-.886675 4.821918-1.414695 4.821918-1.863014C4.821918-2.550436 4.672478-3.068493 4.273973-3.39726C5.190535-3.247821 5.838107-2.729763 5.838107-1.863014C5.838107-1.105853 5.240349-.667497 4.383562-.468244Z" />
+<ns0:path id="g1-97" d="M3.317559-.757161C3.35741-.358655 3.626401 .059776 4.094645 .059776C4.303861 .059776 4.911582-.079701 4.911582-.886675V-1.444583H4.662516V-.886675C4.662516-.308842 4.41345-.249066 4.303861-.249066C3.975093-.249066 3.935243-.697385 3.935243-.747198V-2.739726C3.935243-3.158157 3.935243-3.5467 3.576588-3.915318C3.188045-4.303861 2.689913-4.463263 2.211706-4.463263C1.39477-4.463263 .707347-3.995019 .707347-3.337484C.707347-3.038605 .9066-2.86924 1.165629-2.86924C1.444583-2.86924 1.62391-3.068493 1.62391-3.327522C1.62391-3.447073 1.574097-3.775841 1.115816-3.785803C1.384807-4.134496 1.872976-4.244085 2.191781-4.244085C2.67995-4.244085 3.247821-3.855542 3.247821-2.968867V-2.600249C2.739726-2.570361 2.042341-2.540473 1.414695-2.241594C.667497-1.902864 .418431-1.384807 .418431-.946451C.418431-.139477 1.384807 .109589 2.012453 .109589C2.669988 .109589 3.128269-.288917 3.317559-.757161ZM3.247821-2.391034V-1.39477C3.247821-.448319 2.530511-.109589 2.082192-.109589C1.594022-.109589 1.185554-.458281 1.185554-.956413C1.185554-1.504359 1.603985-2.331258 3.247821-2.391034Z" />
+</ns0:defs>
+<ns0:g id="page1" fill-opacity="0.9">
+<ns0:use x="-52.07469" y="-62.037349" ns1:href="#g0-66" />
+</ns0:g>
+</ns0:svg>
\ No newline at end of file