bigdatagenomics · gunjanbaid · Jul 27, 2017 · Jul 27, 2017 · Jul 27, 2017 · Jul 27, 2017
diff --git a/docs/source/70_algorithms.md b/docs/source/70_algorithms.md
@@ -1,4 +1,4 @@
-# Core Algorithms {#algorithms}
+# Core Algorithms
 
 ## Read Preprocessing Algorithms
 
@@ -43,7 +43,7 @@ as well as the total number of bases within the covariate that do not match the
 reference genome. From this data, we apply a correction by estimating the error
 probability for each set of covariates under a beta-binomial model with uniform
 prior. We have validated the concordance of our BQSR implementation against the
-GATK. Across both tools, only 5000 of the 180B bases ($<0.0001\%$) in the
+GATK. Across both tools, only 5000 of the 180B bases (`$<0.0001\%$`) in the
 high-coverage NA12878 genome dataset differ. After investigating this
 discrepancy, we have determined that this is due to an error in the GATK, where
 paired-end reads are mishandled if the two reads in the pair overlap.
@@ -70,7 +70,7 @@ variants; see DePristo et al [@depristo11].
 In the rest of this section, we discuss the high level implementations of these
 algorithms.
 
-### BQSR Implementation {#bqsr}
+### BQSR Implementation
 
 Base quality score recalibration seeks to identify and correct correlated
 errors in base quality score estimates. At a high level, this is done by 
@@ -141,15 +141,15 @@ estimate the observed base quality using the below equation. This represents a
 Bayesian model of the mismatch probability with Binomial likelihood and a
 Beta(1, 1) prior.
 
-$$
+```math
 \mathbf{E}(P_{err}|{cov}) = \frac{\text{\#errors}(cov) + 1}{\text{\#observations}(cov) + 2}
-$$
+```
 
 After these probabilities are estimated, we go back across the input read
 dataset and reconstruct the quality scores of the read by using the covariate
 assigned to the read to look into the covariate table.
 
-### Indel Realignment Implementation {#realignment}
+### Indel Realignment Implementation
 
 Although global alignment will frequently succeed at aligning reads to the
 proper region of the genome, the local alignment of the read may be incorrect.
@@ -172,15 +172,17 @@ region covered by that read.
 #### Convex-Hull Finding
 
 Once we have identified the target realignment regions, we must then find the
-maximal convex hulls across the set of regions. For a set $R$ of regions, we
-define a maximal convex hull as the largest region $\hat{r}$ that satisfies the
+maximal convex hulls across the set of regions. For a set `$R$` of regions, we
+define a maximal convex hull as the largest region `$\hat{r}$` that satisfies the
 following properties:
 
+```math
 \begin{align}
 \hat{r} &= \cup_{r_i \in \hat{R}} r_i \\
 \hat{r} \cap r_i &\ne \emptyset, \forall r_i \in \hat{R} \\
 \hat{R} &\subset R
 \end{align}
+```
 
 In our problem, we seek to find all of the maximal convex hulls, given a set of
 regions. For genomics, the convexity constraint described by equation
@@ -241,7 +243,7 @@ tail-call recursive `mergeTargetSets` function that is described in Algorithm
 The set returned by this function is used as an index for mapping reads
 directly to realignment targets.
 
-#### Candidate Generation and Realignment {#consensus-model}
+#### Candidate Generation and Realignment
 
 Once we have generated the target set, we map across all the reads and check to
 see if the read overlaps a realignment target. We then group together all reads
@@ -269,26 +271,28 @@ take each read and compute the quality score weighted Hamming edit distance of
 the read placed at each site in the consensus sequence. We then take the
 minimum quality score weighted edit versus the consensus sequence and the
 reference genome. We aggregate these scores together for all reads against this
-consensus sequence. Given a consensus sequence $c$, a reference sequence $R$,
-and a set of reads $\mathbf{r}$, we calculate this score using the equation 
+consensus sequence. Given a consensus sequence `$c$`, a reference sequence `$R$`,
+and a set of reads `$\mathbf{r}$`, we calculate this score using the equation 
 below.
 
+```math
 \begin{align}
 q_{i, j} &= \sum_{k = 0}^{l_{r_i}} Q_k I[r_I(k) = c(j + k)] \forall r_i \in \mathbf{R}, j \in \{0, \dots, l_c - l_{r_i}\} \\
 q_{i, R} &= \sum_{k = 0}^{l_{r_i}} Q_k I[r_I(k) = c(j + k)] \forall r_i \in \mathbf{R}, j = \text{pos}(r_i | R) \\
 q_i &= \min(q_{i, R}, \min_{j \in \{0, \dots, l_c - l_{r_i}\}} q_{i, j}) \\
 q_c &= \sum_{r_i \in \mathbf{r}} q_i
 \end{align}
+```
 
-In the above equation, $s(i)$ denotes the base at position $i$ of sequence $s$,
-and $l_s$ denotes the length of sequence $s$. We pick the consensus sequence
-that minimizes the $q_c$ value. If the chosen consensus has a log-odds ratio
-(LOD) that is greater than $5.0$ with respect to the reference, we realign the
+In the above equation, `$s(i)$` denotes the base at position `$i$` of sequence `$s$`,
+and `$l_s$` denotes the length of sequence `$s$`. We pick the consensus sequence
+that minimizes the `$q_c$` value. If the chosen consensus has a log-odds ratio
+(LOD) that is greater than `$5.0$` with respect to the reference, we realign the
 reads. This is done by recomputing the CIGAR and MDTag for each new alignment.
 Realigned reads have their mapping quality score increased by 10 in the Phred
 scale.
 
-### Duplicate Marking Implementation {#duplicate-marking}
+### Duplicate Marking Implementation
 
 Reads may be duplicated during sequencing, either due to clonal duplication
 via PCR before sequencing, or due to optical duplication while on the