bigdatagenomics · gunjanbaid · Jul 27, 2017 · Jul 27, 2017 · Jul 27, 2017 · Jul 27, 2017
diff --git a/docs/source/70_algorithms.md b/docs/source/70_algorithms.md
@@ -43,7 +43,7 @@ as well as the total number of bases within the covariate that do not match the
 reference genome. From this data, we apply a correction by estimating the error
 probability for each set of covariates under a beta-binomial model with uniform
 prior. We have validated the concordance of our BQSR implementation against the
-GATK. Across both tools, only 5000 of the 180B bases ($<0.0001\%$) in the
+GATK. Across both tools, only 5000 of the 180B bases (`$<0.0001\%$`) in the
 high-coverage NA12878 genome dataset differ. After investigating this
 discrepancy, we have determined that this is due to an error in the GATK, where
 paired-end reads are mishandled if the two reads in the pair overlap.
@@ -141,9 +141,9 @@ estimate the observed base quality using the below equation. This represents a
 Bayesian model of the mismatch probability with Binomial likelihood and a
 Beta(1, 1) prior.
 
-$$
+```math
 \mathbf{E}(P_{err}|{cov}) = \frac{\text{\#errors}(cov) + 1}{\text{\#observations}(cov) + 2}
-$$
+```
 
 After these probabilities are estimated, we go back across the input read
 dataset and reconstruct the quality scores of the read by using the covariate
@@ -172,15 +172,17 @@ region covered by that read.
 #### Convex-Hull Finding
 
 Once we have identified the target realignment regions, we must then find the
-maximal convex hulls across the set of regions. For a set $R$ of regions, we
-define a maximal convex hull as the largest region $\hat{r}$ that satisfies the
+maximal convex hulls across the set of regions. For a set `$R$` of regions, we
+define a maximal convex hull as the largest region `$\hat{r}$` that satisfies the
 following properties:
 
+```math
 \begin{align}
 \hat{r} &= \cup_{r_i \in \hat{R}} r_i \\
 \hat{r} \cap r_i &\ne \emptyset, \forall r_i \in \hat{R} \\
 \hat{R} &\subset R
 \end{align}
+```
 
 In our problem, we seek to find all of the maximal convex hulls, given a set of
 regions. For genomics, the convexity constraint described by equation
@@ -269,21 +271,23 @@ take each read and compute the quality score weighted Hamming edit distance of
 the read placed at each site in the consensus sequence. We then take the
 minimum quality score weighted edit versus the consensus sequence and the
 reference genome. We aggregate these scores together for all reads against this
-consensus sequence. Given a consensus sequence $c$, a reference sequence $R$,
-and a set of reads $\mathbf{r}$, we calculate this score using the equation 
+consensus sequence. Given a consensus sequence `$c$`, a reference sequence `$R$`,
+and a set of reads `$\mathbf{r}$`, we calculate this score using the equation 
 below.
 
+```math
 \begin{align}
 q_{i, j} &= \sum_{k = 0}^{l_{r_i}} Q_k I[r_I(k) = c(j + k)] \forall r_i \in \mathbf{R}, j \in \{0, \dots, l_c - l_{r_i}\} \\
 q_{i, R} &= \sum_{k = 0}^{l_{r_i}} Q_k I[r_I(k) = c(j + k)] \forall r_i \in \mathbf{R}, j = \text{pos}(r_i | R) \\
 q_i &= \min(q_{i, R}, \min_{j \in \{0, \dots, l_c - l_{r_i}\}} q_{i, j}) \\
 q_c &= \sum_{r_i \in \mathbf{r}} q_i
 \end{align}
+```
 
-In the above equation, $s(i)$ denotes the base at position $i$ of sequence $s$,
-and $l_s$ denotes the length of sequence $s$. We pick the consensus sequence
-that minimizes the $q_c$ value. If the chosen consensus has a log-odds ratio
-(LOD) that is greater than $5.0$ with respect to the reference, we realign the
+In the above equation, `$s(i)$` denotes the base at position `$i$` of sequence `$s$`,
+and `$l_s$` denotes the length of sequence `$s$`. We pick the consensus sequence
+that minimizes the `$q_c$` value. If the chosen consensus has a log-odds ratio
+(LOD) that is greater than `$5.0$` with respect to the reference, we realign the
 reads. This is done by recomputing the CIGAR and MDTag for each new alignment.
 Realigned reads have their mapping quality score increased by 10 in the Phred
 scale.