-
Notifications
You must be signed in to change notification settings - Fork 18
/
index.Rmd
129 lines (107 loc) · 5.54 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
knit: 'bookdown::render_book("index.Rmd", "tufte_html_book")'
title: 'Automatic Differentiation Handbook'
author: 'Bob Carpenter, Adam Haber, and Charles Margossian'
date: '2020'
bibliography: all.bib
# biblio-style: "acm"
link-citations: true
output:
bookdown::tufte_html_book:
split_by: chapter
toc_depth: 2
bookdown::tufte_book:
latex_engine: "pdflatex"
includes:
in_header: header.tex
toc:
collapse: section
fontsize: 11pt
---
# Preface {-}
The goal of this book is to introduce the basic algorithms for
automatic differentiation along with an encyclopedic collection of
automatic differentiation rules for popular mathematical and
statistical functions.
## Overview of this book {-}
Automatic differentiation is a general technique for converting a
function computing values to one that also computes derivatives.
Derivative computations add only a constant overhead to each operation
used to compute the function value, so that the differentiable
function has the same order of complexity as the original function.
After describing the standard forms of automatic differentiation, this
book supplies an encyclopedic collection of tangent and adjoint rules
for forward-mode and backward-mode automatic differentiation, covering
most widely used scalar, vector, matrix, and probability functions.
The appendix contains working example code for forward-mode,
reverse-mode, and mixed-mode automatic differentiation.
## Why derivatives? {-}
Applications of derivatives, which measure the change in one quantity
relative to a change in another quantity, are ubiquitous in applied
mathematics. Although derivatives can be computed mechanistically by
hand in many cases, the process is painstaking and error prone.
The present text is motivated by the fact that most state-of-the-art
statistical inference algorithms are based on first- or higher-order
derivatives. Examples include
* maximum likelihood and maximum a posteriori estimation with
quasi-Newton or gradient descent methods (first-order derivatives),
* Bayesian posterior sampling with Hamiltonian Monte Carlo sampling
(first-order derivatives of log density functions for Euclidean
geometry; third-order for Riemannian),
* standard error or posterior covariance estimation with Laplace
approximations (second-order derivatives of log density functions),
* maximum marginal likelihood or maximum marginal a posteriori
estimation with Monte Carlo expectation maximization (first-order
derivatives of expectations computed via Monte Carlo samples)
* variational inference with stochastic gradient descent based
on nested expectations (first-order derivatives of
expectations computed via Monte Carlo samples).
Automatic differentiation is also useful for computing the sensitivity
of one quantity to another, expressed as a derivative. Examples in
statistical inference include first- and second-order derivatives
to characterize
* data sensitivity---the sensitivity of a maximum likelihood or
Bayesian parameter estimate to one or more data points,
* hyperparameter sensitivity---the sensitivity of a maximum
likelihood or Bayesian parameter estimate to one or more
model hyperparameters, and
* parameter sensitivity---the sensitivity of a posterior log density
or a log likelihood to the model parameters for a given data set.
Sensitivities are also of interest in many numerical analysis
applications including
* sensitivity of the solution to a system of algebraic or differential
equations to initial conditions or parameters.
## Why automatic differentiation? {-}
Deriving derivatives analytically by hand is not only painstaking and
error prone under the best of circumstances, it is difficult to do
efficiently for iterative functions involved in matrix factorization
or differential equation solving. As its name implies, automatic
differentiation automatically lifts a function computing a value to
one computing the value and its derivatives. Perhaps more
surprisingly, this can be done in full generality with both efficiency
and high arithmetic precision.
## Overview of automatic differentiation techniques {-}
Forward-mode automatic differentiation employs the tangent method of
propagating the chain rule forward from independent (input) variables.
Forward mode computes derivatives of all outputs of a function $f:
\mathbb{R} \rightarrow \mathbb{R}^M$ with respect to its input.
Forward mode can also be used to efficiently compute directional
derivatives, or more generally, gradient-vector products.
Reverse-mode automatic differentiation employs the adjoint method of
propagating the chain rule backward from a dependent (output)
variable. Reverse-mode computes the gradient of a function, that is
the derivatives of the output a function $f : \mathbb{R}^N \rightarrow
\mathbb{R}$ with respect to each input.
Either forward-mode or reverse-mode automatic differentiation may be
used to compute Jacobians, that is the $M \times N$ matrix of all
first-order derivatives of a function $f:\mathbb{R}^N \rightarrow
\mathbb{R}^M.$ Jacobians require $N$ passes of forward-mode or $M$
passes of reverse-mode automatic differentiation.
Nesting reverse-mode automatic differentiation within forward mode
provides efficient calculation of Hessians, that is the matrix of all
second-order derivatives of a function $f:\mathbb{R}^N \rightarrow
\mathbb{R}.$ Hessians require $N$ passes of reverse mode nested in
forward mode. Just as forward-mode automatic differentiation may be
used to efficiently calculate gradient-vector products, reverse mode
nested in forward mode can be used to compute Hessian-vector products
efficiently.