-
Notifications
You must be signed in to change notification settings - Fork 0
/
2.5-dataflow-formalization.tex
838 lines (782 loc) · 42.4 KB
/
2.5-dataflow-formalization.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
% !TEX root = main.tex
\section{Formalization of Unsafe Dataflow with dIMP}
In this section, we are going to formalize what we mean by a program having unsafe dataflow,
give a algorithm specification for computing if a program has unsafe dataflow and
prove that any algorithm conforming to that algorithm specification is sound.
We use the toy language IMP as presented in~\cite{sat}.
The lecture notes themselves heavily borrow from~\cite{fsopl} for the
presentation and semantics of IMP.
We call an algorithm that detects if a program has unsafe dataflow also a dataflow algorithm.
Dataflow is unsafe, if (user-controlled) values flow, without modification,
from a set of sources to a set of sinks.
\subsection{Syntax of dIMP}
We extend the syntax of IMP a bit
and call the resulting language dIMP (for dataflow IMP):
\begin{align*}
n \in \: & \Z\\
\flaggedn{} \in \: & \StoreDomain = \{\flagged{} \mid n \in \Z\}\\
X \in \: & \Loc = \{x, y, z, sum, \ldots\}\\
a \Coloneqq \: &\literalint{} \mid X \mid \defaultsource \mid a_0+a_1 \mid \ldots\\
t \Coloneqq \: & \btrue \mid \bfalse \\
b \Coloneqq \: & t \mid a_0 = a_1 \mid \neg b_0 \mid b_0 \land b_1 \mid \ldots \\
c \Coloneqq \: &\skipcmd \mid \defaultstore \mid \defaultseq \mid \defaultif \mid \defaultwhile \mid \defaultsink\\
f \Coloneqq \: & \textbf{c} \mid \textbf{t}\\
\end{align*}
Exactly which arithmetic and Boolean expressions are available in the language
is left unspecified, as it doesn't matter to the dataflow algorithm.
Stores are denoted with $\store \in \Sigma = \Loc \to \StoreDomain$.
Stores are function from locations to the set of tagged integer
values.
The intuition here is that for all $n \in \Z$,
each value is tagged by either \textbf{c} for clean or \textbf{t} for being tracked
as having unsafe dataflow.
If the tag is irrelevant in the current context, we write $\flaggedn{}$ for a tagged
value.
A tracked value originates from a source, and is preserved by value-preserving operations.
The command $\defaultsink$ is a special marker for the dataflow sink.
It aborts the program if a tracked value reaches the sink, thus indicating a policy
violation.
In real programs, the dataflow source would likely be a procedure reading input,
i.e.\ from a terminal, and the dataflow sink would likely be a procedure
printing the data to a terminal, or otherwise communicating with the outside world.
\iffalse
There is a new judgement of the form
\fbox{$\bigstep{a}{\sigma}{\flaggedn{}}$}:
\begin{align*}
&\bsrule{EA-Num}{}{\bigstep{\literalint}{\store}{\clean}} \qquad
\bsrule{EA-Loc}{}{\bigstep{X}{\store}{\store(X)}}\qquad
\bsrule{EA-Plus}
{\bigstep{a_0}{\store}{\flagged[n_0][0]}
\qquad \bigstep{a_1}{\store}{\flagged[n_1][1]}}
{\bigstep{a_0+a_1}{\store}{\clean[(n_0+n_1)]}}\\
&\bsrule{EA-Minus}
{\bigstep{a_0}{\store}{\flagged[n_0][0]}
\qquad \bigstep{a_1}{\store}{\flagged[n_1][1]}}
{\bigstep{a_0-a_1}{\store}{\clean[(n_0-n_1)]}} \qquad
\bsrule{EA-Times}
{\bigstep{a_0}{\store}{\flagged[n_0][0]}
\qquad \bigstep{a_1}{\store}{\flagged[n_1][1]}}
{\bigstep{a_0 \times a_1}{\store}{\clean[(n_0 \times n_1)]}}\\
&\bsrule{EA-Source}{\bigstep{a}{\store}{\flagged}}
{\bigstep{\source{a}}{\store}{\tracked}}
\end{align*}
The Judgement $\langle c, \sigma \rangle \downarrow \sigma'$ is replaced by
$\bigstep{c}{\sigma}{\excstore'}$
where $\excstore{}: (\Loc \to \StoreDomain) \cup \{\exception\}$.
The state $\exception$ indicates that the program was aborted due to a policy violation.
\\
Judgement \fbox{$\bigstep{c}{\sigma}{\excstore'}$}:
\begin{align*}
&\bsrule{EC-Skip}{}{\bigstep{\skipcmd}{\store}{\store}}\qquad
\bsrule{EC-Assign}{\bigstep{a}{\store}{\flaggedn{}}}
{\bigstep{\defaultstore}{\store}{\store{}[X \mapsto \flaggedn{}]}}\\
&\bsrule{EC-Seq}{\bigstep{c_0}{\sigma}{\store''} \qquad \bigstep{c_1}{\store''}{\excstore'}}
{\bigstep{\defaultseq}{\store}{\excstore'}}\qquad
\bsrule{EC-SeqAbrt}{\bigstep{c_0}{\sigma}{\exception}}
{\bigstep{\defaultseq}{\store}{\exception}} \\
&\bsrule{EC-IfT}{\bigstep{b}{\store}{\btrue}\qquad \bigstep{c_0}{\store}{\excstore'}}
{\bigstep{\defaultif}{\store}{\excstore'}} \qquad
\bsrule{EC-IfF}{\bigstep{b}{\store}{\bfalse}\qquad \bigstep{c_1}{\store}{\excstore'}}
{\bigstep{\defaultif}{\store}{\excstore'}}\\
&\bsrule{EC-WhileF}{\bigstep{b}{\sigma}{\bfalse}}
{\bigstep{\defaultwhile}{\store}{\store}} \\
&\bsrule{EC-WhileT}{\bigstep{b}{\sigma}{\btrue} \qquad \bigstep{c_0}{\store}{\store''}\qquad
\bigstep{\defaultwhile}{\store''}{\excstore'}}
{\bigstep{\defaultwhile}{\store}{\excstore'}}\\
&\bsrule{EC-WhileTAbrt}{\bigstep{b}{\sigma}{\btrue} \qquad \bigstep{c_0}{\store}{\exception}}
{\bigstep{\defaultwhile}{\store}{\exception}}\\
&\bsrule{EC-Sink}{\bigstep{a}{\store}{\clean{}}}
{\bigstep{\defaultsink}{\store}{\store}} \qquad
\bsrule{EC-SinkAbrt}{\bigstep{a}{\store}{\tracked{}}}
{\bigstep{\defaultsink}{\store}{\exception}}\\
\end{align*}
The judgement operations for Boolean expressions are modified to ignore the tracked-flag and work
just as in IMP. This is analogous to how $\textsc{EA-Plus}$ etc.\ work.
\fi
\subsection{Syntax and Semantics of SSA-dIMP}
To facilitate the dataflow analysis, we describe an SSA form on dIMP with a concrete
syntax and semantics.
We do not describe how to transform a dIMP program to an SSA-dIMP program, but
refer the reader to the literature, where different SSA construction algorithms are
described. Some even come with computer-verified correctness and minimality guarantees,
we refer the reader to~\cite{verifiedssa}.
The syntax for Boolean and arithmetic expressions in SSA-dIMP is the same as in dIMP.
However, the command syntax changes.
We now denote commands by $s$ (short for statement, or SSA-command) to highlight the differences
from the commands denoted by $c$ in dIMP.
\begin{align*}
s \Coloneqq &\:\skipcmd \mid \defaultstore \mid \defaultssaseq \mid \defaultssaif\\
&\mid \defaultssawhile \mid \defaultsink \\
\philist \Coloneqq &\: \defaultphilistn\\
\end{align*}
At every point where $\varphi$-nodes can placed in the control flow graph
of a dIMP program, the syntax of SSA-dIMP has an explicit list of $\varphi$-nodes.
\subsubsection*{The SSA Type System}
% TODO possible boolean type?
A program in SSA-dIMP is only valid if it conforms to the SSA properties.
To enforce that, we describe a simple type system.
Every well-typed expression has type $\tint$, thus the set of all types is $\Tau = \{ \tint \}$.
Let $\Gamma, \Delta: \Loc \partialf \Tau$ be partial maps with finite domain.
These are the type contexts.
$\Gamma$ keeps tracks of the type of the variables defined in the store before a
statement is executed, and $\Delta$ keeps track of the types of the
newly defined variables while executing that statement.
We will use $\dunion$ to be the union of maps,
where we assert that the domains of the maps are disjoint.
After giving the typing rules,
we state that type contexts $\Gamma$ and $\Delta$ are enough to prove that the program is indeed in SSA form.
The easiest way to get them is to run an SSA construction algorithm on a dIMP program.
It is a gap in the provided proofs that we do not show that every dIMP program
can be transformed in a valid SSA-dIMP program (especially, we omit showing that
$\varphi$-nodes will be placed exactly at the places they are allowed in the syntax
of SSA-dIMP).
The type system has the following judgments:
\\
Judgement \fbox{$\typestep{\Gamma}{a}{\tint}$}:
\begin{align*}
&\trule{SSA-Num}{}{\typestep{\Gamma}{\overline{n}}{\tint}}\qquad
\trule{SSA-Loc}{}{\typestep{\Gamma}{X}{\Gamma(X)}}\\
&\trule{SSA-Source}{\typestep{\Gamma}{a}{\tint}}{\typestep{\Gamma}{\source{a}}{\tint}}\\
&\trule{SSA-Plus}{\typestep{\Gamma}{a_0}{\tint}\qquad\typestep{\Gamma}{a_1}{\tint}}
{\typestep{\Gamma}{a_0+a_1}{\tint}}\\
&\ldots
\end{align*}
Judgement \fbox{$\typestmt{\Gamma}{b}$}:
\begin{align*}
&\trule{SSA-Bool}{}{\typestmt{\Gamma}{t}}\qquad
\trule{SSA-Eq}{\typestep{\Gamma}{a_0}{\tint}\qquad \typestep{\Gamma}{a_1}{\tint}}
{\typestmt{\Gamma}{a_0 = a_1}}\\
&\trule{SSA-Neg}{\typestmt{\Gamma}{b}}
{\typestmt{\Gamma}{\neg b}}\qquad
\trule{SSA-And}{\typestmt{\Gamma}{b_0}\qquad\typestmt{\Gamma}{b_1}}
{\typestmt{\Gamma}{b_0 \land b_1}}\\
&\ldots
\end{align*}
Judgement \fbox{$\typestep{\Gamma;\Delta_0;\Delta_1}{\philist}{\Delta}$}:
\begin{align*}
&\trule{SSA-$\varphi$}{}{\typestep{\Gamma;\Delta_0;\Delta_1}{\defaultphilistn}{[X_1 \mapsto \type_1, \ldots, X_n \mapsto \type_n]}}\\
&\text{(if for $i \in \{1, \ldots, n\}: X_i \notin \dom\Gamma \land \type_i = (\Gamma \dunion \Delta_0)(Y_i) = (\Gamma \dunion \Delta_1)(Z_i)
\land \forall j \neq i: X_j \neq X_i$)}
\end{align*}
Note that this implies that this rule is only applicable if $Y_i$, $Z_i$ exist in the respective domains
of the partial functions.
\\
Judgement \fbox{$\typestep{\Gamma}{s}{\Delta}$}:
\begin{align*}
&\trule{SSA-Skip}{}{\typestep{\Gamma}{\skipcmd}{\emptyset}}\qquad
\trule{SSA-Assign}{\typestep{\Gamma}{a}{\tint}}{\typestep{\Gamma}{\defaultstore}{[X \mapsto \tint]}} \text{($X \notin \dom\Gamma$)}\\
&\trule{SSA-Seq}{\typestep{\Gamma}{s_0}{\Delta_0}\qquad
\typestep{\Gamma \dunion \Delta_0}{s_1}{\Delta_1}}{\typestep{\Gamma}{\defaultssaseq}{\Delta_0 \dunion \Delta_1}}\\
&\trule{SSA-If}{\typestmt{\Gamma}{b}\qquad\typestep{\Gamma}{s_0}{\Delta_0}\qquad \typestep{\Gamma}{s_1}{\Delta_1}\qquad
\typestep{\Gamma;\Delta_0;\Delta_1}{\philist}{\Delta}}
{\typestep{\Gamma}{\defaultssaif}{\Delta}}\\
&\trule{SSA-While}{\typestep{\Gamma;\emptyset;\Delta_1}{\philist}{\Delta}\qquad
\typestmt{\Gamma \dunion \Delta}{b}\qquad\typestep{\Gamma \dunion \Delta}{s_0}{\Delta_1}}
{\typestep{\Gamma}{\defaultssawhile}{\Delta}}\\
&\trule{SSA-Sink}{\typestep{\Gamma}{a}{\tint}}{\typestep{\Gamma}{\sink{a}}{\emptyset}}\\
\end{align*}
Note that by this definition, the first argument of $\varphi$-nodes for \textbf{if} commands
corresponds to the variable after executing the \btrue{} branch, and the second argument to that
of the variable after executing the \bfalse{} branch.
For \textbf{while} commands, the first argument to the $\varphi$-node corresponds to the value
before executing $s_0$ the first time, and the second argument corresponds to the value
after executing the loop body.
The finite maps $\Gamma$ and $\Delta$ are natural outputs of an algorithm that converts a
dIMP program into SSA-dIMP form.
They prove that a (not necessarily minimal) SSA form was indeed achieved.
Inputs to the program are provided in the initially provided store, and $\Gamma$
has to contain types for all variables accessed by the program that don't
have a definition in the program.
\subsubsection*{The Semantics of SSA-dIMP}
Stores are now partial functions $\store : \Sigma \partialf \Loc$ to indicate that
variables have to be defined before they can be read.
When updating a store, we use the syntax
$\store \dunion [X \mapsto \flagged{}]$ to indicate that the store $\store$ does not
contain the variable $X$.
\autoref{thm:ssa-gamma-delta-disjoint} in combination with ~\autoref{thm:ssa-progress}
justify this notation.
Let $\excstore{}$ denote stores that represent an \exception{}
state or a partial function, i.e.\ $\excstore{} \in (\Loc \partialf \StoreDomain) \union \{\exception\}$
The state $\exception$ indicates that the program was aborted due to a policy violation.
This semantics is not necessarily meant to mirror what actually happens at runtime, i.e.\
in Java values don't have a flag attached whether a variable is tracked or not.
However, this does not mean that the semantics has only theoretical value - for example
Perl has a taint-checking feature, where user input is marked at runtime as tainted,
and if unchecked user input ends up in sinks such as SQL statements, the program aborts.
This is very similar to the semantics given here.
For $\varphi$-nodes, we introduce two judgements --- one that evaluates all the variable definitions
to the first argument, and one that evaluates them all to the second argument.
Both judgements are provided with a filter $\filter \subseteq \dom\sigma$ that specifies which variables
from the outer store should be kept.
The symbol $\store\restrict{\filter}$ denotes the restriction of the map $\store$
to the domain $\filter$.
In order to correctly model the execution of $\varphi$-nodes for \textbf{while} commands,
a second judgement is set up.
The regular command judgement only populates the variables defined by the $\varphi$-nodes
and then defers to a special while-command judgement that evaluates all loop executions,
evaluating the $\varphi$-nodes with the store after the loop body execution.
\\
Judgment \fbox{$\bigstep{a}{\sigma}{\flaggedn{}}$}:
\begin{align*}
&\bsrule{EA-Num}{}{\bigstep{\literalint}{\store}{\clean}} \qquad
\bsrule{EA-Loc}{}{\bigstep{X}{\store}{\store(X)}}\\
&\bsrule{EA-Source}{\bigstep{a}{\store}{\flagged}}
{\bigstep{\source{a}}{\store}{\tracked}}\\
&\bsrule{EA-Plus}
{\bigstep{a_0}{\store}{\flagged[n_0][0]}
\qquad \bigstep{a_1}{\store}{\flagged[n_1][1]}}
{\bigstep{a_0+a_1}{\store}{\clean[(n_0+n_1)]}}\\
&\ldots\\
\end{align*}
\\
Judgement \fbox{$\bigsteppl{\philist}{\filter}{\store}{\store'}$}:
\begin{align*}
&\bsrule{E$\Phi_1$-Assign}{}
{\bigsteppl{\defaultphilistn}{\filter}{\store}{\storeassign{\store\restrict{\filter}}{X_1 \mapsto \store(Y_1),\ldots, X_n \mapsto \store(Y_n)}}}\\
\end{align*}
Judgement \fbox{$\bigsteppr{\philist}{\filter}{\store}{\store'}$}:
\begin{align*}
&\bsrule{E$\Phi_2$-Assign}{}
{\bigsteppr{\defaultphilistn}{\filter}{\store}{\storeassign{\store\restrict{\filter}}{X_1 \mapsto \store(Z_1),\ldots, X_n \mapsto \store(Z_n)}}}\\
\end{align*}
\\
Judgement \fbox{$\bigstepw{s}{\filter}{\store}{\excstore'{}}$}:
\begin{align*}
&\bsrule{EW-WhileF}{\bigstep{b}{\store}{\bfalse}}
{\bigstepw{\defaultssawhile}{\filter}{\store}{\store}}\qquad\\
&\bsrule{EW-WhileTAbrt}{\bigstep{b}{\store}{\btrue} \qquad
\bigstep{s_0}{\store}{\exception}}
{\bigstepw{\defaultssawhile}{\filter}{\store}{\exception}}\\
&\bsrule{EW-WhileT}{\bigstep{b}{\store}{\btrue} \enskip
\bigstep{s_0}{\store}{\store''} \enskip
\bigsteppr{\philist}{\filter}{\store''}{\store'''} \enskip
\bigstepw{\defaultssawhile}{\filter}{\store'''}{\excstore'}}
{\bigstepw{\defaultssawhile}{\filter}{\store}{\excstore'}}
\end{align*}
\\
Judgement \fbox{$\bigstep{s}{\store}{\excstore{}}$}:
\begin{align*}
&\bsrule{EC-Skip}{}{\bigstep{\skipcmd}{\store}{\store}}\qquad
\bsrule{EC-Assign}{\bigstep{a}{\store}{\flaggedn{}}}
{\bigstep{\defaultstore}{\store}{\storeassign{\store}{X \mapsto \flaggedn{}}}} \\
&\bsrule{EC-Seq}{\bigstep{s_0}{\sigma}{\store''} \qquad \bigstep{s_1}{\store''}{\excstore'}}
{\bigstep{\defaultssaseq}{\store}{\excstore'}}\qquad
\bsrule{EC-SeqAbrt}{\bigstep{s_0}{\sigma}{\exception}}
{\bigstep{\defaultssaseq}{\store}{\exception}} \\
&\bsrule{EC-IfT}{\bigstep{b}{\store}{\btrue}\qquad \bigstep{s_0}{\store}{\store''}
\qquad \bigsteppl{\philist}{\dom\store}{\store''}{\store'}}
{\bigstep{\defaultssaif}{\store}{\store'}}\\
&\bsrule{EC-IfF}{\bigstep{b}{\store}{\bfalse}\qquad \bigstep{s_1}{\store}{\store''}
\qquad \bigsteppr{\philist}{\dom\store}{\store''}{\store'}}
{\bigstep{\defaultssaif}{\store}{\store'}}\\
&\bsrule{EC-IfTAbrt}{\bigstep{b}{\store}{\btrue}\qquad \bigstep{s_0}{\store}{\exception}}
{\bigstep{\defaultssaif}{\store}{\exception}} \qquad
\bsrule{EC-IfFAbrt}{\bigstep{b}{\store}{\bfalse}\qquad \bigstep{s_1}{\store}{\exception}}
{\bigstep{\defaultssaif}{\store}{\exception}} \\
&\bsrule{EC-While}{\bigsteppl{\philist}{\dom\store}{\store}{\store''} \qquad
\bigstepw{\defaultssawhile}{\dom\store}{\store''}{\excstore'}}
{\bigstep{\defaultssawhile}{\store}{\excstore'}}\\
&\bsrule{EC-Sink}{\bigstep{a}{\store}{\clean{}}}
{\bigstep{\defaultsink}{\store}{\store}}\qquad
\bsrule{EC-SinkAbrt}{\bigstep{a}{\store}{\tracked{}}}
{\bigstep{\defaultsink}{\store}{\exception}}\\
\end{align*}
The judgement operations for Boolean expressions are modified to ignore the tracked-flag and work
just as in IMP. This is analogous to how $\textsc{EA-Plus}$ etc.\ work.
\subsubsection*{From dIMP to SSA-dIMP}
We show by example how the SSA construction works.
We consider the dIMP program shown in ~\autoref{fig:dimp-example}.
\begin{figure}[h]
\begin{minted}[linenos]{pascal}
I := X;
Z := 1;
while I > 0 do
I := I - 1;
if Z < 100 then
Z := source Z * X
else
Z := Z
;
sink Z
\end{minted}
\caption{The example dIMP program}
\label{fig:dimp-example}
\end{figure}
After transformation, an equivalent SSA-dIMP program is shown in~\autoref{fig:ssa-dimp-example}.
%TODO same figure for both
\begin{figure}[h]
\begin{minted}[escapeinside=||,mathescape=true,linenos]{pascal}
I0 := X;
Z0 := 1;
while [Z3 := |$\varphi$|(Z0, Z2), I2 := |$\varphi$|(I0, I1)]; I2 > 0 do
I1 := I2 - 1;
if Z3 < 100 then
Z1 := source Z3 * X
else
Z1 := Z3
;[Z2 := |$\varphi$|(Z1, Z1)]
;
sink Z3
\end{minted}
\caption{The example program transformed to SSA-dIMP}
\label{fig:ssa-dimp-example}
\end{figure}
The maps $\Gamma$ and $\Delta$ can be easily inferred, except for the while loop,
where the maps do not immediately can be inferred by using the rules.
Thus, as part of the transformation, we give the maps $\Delta$ and
$\Delta_1$ as in rule $\textsc{SSA-While}$ for the while loop in the example program:
$\Delta = [Z_3 \mapsto \tint, I_2 \mapsto \tint]$ and $\Delta_1 = [I_3 \mapsto \tint, Z_2 \mapsto \tint]$.
This SSA form is obviously not unique, as the variable naming is not canonical.
Furthermore, the number of $\varphi$-nodes in the example program is minimal,
but no such assertions is made by the typing rules.
\subsubsection*{Correctness Results}
We state that an SSA-dIMP program together with the SSA type derivation
is in fact in SSA form --- each variable is defined only once.
Furthermore, the structure of the bigstep derivation trees imply that
for every bigstep derivation, all variables are defined before they are used.
Proofs are omitted.
The first theorem is very easy to prove, and the second theorem is proved later in greater generality.
\begin{theorem}
\label{thm:ssa-gamma-delta-disjoint}
If $\typestep{\Gamma}{s}{\Delta}$ holds, then $\Gamma \cap \Delta = \emptyset$.
\end{theorem}
\begin{theorem}
\label{thm:ssa-progress}
If $\typestep{\Gamma}{s}{\Delta}$, $\dom\store = \Gamma$ and
$\bigstep{s}{\store}{\store'}$, then $\store' = \store \dunion \store_0$
and $\dom{\store_0} = \Delta$.
\end{theorem}
\subsection{Definition of Unsafe Dataflow}
\begin{definition}[Program]
A \emph{program} in SSA-dIMP is a command $s$ with finite maps $\Gamma, \Delta$
with $\typestep{\Gamma}{s}{\Delta}$.
Complex programs are expressed by using the recursive nature of the definition of statements.
\end{definition}
\begin{definition}[Initial Store]
An \emph{initial store} is a store $\store$ such that $\initialstore{}$ holds, with
\begin{equation*}
\initialstore{} \iff \forall X: \exists n: \store(X) = \clean[n]
\end{equation*}
This means that in an initial store all values are tagged as being clean.
\end{definition}
\begin{definition}[Unsafe Dataflow]
A tuple $(s, \store, \Gamma, \Delta)$ of a program $s$ and initial store $\store{}$
with $\dom{\store{}} = \Gamma$ and $\initialstore$
has \emph{(unsafe) dataflow from a source to a sink} if
$\bigstep{s}{\store{}}{\exception}$ holds.
This means that the program may abort.
\end{definition}
\begin{definition}[Dataflow Algorithm]
A \emph{dataflow algorithm} $\A(s, \Gamma, \Delta)$ computes, given a program $s$
and an SSA type derivation $\typestep{\Gamma}{s}{\Delta}$,
whether there exists a store $\store{}$
such that $\initialstore$ and $(s, \store, \Gamma, \Delta)$ has unsafe dataflow from a source to a sink.
We require $\A(s, \Gamma, \Delta) \in \{\text{POSSIBLE\_FLOW}, \text{NO\_FLOW}\}$.
\end{definition}
\begin{definition}[Soundness]
A dataflow algorithm is \emph{sound} if for all tuples $(s, \store, \Gamma, \Delta)$ with $\initialstore$ that
have unsafe dataflow it holds that $\A(s, \Gamma, \Delta) = \text{POSSIBLE\_FLOW}$.
\end{definition}
\begin{definition}[Completeness]
A dataflow algorithm is \emph{complete} if for all tuples $(s, \store, \Gamma, \Delta)$ with $\initialstore$ that
do not have unsafe dataflow it holds that $\A(s, \Gamma, \Delta) = \text{NO\_FLOW}$.
\end{definition}
\begin{definition}[False Positive]
A program $s$ for which there exists no store $\store$ such that
$(s, \initialstore, \Gamma, \Delta)$ with $\initialstore{}$ has unsafe dataflow,
but for which $\A(s, \Gamma, \Delta) = \text{POSSIBLE\_FLOW}$ holds
is called a \emph{false positive} of the algorithm.
\end{definition}
\begin{remark}
In general, it is impossible to construct a dataflow algorithm that is both
sound and complete.
In practice, a dataflow algorithm may be neither sound nor complete.
However, in the theoretical setting of this chapter, we are interested in
sound dataflow algorithms.
The \emph{trivially sound dataflow algorithm} $\A_0(s, \Gamma, \Delta) = \text{POSSIBLE\_FLOW}$
is sound by definition, but not very interesting.
We will not consider it further, but it is interesting to keep in mind,
because it shows that just proving that a dataflow algorithm is sound does not
mean it is useful.
\end{remark}
\subsection{A General Setting For Flow Analysis}
\label{sec:df-theory}
In this section, we refine the type system presented for SSA to be viable as a
general-purpose constraint system.
This constraint system can then be solved via fixed-point iteration.
The system is designed in a way that it can be easily adapted to different analyses.
We instantiate this system to determine if programs have unsafe dataflow.
Let $\lattice$ be a poset.
Let $\Gamma, \Delta: \Loc \partialf \lattice$ with finite domain.
For dataflow analysis, we have $\lattice = \{\lclean, \ltracked, \lunknown\}$.
Every variable encountered in the program is typed with an annotated type from \lattice.
Thus, it is either typed with $\lclean{}$ (the variable
is clean, i.e.\ does not contain a value originating from a dataflow source), $\lunknown{}$
(it is unknown whether the variable contains a value originating from a source) or $\ltracked{}$
(the variable contains a tracked value from a dataflow source).
We define the following poset structure:
\begin{figure}[h]
\centering
\begin{tikzpicture}
\node (u) at (0, 1) {$\lunknown$};
\node (c) at (-1, 0) {$\lclean$};
\node (t) at (1, 0) {$\ltracked$};
\draw[->] (c) -- (u);
\draw[->] (t) -- (u);
\end{tikzpicture}
\end{figure}
We introduce the following judgements:
\\
Judgement \fbox{$\typestep{\Gamma}{a}{L}$}:
\begin{align*}
&\trule{DF-Num}{}{\typestep{\Gamma}{\overline{n}}{\lclean}}\qquad
\trule{DF-Loc}{}{\typestep{\Gamma}{X}{\Gamma(X)}}\\
&\trule{DF-Source}{\typestep{\Gamma}{a}{L}}
{\typestep{\Gamma}{\source{a}}{\ltracked}}\\
&\trule{DF-Plus}{\typestep{\Gamma}{a_0}{L_0}\qquad\typestep{\Gamma}{a_1}{L_1}}
{\typestep{\Gamma}{a_0+a_1}{\lclean}}\\
&\ldots
\end{align*}
Judgement \fbox{$\typestmt{\Gamma}{b}$}:
\begin{align*}
&\trule{DF-Bool}{}{\typestmt{\Gamma}{t}}\qquad
\trule{DF-Eq}{\typestep{\Gamma}{a_0}{L_0}\qquad \typestep{\Gamma}{a_1}{L_1}}
{\typestmt{\Gamma}{a_0 = a_1}}\\
&\trule{DF-Neg}{\typestmt{\Gamma}{b}}
{\typestmt{\Gamma}{\neg b}}\qquad
\trule{DF-And}{\typestmt{\Gamma}{b_0}\qquad\typestmt{\Gamma}{b_1}}
{\typestmt{\Gamma}{b_0 \land b_1}}\\
&\ldots
\end{align*}
\\
Judgement \fbox{$\typestep{\Gamma;\Delta_0;\Delta_1}{\philist}{\Delta}$}:
\begin{align*}
&\trule{DF-$\varphi$}{}{\typestep{\Gamma;\Delta_0;\Delta_1}{\defaultphilistn}{[X_1 \mapsto L_1, \ldots, X_n \mapsto L_n]}}\\
&\text{(if for $i \in \{1, \ldots, n\} \exists L_i: X_i \notin \dom\Gamma \land (\Gamma \dunion \Delta_0)(Y_i) \sqsubseteq L_i \land (\Gamma \dunion \Delta_1)(Z_i) \sqsubseteq L_i
\land \forall j \neq i: X_j \neq X_i$)}
\end{align*}
Note that this implies that this rule is only applicable if $Y_i$, $Z_i$ exist in the respective
domains of the partial functions, and if $L_i$ satisfying the constraints exists as well.\\
Judgement \fbox{$\typestep{\Gamma}{s}{\Delta}$}:
\begin{align*}
&\trule{DF-Skip}{}{\typestep{\Gamma}{\skipcmd}{[]}}\qquad
\trule{DF-Assign}{\typestep{\Gamma}{a}{L}}
{\typestep{\Gamma}{\defaultstore}{[X \mapsto L]}} \text{($X \notin \dom\Gamma$)}\\
&\trule{DF-Seq}{\typestep{\Gamma}{s_0}{\Delta_0}\qquad
\typestep{\Gamma \cup \Delta_0}{s_1}{\Delta_1}}{\typestep{\Gamma}{\defaultssaseq}{\Delta_0 \cup \Delta_1}}\\
&\trule{DF-If}{\typestmt{\Gamma}{b}\qquad\typestep{\Gamma}{s_0}{\Delta_0}\qquad \typestep{\Gamma}{s_1}{\Delta_1}\qquad
\typestep{\Gamma;\Delta_0;\Delta_1}{\philist}{\Delta}}
{\typestep{\Gamma}{\defaultssaif}{\Delta}}\\
&\trule{DF-While}{\typestep{\Gamma;\emptyset;\Delta_1}{\philist}{\Delta}\qquad
\typestmt{\Gamma \cup \Delta}{b}\qquad\typestep{\Gamma \cup \Delta}{s_0}{\Delta_1}}
{\typestep{\Gamma}{\defaultssawhile}{\Delta}}\\
&\trule{DF-Sink}{\typestep{\Gamma}{a}{L}}{\typestep{\Gamma}{\sink{a}}{[]}}\\
\end{align*}
\begin{definition}[Concretization Function, ~\cite{abstractinterpretation}]
Let $\gamma:\lattice \to \P(\StoreDomain)$ be the concretization function in the abstract interpretation framework,
as presented in~\cite{abstractinterpretation}.
The concretization function is defined by
\begin{align*}
&\gamma:\lattice \to \P(\StoreDomain)\\
&\gamma(L) = \begin{cases}
\{\flagged{} \mid \flagged{} \in \StoreDomain \land f = \textbf{t}\} & \text{if }L = \ltracked\\
\{\flagged{} \mid \flagged{} \in \StoreDomain \land f = \textbf{c}\} & \text{if }L = \lclean\\
\StoreDomain & \text{if }L = \lunknown\\
\end{cases}
\end{align*}
\end{definition}
\begin{definition}[Store-Matching]
A store $\store$ \emph{matches a description} $\Gamma: \Loc \partialf \lattice$ if it holds that
\begin{equation*}
\matches{\store}{\Gamma} \longeq \dom\store = \dom\Gamma \land \forall X: \sigma(X) \in \gamma(\Gamma(X))
\end{equation*}
\end{definition}
First, we justify why the semantics contains disjoint unions when updating stores.
Then we prove some lemmas needed to establish the soundness of our dataflow analysis.
We omit the easy proofs.
\begin{theorem}
\label{thm:gamma-delta-disjoint}
If $\typestep{\Gamma}{s}{\Delta}$ holds, then $\dom\Gamma \cap \dom\Delta = \emptyset$.
\end{theorem}
\begin{lemma}
\label{lem:gamma-arithm}
Let $\E$ be a derivation of $\bigstep{a}{\sigma}{\flaggedn{}}$.
If $\typestep{\Gamma}{a}{L}$ and $\matches{\store}{\Gamma}$, then $\flaggedn{} \in \gamma(L)$.
\end{lemma}
The following lemma and theorem are very similar to the classic preservation lemma
used to prove the soundness of type systems.
\begin{lemma}
\label{thm:preservation-phi}
Let $\bigsteppl{\Phi}{\Theta}{\sigma''}{\sigma'}$ (or $\bigsteppr{\Phi}{\Theta}{\sigma''}{\sigma'}$),
such that $\store'' = \store \dunion \store_0''\dunion \store{\_}$, $\matches{\store}{\Gamma}$,
$\matches{\store_0''}{\Delta_1}$ (or $\matches{\store_0''}{\Delta_2}$) and $\Theta = \dom\store$.
Let $\typestep{\Gamma;\Delta_0;\Delta_1}{\Phi}{\Delta}$ by the DF rules.
Then there exists a $\store_0$ such that $\store' = \store \dunion \store_0$
and $\matches{\store_0}{\Delta}$.
\end{lemma}
\begin{theorem}
\label{thm:preservation}
Suppose $\bigstep{s}{\store}{\store'}$ by $\E$, $\matches{\store}{\Gamma}$,
and let $\typestep{\Gamma}{s}{\Delta}$ by $\D$ and the DF rules.
Then there exists a $\store_0$ such that $\store' = \store \dunion \store_0$
and $\matches{\store_0}{\Delta}$.
\end{theorem}
\begin{proof}
We prove the theorem by induction over the derivation $\E$.\\
\textbf{Case 1:}
\textsc{EC-Skip}: Trivial, as $\Delta = \emptyset$.\\
\textbf{Case 2:}
\textsc{EC-Sink}: Trivial, as $\Delta = \emptyset$.\\
\textbf{Case 3:}
\begin{align*}
&\E = \bsrule{EC-Seq}{\overset{\E_0}{\bigstep{s_0}{\sigma}{\store''}}
\qquad \overset{\E_1}{\bigstep{s_1}{\store''}{\store'}}}
{\bigstep{\defaultssaseq}{\store}{\store'}}\\
&\D = \trule{DF-Seq}{\overset{\D_0}{\typestep{\Gamma}{s_0}{\Delta_0}}\qquad
\overset{\D_1}{\typestep{\Gamma \cup \Delta_0}{s_1}{\Delta_1}}}
{\typestep{\Gamma}{\defaultssaseq}{\Delta_0 \cup \Delta_1}}
\end{align*}
By assumption we have that $\matches{\store}{\Gamma}$.
Thus, we can apply the IH on $\E_0$ with $\D_0$.
With the IH we get $\store'' = \store \dunion \store''_0$ and $\matches{\store''_0}{\Delta_0}$
with $\matches{\store''}{\Gamma \union \Delta_0}$.
Thus, we can apply the IH on $\E_1$ with $\D_1$.
Then we get that $\store' = \store'' \dunion \store'_0$ and $\matches{\store'_0}{\Delta_1}$.
With that, we have $\store' = \store \dunion \store''_0 \dunion \store'_0$.
We can set $\store_0 = \store''_0 \dunion \store'_0$, and we obviously
have that $\matches{\store_0}{\Delta_0 \union \Delta_1}$.\\
\textbf{Case 4:}
\begin{align*}
&\E = \bsrule{EC-IfT}{
\overset{}{\bigstep{b}{\store}{\btrue}}\qquad
\overset{\E_0}{\bigstep{s_0}{\store}{\store''}}
\qquad
\overset{\E_1}{\bigsteppl{\philist}{\dom\store}{\store''}{\store'}}}
{\bigstep{\defaultssaif}{\store}{\store'}}\\
&\D = \trule{DF-If}{
\overset{}{\typestmt{\Gamma}{b}}\qquad
\overset{\D_0}{\typestep{\Gamma}{s_0}{\Delta_0}}\qquad \typestep{\Gamma}{s_1}{\Delta_1}\qquad
\overset{\D_1}{\typestep{\Gamma;\Delta_0;\Delta_1}{\philist}{\Delta}}}
{\typestep{\Gamma}{\defaultssaif}{\Delta}}
\end{align*}
By IH on $\E_0$ with $\D_0$, we get that $\store'' = \store \dunion \store''_0$ and
$\matches{\store''_0}{\Delta_0}$.
The result then follows from~\autoref{thm:preservation-phi} on $\E_1$ with $\D_1$.\\
\textbf{Case 5:}
$\E$ uses \textsc{EC-IfF}: analogous.\\
\textbf{Case 6:}
\begin{align*}
&\E = \bsrule{EC-Assign}{\overset{\E_0}{\bigstep{a}{\store}{\flaggedn{}}}}
{\bigstep{\defaultstore}{\store}{\storeassign{\store}{X \mapsto \flaggedn{}}}}\\
&\D = \trule{DF-Assign}{\overset{\D_0}{\typestep{\Gamma}{a}{L}}}
{\typestep{\Gamma}{\defaultstore}{[X \mapsto L]}}
\end{align*}
We apply~\autoref{lem:gamma-arithm} on $\E_0$ with $\D_0$,
and as $\flaggedn{} \in \gamma(L)$, we have that $\matches{[X \mapsto \flaggedn{}]}{[X \mapsto L]}$.
The side-condition of \textsc{DF-Assign} implies that the union in the \textsc{EC-Assign}
rule is actually disjoint.
\\
\textbf{Case 7:}\\
\begin{align*}
&\E = \bsrule{EC-While}{\overset{\E_0}{\bigsteppl{\philist}{\dom\store}{\store}{\store''}}\qquad
\overset{\E_2}{\bigstepw{\defaultssawhile}{\dom\store}{\store''}{\excstore'}}}
{\bigstep{\defaultssawhile}{\store}{\store'}}\\
&\D = \trule{DF-While}{\overset{\D_0}{\typestep{\Gamma;\emptyset;\Delta_1}{\philist}{\Delta}}\qquad
\overset{\D_1}{\typestmt{\Gamma \cup \Delta}{b}}\qquad
\overset{\D_2}{\typestep{\Gamma \cup \Delta}{s_0}{\Delta_1}}}
{\typestep{\Gamma}{\defaultssawhile}{\Delta}}
\end{align*}
By applying~\autoref{thm:preservation-phi} on $\E_0$ with $\D_0$, we get that
$\store'' = \store \dunion \store_0$ and $\matches{\store_0}{\Delta}$.
Thus, $\matches{\store''}{\Gamma \union \Delta}$.
We prove the statement by an inner induction over the while derivation $\E^w$:
\begin{claim}
Let $\bigstepw{\defaultwhile}{\filter}{\store_w}{\store_w'}$ by $\E^w$,
$\store_w = \store_w^0 \dunion \store_w^1$
such that $\matches{\store_w^0}{\Gamma}$, $\Theta = \dom\store^0_w$, $\matches{\store_w^1}{\Delta}$
and let $\typestep{\Gamma}{\defaultwhile}{\Delta}$ by $\D$ and the DF rules.
Then we have that there exists a $\store_0$ such that $\store_w' = \store_w^0 \dunion \store_0$,
and $\matches{\store_0}{\Delta}$.
\end{claim}
\begin{claimproof}
\emph{Subcase 1:}
\begin{align*}
\E^w = \bsrule{EW-WhileF}{
\overset{}{\bigstep{b}{\store_w}{\bfalse}}}
{\bigstepw{\defaultssawhile}{\filter}{\store_w}{\store_w}}
\end{align*}
Trivial, taking $\store_0 = \store_w^1$.\\
\emph{Subcase 2:}
\begin{align*}
\E^w = \bsrule{EW-WhileT}{\overset{}{\bigstep{b}{\store_w}{\btrue}} \enskip
\overset{\E^w_0}{\bigstep{s_0}{\store_w}{\store''}} \enskip
\overset{\E^w_1}{\bigsteppr{\philist}{\filter}{\store''}{\store'''}} \enskip
\overset{\E^w_2}{\bigstepw{\defaultssawhile}{\filter}{\store'''}{\excstore'}}}
{\bigstepw{\defaultssawhile}{\filter}{\store_w}{\store_w'}}
\end{align*}
By the outer IH on $\E^w_0$ with $\D_2$, we get that
$\sigma'' = \sigma_w \dunion \sigma_w'' = \sigma_w^0 \dunion \sigma_w^1 \dunion \sigma_w''$
and $\matches{\sigma_w''}{\Delta_1}$.
By~\autoref{thm:preservation-phi} we get that $\store''' = \store_w^0 \dunion \store_0'''$ and
$\matches{\store_0'''}{\Delta}$.
With that we can apply the inner IH on $\E^2_w$ with $\D$ to get the result on $\sigma_w'$.
\end{claimproof}
We finish the proof by applying the claim on $\E^2$ with $\D$.
\end{proof}
\begin{definition}[Safe Sink]
Let $s, \Gamma, \Delta$ be such that $\typestep{\Gamma}{s}{\Delta}$ (by $\D$).
A sink in $\D$ is \emph{safe}, if the type of its argument is $\lclean$.
\end{definition}
We then have the following (general) soundness property of the dataflow analysis:
\begin{theorem}[Soundness of the DF rules]
\label{thm:df-soundness}
Let $s$ be a program in SSA form, $\typestep{\Gamma}{s}{\Delta}$ (by $\D$)
and let $\store$ be a store with $\matches{\store}{\Gamma}$.
Let $\E$ be the bigstep derivation of $\bigstep{s}{\store}{\excstore{}'}$.
Assume that all sinks in $\D$ are safe.
Then $\excstore{}' \neq \exception$.
\end{theorem}
\begin{proof}
Proof by induction over the derivation $\E$.
The cases \textsc{EC-Skip}, \textsc{EC-Sink}, \textsc{EC-Assign}, \textsc{EC-IfT} and \textsc{EC-IfF}
are trivially clear, as they never evaluate to \exception.\\
\textbf{Case 1:}
\begin{align*}
\E = \bsrule{EC-SinkAbrt}{\overset{}{\bigstep{a}{\store}{\tracked{}}}}
{\bigstep{\defaultsink}{\store}{\exception}}\\
\D = \trule{DF-Sink}{\overset{}{\typestep{\Gamma}{a}{L}}}
{\typestep{\Gamma}{\sink{a}}{\emptyset}}
\end{align*}
However, because all sinks are safe, we have $L = \lclean$.
Furthermore, as $\store : \Gamma$ this is a contradiction to ~\autoref{lem:gamma-arithm},
so this case cannot happen.\\
\textbf{Case 2:}
\begin{align*}
&\E = \bsrule{EC-SeqAbrt}{\overset{\E_0}{\bigstep{s_0}{\sigma}{\exception}}}
{\bigstep{\defaultssaseq}{\store}{\exception}}\\
&\D = \trule{DF-Seq}{\overset{\D_0}{\typestep{\Gamma}{s_0}{\Delta_0}}\qquad
\overset{}{\typestep{\Gamma \cup \Delta_0}{s_1}{\Delta_1}}}
{\typestep{\Gamma}{\defaultssaseq}{\Delta_0 \cup \Delta_1}}
\end{align*}
By IH on $\E_0$ with $\D_0$ we get that in $\bigstep{s_0}{\store}{\excstore{}'}$
we have that $\excstore{}' \neq \exception$ holds, so this case cannot happen.
The cases for \textsc{EC-IfTAbrt} and \textsc{EC-IfFAbrt} are proved analogous.\\
\textbf{Case 3:}
\begin{align*}
&\E = \bsrule{EC-Seq}{\overset{\E_0}{\bigstep{s_0}{\sigma}{\store''}} \qquad
\overset{\E_1}{\bigstep{s_1}{\store''}{\excstore'}}}
{\bigstep{\defaultssaseq}{\store}{\excstore'}}\\
&\D = \trule{DF-Seq}{\overset{\D_0}{\typestep{\Gamma}{s_0}{\Delta_0}}\qquad
\overset{\D_1}{\typestep{\Gamma \cup \Delta_0}{s_1}{\Delta_1}}}
{\typestep{\Gamma}{\defaultssaseq}{\Delta_0 \cup \Delta_1}}
\end{align*}
By~\autoref{thm:preservation} on $\E_0$ with $\D_0$ we get that $\matches{\store''}{\Gamma \cup \Delta_0}$.
Then we apply the IH on $\E_1$ with $\D_1$ and get that $\excstore{}' \neq \exception$.\\
\textbf{Case 4:}
\begin{align*}
&\E = \bsrule{EC-While}{\bigsteppl{\philist}{\dom\store}{\store}{\store''} \qquad
\overset{\E_0}{\bigstepw{\defaultssawhile}{\dom\store}{\store''}{\excstore'}}}
{\bigstep{\defaultssawhile}{\store}{\excstore'}}\\
&\D = \trule{DF-While}{\overset{\D_0}{\typestep{\Gamma;\emptyset;\Delta_1}{\philist}{\Delta}}\qquad
\typestmt{\Gamma \cup \Delta}{b}\qquad
\overset{\D_1}{\typestep{\Gamma \cup \Delta}{s_0}{\Delta_1}}}
{\typestep{\Gamma}{\defaultssawhile}{\Delta}}
\end{align*}
\begin{claim}
Let $\store_w = \store_{w0} \dunion \store_{w1}$ with
$\matches{\store_{w0}}{\Gamma}$ and $\matches{\store_{w1}}{\Delta}$,
$\dom\Gamma = \filter$ and
$\bigstepw{\defaultssawhile}{\filter}{\store_w}{\excstore{}'_w}$ by $\E^w$.
Let $\typestep{\Gamma \union \Delta}{s_0}{\Delta_1}$ by $\D^w_0$
and $\typestep{\Gamma;\emptyset;\Delta_1}{\philist}{\Delta}$ by $\D^w_1$.
Then if all sinks in $\E^w$ are safe, $\excstore{}'_w \neq \exception$.
\end{claim}
\begin{claimproof}
The case \textsc{EW-WhileF} is trivial, as it never evaluates to \exception.\\
\emph{Subcase 1:}
\begin{align*}
\E^w = \bsrule{EW-WhileTAbrt}{\bigstep{b}{\store_w}{\btrue} \qquad
\overset{\E_0}{\bigstep{s_0}{\store_w}{\exception}}}
{\bigstepw{\defaultssawhile}{\filter}{\store_w}{\exception}}
\end{align*}
By applying the outer IH on $\E_0$ with $\D^w_0$ we get that for
$\bigstep{s_0}{\store_w}{\excstore{}'}$ it actually holds that $\excstore{}' \neq \exception$,
so this case cannot occur.\\
\emph{Subcase 2:}
\begin{align*}
\E^w = \bsrule{EW-WhileT}{\overset{}{\bigstep{b}{\store_w}{\btrue}} \enskip
\overset{\E^w_0}{\bigstep{s_0}{\store_w}{\store''}} \enskip
\overset{\E^w_1}{\bigsteppr{\philist}{\filter}{\store''}{\store'''}} \enskip
\overset{\E^w_2}{\bigstepw{\defaultssawhile}{\filter}{\store'''}{\excstore'_w}}}
{\bigstepw{\defaultssawhile}{\filter}{\store_w}{\store_w'}}
\end{align*}
By~\autoref{thm:preservation} on $\E^w_0$ with $\D^w_0$, we get that $\store'' = \store_w \dunion \store''_0$
and $\matches{\store''_0}{\Delta_1}$, thus $\matches{\store''}{\Gamma \union \Delta \dunion \Delta_1}$.
By~\autoref{thm:preservation-phi} on $\E^w_1$ with $\D_1^w$ it follows that $\store''' = \store_{w0} \dunion \store'''_0$
with $\matches{\store'''_0}{\Delta}$.
Thus, $\matches{\store'''}{\Gamma \union \Delta}$ and we can apply the inner IH on $\E^w_2$.
That implies that $\excstore{}'_w \neq \exception$.
\end{claimproof}
We apply the claim on $\E_0$ with $\D_0$ and $\D_1$.
\end{proof}
\iffalse
\begin{corollary}
Let $s$ be a program in SSA form with
$\typestep{\Gamma}{s}{\Delta}$ (by $\D$ with the DF rules).
Let $\sigma$ be a store with $\initialstore{}$
and $\matches{\store}{\Gamma}$.
Let $\E$ be the bigstep derivation of
$\bigstep{s}{\store}{\excstore{}'}$.
If all sinks in $\D$ are safe, then $\excstore{}' \neq \exception$.
\end{corollary}
\fi
We now describe a sound dataflow algorithm schema $\A(s, \Gamma_\text{ssa}, \Delta_\text{ssa})$.
This algorithm schema can be seen as a general specification for sound dataflow algorithms.
As soon as an algorithm satisfies this schema, it is sound.
In the next section, we will see an instantiation of this algorithm schema in QL.
First, define $\Gamma$ by $\Gamma(X) = \lclean$ for all $X \in \dom\Gamma_\text{ssa}$.
Then, compute a finite map $\Delta$ and a derivation tree $\D$
such that $\typestep{\Gamma}{s}{\Delta}$ (by $\D$) holds.
Output \text{POSSIBLE\_FLOW} if $\D$ contains a derivation of form
\begin{equation*}
\trule{DF-Sink}{\typestep{\Gamma}{a}{\ltracked}}{\typestep{\Gamma}{\sink{a}}{\emptyset}}
\end{equation*}
or
\begin{equation*}
\trule{DF-Sink}{\typestep{\Gamma}{a}{\lunknown}}{\typestep{\Gamma}{\sink{a}}{\emptyset}}
\end{equation*}
In the absence of these derivations, output \text{NO\_FLOW}.
\begin{corollary}
\label{cor:soundness-df}
The algorithm schema $\A(s, \Gamma_\text{ssa}, \Delta_\text{ssa})$ describes a sound dataflow algorithm.
\end{corollary}
\begin{proof}
We prove the corollary by contraposition.
If the algorithm outputs \text{POSSIBLE\_FLOW}, there is nothing to prove.
If the algorithm outputs \text{NO\_FLOW}, we have to show that no store $\sigma$
with $\initialstore{}$ exists such that the program aborts.
As we have $\matches{\store}{\Gamma}$ by construction of $\Gamma$,
we can apply~\autoref{thm:df-soundness}.
Thus, in this case the program cannot program, and the algorithm schema
is indeed sound.
\end{proof}
The difference between $\ltracked$ and $\lunknown$ at an unsafe sink
is not reflected in the soundness theorem,
but it can be used to tailor the warnings given to the user.
If a sink is marked with $\ltracked$, that sink will abort the program, if it is ever
executed. Such a sink almost always constitutes an error of the programmer.
A sink that is marked with $\lunknown$ might, or might not, abort the program depending on control flow.
There is also the possibility that no control flow path exists at which the sink aborts
the program. In this case, marking the sink as unsafe happens only because the analysis
is not fine-grained enough to detect this.
Thus, the annotated types can be used to show sinks with $\ltracked$ with a higher
priority than sinks with $\lunknown$.
With a different operational semantics where a sink would output the value,
as well as its flag, we could formally make a distinction
if a sink will always get a tainted value, or if that only might happen in some cases.
With that, we could add \text{HAS\_FLOW} to the possible outputs of the algorithm,
and make a distinction between $\ltracked$ and $\lunknown$ at sinks.
This would allow for a stronger soundness theorem.
Currently, both $\ltracked$ and $\lunknown$ get mapped to the algorithm output POSSIBLE\_FLOW.
Furthermore, algorithmic improvements could reduce the number of sinks marked with
$\lunknown$, but not the number of sinks marked with $\ltracked$.