-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrss_output.txt
1932 lines (1932 loc) · 231 KB
/
rss_output.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[#<Feedjira::Parser::AtomEntry:0x00007f8c0ab73958
@author="Drew Breunig",
@content=
"<p><img src=\"/img/dalle_hockney_text_factory.png\" alt=\"\" /></p>\n" +
"\n" +
"<h4 id=\"what-should-we-do-if-llms-arent-compatible-with-privacy-legislation\">What should we do if LLMs aren’t compatible with privacy legislation?</h4>\n" +
"\n" +
"<p>This week, <a href=\"https://jonathanturley.org/2023/04/06/defamed-by-chatgpt-my-own-bizarre-experience-with-artificiality-of-artificial-intelligence/\">Georgetown law professor Jonathan Turley wrote about ChatGPT repeatedly slandering him</a>:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>Recently I learned that ChatGPT falsely reported on a claim of sexual harassment that was never made against me on a trip that never occurred while I was on a faculty where I never taught. ChapGPT relied on a cited Post article that was never written and quotes a statement that was never made by the newspaper. When the Washington Post investigated the false story, it learned that another AI program “Microsoft’s Bing, which is powered by GPT-4, repeated the false claim about Turley.” It appears that I have now been adjudicated by an AI jury on something that never occurred.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>Imagine waking up one day and learning an AI is confidently claiming you’re been mired in a sexual harassment scandal. After the shock, how would you react? Turley continues:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>When contacted by the Post, “Katy Asher, Senior Communications Director at Microsoft, said the company is taking steps to ensure search results are safe and accurate.” That is it and that is the problem. You can be defamed by AI and these companies merely shrug that they try to be accurate. In the meantime, their false accounts metastasize across the Internet. By the time you learn of a false story, the trail is often cold on its origins with an AI system. You are left with no clear avenue or author in seeking redress. You are left with the same question of Reagan’s Labor Secretary, Ray Donovan, who asked “Where do I go to get my reputation back?”</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>It’s a great question! And the answer, which I have been coincidentally thinking about for weeks, is far from clear. The Large Language Models (LLMs) that power ChatGPT are constructed in such a way that understanding <em>why</em> they say the things they do, let alone preventing them from saying such things, isn’t completely understood. Further, the relationship between their training data and the generated models is blurry and legally vague. <strong>LLMs – as they’re built today – may be incompatible with existing privacy regulation.</strong></p>\n" +
"\n" +
"<p>To understand why LLMs might not comply with regulation, we need to quickly cover how they work. And just as importantly, why they lie.</p>\n" +
"\n" +
"<div class=\"sidenote\">\n" +
"\n" +
"<h2>A Quick Note</h2>\n" +
"\n" +
"<p>It's particularlly hard to hold productive discussions about AI & its impacts because there's so much foundation to learn, understand, and establish before we can address our chief concerns. As a result, there's a lot here. By all means, skip the upfront sections you may be familiar with. (Or, grab a drink and read the whole thing.)</p>\n" +
"\n" +
"<p>Sections:</p>\n" +
"\n" +
"<ul>\n" +
"<li><a href=\"#how-large-language-models-work\">How Large Language Models Work</a></li>\n" +
"<li><a href=\"#why-do-llms-make-false-claims\">Why Do LLMs Make False Claims?</a></li>\n" +
"<li><a href=\"#how-can-you-fix-bad-training-data\">How Can You Fix Bad Training Data?</a></li>\n" +
"<li><a href=\"#how-can-you-fix-hallucinations\">How Can You Fix Hallucinations?</a></li>\n" +
"<li><a href=\"#so-what-is-openai-doing\">So What is OpenAI Doing?</a></li>\n" +
"<li><a href=\"#how-can-we-do-better\">How Can We Do Better?</a></li>\n" +
"</ul>\n" +
"\n" +
"</div>\n" +
"\n" +
"<hr />\n" +
"\n" +
"<h4 id=\"how-large-language-models-work\">How Large Language Models Work</h4>\n" +
"\n" +
"<div class=\"sidenote\">\n" +
"\n" +
"<p>LLMs are a collection of probabilities that perform MadLibs at a scale beyond human comprehension.</p>\n" +
"\n" +
"</div>\n" +
"\n" +
"<p>It sounds ridiculous but that’s basically it. There are plenty of excellent explanations on LLMs and how they work (<a href=\"https://arxiv.org/pdf/2212.03551.pdf\">this one</a> by Murray Shanahan might be my favorite), so we won’t be exhaustive here. But it is valuable to expand a bit.</p>\n" +
"\n" +
"<p>First, let’s describe how one creates an LLM:</p>\n" +
"\n" +
"<ol>\n" +
" <li><strong>Get a TON of text data:</strong> They don’t call them “Large Language Models” without reason. You’re going to need a comical amount of text. It can be typed text (from forum posts, blogs, articles, books, etc) or transcribed text (podcasts, audiobooks, or video transcriptions). It doesn’t matter, you just need a lot – we’ll explain why in a bit.</li>\n" +
" <li><strong>Convert the text to numbers:</strong> If we keep our data in text form during training, our processors have to deal with the words, phrases, and punctuation <em>as textual data</em>. And processors don’t natively understand text. So we translate the words into something closer to a processor’s lingua franca: numbers. We’ll walk through our giant dataset of text and build a dictionary of each word or punctuation we encounter. But rather than definitions, each word maps to a number. After we create this dictionary we can ‘tokenize’, or translate, our input data by converting each word to its associated number. A simple sentence like, “See spot run,” is encoded into something like: <code class=\"language-plaintext highlighter-rouge\">[242359, 3493, 12939]</code>\n" +
"After the text is tokenized, the tokens are converted into “<a href=\"https://en.wikipedia.org/wiki/Word_embedding\">word embeddings</a>”. Word embeddings worth another post entirely, but in a nutshell: word embeddings are a set of coordinates that map words by <em>contextual similarity</em>. Words used in similar contexts are given embedding values that are close together (for example: “car”, “automobile”, and “bike” will be in the same region). This helps models genericize questions and contexts so that “Tell me how an apple tree grows,” will yield a similar response to, “Explain how an apple tree grows.” (Embeddings are interesting and worth checking out, if you haven’t already.) But once our words are numeric representations, we can improve our model’s map.</li>\n" +
" <li><strong>Build a statistical map of how tokens contextually relate to one another:</strong> Now our processors will pour over our encoded input data using a technique called ‘<a href=\"https://en.wikipedia.org/wiki/Unsupervised_learning\">unsupervised learning</a>’. Training data is continually fed into the training program and the model tries to predict the next word (or really, the next token) given a context. If it’s correct (or close), the statistical association is reinforced. If it’s incorrect, the weighting between that word and context is dialed down. This guessing and checking is the “learning” part of “machine learning.” The model is gradually improved as more and more training is performed. Ultimately, this training work builds a map of which words are most likely to follow specific patterns of other words. This ‘map’ is our LLM model.</li>\n" +
"</ol>\n" +
"\n" +
"<p>At the core of ChatGPT, Bard, Bing, or LLaMA is a model like the one we ‘built’ above. Again, I’ve greatly simplified things here, but those missing details are less important than appreciating a fact I’ve underemphasized above: <em>the sheer size of every step</em>.</p>\n" +
"\n" +
"<p>Let’s use GPT-3 to illustrate the scope of these things:</p>\n" +
"\n" +
"<ul>\n" +
" <li><strong>How Big: How much text data was used?</strong> GPT-3 was trained on 499 billion tokens (remember, each token corresponds with a word, punctuation, chunk of a word, etc.). Sources aren’t clear on the total size of this dataset, but for comparison GPT-2 was trained on only 10 billion tokens, which equated to 40GB of <em>text</em>. The amount of data here is more than you could read in hundreds of lifetimes, if your read constantly.</li>\n" +
" <li><strong>How Hard: How much context was examined?</strong> When building a LLM, the amount of processing you have to do isn’t just dependent on how large your input data is. The size of the <em>context</em> you use to compute your statistical relationships matters too. The amount of tokens in your input data roughly equates to how many steps you have to take while the size of your context defines the <em>size</em> of those steps. For example, if your context size was only the preceding word every comparison only requires you store and process 2 words of data. That’s very cheap but yields a bad model. GPT-3 had a context size of <em>2,049 tokens</em>. So with every token processed, you had to store and process a significant chunk of data. (For GPT-4 this window skyrocketed, ranging between 8,192 and <em>32,768</em> tokens).</li>\n" +
" <li><strong>How Expensive: How much compute was used?</strong> In <a href=\"https://lambdalabs.com/blog/demystifying-gpt-3\">a technical analysis</a>, Lambdalabs estimated it would cost $4.6 million and 355 years to train GPT-3 on a single GPU. Take this number with a grain of salt, as OpenAI isn’t using a single GPU (<em>clearly</em>) and has countless optimizations going on behind the scenes. But the point to take away here is: <em>building LLMs takes a lot of time and money</em>.</li>\n" +
"</ul>\n" +
"\n" +
"<p>All of this produces a model with 175 billion “parameters”. Each “parameter” is a computed statistical relationship between a token and a given context. And this is just GPT-3!</p>\n" +
"\n" +
"<p>The only thing that isn’t big about LLMs is the filesize of the model they output. For example, one of Meta’s <a href=\"https://ai.facebook.com/blog/large-language-model-llama-meta-ai/\">LLaMA models</a> was trained on one trillion tokens and produced a final model whose size is only 3.5GB! In a sense, LLMs are a form of <em>file compression</em>. Importantly, this file compression is <em>lossy</em>. Information is lost as we move from training datasets to models. We cannot look at a parameter in a model and understand <em>why</em> it has the value it does <em>because the informing data is not present</em>.</p>\n" +
"\n" +
"<p>(Sidenote: this file compression aspect of AI is a generally under-appreciated feature! Distilling giant datasets down to tiny files (a 3.5GB LLaMA model fits easily on your smartphone!) allows you to bring capabilities previously tied to big, extensive, remote servers to your local device! This will be game-changing. But I digress…)</p>\n" +
"\n" +
"<p>With the model built above you can now perform <em>text prediction</em>. Shanahan, in the <a href=\"https://arxiv.org/pdf/2212.03551.pdf\">paper I linked above</a>, describes this well:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>Suppose we give an LLM the prompt “The first person to walk on the Moon was ”, and suppose it responds with “Neil Armstrong”. What are\n" +
"we really asking here? In an important sense, we are not really asking who was the first person to walk on the Moon. What we are really asking\n" +
"the model is the following question: Given the statistical distribution of words in the vast public corpus of (English) text, what words are most likely to follow the sequence “The first person to\n" +
"walk on the Moon was ”? A good reply to this question is “Neil Armstrong”.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>And:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>Suppose you are the developer of an LLM and you prompt it with the words “After the ring was destroyed, Frodo Baggins returned to ”, to which it responds “the Shire”. What are you doing here? On one level, it seems fair to say, you might be testing the model’s knowledge of the fictional world of Tolkien’s novels. But, in an important sense, the question you are really asking … is this: Given the statistical distribution of words in the public corpus, what words are most likely to follow the sequence “After the ring was destroyed, Frodo Baggins returned to”? To which an appropriate response is “the Shire”.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>The magic trick here, why LLMs work <em>so well</em> when all they’re doing is mostly guessing based on context, is the sheer size of their input data. It’s a scale of input beyond human comprehension and human <em>expectation</em>. The trick is not unlike a TikTok video that shows someone performing an insanely difficult and unlikely action (like a basketball trickshot) while omitting the thousands of failed attempts (this tension is a central theme in Christopher Nolan’s <a href=\"https://en.wikipedia.org/wiki/The_Prestige_(film)\">The Prestige</a>).</p>\n" +
"\n" +
"<p>We’ve covered a lot here, so let’s highlight the key points that matter for our privacy discussion:</p>\n" +
"\n" +
"<div class=\"sidenote\">\n" +
"\n" +
"<h2>How LLMs Work</h2>\n" +
"\n" +
"<ul>\n" +
"<li>Giant amounts of text training data are used to build models.</li>\n" +
"<li>Models express statistical relationships between words and surrounding context.</li>\n" +
"<li>Models are reductive, distilling giant training sets into (relatively) tiny statistical models.</li>\n" +
"<li>It takes lots of time and money to build good models.</li>\n" +
"</ul>\n" +
"\n" +
"</div>\n" +
"\n" +
"<hr />\n" +
"\n" +
"<h4 id=\"why-do-llms-make-false-claims\">Why Do LLMs Make False Claims?</h4>\n" +
"\n" +
"<p>Let’s get back to Jonathan Turley being falsely associated with sexual harassment claims and the question of what recourse is available. <em>How can one respond to false AI claims?</em> Before we can answer this we need to build on what we’ve established above to understand <em>how and why</em> AIs lie.</p>\n" +
"\n" +
"<p>Unfortunately, Turley is not alone in discovering falsehoods about himself in LLMs. Noting LLM hallucinations (as they’re called) has become a bit of an internet sport.</p>\n" +
"\n" +
"<p>Spurred by these hallucinations, <a href=\"https://arstechnica.com/information-technology/2023/04/why-ai-chatbots-are-the-ultimate-bs-machines-and-how-people-hope-to-fix-them/\">Benj Edwards published an interesting piece at Ars Technical exploring why ChatGPT is so good at bullshit</a>. The whole piece is worth a read, but there are a few nice points relevant to us.</p>\n" +
"\n" +
"<p>First, Edwards <a href=\"https://arxiv.org/abs/2109.07958\">references a paper</a> authored by researchers from Oxford and OpenAI which spotted two major types of falsehoods that LLMs make. Edwards writes:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>The first comes from inaccurate source material in its training data set, such as common misconceptions (e.g., “eating turkey makes you drowsy”). The second arises from making inferences about specific situations that are absent from its training material (data set); this falls under the aforementioned “hallucination” label.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>Edwards expands on that “hallucination” category, illustrating how it can occur in ChatGPT:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>When ChatGPT confabulates, it is reaching for information or analysis that is not present in its data set and filling in the blanks with plausible-sounding words. ChatGPT is especially good at making things up because of the superhuman amount of data it has to work with, and its ability to glean word context so well helps it place erroneous information seamlessly into the surrounding text.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>These categories inform our question, as they define <em>where</em> and <em>what</em> is causing the falsehoods.</p>\n" +
"\n" +
"<p>The first category is <strong>bad training data creates bad models.</strong></p>\n" +
"\n" +
"<p>Computer science can’t get away from the principle, “<a href=\"https://en.wikipedia.org/wiki/Garbage_in,_garbage_out\">Garbage in, garbage out</a>.” If OpenAI’s training data contains falsehoods (and it does, as much of it is <em>text found on the internet</em>), their resulting models will have these erroneous associations baked in. If these falsehoods are <em>repeated</em> throughout the training data, they are further reinforced. To LLMs, <em>frequency equals fact</em>.</p>\n" +
"\n" +
"<p>We can see these types of errors by trying out common misconceptions in Chat-GPT. Here’s a popular error in action:</p>\n" +
"\n" +
"<p><img src=\"/img/seal_in_juices.png\" alt=\"Searing steak does not trap in juices.\" /></p>\n" +
"\n" +
"<p><a href=\"https://www.seriouseats.com/the-food-labs-top-food-myths\">Searing a crust onto a steak doesn’t trap juices</a>, but it’s a common enough misconception that ChatGPT regurgitates the sequence. <strong>Common misconceptions significantly influence models because they are common.</strong></p>\n" +
"\n" +
"<p>In the case of Jonathan Turley, if there were plenty of slanderous blog posts, articles, and comments online claiming he’s been accused of sexual harassment we could expect ChatGPT to echo these claims.</p>\n" +
"\n" +
"<p>The second category of errors is “hallucination” errors: <strong>when LLMs are asked to complete prompts that weren’t well-covered by its training data it will make up an answer based on very little data.</strong></p>\n" +
"\n" +
"<p>A good example of hallucinations is when an LLM cites articles or books that don’t exist, even giving page numbers as a reference. These non-existent citations don’t occur because they exist somewhere in the training data, but because the LLM is trying to complete a sequence <em>without an understanding of how words relate beyond their contextual similarity</em>. This is why it botches references and citations so spectacularly: it’s not referencing a passage in a text, <em>it’s trying to complete an expected sequence of words that make up the reference</em>. It doesn’t help that citations and references follow standardized patterns, so their specific patterns <em>frequently</em> occur and enforce those parameters in an LLM’s model.</p>\n" +
"\n" +
"<p>For example, Hacker News user <a href=\"https://news.ycombinator.com/user?id=msravi\">msravi</a> entered into ChatGPT the prompt, “Please provide references that deal with the mathematical properties of lists.” <a href=\"https://news.ycombinator.com/item?id=33841672\">ChatGPT answered</a>:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>“Introduction to the Mathematics of Lists” by Peter J. Cameron (https://www.cambridge.org/core/books/introduction-to-the-mathematics-of-lists/3F0C6D0E69E49BF6D5CD6CA5D6A761B2)</p>\n" +
"</blockquote>\n" +
"\n" +
"<blockquote>\n" +
" <p>“List Processing and Programming” by John F. Sowa (https://www.cs.cmu.edu/~fp/courses/lp/lectures.html)</p>\n" +
"</blockquote>\n" +
"\n" +
"<blockquote>\n" +
" <p>“List Processing in Prolog” by David H. D. Warren (https://www.cs.purdue.edu/homes/hosking/726/html/prolog_lists.html)</p>\n" +
"</blockquote>\n" +
"\n" +
"<blockquote>\n" +
" <p>“Introduction to the Mathematics of Lists and Trees” by Meinard Müller (https://link.springer.com/book/10.1007%2F978-3-319-05147-1)</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>“Every single one of those references and links are made up. The references don’t exist and the links appear to be cobbled together,” <a href=\"https://news.ycombinator.com/item?id=33841672\">msravi wrote</a>.</p>\n" +
"\n" +
"<p>Hallucinations are weird, often eerie, but they’re an expected artifact from the model building and usage process. As we’ve covered, LLMs are doing their best to guess the next word in a sequence given everything they’ve processed in their training data. If the sequence they’re given is novel they’ll just guess and toss out words with contextually similar values. This is problematic when there’s little relationship between a text’s factual meaning and the context within which it’s usually used.</p>\n" +
"\n" +
"<p>Once again, we’ve covered a lot here. We’re inadvertently illustrating the challenge of holding productive discussions about AI. There’s so much to learn and understand <em>before</em> we can address our chief concerns.</p>\n" +
"\n" +
"<p>To summarize:</p>\n" +
"\n" +
"<div class=\"sidenote\">\n" +
"\n" +
"<h2>Why LLMs Lie</h2>\n" +
"\n" +
"<ul>\n" +
"<li>Models produce false outputs due to false training data or non-existent training data.</li>\n" +
"<li>The more common a misconception or lie is, the more it will be reinforced.</li>\n" +
"<li>'Hallucinations' occur when LLMs try to respond to sequences they haven't seen before.</li>\n" +
"<li>When the factual meaning of a text doesn't closely relate to the context within which it's usually used, hallucinations are more likely to occur.</li>\n" +
"</ul>\n" +
"\n" +
"</div>\n" +
"\n" +
"<hr />\n" +
"\n" +
"<h4 id=\"how-can-you-fix-bad-training-data\">How Can You Fix Bad Training Data?</h4>\n" +
"\n" +
"<p>Now that we know the types of errors that can occur, how can we fix them?</p>\n" +
"\n" +
"<p>For erroneous input data, the fix seems simple: delete the bad training data. This <em>would</em> be a great solution if training LLMs didn’t take tremendous time and money. Further, it’s currently impossible to know which parameters a specific bit of training data informed and by how much. The reductive process of training an LLM means we can’t easily reach in and pull out a contribution from a specific article or articles.</p>\n" +
"\n" +
"<p>Instead of addressing the root issue (the false training data), LLM maintainers like OpenAI rely on a technique called <a href=\"https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback\">Reinforcement Learning from Human Feedback</a>, or RLHF. This is a fancy way of saying continuing to train LLMs with active input from <em>people</em>, rather than data.</p>\n" +
"\n" +
"<p>For example, to address the erroneous sexual harassment claims against Jonathan Turley, OpenAI might pay lots of contractors to give their models prompts about Turley and provide negative feedback when any links to sexual harassment emerge.</p>\n" +
"\n" +
"<p>OpenAI uses RLHF to prepare its LLMs for wider audiences. Countless contractor hours are ‘correcting’ the GPTs’ outputs, discouraging toxic output or uses OpenAI would like to prohibit. OpenAI also uses RLHF to train the GPTs to understand its users better: out of the box, LLMs are sequence completion machines, <a href=\"https://openai.com/research/instruction-following#sample2\">but ChatGPT responds appropriately to questions and commands thanks to RLHF</a>.</p>\n" +
"\n" +
"<p>Reinforcement Learning from Human Feedback is one way we can correct an LLM’s lies caused by bad training data. But this too is costly (paying contractors is expensive!) and <em>not 100% effective</em>. OpenAI can pay for lots of contractors to spend lots of time to skew its problematic parameters, but they can’t cover every type of input. Countless “<a href=\"https://en.wikipedia.org/wiki/Prompt_engineering\">prompt engineers</a>” have been probing the edges of ChatGPT, finding contrived ways to evade its human-led coaching.</p>\n" +
"\n" +
"<p>And when I say contrived, I mean <em>contrived</em>. <a href=\"https://www.jailbreakchat.com/prompt/4f37a029-9dff-4862-b323-c96a5504de5d\">Here’s an example “jailbreak” prompt people are using to evade ChatGPT’s training</a>:</p>\n" +
"\n" +
"<p><img src=\"/img/chatgpt_larp.png\" alt=\"Asking ChatGPT to role play so it does things it's shouldn't.\" /></p>\n" +
"\n" +
"<p>I can’t imagine how many contractors you’d have to employ to cover this scenario!</p>\n" +
"\n" +
"<p>The inability of RLHF to completely squash unwanted output raises significant questions, both regarding privacy and safe harbor regulation. There’s no standard for how much counter-training a company should perform to correct an erroneous output. We lack metrics for quantifying how significantly a bad bit of training data contributes to a model or a model’s tendency for lying about a particular topic. Without these metrics, we can’t adequately define when a model is in or out of compliance.</p>\n" +
"\n" +
"<div class=\"sidenote\">\n" +
"\n" +
"<h2>How to (Try to) Fix Errors Caused By Bad Training Data</h2>\n" +
"\n" +
"<ul>\n" +
"<li>It's incredibly expensive to delete the bad data and retrain the model.</li>\n" +
"<li>There's no current, practical method for identifying and deleting the contribution from a bit of bad training data.</li>\n" +
"<li>Unwanted output can be corrected by adding *more* training data, in the form of humans providing feedback (RLHF).</li>\n" +
"<li>RLHF moderation isn't perfect because it can't anticipate all contexts.</li>\n" +
"</ul>\n" +
"\n" +
"</div>\n" +
"\n" +
"<hr />\n" +
"\n" +
"<h4 id=\"how-can-you-fix-hallucinations\">How Can You Fix Hallucinations?</h4>\n" +
"\n" +
"<p>Ok, but what about hallucinations? How can we fix errors spewed by LLMs not caused by bad training data but by <em>unfamiliar prompts</em>?</p>\n" +
"\n" +
"<p>One way to fix this is adding <em>more</em> training data, accounting for more potential contexts. This is partially what OpenAI did with GPT-4, which has roughly 1,000 times more parameters than GPT-3. By adding more (hopefully correct!) input data, GPT-4 is less likely to be surprised by a novel prompt. This is a great <em>general</em> strategy, but might not solve one-off complaints like those of Turley. In these instances, companies like OpenAI again rely on Reinforcement Learning from Human Feedback (RLHF) to correct errors, just as they do with bad input data. Humans add more training, one prompt at a time, to moderate a specific issue.</p>\n" +
"\n" +
"<p>Another option to fix hallucinations is to tell LLMs to make less dramatic guesses. When an LLM doesn’t have a clear answer for a given prompt or sequence, it will select less associated tokens, introducing a bit of randomness into the response. When standing up an existing model, we can define <em>how random</em> this guess should be. We call this parameter, “temperature.”</p>\n" +
"\n" +
"<p>When Bing’s chatbot (née <a href=\"https://www.theverge.com/2023/2/23/23609942/microsoft-bing-sydney-chatbot-history-ai\">Sydney</a>) broke down in public, <a href=\"https://www.nytimes.com/2023/02/16/technology/bing-chatbot-transcript.html\">culminating in an attempt to break up Kevin Roose’s marriage</a>, Microsoft acted by dialing down the temperature parameter. Further, they gave users indirect control over the temperature by <a href=\"https://arstechnica.com/information-technology/2023/03/microsoft-equips-bing-chat-with-multiple-personalities-creative-balanced-precise/\">allowing them to choose</a> between “creative”, “balance”, and “precise” modes.</p>\n" +
"\n" +
"<p><img src=\"/img/bing_creative.jpg\" alt=\"Bing's chatbot allows selecting a conversation style, which moderates the temperature.\" /></p>\n" +
"\n" +
"<p>Bing’s chatbot settings present another mechanism for controlling LLM lies: user interfaces. LLMs can be accessed with more restrictive interfaces to limit the contexts given. In the image above, we can see Microsoft doing this a bit by providing preset prompts as options (“That’s too complicated. Can you suggest something simpler?”)</p>\n" +
"\n" +
"<p>One could take this further and remove the open-ended text box and only allow inputs via tightly constrained controls. For example, consider an app that lets you take a picture of a restaurant menu and spits out recipes for recreating each meal at home. This use case is well within GPT’s wheelhouse and <em>eliminates</em> the potential for slanderous statements in exchange for <em>greatly</em> limiting the flexibility of the tool.</p>\n" +
"\n" +
"<p>And that flexibility is not only valuable to people using Bing, Bard, and ChatGPT: it’s incredibly valuable to Microsoft, Google, and OpenAI. <em>Because when you use these tools you’re training them.</em> Reinforcement training by humans <em>is</em> cheaper than retraining models from scratch <em>but it’s not cheap</em>. But we train GPTs for free while playing with ChatGPT, hitting those thumbs up and down buttons, telling it something doesn’t make sense, and thanking it for a job well done.</p>\n" +
"\n" +
"<p>ChatGPT is a <a href=\"https://www.dbreunig.com/2016/06/23/the-business-implications-of-machine-learning.html\">Reciprocal Data Application</a>, or RDAs. As I wrote back in 2016: “[Reciprocal Data Applications] are designed to spur the creation of training data as well as deliver the products powered by the data captured. People get better apps and companies get better data.” RDAs are a new kind of network effect. More usage means a better product, which results in more usage, and so on.</p>\n" +
"\n" +
"<p>This is why ChatGPT is (mostly) free. This is why OpenAI pushed to be first to market, despite the risks. And this is most likely why OpenAI seems to be avoiding the privacy discussion and its implications entirely.</p>\n" +
"\n" +
"<div class=\"sidenote\">\n" +
"\n" +
"<h2>How to (Try to) Fix Errors Caused By Hallucinations</h2>\n" +
"\n" +
"<ul>\n" +
"<li>We can reduce the amount LLMs guess when they're in uncharted contexts by dialing back the 'temperature'.</li>\n" +
"<li>UIs can limit the potential for falsehoods and bad output by limiting the ways users can interact with LLMs, like removing open-ended text boxes.</li>\n" +
"<li>But open-ended interfaces are valuable to users (who value the flexibility) and LLM owners (who value the training).</li>\n" +
"</ul>\n" +
"\n" +
"</div>\n" +
"\n" +
"<hr />\n" +
"\n" +
"<h4 id=\"so-what-is-openai-doing\">So What is OpenAI Doing?</h4>\n" +
"\n" +
"<p>Now, finally, we can get to the crux of the issue. The reason Turley feels so powerless to address the slander emanating from ChatGPT.</p>\n" +
"\n" +
"<p>As we’ve detailed, there <em>are</em> mechanisms for moderating LLMs. But <em>all</em> of them are imperfect.</p>\n" +
"\n" +
"<p>Reinforcement Training is expensive and can’t completely eliminate bad output. Dialing down LLM’s ability to guess when it doesn’t have a strong recommendation makes them boring. Hiding LLMs behind more restrictive UIs makes them less appealing to users and restricts user feedback that continues to train the LLM. Deleting problematic training data and retraining models is <em>incredibly</em> expensive and time-consuming.</p>\n" +
"\n" +
"<p>OpenAI <em>knows</em> how imperfect these options are, especially anything that involves retraining from scratch. And it seems their approach to addressing this issue is <em>avoiding the problem</em>. Rather than fostering a discussion about how to proceed, their communications and documents either ignore the issue entirely or hide behind vague wording.</p>\n" +
"\n" +
"<p>Let’s take a look at OpenAI’s <a href=\"https://openai.com/policies/privacy-policy\">Privacy Policy</a>. To comply with laws like <a href=\"https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml?division=3.&part=4.&lawCode=CIV&title=1.81.5\">CCPA</a> and <a href=\"https://gdpr-info.eu\">GDPR</a>, businesses are required to post a privacy policy that details the personal data they collect.</p>\n" +
"\n" +
"<p>In their Privacy Policy OpenAI never mentions their training data. They detail the data they collect when users sign up for and use an OpenAI account: your email, name, IP address, cookies, and such. They enumerate your rights (conditional upon where you reside), including the right to access, correct, and delete your personal information.</p>\n" +
"\n" +
"<p>Again: <em>training data is never mentioned</em>. It seems to me that OpenAI is avoiding the topic and focusing on <em>registration</em> data because of the challenges we detailed above.</p>\n" +
"\n" +
"<p><strong>Correcting or deleting personal information in their training data may require OpenAI to pay contractors to perform imperfect, reinforcement training or retrain the entire model from scratch. The legal precedents have not been established and existing regulations were not written with LLMs in mind.</strong></p>\n" +
"\n" +
"<p>We know OpenAI knows this because <em>they acknowledge the challenge elsewhere</em>. In a post published on April 5th, titled “<a href=\"https://openai.com/blog/our-approach-to-ai-safety\">Our Approach to AI Safety</a>,” OpenAI writes:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>While some of our training data includes personal information that is available on the public internet, we want our models to learn about the world, not private individuals. So we work to remove personal information from the training dataset where feasible, fine-tune models to reject requests for personal information of private individuals, and respond to requests from individuals to delete their personal information from our systems. These steps minimize the possibility that our models might generate responses that include the personal information of private individuals.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>Here they acknowledge that personally identifiable information (<a href=\"https://www.dol.gov/general/ppii\">PII</a>) exists in their training dataset and that they aren’t able to entirely comply with correction or deletion requests. Removal “where <em>feasible</em>” and steps that “<em>minimize</em> the possibility” that models “<em>might</em> generate responses that include personal information” (emphasis mine) is not ideal! That these acknowledgments exist in a blog post and not their Privacy Policy is even worse.</p>\n" +
"\n" +
"<p>It’s hard to not read the above without seeing the very <em>careful</em> yet <em>imprecise</em> language as an artifact of a key tension here: it is obvious that training data is governed by data regulations but it is <em>not</em> obvious that the LLM models themselves are governed by the same regulations.</p>\n" +
"\n" +
"<p>Certainly, LLM models are not 1-to-1 mappings of their training datasets. But if not, what are they? In one sense, they could be considered <em>derived abstractions</em> of the training data. In another sense, they are <em>distillations</em> of the training data. It doesn’t help that AI boosters market them as the latter when championing them but retreat to the former when defending them.</p>\n" +
"\n" +
"<p>We can see this tension on display elsewhere in OpenAI’s “<a href=\"https://openai.com/blog/our-approach-to-ai-safety\">Our Approach to AI Safety</a>” post:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>Our large language models are trained on a broad corpus of text that includes publicly available content, licensed content, and content generated by human reviewers. We don’t use data for selling our services, advertising, or building profiles of people—we use data to make our models more helpful for people. ChatGPT, for instance, improves by further training on the conversations people have with it.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>Sure, OpenAI doesn’t <em>directly</em> use their training data for “selling”, “advertising”, or “building profiles of people.” But according to their <a href=\"https://openai.com/policies/terms-of-use\">Terms of Use</a> they 100% allow companies using APIs powered by GPT to do <em>exactly</em> these use cases. The Terms of Use even list specific, additional standards OpenAI users need to meet if they plan on processing personal data.</p>\n" +
"\n" +
"<p>Reading the “<a href=\"https://openai.com/blog/our-approach-to-ai-safety\">Approach to AI Safety</a>” post and the <a href=\"https://openai.com/policies/terms-of-use\">Terms of Use</a> we are left to conclude that <strong>OpenAI does not view the actions of its models as utilization of their training data.</strong></p>\n" +
"\n" +
"<p>Finally, Jonathan Turley might argue that the act of training an LLM <em>is</em> “building profiles of people.” An LLM doesn’t <em>know</em> who Jonathan Turley is; it is not a database in the usual sense. But it <em>does</em> have parameters linking his name (of the tokens of his name) to contexts and tokens that are presented as knowledge.</p>\n" +
"\n" +
"<p>We can see what OpenAI is doing about the problem of erroneous responses in the above. In their communications and documents they minimize privacy issues or simply never mention them.</p>\n" +
"\n" +
"<p>I am sympathetic. As Steward Brand once said, “If you want to know where the future is being made, look for where language is being invented and lawyers are congregating.” I believe AI is reconfiguring the future (though I hate the term ‘AI’ and think it especially hurts in contexts like this.) OpenAI has been leading the field and setting the pace.</p>\n" +
"\n" +
"<p>By being aggressive with ChatGPT’s launch, and making it available to all, they’re reaping the benefits of constant feedback. Each conversation held with ChatGPT pushes them out further. This is an <em>entirely</em> new interface and they’re gaining experience while everybody else just tries to launch. The <a href=\"https://www.dbreunig.com/2016/06/23/the-business-implications-of-machine-learning.html#the-rise-of-reciprocal-data-applications-rdas\">Reciprocal Data Application</a> is a design with network effects, which OpenAI accrues every day they’re live and at the forefront. It’s an enviable position.</p>\n" +
"\n" +
"<p>In exchange for this lead, they become the <em>target</em> of criticism and concern. But they <em>must</em> lean into the tough conversations, not cede them to be hashed out and written without them. To start, they need to invest in their communication (and not more AGI handwringing, for christ sake) because false accusations against Turley are nothing compared to <a href=\"https://www.dbreunig.com/2023/02/24/we-need-ai-safewords.html\">the fraud that will bubble up shortly</a>. Even if bad actors use LLaMA, they’re going to catch flack.</p>\n" +
"\n" +
"<div class=\"sidenote\">\n" +
"\n" +
"<h2>How OpenAI is Addressing Privacy</h2>\n" +
"\n" +
"<ul>\n" +
"<li>Their Privacy Policy ignores the existence of personal identifiable information (PII) in their training data.</li>\n" +
"<li>Their own blog post seem to acknowledge their inability to totally remove PII from their training data and their model output.</li>\n" +
"<li>The same blog post says they don't use training data for advertising or profiling, but do not prohibit these use cases in their Terms of Use.</li>\n" +
"<li>Rather then highlighting and discussing the differences between training data and models, and openly discussing the regulatory questions, OpenAI seems to deliverately avoid this tension.</li>\n" +
"</ul>\n" +
"\n" +
"</div>\n" +
"\n" +
"<hr />\n" +
"\n" +
"<h4 id=\"how-can-we-do-better\">How Can We Do Better?</h4>\n" +
"\n" +
"<p>To foster this industry <em>and</em> respect the people whose data is being used to build these models, all LLM owners should consider the following steps:</p>\n" +
"\n" +
"<ol>\n" +
" <li>Sponsor and invest in projects focusing on data governance and auditing capabilities for models, so we might modify them directly rather than simply adding corrective training.</li>\n" +
" <li>Be completely transparent regarding the training data used in these models. Don’t merely list categories of data, go further. Perhaps even make the metadata catalogs public and searchable.</li>\n" +
" <li>Work on defining standard metrics for quantifying the significance and surface area of output errors, and associated standards for how they should be corrected.</li>\n" +
" <li>Always <a href=\"https://www.dbreunig.com/2023/02/24/we-need-ai-safewords.html\">disclose AI <em>as AI</em></a> and present it as fallible. More “artificial,” less “intelligence.”</li>\n" +
" <li>Collaboratively establish where and how AI can or should be used in specific venues. Should AI be used to grant warrants? Select tenants? Price insurance? Triage healthcare? University admissions?</li>\n" +
"</ol>\n" +
"\n" +
"<p>Collaboration is key here. Right now OpenAI is asking us to take them at their word and trust them without any established history. Outside organizations – including companies, industry groups, non-profits, think tanks, and government agencies – are already starting to kick off their own, separate conversations. Just today, <a href=\"https://www.axios.com/2023/04/11/ai-safety-rules-commerce-department-artificial-intelligence\">the US Commerce Department invited public comments to inform policy recommendations</a>. These individual entities are all trying to figure this out, separately. As the current leader, OpenAI has an opportunity to help bring these together into productive collaborations.</p>\n" +
"\n" +
"<p>But if AI companies cede these conversations by avoiding them, their future will be written by reactions to bad news. OpenAI and others need to invest in the boring stuff, have the hard conversations, to foster a world-changing technology. Otherwise, they risk arresting AI’s potential.</p>",
@entry_id=
"https://www.dbreunig.com/2023/04/10/the-privacy-question-and-open-ai",
@links=
["https://www.dbreunig.com/2023/04/10/the-privacy-question-and-open-ai.html"],
@published=2023-04-10 00:00:00 UTC,
@raw_title="AI Lies, Privacy, & OpenAI",
@summary="",
@title_type="html",
@updated=2023-04-10 00:00:00 UTC,
@url=
"https://www.dbreunig.com/2023/04/10/the-privacy-question-and-open-ai.html">,
#<Feedjira::Parser::AtomEntry:0x00007f8c0abd97d0
@author="Drew Breunig",
@content=
"<h4 id=\"standard-commands-confirming-an-ais-presence-is-a-common-sense-starting-point-for-ai-controls\">Standard commands confirming an AI’s presence is a common sense starting point for AI controls</h4>\n" +
"\n" +
"<p><img src=\"/img/weights.png\" alt=\"\" /></p>\n" +
"\n" +
"<p>The effectiveness of <a href=\"https://arxiv.org/abs/2212.03551\">Large Language Models</a> has spurred an AI renaissance. We’re awash in models capable of generating images from mere phrases and chatbots capable of holding confident, emotional conversations. The noise of crypto subsided just as this generation of deep learning models refigured technology conversations seemingly overnight, posing BIG questions that seemed years away only months ago.</p>\n" +
"\n" +
"<p>People who hadn’t been following AI were whiplashed by the performance of ChatGPT, Stable Diffusion, and others. They asked questions about copyright law, the exploitation of training data, and wondered about labor regulations and worker rights if AI renders entire jobs obsolete. Others, who’ve been swimming in AI for years, set their sights <em>far</em> beyond these concerns and <a href=\"https://openai.com/blog/planning-for-agi-and-beyond/\">began to prepare for <em>true</em> artificial intelligence</a>, or <a href=\"https://en.wikipedia.org/wiki/Artificial_general_intelligence\">AGI</a>.</p>\n" +
"\n" +
"<p>The wide range of applications these models seem to address has boggled our minds and diluted our conversations. Only now do we seem to be stepping back and gathering our senses. We’re seeing <a href=\"https://www.nytimes.com/2023/02/16/technology/bing-chatbot-transcript.html\">these models have limits and flaws</a>. We’re starting to see that they’re not gods or monsters. Just organized bundles of statistics beyond the scale of human comprehension.</p>\n" +
"\n" +
"<p>In this moment I’d like to ignore the big picture questions and argue for something relatively mundane: <a href=\"https://en.wikipedia.org/wiki/Safeword\">safewords</a>.</p>\n" +
"\n" +
"<p>I believe fraudulent misuse of models poses an immediate risk. The ability of interactive models to generate voices, videos, and text capable of impersonating humans will become a powerful tool for social engineering <em>at scale</em>. The most common scams already focus their attention on those unfamiliar with technology (especially seniors) to manipulate access to bank and e-commerce accounts. These tactics are so effective, <a href=\"https://thediplomat.com/2022/08/inside-southeast-asias-casino-scam-archipelago/\">large market ecosystems have emerged to execute them at scale</a>, powered by human trafficking. This slavery is both what enables these scams and limits them, as the scam itself is dependent on canny humans manipulating marks one by one. New AI threatens to <em>remove</em> this limitation, scaling the scams beyond what even slavery allows and potentially leading to a diverse array of smaller-money tactics which previously hadn’t been worth the time.</p>\n" +
"\n" +
"<p><strong>The threat of AGI is a far-off dream. LLM-powered fraud is kicking off <em>now</em>.</strong></p>\n" +
"\n" +
"<p><strong>A relatively quick regulatory action we can take is requiring government-approved safewords to be built into AI models. These safewords, when input into an AI-powered interface, would confirm to users they are interacting with an AI and provide basic metadata detailing the given model.</strong></p>\n" +
"\n" +
"<p>I haven’t thought through all the details, but it’s worth sketching this out in hopes of kickstarting the conversation.</p>\n" +
"\n" +
"<p>Adding safewords to <em>base models</em> – the largest, foundational models which are tuned into custom applications (GPT is one such base model) – is an effective mechanism for adding regulation without hindering the ecosystem at large. Large base models require significant resources to build, limiting the number of parties able to develop them. Safewords added as custom rules to these models will not hinder their effectiveness for all non-fraudulent use cases.</p>\n" +
"\n" +
"<p>There is history and comparables here: AI safewords are akin to WHOIS and other <a href=\"https://lookup.icann.org/en\">ICANN mechanisms</a> for website accountability and transparency. Optional standards, like <a href=\"https://moz.com/learn/seo/robotstxt\">robots.txt</a>, also define ways forward and models for implementation.</p>\n" +
"\n" +
"<p>Requiring AI-powered interfaces and models to respond appropriately to approved safewords will not eliminate fraudulent behaviors by bad actors. But it lays the groundwork for enforcement mechanisms that governments can use to police bad actors and reduce the ease with which actors can leverage the largest, most effective models for illicit actions. Further, such requirements and enforcement should <em>not</em> hinder AI innovation. So long as a model complies with safeword requirements, leeway is granted.</p>\n" +
"\n" +
"<p>Finally, AI safewords provide a tangible escape hatch for users to utilize when AI models start to cross emotional lines. We humans are hardwired to engage in anthropomorphism, seeing the mark of intelligence where only hyper-scale cut-and-paste exists. By building in standard mechanisms for confirming the a model’s inhumanity, we allow users to ground themselves on occasion.</p>",
@entry_id="https://www.dbreunig.com/2023/02/24/we-need-ai-safewords",
@links=["https://www.dbreunig.com/2023/02/24/we-need-ai-safewords.html"],
@published=2023-02-24 00:00:00 UTC,
@raw_title="We Need AI Safewords",
@summary=
"Standard commands confirming an AI’s presence is a common sense starting point for AI controls",
@title_type="html",
@updated=2023-02-24 00:00:00 UTC,
@url="https://www.dbreunig.com/2023/02/24/we-need-ai-safewords.html">,
#<Feedjira::Parser::AtomEntry:0x00007f8c0e7253c0
@author="Drew Breunig",
@content=
"<h4 id=\"a-primer-on-media-measurement-and-why-it-defines-your-world\">A Primer on Media Measurement and Why it Defines Your World</h4>\n" +
"\n" +
"<p><img src=\"/img/measured_dollar.jpeg\" alt=\"\" /></p>\n" +
"\n" +
"<blockquote>\n" +
" <p>We sometimes treat the information industries as if they were like any other enterprise, but they are not, for their structure determines who gets heard.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>— Tim Wu, <a href=\"https://amzn.to/2YTMMHw\">The Master Switch</a></p>\n" +
"\n" +
"<p>Advertising funds much of the information and entertainment in our lives. The amount of funding from ads each source receives is determined by media metrics. Media metrics help advertisers understand who is looking at their ads (if anyone) and what they’re doing after they see them.</p>\n" +
"\n" +
"<p>Perform well against key media metrics and your media property will thrive. Fail to deliver and your newspaper, TV station, website, podcast, or newsletter will need to find money elsewhere.</p>\n" +
"\n" +
"<p>Media metrics influence what information circulates because they determine which information sources gets paid. And content, like most anything else, follows the easy money.</p>\n" +
"\n" +
"<hr />\n" +
"\n" +
"<p><img src=\"/img/pressure_gauge.png\" alt=\"\" /></p>\n" +
"\n" +
"<h3 id=\"what-are-media-metrics\">What Are Media Metrics?</h3>\n" +
"\n" +
"<p>Media metrics tell advertisers if their ads are worthwhile. They answer questions, like:</p>\n" +
"\n" +
"<ul>\n" +
" <li>Are there people at the venue where I’m advertising?</li>\n" +
" <li>Are the people I care about at this venue?</li>\n" +
" <li>Are people seeing my ad?</li>\n" +
" <li>Are people interacting with my ad?</li>\n" +
" <li>Are people influenced by my ad?</li>\n" +
" <li>Are people taking action based on my ad?</li>\n" +
"</ul>\n" +
"\n" +
"<p>Some of these questions are easy to answer. Some of them are nearly impossible. All of them are heavily contested and argued about due to the money involved.</p>\n" +
"\n" +
"<p>Media metrics are tools marketers use in an attempt to answer one or more of the questions above.</p>\n" +
"\n" +
"<p>In digital advertising, the most common media metrics are:</p>\n" +
"\n" +
"<ul>\n" +
" <li><strong>Cost Per Click (CPC):</strong> How much does the advertiser have to spend to get someone to click one of their ads?</li>\n" +
" <li><strong>Cost Per Action (CPA):</strong> How much does the advertiser have to spend to get someone to perform a predefined action? Often this is defined as a the audience buying something, signing up for something, or calling a phone number.</li>\n" +
" <li><strong>Cost Per Install (CPI):</strong> A variant of CPA. How much does the advertiser have to spend to get someone to install an app?</li>\n" +
" <li><strong>Cost Per Mille (CPM):</strong> How much does the advertiser have to spend for one thousand people to view their ad? (‘Mille’ just means thousand.) CPM is often defined by a target audience. For example, a marketer pays to deliver an ad to affluent Millennials.</li>\n" +
"</ul>\n" +
"\n" +
"<p>There are many more metrics and new ones appear all the time. But most fail or are used by a handful of marketers for their own purposes. (Domino’s Pizza used “Cost Per Pizza Ordered” for years. They may still!)</p>\n" +
"\n" +
"<p>Very, very rarely a new metric will be adopted by a sufficient number of marketers and media owners to become a currency: a common metric people use to buy and sell media. A currency creates a marketplace. It defines the demand present, incentivizing content producers to create things for it.</p>\n" +
"\n" +
"<p>Currencies are the metrics you should be aware of and understand. They’re the metrics that shape our world.</p>\n" +
"\n" +
"<p><img src=\"/img/many_bills.png\" alt=\"\" /></p>\n" +
"\n" +
"<h3 id=\"creating-a-media-currency\">Creating a Media Currency</h3>\n" +
"\n" +
"<p>There’s no clear moment when a metric graduates to become a currency. It happens slowly but predictably. More and more advertisers allocate funds to buy a new metric, driving more and more media outlets sell inventory metered by this metric. The metric becomes portable, liquid. A stable, recurring source of ad spending.</p>\n" +
"\n" +
"<p>The creation of a media currency is a disruptive event in the marketplace. The dollars must come from somewhere. They’re either newly allocated marketing dollars (because the metric represents an entirely new type of opportunity) or shuffled from an incumbent currency (because the metric supplants existing tactics). Internal Marketing groups, media agencies, and publishers rise and fall when a new currency is crowned.</p>\n" +
"\n" +
"<p>Because of this, new currencies are not easily created. Incumbent industries fight their emergence.</p>\n" +
"\n" +
"<p>But this is how a currency is usually created:</p>\n" +
"\n" +
"<p><img src=\"/img/gem.png\" alt=\"Gem\" class=\"center\" /></p>\n" +
"\n" +
"<h4 id=\"phase-1-become-a-shiny-object\">Phase 1: Become a Shiny Object</h4>\n" +
"\n" +
"<p>Let’s say you create a new media property and/or social network and begin to acquire users and interest rapidly. Soon, your success can’t be ignored by media buyers. Your platform offers a unique experience, which cannot be measured effectively by existing media metrics. But that’s OK: your property is new, exciting, and hyped. It is a shiny object. Media buyers will buy ads to learn about the new format, impress their clients, and win awards.</p>\n" +
"\n" +
"<p>Most advertising budgets contain a slush fund reserved for new opportunities. Some advertisers (Red Bull, Dove, etc.) use these dollars to create exciting campaigns, unhindered by usual requirements, to win awards or shift the brand’s image. Others treat this fund as a “test and learn” budget, a place where new properties or ideas can be executed with the goal of learning, not explicit marketing success. Either way, these are the funds which you’ll be courting. Let’s say you attract lots of them.</p>\n" +
"\n" +
"<p>An influx in advertiser attention and money will challenge your new company. Suddenly, your platform will have two clients to keep happy instead of just one: your users and your advertisers. Maintaining focus on user happiness and growth while servicing advertisers will divide your already limited resources.</p>\n" +
"\n" +
"<p>As if that weren’t difficult enough, the platform has a limited life as a shiny object. Eventually you will cease to be novel and ad buyers will demand performance against media currencies. You won’t be new, cool, or exciting forever. YOU can maintain your coolness, extending shiny object phase (For example, Vice held this note impressively long) but coolness always fades.</p>\n" +
"\n" +
"<div class=\"sidenote\">\n" +
"\n" +
"<h1>Side Note</h1>\n" +
"\n" +
"<p>The shiny object crowd just hit a massive headwind due to macro economy troubles, so their population will dwindle. The fun stuff without metrics is the first to budget to get cut in tough times.</p>\n" +
"\n" +
"</div>\n" +
"\n" +
"<p><img src=\"/img/ruler.png\" alt=\"Gem\" class=\"center\" /></p>\n" +
"\n" +
"<h4 id=\"phase-2-find-a-metric-that-fits\">Phase 2: Find a Metric That Fits</h4>\n" +
"\n" +
"<p>Before your platform loses its cool, it needs to find or create a media metric that properly assesses its worth and doesn’t work against its unique experience. It then needs to convince media planners and their clients that the new metric is worth adopting.</p>\n" +
"\n" +
"<p>If a platform can’t create metric or can’t sell it to advertisers, media planners will evaluate the platform with existing metrics that will certainly paint it in an unflattering light.</p>\n" +
"\n" +
"<p>To understand this tension — how being valued by the wrong metric can corrupt your platform — let’s first look at a company that successfully sold in a new(ish) metric that aligned with its product. Let’s look at Google search.</p>\n" +
"\n" +
"<p><strong>Google’s Cost-Per-Click Metric Fits Their Product’s Strengths</strong></p>\n" +
"\n" +
"<p>When Google entered it’s shiny object phase the dominant metric in digital ad buying was cost-per-thousand ads delivered, or CPM. Media properties were paid every time they showed someone an ad, regardless of the outcome. If Google stuck with CPM and never sold in a new metric, there would be a strong incentive to show users more pages and keep them on their site longer — in short, deliver worse search results.</p>\n" +
"\n" +
"<p>Thankfully, Google adopted a lesser-used metric called cost-per-click (CPC) which billed the advertiser every time a user clicked on an ad, leaving Google’s site. This wasn’t difficult to sell to media buyers. The only limiting factors at the time were lack of organizational structure to manage ‘click’ campaigns and companies not knowing what to do with web traffic once they had it.</p>\n" +
"\n" +
"<p>CPC aligned the incentives for Google, who was financially encouraged to be the most efficient search engine. The more you used Google to search, the more opportunities Google had to serve you the right ad. This resonant cycle unlocked a deluge of dollars for Google, growing them into a giant.</p>\n" +
"\n" +
"<div class=\"sidenote\">\n" +
"\n" +
"<h1>Side Note</h1>\n" +
"\n" +
"<p>You can (and many do!) argue that the CPC model is causing Google to ship a worse product today. A search results page today is often half ads, requiring the user to scroll before any non-sponsored results are visible. But that’s another discussion entirely…</p>\n" +
"\n" +
"</div>\n" +
"\n" +
"<p>While Google found a metric that fit, most platforms aren’t so lucky. Usually they invent metrics they’re unable to sell to the market or forget to find a metric at all. They wake up one morning and discover they’re no longer a shiny object, are being valued with measures they can’t perform against, and there’s little runway left. If there’s no time to find and sell the right metric (or the market isn’t buying the ones you have) your window for monumental success has closed.</p>\n" +
"\n" +
"<p>Attempting to perform against existing currencies will undercut your platform, devalue what makes you unique, and lead you on a road to irrelevance.</p>\n" +
"\n" +
"<p>This is what happened to Tumblr.</p>\n" +
"\n" +
"<p><strong>Tumblr’s New Ad Formats Aren’t a Substitute for New Ad Metrics</strong></p>\n" +
"\n" +
"<p>Tumblr had a particularly strong shiny object phase. The company was a poster child for 2010s-era <a href=\"https://en.wikipedia.org/wiki/Silicon_Alley\">Silicon Alley</a>. It was a vibrant community with countless exciting projects emerging from it’s pages. It reeked cool. Brands wanted to be there. Corporations started their own Tumblogs alongside their campaigns and started to pester the company about advertising.</p>\n" +
"\n" +
"<p>In response to this demand, Tumblr launched Radar: a featured space on it’s homepage curated by editors highlighting popular or interesting posts. Radar was also (eventually) an ad unit. Brands could purchase a Radar ad for $25,000 for 6 million impressions. But this allocation of impressions didn’t include any earned impressions generated by people reblogging the post. So the CPM of the ad dropped dramatically if you created ads people were inclined to share.</p>\n" +
"\n" +
"<p><img src=\"/img/tumblr_radar.png\" alt=\"Tumblr Radar Ad\" /></p>\n" +
"\n" +
"<p>At first, this looks like a good model! The incentives were aligned for advertisers to create ads that felt at home on Tumblr, allowing Tumblr to generate revenue without driving away their users.</p>\n" +
"\n" +
"<p>But there were a few things that doomed Radar ads to fail.</p>\n" +
"\n" +
"<p>First, creating an ad people would reblog was hard. Tumblr users were creative and fickle, sniffing out in-authenticity with art student ease. Tumblr attempted to address this challenge by educating creative agencies and even offering their own in-house design services (most new networks had to do this do some degree, this wasn’t limited to Tumblr by any means).</p>\n" +
"\n" +
"<p>Second, Radar wasn’t targeted to specific users or content. At any given time there was a queue of rotating Radar posts, with some percentage being ads. Posts rotated through users’ Radar, fulfilling the impression quota of the ads, without consideration for who the users were. Advertisers are generally not a fan of this, though for the high priced shiny object advertising Tumblr was engaging with at the time, this was fine for a bit.</p>\n" +
"\n" +
"<p>But more importantly, Radar was an new ad unit not a new ad metric. Sure, CPM is a currency…but it is not a performance metric. CPM measures how many people you reach not the efficacy of the ads themselves. Radar was an ad format well suited to the shiny object phase (when ad agencies don’t care about metrics so long as they’re winning awards, learning, and/or making their client look cool).</p>\n" +
"\n" +
"<p>Ads at Tumblr quickly expanded beyond Radar into ‘Promoted Posts’, where advertisers paid to insert their Tumblr posts into users’ dashboards. These were priced similarly, with the same CPM metric costs being diluted if people shared good ads.</p>\n" +
"\n" +
"<p>But by this point, the shine was coming off Tumblr. And while they had new ad units that were well aligned with what users wanted from the platform, they nothing to show in terms of new ad metrics. Advertisers were now asking for proof the ads worked from both Tumblr and their agencies executing the buys. And Tumblr only had their diminishing novelty, Likes, and Reblogs — all of which were not denominating their ads.</p>\n" +
"\n" +
"<p>Around this time, Tumblr sold to Yahoo. Tumblr’s young, mobile audience checked the strategic boxes Marissa Mayer needed to check to re-frame Yahoo to the market, justifying a $1.1 billion price tag despite a pittance of revenues.</p>\n" +
"\n" +
"<p>After the acquisition, Tumblr finally created a metric by shifting to selling ads priced by ‘engagement’ — advertisers paid if their ads were Liked or Reblogged — but the market wasn’t having it. Too little, too late. Their shiny object status was truly gone (nothing like an acquisition to do that to you!) and they hadn’t created and sold a metric to the market.</p>\n" +
"\n" +
"<p>I knew decline was locked in for Tumblr when I saw this on my dashboard:</p>\n" +
"\n" +
"<p><img src=\"/img/tumblr_iab.jpeg\" alt=\"Tumblr's IAB Ad\" /></p>\n" +
"\n" +
"<p><strong>This is what capitulation looks like.</strong> Rather than a sponsored Tumblr post, this Victoria’s Secret ad is an IAB ad unit: an industry-standard digital display advertisement. The same ad format you’ll see on any other run of the mill webpage. Tumblr shoe-horned this into their dashboard with a quick blue frame, allowing them to adopt the ad units and metrics of the industry at large. They gave up creating a format and metric suited to their site and simply plugged into the broader, programmatic web. Sure, this gave them access to more money in the short term, but it guaranteed their unique platform would be valued in the same manner as any other blog, newspaper, or whatever. A race to the bottom had truly began.</p>\n" +
"\n" +
"<p>Tumblr failed to create a metric that suited their unique platform before they lost their shine, denying the market an effective way to measure the true value of their platform. They (smartly!) sold before this problem was fully apparent, and eventually capitulated to existing metrics, built for other sites, to easily monetize the platform on the way down.</p>\n" +
"\n" +
"<p>Most recently, Tumblr was sold to Wordpress for $3 million.</p>\n" +
"\n" +
"<p><img src=\"/img/crown.png\" alt=\"Crown\" class=\"center\" /></p>\n" +
"\n" +
"<h4 id=\"phase-3-crown-a-new-currency\">Phase 3: Crown a New Currency</h4>\n" +
"\n" +
"<p>If you can find a metric that aligns with your platform’s function and sell it to brands and ad agencies, you’re well on your way to something special. The flywheel is in motion. If you keep generating demand for your metric, at some point other platforms will take notice and adopt your metric to get a slice of the pie. Demand rises, then supply rises, and pretty soon you’ve got yourself a <strong>currency</strong> and the marketplace that comes with it. Waves of cash will engulf your platform as more advertising budgets allocate a line item for your metric.</p>\n" +
"\n" +
"<p>Sometimes this happens early in a platform’s lifetime. Google’s CPC adoption certainly did. Other times it takes time and context changes to create the moment ripe for a new metric. This is what happened with Facebook, whose business truly took off once their mobile-install business took off, driven by the booming smartphone install base, a Cambrian era of mobile app businesses, and their cost-per-install (CPI) metric.</p>\n" +
"\n" +
"<div class=\"sidenote\">\n" +
"\n" +
"<h1>Side Note</h1>\n" +
"\n" +
"<p>Facebook had several failed pushes for new metrics prior to their CPI-fueled growth period. Notable was their eGRP effort, in partnership with Nielsen. GRP (gross rating points), created and manged by Nielsen, is the currency nearly all TV ads are denominated in. Billions of ad dollars of demand are out their buying GRP. Facebook, lacking a killer metric, partnered with Nielsen to translate this metric to the Newsfeed. If successful, they could have sold ads to GRP budgets, diverting a slice of TV ad spend to their platform. However, this was wishful thinking and never resonated with the market for multiple reasons. (Twitter tried this slight of hand as well. TV budgets are huge and super enticing…)</p>\n" +
"\n" +
"<p>I believe this failure was actually good for Facebook. Borrowing an existing metrics not designed for their platform wouldn’t represent their unique value as well, as we saw with Tumblr. By bumbling along until they landed on CPI (which was partially due to them failing to spot mobile as the Next Big Thing!), Facebook didn’t capitulate to the value denominations of others.</p>\n" +
"\n" +
"</div>\n" +
"\n" +
"<p>This is the golden path of monumental platforms. They start by generating enough buzz to become a shiny object then use these funds to keep the company going while they try to establish a new metric that suits the platform. If they succeed, they create a currency and a marketplace, and are awash in ad dollars.</p>\n" +
"\n" +
"<p>But while they bask in treasure and glory, a fourth phase begins…</p>\n" +
"\n" +
"<p><img src=\"/img/optimize.png\" alt=\"Optimize\" class=\"center\" /></p>\n" +
"\n" +
"<p>Creating a currency defines demand in the marketplace. A currency highlights the existence of giant pools of money ready to buy anyone who can deliver against your metric. In short order, new companies will emerge built for the currency and existing companies will develop new ad units or features to get in on the funds.</p>\n" +
"\n" +
"<p>Up until now, everything we’ve covered relates to the emergence of currencies and new media platforms. Once currencies are established, the exploits emerge. This is when the currency starts to affect our media environment and how we perceive the world.</p>\n" +
"\n" +
"<p>After Google turned cost-per-click (CPC) into a currency, an entire industry emerged. Media owners adopted the metric and started to optimize their sites to get more clicks. Data-driven targeting, that worked across sites (so as to compete with the pristine signal Google had from user searches), blossomed into an entire sector. Search agencies become a thing, with armies of ad managers tweaking campaigns to get the lowest price per click. Botnets emerged, visiting sites and clicking on ads in increasingly convoluted fashions to evade anti-spam efforts. Everyone — ad buyers, ad sellers, tool providers — started gaming the metric. Fueled by easy money, denominated in CPC, the complexity and manipulation rose.</p>\n" +
"\n" +
"<p>Media currencies are a spectacular example of <a href=\"https://en.wikipedia.org/wiki/Goodhart%27s_law\">Goodhart’s Law</a>: <strong>“When a measure becomes a target, it ceases to be a good measure.”</strong></p>\n" +
"\n" +
"<p>This would be a curiosity to most — a framework for understanding why some popular platforms thrive and others languish — if the gaming of these metrics didn’t significantly color the information to which we’re exposed.</p>\n" +
"\n" +
"<p>Many online have a passing understanding of the “clickbait” concept, how content is published to perform against metrics advertisers will pay for. But the subtleties of other currencies are not fully appreciated. Metrics define demand, which define the ad formats created, which define the content being made.</p>\n" +
"\n" +
"<p>Perhaps my favorite recent example of this effect is on YouTube.</p>\n" +
"\n" +
"<p>To increase video views — one of the denominating currencies for YouTube ads — YouTube introduced mid-roll ads, or commercials in the middle of videos. YouTube only gets credit for video views if a viewer watches 10 seconds of it. Mid-roll ads generally check this box more effectively, as the viewer is already hooked on the content it’s inserted within when the ad runs. Contrast this with Pre-roll ads, which run before the content has started. Pre-rolls are more likely to drive off viewers within the first 10 seconds, as they’re not invested in the video much before it starts.</p>\n" +
"\n" +
"<p>As a result of this new format, driven by the video views currency, YouTube creators started making longer and longer videos. Videos longer than 8 minutes allow creators to insert mid-roll ads, and it only goes up from there. As a result, we’ve now entered the golden age of video essays with creators creating content no less than 10 minutes and often longer than an hour. The ad metric drove the creation of the format, which drove the creation of the content, remaking the media landscape on YouTube.</p>\n" +
"\n" +
"<p>As someone who enjoys longer form video essays, this is a great example of how tuned incentives — aligning currencies with the unique strengths of a platform — result in value for users, advertisers, and the platform. But not all currencies are as beneficial. And even the good ones age poorly over time, under optimization pressures from all sides.</p>\n" +
"\n" +
"<p>For generations on the internet, people have warned, “If you’re not paying, you are the product.” Such an adage is good to keep in mind, but I suggest we take it a bit further. <strong>We should understand how we’re being packaged and sold so that we understand how it shapes the environments defined for us.</strong></p>\n" +
"\n" +
"<p><img src=\"/img/cash.jpeg\" alt=\"\" /></p>",
@entry_id="https://www.dbreunig.com/2022/06/06/why-media-metrics-matter",
@links=["https://www.dbreunig.com/2022/06/06/why-media-metrics-matter.html"],
@published=2022-06-06 00:00:00 UTC,
@raw_title="Why Media Metrics Matter",
@summary="A Primer on Media Measurement and Why it Defines Your World",
@title_type="html",
@updated=2022-06-06 00:00:00 UTC,
@url="https://www.dbreunig.com/2022/06/06/why-media-metrics-matter.html">,
#<Feedjira::Parser::AtomEntry:0x00007f8c097c9268
@author="Drew Breunig",
@content=
"<h4 id=\"a-common-set-of-trials-faced-all-ambitious-social-networks-until-this-month\">A common set of trials faced all ambitious social networks… Until this month.</h4>\n" +
"\n" +
"<!-- ![](/img/1*O4yjonM7MvPjk8-saX1G1w.png) -->\n" +
"\n" +
"<p>Some of these hurdles just got higher and moved closer together.As Facebook faces a crisis in confidence — having lost the trust of a wide swath of users, advertisers, investors, employees, and (most worryingly) regulators — pundits have attempted to make sense of the matter by holding the company up to historical precedents. Comparisons to AOL have always been popular (<a href=\"https://kottke.org/07/06/facebook-is-the-new-aol\">here’s Kottke pairing the two over a decade ago)</a> but <a href=\"https://www.economist.com/business/2018/11/24/facebook-should-heed-the-lessons-of-internet-history\">now we’re seeing Yahoo! in the mix</a>, which is never a good sign.</p>\n" +
"\n" +
"<p>Considering Facebook’s challenges against the ghosts of social networks past, I couldn’t help but attempt to define a framework regarding the challenges of building a social network. Is there a consistent set of challenges these networks face on their way to either global adoption or failure? While considering the trajectories of current and past social networks, a gauntlet of sorts emerged:</p>\n" +
"\n" +
"<p><img src=\"/img/1*YaK-lm_x0DJLRzyI6NPoqw.jpeg\" alt=\"The Social Networking Gauntlet\" /></p>\n" +
"\n" +
"<p>Ambitious social networks meet each of these successive challenges. With much effort they either overcome each subsequent challenge, stall, or entirely fail out.</p>\n" +
"\n" +
"<p>Let’s use this framework to think through Facebook’s current situation.</p>\n" +
"\n" +
"<p>How has Facebook manged to make it to Challenge 9, ‘Survive Regulation’, when all others have failed? And what challenges is it facing today?</p>\n" +
"\n" +
"<ul>\n" +
" <li>Facebook grew quickly through smart go-to-market strategies. Focusing on colleges allowed it to laser-focus on a niche and upset existing offerings (MySpace, Friendster, etc. who were struggling with Challenges 4 through 7. Facebook at the time was a ‘less boring upstart’ mentioned in Challenge 7). Only after it was the de facto school social network did it allow non-student users to join, in 2006.</li>\n" +
" <li>The following year, Facebook’s revenue began to grow exponentially. In 2007 annual revenue was roughly $150 million. <a href=\"https://www.statista.com/statistics/277229/facebooks-annual-revenue-and-net-income/\">Three years later, annual revenue was nearly $2 billion</a>.</li>\n" +
" <li>With a humming revenue machine, Facebook continued to grow and push for ubiquity. But if you try to attract <em>all *people you risk becoming overly *generic</em>, leaving yourself open to threats from smaller upstarts who pick off chunks of your user base with more tailored and exciting products. This is Challenge 7: achieve ubiquity while not becoming generic, so you’re able to defend against new networks.</li>\n" +
" <li>Facebook viewed this challenge as two problems, each with their own solution.</li>\n" +
" <li>The first problem was vanquishing more interesting upstarts. Facebook’s strategy here was to simply buy anyone who became too bothersome.</li>\n" +
" <li>The second component of this challenge was building a product that works for everyone without being generic. Facebook’s solution here was the algorithmic feed, which automated the adaptation of their product to their users. Sure, the scaffolding *around *the content was the same for everyone, but the content itself was not.</li>\n" +
" <li>The first solution worked very well. The second worked excellently until it didn’t.</li>\n" +
" <li>With the algorithmic feed, there were as many *versions *of Facebook as there were users. So when something went wrong, there weren’t enough resources to address the issues. There were too many Facebooks to manage.</li>\n" +
" <li>Caught off guard, Facebook scrambled. Turning off the algorithmic feed would tank their engagement, reduce their advertising inventory, hinder their advertising performance, and leave them open to less boring competition. <a href=\"https://www.theverge.com/2019/2/25/18229714/cognizant-facebook-content-moderator-interviews-trauma-working-conditions-arizona\">So contractors with massive work forces were hired to monitor content</a>, essentially policing algorithms with countless humans. The tech and operations are catching up, but they’re not there yet.</li>\n" +
" <li>Now they put out fires as they arise, waiting for automated moderation to improve and to find out what Challenge 9, ‘Survive Regulation’, has in store.</li>\n" +
"</ul>\n" +
"\n" +
"<p>I think this gauntlet framework works well for framing Facebook’s success. It works pretty well for Twitter, Tumblr, Instagram, Reddit, Digg, MySpace, Friendster, Yahoo!, and more. But since I first blocked out the gauntlet, a narrative has emerged that short-circuits it a bit: TikTok.</p>\n" +
"\n" +
"<p>TikTok jumped from Challenge 3 to Challenge 9. Prior to the Trump administrations actions, TikTok was staffing up. It was running a familiar playbook: hiring tons of ad ops and sales people to scale their business model after acquiring and engaging a large, valuable audience. Snap did precisely this several years ago, going from just over a hundred employees to 900+ in a matter of months. TikTok was on it’s way to Challenge 4 (making money without pissing off your users) until the Trump administration sprung Challenge 9 on them.</p>\n" +
"\n" +
"<p>And now we’re in uncharted waters.</p>\n" +
"\n" +
"<p>The framework above assumes you’ve built a nicely humming revenue machine <em>before</em> you’re forced to tango with existential regulation. If TikTok <a href=\"https://www.economist.com/leaders/2020/09/19/will-tiktok-survive\">signals the beginning of more aggressive digital nationalism</a>, will this roadmap continue to make financial sense? Or have Facebook and Google timed the window perfectly (completing Challenges 1–8, globally, <em>before</em> facing 9) to become the de facto winners?</p>\n" +
"\n" +
"<p>While this isn’t the explicit knighting of a monopolist, <a href=\"https://amzn.to/3hPAzNm\">as predicted in Tim Wu’s The Master Switch</a>, it may be functionally the same. If the US’s regulatory response to China’s own rules (<a href=\"https://money.cnn.com/2018/02/28/technology/apple-icloud-data-china/index.html\">Apple, Microsoft, Amazon, and other’s services run on state-run servers</a>, just as TikTok’s may soon enough) spurs similar reactions from other governments the addressable market for new social network entrants is radically smaller than their established competitors. (And we haven’t even touched on how this manifests within privacy regulation.)</p>\n" +
"\n" +
"<p>Can you start something that will become a global social network after 2020? Or will the costs needed to even take a <em>shot</em> at major markets be too large and complex to swallow for investors? Or will less and more expensive access to global markets simply prevent new networks from ever growing to challenge the established competition? I’m not sure, but one dynamic certainly has changed:</p>\n" +
"\n" +
"<p>Previously, rapid success meant attracting the eye of your competition’s M&A team. Today, rapid success attracts competitors <em>and</em> regulators. Which makes this trial much more expensive, much earlier. The gauntlet has been short-circuited.</p>",
@entry_id=
"https://www.dbreunig.com/2020/09/21/the-gauntlet-growing-social-networks-face-just-got-harder",
@links=
["https://www.dbreunig.com/2020/09/21/the-gauntlet-growing-social-networks-face-just-got-harder.html"],
@published=2020-09-21 00:00:00 UTC,
@raw_title="The Gauntlet Growing Social Networks Face Just Got Harder",
@summary=
"A common set of trials faced all ambitious social networks… Until this month.",
@title_type="html",
@updated=2020-09-21 00:00:00 UTC,
@url=
"https://www.dbreunig.com/2020/09/21/the-gauntlet-growing-social-networks-face-just-got-harder.html">,
#<Feedjira::Parser::AtomEntry:0x00007f8c097e2830
@author="Drew Breunig",
@content=
"<p><img src=\"/img/1*qWliIK2AGiiV2Rs3d4S6Qw.png\" alt=\"\" /></p>\n" +
"\n" +
"<h4 id=\"and-why-theyre-so-powerful\">And why they’re so powerful</h4>\n" +
"\n" +
"<p>Recently a friend asked why people have an abundance of concern for “<a href=\"https://en.wiktionary.org/wiki/porch_pirate\">porch pirates</a>.” “Is package theft truly a frequent problem?” he asked. “Or is the inconvenience so tremendous people loath to deal with it even once?”</p>\n" +
"\n" +
"<p>While I tend to support the latter theory, I also believe the term “porch pirate” itself is behind the surge in concern. The term is evocative, instantly <a href=\"https://en.wikipedia.org/wiki/Grok\">grokked,</a> and puts a name to a common problem previously dealt with privately. That last part is key: nearly all of us have had packages stolen at one time in our lives, but it was never more than a personal inconvenience. You might mention it in the office kitchen the next day, killing time while coffee brews, but it wasn’t something you shared.</p>\n" +
"\n" +
"<p>“Porch pirate” puts a name to your personal experience and instantly makes it communal. In a moment it crystallizes internal thoughts and makes them portable. It is an exemplar <em>buzzword</em>.</p>\n" +
"\n" +
"<p>I am a <em>massive</em> fan of the <a href=\"https://linguistlist.org/ask-ling/sapir.cfm\">Sapir-Whorf Hypothesis,</a> which suggests that the languages we speak are the framework that we use to understand the world. Meaning: language does not just describe reality, <em>it defines reality</em>. The words you know establish the boundaries of what you can understand.</p>\n" +
"\n" +
"<p><a href=\"https://linguistlist.org/ask-ling/sapir.cfm\">Whorf sums it up nicely</a> (emphasis mine):</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>We dissect nature along lines laid down by our native languages. The categories and types that we isolate from the world of phenomena we do not find there because they stare every observer in the face; on the contrary, <strong>the world is presented in a kaleidoscopic flux of impressions which has to be organized by our minds — and this means largely by the linguistic systems in our minds</strong>. We cut nature up, organize it into concepts, and ascribe significances as we do, largely because we are parties to an agreement to organize it in this way — an agreement that holds throughout our speech community and is codified in the patterns of our language. The agreement is, of course, an implicit and unstated one, but its terms are absolutely obligatory; we cannot talk at all except by subscribing to the organization and classification of data which the agreement decrees.Sapir and Whorf focused on the differences between languages and cultures, but I most frequently reference their idea on the micro level, when new terms emerge in a single language.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>Which brings us back to <em>buzzwords</em>.</p>\n" +
"\n" +
"<p>Not all new words are buzzwords. Most new words are esoteric — <em>jargon</em> — describing new concepts or ideas among groups of experts forging new ground. To understand these new words <em>you</em> must be actively learning and exploring the new terrain they occupy. Jargon emerges with expertise.</p>\n" +
"\n" +
"<p>Buzzwords describe what you already <em>intuitively know</em>. At once they snap the ‘kaleidoscopic flux of impressions’ in your mind into form, crystallizing them instantly allowing you to both organize your knowledge and recognize you share it with other. This rapid, mental crystallization is what I call the <em>buzzword whiplash</em>. It gives buzzwords more importance and velocity, more power, than they objectively should have.</p>\n" +
"\n" +
"<p>The potential energy stored within your mind is released by the <em>buzzword whiplash</em>. The buzzword is perceived as important partially because of what it describes but also because of the social and emotional weight felt when the buzzword recognizes your previously wordless experiences and demonstrates that those experiences are shared.</p>\n" +
"\n" +
"<p>Which is why people are suddenly concerned with “porch pirates.” The term is a perfect buzzword. And the whiplash of recognition makes people more concerned than they objectively should.</p>\n" +
"\n" +
"<p>You too could build a new buzzword. Here’s how to:</p>\n" +
"\n" +
"<ol>\n" +
" <li>Identify a common experience everyone intuitively is aware of but has difficulty expressing concisely (or at all).</li>\n" +
" <li>Craft a term to encapsulate this amorphous experience in plain and evocative language.</li>\n" +
" <li>Seed the term among an expressive, socially connected community in a form they want to consume and share.\n" +
"That’s it!</li>\n" +
"</ol>\n" +
"\n" +
"<p>We haven’t talked about step 3 yet, but it is just as essential as the others. “Porch pirate” registers as barely a blip in Google Trends (despite describing a common experience) until <a href=\"https://www.youtube.com/watch?v=xoxhDk-hwuo\">Mark Rober built a glitter bomb trap to catch a package thief and uploaded the footage to YouTube</a>. Rober went viral <a href=\"https://trends.google.com/trends/explore?date=today%205-y&geo=US&q=Porch%20Pirate\">as did the term</a>:</p>\n" +
"\n" +
"<p><img src=\"/img/1*0iv19RPSU0XxWfY3sTf5xw.png\" alt=\"\" /></p>\n" +
"\n" +
"<p><em>Rober’s video went live on December 17th 2018. That high mark is December 16–22nd, 2018.</em></p>\n" +
"\n" +
"<p>Buzzwords are more powerful than what they describe. By helping you better recognize your own experiences and demonstrating they are shared, they are emotional and social explosions. Often they are accidental, like “porch pirate.” Other times they have intent, like “fake news”. It is crucial we understand how buzzwords work to counter the spells they cast.</p>\n" +
"\n" +
"<p>When we understand how tricks work they becomes less awesome.</p>",
@entry_id="https://www.dbreunig.com/2020/02/28/how-to-build-a-buzzword",
@links=["https://www.dbreunig.com/2020/02/28/how-to-build-a-buzzword.html"],
@published=2020-02-28 00:00:00 UTC,
@raw_title="How to Build a Buzzword",
@summary="And why they’re so powerful",
@title_type="html",
@updated=2020-02-28 00:00:00 UTC,
@url="https://www.dbreunig.com/2020/02/28/how-to-build-a-buzzword.html">,
#<Feedjira::Parser::AtomEntry:0x00007f8c097ee7c0
@author="Drew Breunig",
@content=
"<p><img src=\"/img/1*ZtDmPJ8iKzUza9ZghEl2kA.jpeg\" alt=\"Paiting by Carl Grossberg\" /> <a href=\"https://en.wikipedia.org/wiki/Carl_Grossberg\" class=\"image-caption\">Painting by Carl Grossberg</a></p>\n" +
"\n" +
"<h4 id=\"playing-chicken-with-the-limits-of-technology\">Playing chicken with the limits of technology</h4>\n" +
"\n" +
"<p>Spend enough time thinking about the technology business and you’ll inevitably acquire a mental metaphor for the rhythms and cycles you observe. A specific book or presentation might snap into your head, providing scaffolding to order successes and failures you’ve observed over the years. Or — if you are prone to metaphors and storytelling — you might develop your own.</p>\n" +
"\n" +
"<p>I fall firmly into the latter category. To me, a good framework for thinking about the technology business is asking, “<strong>What are we waiting for?</strong>”</p>\n" +
"\n" +
"<p>In 2001 we were waiting for people. Today we’re waiting for robots. But hold that thought for a moment…</p>\n" +
"\n" +
"<p>Benedict Evens recently published his <a href=\"https://www.ben-evans.com/presentations\">annual presentation</a> on macro trends within the technology industry. Please go read it first.</p>\n" +
"\n" +
"<p>In the piece, Evens asks, “What happens when everyone is online?” then follows with a quote from Marc Andreessen:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>Every failed idea from the dotcom bubble would work now.There are many reasons why the dotcom bubble burst, but a dominant one is not enough people were online soon enough. Companies, caught in a game of chicken, grew as if the masses were right around the corner. When they didn’t arrive (less than 10% of the population were online at the time) companies collapsed as their investments weren’t met by customers.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>We were waiting for people, who didn’t arrive until smartphones matured because it turned out PC were too expensive and wonky for most. Once people arrived, things started picking up:</p>\n" +
"\n" +
"<p><img src=\"/img/1*HMnkfOSXMGNc68Gkg5DtsQ.png\" alt=\"\" /></p>\n" +
"\n" +
"<p><em>Slide from <a href=\"https://www.ben-evans.com/presentations\">Benedict Evan’s deck</a>. Ugly annotations by me.</em></p>\n" +
"\n" +
"<p>We wait for the limits of technology to fall. Sometimes everyone waits for a accessible, sufficient mobile UI for the internet. Sometimes a sector waits for a new type of mattress that can fit in a cardboard box.</p>\n" +
"\n" +
"<p><em><img src=\"/img/1*68XCIzgbbKnoVpqt24zjDQ.png\" alt=\"\" /><a href=\"https://www.ben-evans.com/presentations\">ibid</a>.</em></p>\n" +
"\n" +
"<p>Once a limitation is removed by an innovator, competition races to the new boundary. It’s a land rush. But if you’re investing in massive growth, <em>you venture past the limit and hope it catches up before you run out of money</em>.</p>\n" +
"\n" +
"<p>The limits of technology can be shared headaches or niche enablers. But despite this diversity, they move in one direction on a spectrum from <em>simple problems</em> to <em>messy problems:</em></p>\n" +
"\n" +
"<p><img src=\"/img/1*9h544mzM_swha3oIy2VCvw.png\" alt=\"\" />Messy problems are best solved by expensive human experts. They exist beyond the capabilities of technology. The <em>job</em> of new technology is to move problems from the messy column to the simple one.</p>\n" +
"\n" +
"<p>Some recent examples of messy problems made simple and the results:</p>\n" +
"\n" +
"<ul>\n" +
" <li>Smartphones made the internet <em>accessible</em> to billions, allowing eCommerce to grow.</li>\n" +
" <li>Vacuum-packed foam packaging made mattresses <em>portable</em>, allowing for direct-to-consumer mattress businesses to grow.</li>\n" +
" <li>High bandwidth connectivity made high definition video <em>portable</em>, allowing for streaming services to grow.</li>\n" +
" <li>High volume pricing for cellphone components made smart home devices <em>cheap</em>, allowing for our IoT Cambrian age.</li>\n" +
" <li>Modular shipping containers made small shipments <em>portable</em> and <em>predictable</em>, enabling the build out of global commerce.\n" +
"In my head I make a distinction between “what we’re waiting for” (the ends) and a limit of technology (the means). Many global consumer businesses were waiting for fast, cheap shipping — not the shipping container specifically.</li>\n" +
"</ul>\n" +
"\n" +
"<p>So where are the current limits of technology? What’s preventing the next S Curves, as Ben Evens asks?</p>\n" +
"\n" +
"<p>I keep a running list of problems that resist technology in an effort to spot patterns.</p>\n" +
"\n" +
"<p>Sometimes the problems don’t make the headlines, like <a href=\"https://www.latimes.com/business/story/2019-12-19/boeing-spacex-spacecraft-parachutes\">building parachutes</a>:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>Parachutes encounter turbulent and dynamic airflow, which is almost impossible to replicate with computers, said Erik Seedhouse, an assistant professor in spaceflight operations at Embry-Riddle Aeronautical University. Wind speeds vary at different altitudes. Atmospheric pressure changes in a hundredth of a second. The stresses on a parachute whose job is pulling out a larger parachute can be unpredictable.Parachutes are a <em>poorly understood</em> problem; their environment and performance cannot be modeled. It’s still messy.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>Some of the problems that resist technology are comically mundane, like <a href=\"https://www.newyorker.com/magazine/2018/02/12/why-paper-jams-persist\">paper jams</a>.</p>\n" +
"\n" +
"<blockquote>\n" +
" <p><em>“Papers are not created equally,” John Viavattine, the head of the Torture Lab, said. Some stocks generate excessive friction; others swell in the humidity. (In general, winter jams are more common than summer jams.) Sheets cut from the same forty-ream roll can vary in quality. At the center of the roll, paper fibres tend to arrange themselves in an orderly matrix; nearer the edges, they become jumbled… Even the highest-quality paper can be ruined by poor “paper handling.” A half-used package of paper left to sit will grow damp and curly or dry and “tight.” Reams of paper that are thrown around or kept in stacks can develop hidden curls that lead to jams.</em>Paper is surprising, <em>unpredictable</em>, so paper jams persist. Messy.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>But the headline problems of our era are rooted in technology’s inability to <em>perceive</em> and <em>understand</em>. Software can read articles, organize photos, recognize faces, and spot pedestrians…but not with the skill required for massive rollout and acceptance. Software can perceive simple, predictable scenes, but it has yet to master truly messy perceptive problems. This limitation of technology is why we’re still waiting for robots, or <em>AI</em>.</p>\n" +
"\n" +
"<p>Facebook and Google had hopes of solving moderation with AI. But the complexities of language and diversity of rule sets (Evens gets into this multicultural challenge elsewhere in his deck) resists present technologies. Companies must hire armies of moderators.</p>\n" +
"\n" +
"<p>Uber can turn a profit with either complete market dominance or self-driving technology — a problem that remains messy because it’s surprising and subtle. AI, the hope goes, would address these limitations and enable >10x returns.</p>\n" +
"\n" +
"<p>Companies that bet machine and deep learning would soon provide capable AI are in a squeeze right now. But they had little choice to not take a shot at AI, since any major advancement by a competitor could instantly provide their product at a fraction of their cost. (<a href=\"https://www.sec.gov/Archives/edgar/data/1543151/000119312519103850/d647752ds1.htm\">Uber details this dilemma quite well in their S-1</a>.)</p>\n" +
"\n" +
"<p>The dotcom companies of 2001 played chicken with the limits of technology and their competitors, building as if their world would arrive online tomorrow. But it took the greater part of a decade before people arrived.</p>\n" +
"\n" +
"<p>Today, companies are waiting for robots, sacrificing researchers and capital, hoping they’ll arrive before the money runs out or the regulation arrives. So far the problem has resisted.</p>\n" +
"\n" +
"<p>Which brings us back to Benedict Even’s question: “What is the next S-Curve?” I suggest the following questions as good discussions to have in pursuit of an answer:</p>\n" +
"\n" +
"<ol>\n" +
" <li>Where are the limitations of technology? What problems resist being directly addressed by software or hardware at scale?</li>\n" +
" <li>What are we — our current industry — waiting for? Where are we caught in games of chicken: where the rewards are so disrupting, the entire industry must invest in and chase the challenge?</li>\n" +
" <li>What is the advancement that has yet to arrive that we hope will put a stop to our waiting? What are the paths being explored and what are the paths less traveled?</li>\n" +
" <li>When the above advances arrive, what new use cases will they enable that we aren’t currently waiting for? (Note: as you can see above, the limitations of technology and what we’re waiting for are often not the same. And we often take a surprising route to arrive at our ends.)</li>\n" +
" <li>Poorly understood problems are the hardest technical limitations to solve. What limitations do we face today that are well understood? Poorly understood?\n" +
"I’m sure there are more (better!) questions to help think about the future. But thinking about what we’re waiting for has been fruitful for me, at the macro and micro level.</li>\n" +
"</ol>",
@entry_id="https://www.dbreunig.com/2020/02/03/what-are-we-waiting-for",
@links=["https://www.dbreunig.com/2020/02/03/what-are-we-waiting-for.html"],
@published=2020-02-03 00:00:00 UTC,
@raw_title="What are We Waiting For?",
@summary="Painting by Carl Grossberg",
@title_type="html",
@updated=2020-02-03 00:00:00 UTC,
@url="https://www.dbreunig.com/2020/02/03/what-are-we-waiting-for.html">,
#<Feedjira::Parser::AtomEntry:0x00007f8c09802040
@author="Drew Breunig",
@content=
"<p><img src=\"/img/1*c0dseyBJ5WcN43KVgiCIkA.png\" alt=\"One of these may have helped take down Twitter.\" /></p>\n" +
"\n" +
"<h4 id=\"explaining-the-ddos-problem--its-origins-as-simply-as-i-can\">Explaining the DDoS Problem & its Origins, As Simply As I Can</h4>\n" +
"\n" +
"<p>You may have noticed the <a href=\"https://hackernoon.com/tagged/internet\">Internet</a> is kind of broken today. I think this event will become a Big Deal, potentially remembered as the denial of service attack which led to regulation and more.</p>\n" +
"\n" +
"<p>Discussing it via iMessage, a friend who doesn’t follow Internet <a href=\"https://hackernoon.com/tagged/security\">security</a> (like, you know, normal people) asked me to explain the above in English. I don’t have time to polish it, but I could see this being helpful to others. So here’s roughly what I write:</p>\n" +
"\n" +
"<ol>\n" +
" <li>The iPhone launched, Android followed and smartphones became <em>the</em> business. In under 10 years, roughly 30% of the world now has a smartphone. The amount of money being generated to make little devices with cameras, CPUs, WiFi, gyroscopes, touchscreens, batteries, and more boggles the mind.</li>\n" +
" <li>The demand for smartphones drove the costs of their components <em>wayyy</em> down. There are massive buildings in China filled with bins of components, which designers can pick up on the cheap and design a new toy. This is why almost all new device categories are basically reassembled smartphone guts with a twist: action cameras, fitness trackers, drones, hover boards, webcams, and more. They’re all plentiful and affordable because smartphones are paying for their parts en masse.</li>\n" +
" <li>Designers, engineers, and factories in China iterate on these products constantly. <a href=\"https://www.wired.com/2015/06/the-weird-story-of-the-viral-chinese-scooter-phunkeeduck-io-hawk/\">It’s an organic environment that plays with hardware the way Silicon Valley plays with software</a>. When one conceives, designs, produces, and ships hardware fast, on shoestring budgets, one doesn’t usually take the time to test, secure, or QA much of anything. See: exploding hoverboards. This is another way you can keep costs cheap.</li>\n" +
" <li>Now there are hundreds of millions of these cheap devices all over the world which are comically undersecured and connected to the Internet. And none of the companies which shipped than want to (or can) fix them. (For many of them, the insecurities are baked into the firmware.)</li>\n" +
" <li>Botnet makers have not missed the opportunity. They’ve developed simple hacks to remotely control millions of devices–especially webcams and routers. These devices are the perfect for botnets because they’re online all the time and widely adopted.</li>\n" +
" <li>Recently, the attacks from these botnets have started to reach critical limits, producing distributed denial of service attacks (wherein tons of devices all hit the same website or service as fast as they can, overwhelming it) that are factors larger than attacks they’ve seen in the past.</li>\n" +
" <li>This leap in strength has disrupted the tit-for-tat exchange between attackers and security firms. Security god Bruce Schneier (who I’ve been cribbing from for these last few bullets. <a href=\"http://motherboard.vice.com/read/we-need-to-save-the-internet-from-the-internet-of-things\">Read him</a>!) writes, “Basically, it’s a size vs. size game. If the attackers can cobble together a fire hose of data bigger than the defender’s capability to cope with, they win. If the defenders can increase their capability in the face of attack, they win.” The massive output of the factories, subsidized by smartphone parts, connected to a global marketplace of unaware buyers has disrupted this balance in favor of the attackers.</li>\n" +
" <li>And it’s already creating problems as a tool against less than desirable media. Security journalist Brian Krebs’ site was hit with an attack so large Akamai (one of <em>the</em> cloud hosting providers) <a href=\"http://www.businessinsider.com/akamai-brian-krebs-ddos-attack-2016-9?op=1\">kicked him off their servers</a>. Krebs was saved by Google, who stepped in to protect Krebs free of charge. (Worryingly, <a href=\"http://www.businessinsider.com/akamai-brian-krebs-ddos-attack-2016-9?op=1\">Krebs’ site is completely down at the moment</a>, likely connected to the current attack.)</li>\n" +
" <li>These weapons aren’t just disruptingly powerful, they’re relatively easily available. It’s distressing to consider the scale that could be produced by state-sponsored groups. Imagine that this Internet outage hit more than the East Coast. Imagine this outage hit on Election Day. (If there is a bright side to this attack it is that it establishes some public awareness of the issue prior to it being something which could be politically construed.)</li>\n" +
" <li>This attack is significant enough and targets the right people (the media in NYC and politicians in DC) to garner attention from government. Because, sadly, it appears significant regulation–both requiring basic security features for devices and granting ISPs blacklisting power–is the only way to solve this challenge. Regulating who and what can connect to the Internet appears to be necessary but sets a worrying prescident.</li>\n" +
" <li>In short: access to the Internet and the availability of specific sites is super vulnerable, there’s no easy fix, and the solution is bad.\n" +
"There is a certain irony in Twitter being taken down by webcams, though.</li>\n" +
"</ol>\n" +
"\n" +
"<p>So who’s doing this? Well the most reasonable–and frightening–theory comes again from Bruce Schneier: <a href=\"https://www.schneier.com/blog/archives/2016/09/someone_is_lear.html\">a state-sponsored group is testing the Internet to learn how to take it down</a>. Krebs (whose site is back up!), <a href=\"https://krebsonsecurity.com/2016/10/spreading-the-ddos-disease-and-selling-the-cure/\">thinks someone is creating the disease to sell the cure</a>. Both are possible!</p>\n" +
"\n" +
"<p>(There are plenty of things I’d like to link to above but cannot now, due to the outage. Apologies)</p>\n" +
"\n" +
"<p><strong>Update:</strong> <a href=\"https://krebsonsecurity.com/2016/10/ddos-on-dyn-impacts-twitter-spotify-reddit/#more-36727\">it really looks like this is a vendetta attack</a> against Krebs and a researcher at Dyn, the company which is the main target of the attack. The attack kicked off hours after the two of them delivered a presentation highlighting one party’s bad practices.</p>\n" +
"\n" +
"<p><strong>Update, November 8th, 2016:</strong> It appears we’ve been saved in an unexpected– yet totally <em>fitting</em>–way. The week of the US elections, hackers pointed the traffic cannon at the websites of both Hillary Clinton and Donald Trump, but failed to take out either. Why? Well, it turns out this particular botnet is too easy to access. Following the October 21st attack, many parties signed up the same insecure devices to do their bidding. Specifically, they wanted the botnet to attack various game hosting servers. With tons of users directing the same traffic cannon at their own pet targets, the effect of the weapon became severely dilluted.</p>\n" +
"\n" +
"<p>In a nutshell: the Internet lives another day because too many people wanted to grief Counter-Strike and Minecraft servers. You couldn’t script a more appropriate twist.</p>",
@entry_id=
"https://www.dbreunig.com/2016/10/21/how-the-iphone-led-to-today-s-internet-outage",
@links=
["https://www.dbreunig.com/2016/10/21/how-the-iphone-led-to-today-s-internet-outage.html"],
@published=2016-10-21 00:00:00 UTC,
@raw_title="How the iPhone Led to Today’s Internet Outage",
@summary="",
@title_type="html",
@updated=2016-10-21 00:00:00 UTC,
@url=
"https://www.dbreunig.com/2016/10/21/how-the-iphone-led-to-today-s-internet-outage.html">,
#<Feedjira::Parser::AtomEntry:0x00007f8c0e70bee8
@author="Drew Breunig",
@content=
"<p><img src=\"/img/1*a5IOD4PmyaQIJvv4HtbFxA.png\" alt=\"\" /><a href=\"https://www.instagram.com/richkidsofinstagram/\" class=\"image-caption\">Via Rich Kids of Instagram</a></p>\n" +
"\n" +
"<h4 id=\"participatory-mass-media-changes-the-equation\">Participatory Mass Media Changes the Equation</h4>\n" +
"\n" +
"<p>You should really read <a href=\"http://www.economist.com/news/briefing/21708216-americas-president-writes-us-about-four-crucial-areas-unfinished-business-economic\">Obama’s essay in The Economist</a>. In it, he discusses the rise of populism and anti-globalization, the importance of economic mobility, the growing divide between rich and poor, and how we might address these challenges.</p>\n" +
"\n" +
"<p>Midway in, Obama makes a quick point on how digital media affects one’s perceived social status (emphasis mine):</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>More fundamentally, a capitalism shaped by the few and unaccountable to the many is a threat to all. Economies are more successful when we close the gap between rich and poor and growth is broadly based. A world in which 1% of humanity controls as much wealth as the other 99% will never be stable. <strong>Gaps between rich and poor are not new but just as the child in a slum can see the skyscraper nearby, technology allows anyone with a smartphone to see how the most privileged live.</strong> Expectations rise faster than governments can deliver and a pervasive sense of injustice undermines peoples’ faith in the system. Without trust, capitalism and markets cannot continue to deliver the gains they have delivered in the past centuries.The argument that media exposes the underclass to a good life beyond their reach is not new. In his book, <a href=\"http://amzn.to/2dUTN3O\">In Spite of the Gods</a>, Edward Luce argues that access to television exacerbated unrest with regard to the economic divide in India:</p>\n" +
"</blockquote>\n" +
"\n" +
"<blockquote>\n" +
" <p>What today’s villagers and small-town dwellers in India see seductively paraded before them as they crowd around their nearest TV screens are things most of them have little chance of getting in the near future: the cars, foreign holidays, smart medical services, and electronic gadgets that dominate the TV commercials. Most of these products are not meant for them at all but are targeted at — and often by — people like Alok. Such items are beyond the reach of the majority in a country where the average per capita income in 2006 was still below $1,000. Sooner or later if you are unable to get what you are repeatedly told you should want, something has to give.Yes, Obama is correct that smartphones and social media better expose us to the lives of the rich and famous, when compared to TV. But it goes beyond more access. Digital media’s effect on our attitude regarding socioeconomic standing is doubly problematic because we share our lives in the same forum as the privileged.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p><img src=\"/img/1*DkRVLcTOX1xPioFaoZYg1A.png\" alt=\"On your TV\" /></p>\n" +
"\n" +
"<p>To audiences in the 80’s, <em>Lifestyles of the Rich and Famous</em> was practically on another planet. Robin Leach performed a function similar to David Attenborough’s: both hosts guided viewers through an environment most would never witness. Then we turned off the TV, and went back to our real world.</p>\n" +
"\n" +
"<p><img src=\"/img/1*smvVsAiEmD6m8LG95oiEug.png\" alt=\"In your stream\" /></p>\n" +
"\n" +
"<p>But with Instagram, Snapchat, and Facebook our lives, the lives of our friends, and the lives of the famous mix in the same feed. We have the same accounts, the same phones, the same cameras. Not only is there nothing to turn off to return to our real world, we must actively consider how to present our world into a feed populated with those more privileged.</p>\n" +
"\n" +
"<p>This active editing, they way we’ve learned to brand ourselves and our lives, has a nasty knock-on effect: the people we know who carefully edit start to appear more privileged. I have friends of similar socioeconomic status who seem to vaction more than Leach’s subjects. Most of our feeds look this way.</p>\n" +
"\n" +
"<p>This change — from the lavish life on TV to lavish life on your stream––will get to many. It fundamentally changes how we <a href=\"https://en.wikipedia.org/wiki/Keeping_up_with_the_Joneses\">keep up with the Joneses</a>. In the TV era, when I wasn’t in the part of the medium, one just needed a nice lawn and adequate car. Today, if we need to feed the stream, we need better vacations, better reservations, better fashion, and free time to find diverse experiences. All at a constant pace.</p>\n" +
"\n" +
"<p>Unquestionably, digital media is a massive net good for the world. (Even that feels like an understatement) But its effects on perceived inequality are different than any we’ve experienced. This is a wholly new challenge. One which requires new cultural tools.</p>",
@entry_id=
"https://www.dbreunig.com/2016/10/08/keeping-up-with-the-joneses--today",
@links=
["https://www.dbreunig.com/2016/10/08/keeping-up-with-the-joneses-today.html"],
@published=2016-10-08 00:00:00 UTC,
@raw_title="Keeping Up with the Joneses, Today",
@summary="Via Rich Kids of Instagram",
@title_type="html",
@updated=2016-10-08 00:00:00 UTC,
@url=
"https://www.dbreunig.com/2016/10/08/keeping-up-with-the-joneses-today.html">,
#<Feedjira::Parser::AtomEntry:0x00007f8c0e70dcc0
@author="Drew Breunig",
@content=
"<p><img src=\"/img/1*bVGPuLFrOhHN8ig7xSgrbw.jpeg\" alt=\"\" /></p>\n" +
"\n" +
"<h4 id=\"does-free-speech-include-distribution\">Does ‘Free Speech’ Include Distribution?</h4>\n" +
"\n" +
"<p>Yes, it’s a goofy dream. Yes, Congress won’t let them stop Saturday delivery, let alone spend $30 billion on a wobbly and <em>weird</em> social network. Yes, this will never happen. Yes, $30 billion could buy 90 <a href=\"https://warisboring.com/how-much-does-an-f-35-actually-cost-21f95d239398#.2mkhzsxb8\">F-35s</a> instead.</p>\n" +
"\n" +
"<p>But: I can’t get this idea out of my head. My mind stumbles on it every other commute. Every news item about Twitter’s sale spurs the notion. <a href=\"http://www.recode.net/2016/10/6/13183378/recode-daily-google-disney-not-bidding-on-twitter\">Google and Disney are walking away</a> leaving only Salesforce, but oh: <a href=\"https://techcrunch.com/2016/10/03/to-beef-up-in-marketing-tech-salesforce-is-buying-krux-for-340m-in-cash-and-stock-potentially-750m-overall/\">they just bought Krux</a>. Maybe there won’t be a suitor. Their market cap is down to less than $15 billion on the news. Hmm, that’s only <em>44</em> F-35s…</p>\n" +
"\n" +
"<p>Ok. This won’t happen. But the idea is so natural to me, so easily raised, that I feel compelled to share it with you. Perhaps doing so will expunge it from my head.</p>\n" +
"\n" +
"<p>Here is why I keep dreaming about Twitter being bought by the USPS:</p>\n" +
"\n" +
"<ul>\n" +
" <li>It is becoming increasingly apparent that we, the people of the United States, expect Freedom of Speech to not only protect the articulation of one’s ideas and opinions, but the distribution of these notions as well.</li>\n" +
" <li>This expectation is shared by various political groups in the United States, from left and right, in various contexts.</li>\n" +
" <li>When Reddit banned it’s most reprehensible channels, protestors cried, “Freedom of speech!” Supporters of the move chastised the protesters for expecting that a private corporation should be required to host their content.</li>\n" +
" <li>When Facebook suffers a “<a href=\"http://gizmodo.com/facebook-stands-by-technical-glitch-claim-says-cop-did-1783349993\">technical glitch</a>” and removes live broadcasts of police confrontations or protests, we wring our hands about freedom of speech again. When it’s not clear how trending news is determined, representatives from both parties demand clarification. All this happens despite the fact that Facebook is another private corporation which can host, or <a href=\"http://www.nytimes.com/2016/09/10/technology/facebook-vietnam-war-photo-nudity.html?_r=0\">not host</a>, whatever it wants.</li>\n" +
" <li>Hell, people were shouting about freedom of speech when that dude from Duck Dynasty was kicked off a TV show.</li>\n" +
" <li>The expectation that freedom of speech now covers some baseline of distribution is widely held among the US population. Pundits, congress people, journalists, and average citizens have come to expect information to be distributed to some degree.</li>\n" +
" <li>If we wish to fulfill this expectation, we have two options for ensuring this new concept of freedom of speech is guaranteed: we can make a deal with a corporation or develop the government’s ability to meet this need. Let’s look at an example for each.</li>\n" +
" <li>The best example for corporate negotiated information openness is AT&T. In 1913, AT&T struck a deal with the US government which ensured AT&T would not be pursued as a monopolist, so long as AT&T allowed competiting telephone companies to interconnect with their long-distance network. This, certainly, is a gross over-simplification, but everyone should just go read Tim Wu’s <a href=\"http://amzn.to/2dPZG2j\">The Master Switch</a>, which covers all of this, in context, beautifully:</li>\n" +
"</ul>\n" +
"\n" +
"<blockquote>\n" +
" <p>We sometimes treat the information industries as if they were like any other enterprise, but they are not, for their structure determines who gets heard.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>An AT&T-type deal, the dream of an idealistic monopoly, has happened many times (and a few times before). It is the default. It is what will likely happen with Facebook, Google, or whomever wins. Seriously,<a href=\"http://amzn.to/2dPZG2j\"> just go read Wu</a>.</p>\n" +
"\n" +
"<ul>\n" +
" <li>But there is another model for information distribution: the government can take up the task. This is why we created the United States Post Office. The little discussed origins of the USPS was covered recently by <a href=\"http://amzn.to/2dQ0og0\">Winifred Gallagher</a> (<a href=\"http://www.wnyc.org/story/how-post-office-created-america/\">here is an interview with a summary of her recounting</a>).</li>\n" +
" <li>The USPS was instigated by our founding fathers during the 2nd Continental Congress. Ben Franklin was the first postmaster general. It is explictely authorized in the Constitution. Gallagher shows in her work that the motivation for it’s creation was to unite 13 different colonies, tying the nation together. Information, it was held, was crucial to the functioning of a democracy. If information could not be distributed, the new nation could not function.</li>\n" +
" <li>Both the corporate deal model and the government provided model have their pros and cons. A strong argument could be made for both. What worries me, what keeps this line of thinking alive in my brain, is that we aren’t having this discussion. We yell and protest when we feel some bit of speech <em>must</em> be distributed by Facebook, Twitter, Reddit, Google, or whomever. But we never stop and think about <em>why *and</em> if* we should be relying on private companies to guarantee the distribution of our speech.</li>\n" +
" <li>True, there are exchanges about net neutrality (coined by our good friend, Tim Wu) and we’ve had some victories, but we’ve had nothing approaching national discussion.</li>\n" +
" <li>If we do not discuss this, the choice is made for us: a deal with a corporation will be struck. We will ignore these issues while Facebook and Google continue to grow until one day we realize they’re too big and important to change or break up. Then, a deal will be cut.</li>\n" +
" <li>(Oh look: <a href=\"https://www.washingtonpost.com/news/the-switch/wp/2016/10/06/facebook-is-talking-to-the-white-house-about-giving-you-free-internet-heres-why-that-may-be-controversial/\">Facebook is working on bringing their own version of the Internet to rural and low income users for free</a>.)</li>\n" +
" <li>Or: Twitter could flounder. No suitor could emerge while their market cap dwindles. At some point their price could become within the reach of a government take-over. Then digital distribution — for text, photos, video, conversations, and more — could be a public good, maintained by the public. Facebook could block whatever it wants. Google could remove search results and the stakes would be lower. The USPS could continue their mission of ensuring equal access to the distribution of information to the nation (should they care). And finally, our contemporary expectations of freedom of speech could be reflected by government guarentees.\n" +
"Yes, this is absurd. But it’s fun to think about. How would Twitter change under the USPS, assuming they had the resources to change Twitter? Real identities would need to be present, but not public. One could change their public name as one obtains a PO Box. Advertising would still be there, subsidizing development. The direct mail industry could move online, seamlessly shifting it’s household address databases to Twitter IDs for audience targeting. What new businesses would be unexpectedly fueled by this machine?</li>\n" +
"</ul>\n" +
"\n" +
"<p>It’s a goofy idea, but one that spurs thinking.</p>\n" +
"\n" +
"<p>Mock it if you like, but it’s much less boring than waiting quietly to strike a deal with Facebook.</p>",
@entry_id=
"https://www.dbreunig.com/2016/10/07/i-dream-of-the-post-office-buying-twitter",
@links=
["https://www.dbreunig.com/2016/10/07/i-dream-of-the-post-office-buying-twitter.html"],
@published=2016-10-07 00:00:00 UTC,
@raw_title="I Dream of the Post Office Buying Twitter",
@summary="",
@title_type="html",
@updated=2016-10-07 00:00:00 UTC,
@url=
"https://www.dbreunig.com/2016/10/07/i-dream-of-the-post-office-buying-twitter.html">,
#<Feedjira::Parser::AtomEntry:0x00007f8c0e710c90
@author="Drew Breunig",
@content=
"<p><img src=\"/img/1*8SrauHQD5aBmPFmPFJQRwA.png\" alt=\"\" /></p>\n" +
"\n" +
"<h4 id=\"and-theyll-succeed-where-google-stumbled\">…and they’ll succeed where Google stumbled</h4>\n" +
"\n" +
"<p><a href=\"https://hackernoon.com/tagged/snapchat\">Snapchat</a> announced Spectacles, their first hardware product, last Friday evening. Despite being the first hardware effort from our* *current social darlings, Evan Spiegel and company seem to be playing down the product. In <a href=\"http://www.wsj.com/articles/snapchat-releases-first-hardware-product-spectacles-1474682719\">a Wall Street Journal feature</a> on Snap, only a handful of paragraphs touch on Spectacles. One section sandbags the entire effort:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>For the moment, Spectacles appears to be a bit of a lark. At a price of $129.99 and with limited distribution, it won’t be relied upon for significant immediate revenue. Spiegel refers to it as a toy, to be worn for kicks at a barbecue or an outdoor concert — Spectacles video syncs wirelessly to a smartphone, making it easily shareable. “We’re going to take a slow approach to rolling them out,” says Spiegel. “It’s about us figuring out if it fits into people’s lives and seeing how they like it.”</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>If you ignore the Karl Lagerfeld portraits of Spiegel, the whole affair is rather low key.</p>\n" +
"\n" +
"<p><img src=\"/img/1*Ee5aGGnMAt5iv6i5pOeEqg.png\" alt=\"\" />And who can blame them? Launching digital glasses anywhere in California forces comparison’s to Google Glass, a device whose <a href=\"https://www.youtube.com/watch?v=9c6W4CCU9M4\">massive, life changing promises</a> <a href=\"http://www.nytimes.com/2015/02/05/style/why-google-glass-broke.html\">were never delivered</a>. Glass’ launch video was an <a href=\"https://www.youtube.com/watch?v=9c6W4CCU9M4\">augmented reality what-if</a> followed by an <a href=\"https://www.youtube.com/watch?v=D7TB8b2t3QE#t=73\">initial live demo by skydivers</a>, then <a href=\"http://www.wsj.com/video/google-glasses-at-diane-von-furstenberg-show/738BCFD3-F507-4225-855E-D7CFA2656A7F.html\">a Diane Von Furstenburg catwalk appearance</a>.</p>\n" +
"\n" +
"<p>Compared to Glass as promised by Google, Spiegel’s “toy” for a “barbeque or an outdoor concert” is rather mundane.</p>\n" +
"\n" +
"<p>But that’s OK.</p>\n" +
"\n" +
"<p>I think Spectacles could not only succeed, but turn out to be a watershed product for Snap. Here’s why:</p>\n" +
"\n" +
"<p><img src=\"/img/1*VEAHDMdZDzHK2SS4tLNMQQ.jpeg\" alt=\"\" /></p>\n" +
"\n" +
"<h4 id=\"spectacles-side-step-glass-cultural-challenges\">Spectacles Side-Step Glass’ Cultural Challenges</h4>\n" +
"\n" +
"<p>Despite their high technical goals, Glass’ biggest challenge was cultural. Google focused so closely on the users of Glass that they forgot everyone around them, leading to the <a href=\"https://hackernoon.com/tagged/design\">design</a> of an anti-social product. I cover this in detail <a href=\"http://dbreunig.tumblr.com/post/45752835306/google-glass-is-just-like-the-segway-and\">here</a>, but in a nutshell:</p>\n" +
"\n" +
"<blockquote>\n" +
" <p>Glass’ screen is visible only to it’s user and it’s camera looks out documenting everything except the user, storing content to be shared at the user’s discretion. I believe that these always on, core functions of Glass will prevent it from being welcomed in social settings. Those around the Glass users must implicitely trust the Glass wearer, for they have no idea where the Glass user’s current attention lies and cannot visually confirm whether or not they are currently being captured by Glass.</p>\n" +
"</blockquote>\n" +
"\n" +
"<blockquote>\n" +
" <p><strong>Google is working so hard to keep technology out of the way that they’re forgetting why it’s important to <em>see</em>technology when it’s present</strong>.This challenge was clearly considered by the team at Snap, because they made several choices which address it directly. They added a bright light to show others you’re recording. They limited video to 10 second clips, not persistent documentation. And, most importantly, they made Spectacles sunglasses.</p>\n" +
"</blockquote>\n" +
"\n" +
"<p>Sunglasses are worn occasionally. Sunglasses aren’t worn inside. Everyone wears sunglasses. Sunglasses make sense during public, fun times. Wear sunglasses inside and you’re considered “that guy,” no onboard camera required. Choosing sunglasses specifically tightly confines Spectacles’ use cases to social acceptable contexts. Spectacles work within cultural norms rather than attempt to redefine them.</p>\n" +
"\n" +
"<p><img src=\"/img/1*9kXJtGRHC72-8l9nehXL2Q.png\" alt=\"\" /></p>\n" +
"\n" +