-
Notifications
You must be signed in to change notification settings - Fork 76
/
02-SimpleWorld.Rmd
609 lines (499 loc) · 26.1 KB
/
02-SimpleWorld.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
# SimpleWorld: A worked example of spatial microsimulation {#SimpleWorld}
This chapter focuses on the minimum input datasets needed for the classical type
of microsimulation. We will use input data on
the inhabitants of an imaginary world
(geographical individual level) called SimpleWorld to demonstrate
the basic concepts and techniques to perform spatial microsimulation with R.
This is the
first practical chapter and it aims to acquaint you with language R
and the R development program RStudio.
Fluency with R and RStudio, in combination with well-organised project
management and clear, commented code, will enable fast and efficient work.
In short, this chapter aims to teach good 'workflow' in preparation for navigating
the subsequent chapters.
A secondary aim is to illustrate key
concepts in spatial microsimulation with reference to a practical example, called
'SimpleWorld' for its simplicity.
```{r, echo=FALSE}
# TODO: do we talk enough about RStudio projects?
```
The chapter also serves to highlight the links between the methodology
presented in-depth later in the book (see Chapter 5)
and the various real-world applications presented in Chapter 4.
For R 'newbies' it also provides a chance
for the reader to do some basic programming in R:
the unwritten
subtitle of this book is "a *practical* introduction" for a reason!
```{r, echo=FALSE}
# TODO (RL): Update the above if we get a subtitle, mention in preface
```
\pagebreak
This chapter is ordered as follows:
- *Getting setup with the RStudio environment* (\@ref(rstudioUpSpeed)) contains
the basic explanation of RStudio to enable you to code the examples of this
book on your own computer.
- *SimpleWorld data* (\@ref(SimpleWorldData)) describes the context and the
data of this little illustrative example.
- *Generating a weight matrix* (\@ref(weight-matrix)) contains the code and the description of the role of a weight matrix in a spatial microsimulation.
-*Spatial microdata* (\@ref(spatial-microdata)) shows the typically useful output
of a spatial microsimulation.
-*SimpleWorld in context* (\@ref(SimpleWorldContext)) mentions the kind of
context where the output could be used.
## Getting setup with the RStudio environment {#rstudioUpSpeed}
Before progressing further, it is important to ensure that R is setup correctly
and working on your computer, so you can follow the practical examples.
This section will not go into much detail. Further resources are provided in
the Appendix at the end of the book to provide insight into how R
works as a programming language.
The majority of the help that you need to install and setup R on your computer
can be found online, so it is recommended that you consult this section with
reference to online searches and help pages. As with all open source software
projects R evolves over time so it is important to keep up-to-date with
any changes, which may not be accounted for in this book.
### Installing R
The practical examples in this book require a recent version of R to be installed
on your computer. This will allow you to type test the code as you learn the methods.
To install R, we recommend you refer to documentation provided
online, on the Comprehensive R Archive Network (CRAN):
- For Windows, see https://cran.r-project.org/bin/windows/base/
- For Mac, see https://cran.r-project.org/bin/macosx/
- For Linux, see https://cran.r-project.org/doc/manuals/r-release/R-admin.html
```{r, echo=FALSE}
# The linked way:
# - For Windows, see [cran.r-project.org/bin/windows/base](https://cran.r-project.org/bin/windows/base/)
# - For Mac, see [cran.r-project.org/bin/macosx/](https://cran.r-project.org/bin/macosx/)
# - For Linux, see [cran.r-project.org/doc/manuals/r-release/R-admin.html](https://cran.r-project.org/doc/manuals/r-release/R-admin.html)
```
On Debian-based systems such as Ubuntu, the following bash command should be sufficient to install R.
```
sudo apt-get install r-base
```
Once R is installed, new packages are easy to install from within R using
`install.packages("package_name")`. To install the **mipfp** package, for example,
we would use:
```{r, eval=FALSE}
install.packages("mipfp")
```
### RStudio
RStudio is an Integrated Development Environment (IDE) for R.
It provides an advance Graphical User Interface (GUI) that makes it easier not only to write R code, but to install packages, manage R's graphical outputs and organise complex projects.
All R code presented in this book will run in other environments, such as Linux's bash shell, in which R can be initiated by the following command:
```{r, engine='bash', eval=FALSE}
$ R # enter the R command line
```
It is highly recommended that you use RStudio for the work presented in this book, however, especially if you an R beginner.
RStudio will enable you to spend less time worrying about how to write R code correctly and more time focussed on spatial microsimulation.
One example of how RStudio saves time is its tab autocompletion functionality.
To test this functionality, type the beginning of any R object or function stored in R's environment (such as `plo`, short for `plot`) and
hit the `Tab` button. The full name of various options
should auto-complete, as illustrated in Figure 2.1.
Pressing the down arrow allows the user to select the correct input.
Auto-completion allows fast typing, easy recollection of function/object names.
It also prevents errors cause by typos in your code.
```{r rstudio, fig.cap="Autocompletion in RStudio", fig.scap="Autocompletion in RStudio", echo=FALSE, message=FALSE}
# todo: create tutorial on installing R and RStudio
grid.raster(readPNG("figures/rstudio-autocomplete.png"))
```
To test this functionality, download and install RStudio if you have not already.
The installation process is simple on Linux, Mac and Windows platforms:
simply follow the instructions provided here at rstudio.com/products/rstudio/download/.
There are many other advantages of using
RStudio.^[For
more information about RStudio, please see the RStudio website:
https://www.rstudio.com/products/rstudio/features/ and search for
other on-line resources.
There is even an entire book dedicated to the subject
[@verzani2011getting].
]
The next section focuses on one particular advantage that we recommend you
make use of whilst working through the book: RStudio projects.
### Projects
RStudio provides a neat system for creating, managing and even sharing your projects.
This functionality merits a section of its own, because spatial microsimulation
projects are likely to be large and complex, requiring collaboration with
other people. If you are not careful the projects can become over-complex
and unmanageable. RStudio projects can prevent this from happening and
lead to improved workflow.
To setup a new project click on the drop-down menu in the top-right
section of RStudio. If you are working on a project, the name of the
project will appear in this dropdown menu (see Figure 2.2).
When a project is loaded, the following things happen:
- The files open in the script panel (top left) last time you worked on the project will appear.
- R's working directory (set and checked with `setwd()` and `getwd()`, respectively) will change.
- Any R objects saved in the `.RData` file in the project will be automatically loaded, saving your work.
- If you are using `git` to manage your project's development and sharing, you'll have options to commit code and 'push' new work, e.g. to GitHub.
In fact, this entire book was written as an RStudio project.
This allowed us to work from the same system and share code easily, by pushing our
code frequently to the book's online repository: https://github.com/Robinlovelace/spatial-microsim-book .
This online repository will allow you to access all of the example code and data for the book.
It will also allow you to contribute to the book as an evolving project.
\pagebreak
The next section explains how to download and use the data stored in the `spatial-microsim-book`
GitHub repository to access the code and data resources associated with the book.
### Downloading data for the book
To simply download the code and data, associated with this book, first navigate to
https://github.com/Robinlovelace/spatial-microsim-book in your browser. From this page, click on the
'Download ZIP' button to the right and extract the folder into a
sensible place on your computer, such as the
Desktop.^[An
alternative way to get the project files onto your computer is to 'clone' the repository.
For further details on how to fork, clone
and potentially contribute to the book project, see the GitHub website, or
refer to the growing literature how GitHub can be used in research [@lima_coding_2014].
]
With the files safely stored on the computer, the next step is to
open the folder as an RStudio project. To do this, open the folder in a file browser
and double click on `spatial-microsim-book.Rproj`.
Alternatively, start RStudio and open the file by clicking on 'Open Project'
from the dropdown menu above the top-right panel,
or by clicking with File > Open.
This will cause RStudio to
open the project. All the input data files should now be easily accessible to R
through *relative file paths*. The files should also be visible in the Files tab in
bottom right panel.
\index{RStudio}
```{r studio, fig.cap="The RStudio interface with the spatial-microsim-book project loaded and objects loaded in the Environment after running SimpleWorld.R",fig.scap="The RStudio interface",fig.pos='!h', echo=FALSE, message=FALSE}
# todo: create tutorial on installing R and RStudio
grid.raster(readPNG("figures/rstudio-environment.png"))
```
To check if the project has been downloaded and opened correctly,
try running the following code, by typing it and hitting `Ctl-Enter`:
```{r, message=FALSE}
source("code/SimpleWorld.R")
```
You should see some output in red, beginning `Attaching package: ‘dplyr’`.
Some new objects should also appear in the Environment tab in the top-right panel (Figure 2.2).
This is the result of running the code located in a script file called `SimpleWorld.R`, using the `source()` function.
Now that you have an understanding of RStudio and how to load the book project, it's time to
'get your hands dirty' and run some code!
In the subsequent section we will type and run some of the contents of the
`SimpleWorld.R` script line by line.
## SimpleWorld data {#SimpleWorldData}
SimpleWorld is a small world, consisting of 33 persons split across 3 zones, as illustrated in Figure 2.3.
We have two sources of information about these people available to us:
1. aggregate counts of persons by age and sex in each zone (from the SimpleWorld Census)
2. survey microdata recording more detailed information (age, sex and income), for five of the world’s residents.
```{r simpleworld, fig.cap="The SimpleWorld environment, consisting of individuals living in 3 zones.", fig.scap="The SimpleWorld environment", fig.pos='!h', echo=FALSE, message=FALSE, fig.width=4}
# Code to create SimpleWorld
# Builds on this vignette: http://cran.r-project.org/web/packages/sp/vignettes/over.pdf
library(sp)
library(ggplot2)
xpol <- c(-180, -60, -60, -180, -180)
ypol <- c(-70, -70, 70, 70, -70)
pol = SpatialPolygons(list(
Polygons(list(Polygon(cbind(xpol, ypol))), ID="x1"),
Polygons(list(Polygon(cbind(xpol + 120, ypol))), ID="x2"),
Polygons(list(Polygon(cbind(xpol + 240, ypol))), ID="x3")
))
# plot(pol)
proj4string(pol) <- CRS("+init=epsg:4326")
pol1 <- fortify(pol)
theme_space_map <- theme_bw() +
theme(
# rect = element_blank(),
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid.major = element_line(size = 3)
)
#
# ggplot(pol1) + geom_path(aes(long, lat, group, fill = group)) +
# coord_map("ortho", orientation=c(41, -74, 52)) +
# theme_space_map
b <- matrix(c(-180, 180, 180, -180, -180, 90, 90, -90, -90, 90), ncol = 2)
library(sp)
l <- Line(coords = b)
l <- Lines(list(l), ID = "a")
l <- SpatialLines(list(l))
library(rgeos)
p <- gPolygonize(l)
plot(p)
l1 <- Line(matrix(c(-60, -60, 90, -90), ncol = 2))
l1 <- Lines(list(l1), "a")
l1 <- SpatialLines(list(l1))
lines(l1)
l2 <- Line(matrix(c(60, 60, 90, -90), ncol = 2))
l2 <- Lines(list(l2), "a")
l2 <- SpatialLines(list(l2))
lines(l2)
zt <- paste0("Zone ", 1:3)
text(seq(-120, 120, length.out = 3), 0, zt)
```
Unfortunately the survey data lack geographical information, and include only
a small subset of the population (5 out of 33). To infer further information
about SimpleWorld --- and more importantly to be able to model its
inhabitants --- we need a methodology. This is precisely the kind of situation
where spatial microsimulation is useful. After an explanation of this
'starting simple' approach, we describe the input data in detail, demonstrate
what spatial microdata look like and, finally, place the example in its wider
context.
This SimpleWorld example is analogous to the
'cartoon world' used by professor @mackay2008sustainable to simplify the
complexities of sustainable energy. The same principle works here:
we will generate a synthetic population for the imaginary SimpleWorld to
illustrate how spatial microsimulation works and how it can be useful.
The SimpleWorld planet, a 2 dimensional
plane split into 3 zones, is illustrated in Figure 2.3.
SimpleWorld is inhabited by 12, 10 and 11 individuals of its alien inhabitants
in zones 1 to 3, respectively: a planetary population of 33.
From the SimpleWorld Census, we know
how many young (strictly under 50 space years old) and old (50 and over)
residents live in each
zone, as well their genders: male and female.
This information is displayed in the tables below.
Table: Aggregate level age counts for SimpleWorld.
|zone | 0-49 yrs| 50 + yrs|
|:--|-----:|-----:|
|1 | 8| 4|
|2 | 2| 8|
|3 | 7| 4|
Table: Aggregate sex counts for SimpleWorld.
|Zone | m| f|
|:--|--:|--:|
|1 | 6| 6|
|2 | 4| 6|
|3 | 3| 8|
The following commands load-in this data.
```{r}
# Load the constraint data
con_age <- read.csv("data/SimpleWorld/age.csv")
con_sex <- read.csv("data/SimpleWorld/sex.csv")
cons <- cbind(con_age, con_sex)
```
We recommend R beginners
interact with the R objects just loaded: try printing
them to screen, plotting them and even-subsetting them.
If you cannot, it may be worth consulting the Appendix
on using R, or referring to R's copious online documentation.
The data contained in the `cons` object presented below
refer to the characteristics of inhabitants in different SimpleWorld zones.
```{r fig = "Mercator maps of the zones in SimpleWorld", echo=FALSE, message=FALSE}
# library(knitr)
# kable(con_age, row.names = T)
# kable(con_sex, row.names = T)
pol <- SpatialPolygonsDataFrame(pol, cons, match.ID = F)
# pol@data
pol$p_young <- pol$a0.49 / (pol$a.50. + pol$a0.49) * 100
pol$p_male <- pol$m / (pol$f + pol$m) * 100
pol$id <- c("x1", "x2", "x3")
library(dplyr)
pol1 <- inner_join(pol1, pol@data)
pol1$Name <- paste("Zone", 1:3, sep = " ")
pol1$xpos <- seq(-120, 120, length.out = 3)
pol1$ypos <- 0
# ggplot(pol1) +
# geom_polygon(aes(long, lat, group, fill = p_young)) +
# geom_path(aes(long, lat, group, fill = p_young)) +
# geom_text(aes(xpos, ypos, label = Name)) +
# theme_bw() +
# scale_fill_continuous(low = "black", high = "white", limits = c(0, 100),
# name = "% Young") +
# coord_map()
#
# ggplot(pol1) +
# geom_polygon(aes(long, lat, group, fill = p_male)) +
# geom_path(aes(long, lat, group, fill = p_male)) +
# geom_text(aes(xpos, ypos, label = Name)) +
# theme_bw() +
# scale_fill_continuous(low = "black", high = "white", limits = c(0, 100),
# name = "% Male") +
# coord_map()
```
```{r}
cons # print the constraint data to screen
```
Each row represents a new zone (see also Tables 2.1 and 2.1).
The planet's entire
population is represented in the counts in these tables.
From these *constraint tables* we know the marginal distributions
of the two categorical variables, age and sex.
The data does not tell us about the contingency table (or cross-tabulation)
that links age and sex. This means that we have per-zone information on
the number of young and old people and the number of males and females.
But we currently lack information about the number
of young females, young males, and so on. Also note that we have no information
about other important variables such as income.
Spatial microsimulation can be used to better understand the
population of SimpleWorld. To do this, we need some additional information:
an individual level *microdataset*.
This is provided in an individuals level dataset on 5 of SimpleWorld's
inhabitants (Table 2.3).
Note that the individual level data has different dimensions than the
aggregate data presented above.
Table: Individual level survey data from SimpleWorld.
| id| age|sex | income|
|--:|---:|:---|------:|
| 1| 59|m | 2868|
| 2| 54|m | 2474|
| 3| 35|m | 2231|
| 4| 73|f | 3152|
| 5| 49|f | 2473|
The individual level *microdata* survey has one row per individual whereas
the *constraints* have one row per zone.
This individual level data includes two variables that *link* it to the
aggregate level data described above: age and sex. The individual level
data also provides information about a *target variable* not included
in the geographical constraints: income. To load the individual level data,
enter the following:
```{r}
ind <- read.csv("data/SimpleWorld/ind-full.csv")
```
```{r, echo=FALSE}
# ind <- read.csv("data/SimpleWorld/ind.csv")
# ind$income <- round(rnorm(n = nrow(ind), mean = 1000, sd = 100))
# ind$income <- ind$income + 30 * ind$age
# ind$income[ ind$age == "f"] <- ind$income + 1000
# write.csv(ind, "data/SimpleWorld/ind-full.csv", row.names = F)
# kable(ind)
```
Note that although the microdataset contains additional information about the
inhabitants of SimpleWorld, it lacks geographical information about where each
inhabitant lives or even which zone they are from. This is typical of
individual level survey data. Spatial microsimulation tackles this issue by
allocating individuals from a non-geographical dataset to geographical zones in
another.
## Generating a weight matrix {#weight-matrix}
The process to generate spatial microdata is allocating *weights*
to each individual for each zone. The higher the weight for a particular
individual-zone combination, the more representative the individual is of that
zone. This information can be represented as a *weight matrix*.
The subsequent code block uses the **mipfp** package to convert these
inputs into a matrix of weights. Just type it in and observe the
result. If it works, congratulations! You have generated your first
weight matrix using Iterative Proportional Fitting (IPF) (see code output). The
details of this process are described in subsequent chapters.
```{r, message=FALSE}
target <- list(as.matrix(con_age[1,]), as.matrix(con_sex[1,]))
descript <- list(1, 2)
ind$age_cat <- cut(ind$age, breaks = c(0, 50, 100))
seed <- table(ind[c("age_cat", "sex")])
library(mipfp) # install.packages("mipfp") is a prerequisite
res <- Ipfp(seed, descript, target)
res$x.hat # the result for the first zone
```
\pagebreak
Note that the output of the above command is only for the first zone.
It suggests that in zone 1, younger females are far more (nearly 4.5 times more)
numerous than would be expected based on the individual level data alone.
This makes sense because `con_age[1,]` contains many young people and `ind`
contains only one young female. At this stage you may be wondering: what just
happened? This is explained in detail in the subsequent chapters.
An alternative approach to arriving at a similar result, for all zones,
is demonstrated in the code chunk below using another package: **ipfp**.
The result of this code is illustated in Table 2.4.
Again, the reader is not expected to understand what has just happened.
This will be explained in subsequent chapters.
```{r}
A <- t(cbind(model.matrix(~ ind$age_cat - 1),
model.matrix(~ ind$sex - 1)[, c(2, 1)]))
cons <- apply(cons, 2, as.numeric) # convert to numeric data
library(ipfp) # install.packages("ipfp") is a prerequisite
weights <- apply(cons, 1, function(x) ipfp(x, A, x0 = rep(1, 5)))
weights[,1] # result for the first zone
```
Table: A 'weight matrix' linking the microdata (rows) to the zones (columns)
of SimpleWorld.
| Individual| Zone 1 | Zone 2 | Zone 3 |
|---:|----:|----:|-----:|
| 1| 1.23| 1.73| 0.725|
| 2| 1.23| 1.73| 0.725|
| 3| 3.54| 0.55| 1.550|
| 4| 1.54| 4.55| 2.550|
| 5| 4.46| 1.45| 5.450|
The highest value (5.450) is located, to use R's notation, in cell
`weights[5,3]`, the 5th row and 3rd column in the matrix `weights`. This means
that individual number 5 is considered to be highly representative of Zone 3,
given the input data in SimpleWorld. This makes sense because there are many
(7) young people and many (8) females in Zone 3, relative to the input
microdataset (which contains only 1 young female). The lowest value (0.55) is
found in cell `[3,2]`. Again this makes sense: individual 3 from the
microdataset is a young male yet there are only 2 young people and 4 males in
zone 2. A special feature of the weight matrix above is that each of the column
sums is equal to the total population in each zone.
Note that the first method gives weights to categories (for example young female),
whereas the second method gives weights to each individual. This is normal and
equivalent, we can easily transform one type of output in the other type. The details
are explained further.
We will discover different ways of generating such weight matrices
in subsequent chapters.
For now it is sufficient to know
that the matrices link individual level data to geographically aggregated
data and that there are multiple techniques to generate them. The techniques are sometimes
referred to as *reweighting algorithms* in the literature (Tanton et al. 2011).
These include deterministic methods such as Iterative Proportional
Fitting (IPF) and
probabilistic methods that rely on a pseudo-random number generator such as
simulated annealing (SA). These and other reweighting algorithms are described
in detail in Chapter 5.
## Spatial microdata {#spatial-microdata}
A useful output from spatial microsimulation is what we refer to as
*spatial microdata*. This is a dataset that contains a single row per individual
(as with the input microdata) but also an additional variable on geographical location.
The ideal spatial microdataset
selects individuals representative of the aggregate constraints for each zone,
while containing the diversity of information present in the individual level
non-spatial input population. In Chapter 4 we will explore all the steps needed
to produce a spatial microdataset for SimpleWorld. A subset of such spatial microdataset
is presented in Table 2.5, where each row represents an individual taken from the
individual level sample and the 'zone' column represents the
geographical zone in which they reside.
| id| zone| age|sex | income|
|--:|----:|---:|:---|------:|
| 1| 2| 59|m | 2868|
| 2| 2| 54|m | 2474|
| 4| 2| 73|f | 3152|
| 4| 2| 73|f | 3152|
| 4| 2| 73|f | 3152|
| 4| 2| 73|f | 3152|
| 5| 2| 49|f | 2473|
| 4| 2| 73|f | 3152|
| 5| 2| 49|f | 2473|
| 2| 2| 54|m | 2474|
Table: Spatial microdata generated for SimpleWorld zone 2.
Table 2.5 is a reasonable approximation of the inhabitants of zone 2: older
females dominate in both the aggregate (which contains 8 older people and 6
females) and the simulated spatial microdata (which contains 8 older people and
6 females). Note that in addition to the constraint variables, we also have
an estimate of the income distribution in SimpleWorld's second zone.
By the end of Chapter 5, you should learn how to generate
this table and understand each of the steps involved.
The remainder of this section
considers how the outputs of spatial microsimulation, in the context of
SimpleWorld, can be useful before progressing to the practicalities.
```{r, echo=FALSE}
# Add section link here!
```
## SimpleWorld in context {#SimpleWorldContext}
Even though the datasets are tiny in SimpleWorld, we have already generated some
useful output. We can estimate, for example, the average income in each zone.
Furthermore, we could create an estimate of the *distribution* of income in each
area. Although these estimates are unlikely to be very accurate due to the
paucity of data, the methods could be very useful if performed on larger
datasets from 'RealWorld' (planet Earth). The spatial microdata
presented in the above table could be used as an input into an agent-based model
(ABM). Assuming the inhabitants of SimpleWorld are more predictable than those
of RealWorld, the outputs from such a model could be very useful indeed, for
example for predicting future outcomes of current patterns of behaviour.
In addition to clarifying the advantages of spatial microsimulation, the above
example also flags some limitations of the methodology. Spatial microsimulation
will only yield useful results if the input datasets are representative of
the population as a whole, and for each region. If the relationship between age and
sex is markedly different in one zone compared with what we assume to be the
global averages of the input data, for example, our estimates could be way off
Using such a small sample, one could rightly argue, how could the diversity of
33 inhabitants of SimpleWorld be represented by our simulated spatial microdata?
This question is equally applicable to larger simulations. These issues are
important and will be tackled in
[](#svalidation).
## Chapter summary
In this chapter we have learned how to setup the R/RStudio environment for running
the example code provided in this book. This involved installing the software,
understanding projects in RStudio and setting up the `spatial-microsim-book`
project so that data can be accessed easily.
This was the first practical chapter and it involved basic commands for loading
data and creating a weight matrix. The process of running and playing with the
code should have helped get acquainted with the 'workflow' associated with
developing and running spatial microsimulation models in R.
Discussion of the results, which represent the inhabitants of the imaginary
system *SimpleWorld*, create the backdrop for the rest of the book, which
we will return to in Chapter 12.