Skip to content

Commit

Permalink
update session 1
Browse files Browse the repository at this point in the history
  • Loading branch information
stemangiola committed May 20, 2024
1 parent 9c48a0c commit 4a02cba
Showing 1 changed file with 58 additions and 39 deletions.
97 changes: 58 additions & 39 deletions vignettes/Session_1_sequencing_assays.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ Understanding the spatial arrangement of your data is fundamental. We'll demonst

We will use `ggpavis` package to visualise the data.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
# image data
imgData(spatial_data)
Expand All @@ -207,7 +207,7 @@ Enhance the visualisation by annotating the plots to provide more context within

Layers = L1-6, white matter = WM

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
# plot spots with annotation
ggspavis::plotSpots(
Expand All @@ -219,13 +219,13 @@ ggspavis::plotSpots(

Explore additional visualisation features offered by the Visium platform, exposing the H&E (hematoxylin and eosin) image.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
ggspavis::plotVisium(spatial_data)
```

This visualisation focuses on specific tissue features within the dataset, emphasising areas of interest.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
ggspavis::plotVisium(
spatial_data,
annotate = "spatialLIBD",
Expand Down Expand Up @@ -285,7 +285,7 @@ table(qc_mitochondrial_transcription)

After applying the QC metrics, it’s crucial to visually assess their impact. This step involves checking the spatial pattern of the spots removed based on high mitochondrial transcription, helping us understand the distribution and quality of the remaining dataset.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
## Add threshold in colData
colData(spatial_data)$qc_mitochondrial_transcription <- qc_mitochondrial_transcription
Expand All @@ -303,7 +303,7 @@ plotSpotQC(

This analysis focuses on examining the distribution of library sizes across different spots. It uses a histogram and density plot to visualise the range and commonality of library sizes in the dataset.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
## Density and histogram of library sizes
data.frame(colData(spatial_data)) |>
ggplot(aes(x = sum)) +
Expand All @@ -327,7 +327,7 @@ table(qc_total_counts)

Incorporating Library Size Threshold in Dataset: This step involves adding the library size threshold to the dataset’s metadata and examining the spatial pattern of the spots that have been removed based on this criterion.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
## Add threshold in colData
colData(spatial_data)$qc_total_counts <- qc_total_counts
Expand All @@ -347,7 +347,7 @@ plotSpotQC(

This analysis examines how many genes are expressed per spot, using a histogram and density plot to visualise the distribution of gene counts across the dataset.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
## Density and histogram of library sizes
data.frame(colData(spatial_data) ) |>
ggplot(aes(x = detected)) +
Expand All @@ -371,7 +371,7 @@ table(qc_detected_genes)

Incorporating Gene Expression Threshold in Dataset: After setting the gene expression threshold, it is added to the dataset's metadata. The spatial pattern of spots removed based on this threshold is then examined.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
## Add threshold in colData
colData(spatial_data)$qc_detected_genes <- qc_detected_genes
Expand All @@ -387,7 +387,7 @@ plotSpotQC(

Exploring the Relationship Between Library Size and Number of Genes Detected: This analysis explores the correlation between library size and the number of genes detected in each spot, providing insights into data quality and sequencing depth.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
## Density and histogram of library sizes
data.frame(colData(spatial_data)) |>
ggplot(aes(sum, detected)) +
Expand All @@ -403,7 +403,7 @@ data.frame(colData(spatial_data)) |>

After applying all QC filters, this block combines them and stores the results in the dataset. The spatial patterns of all discarded spots are then reviewed to ensure comprehensive quality control.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
## Store the set in the object
colData(spatial_data)$discard <- qc_total_counts | qc_detected_genes | qc_mitochondrial_transcription
Expand All @@ -430,7 +430,7 @@ Dimensionality reduction is essential in spatial transcriptomics due to the high

#### Variable gene identification

```{r, message=FALSE, warning=FALSE, fig.width=7, fig.height=8}
```{r, message=FALSE, warning=FALSE, fig.width=3, fig.height=3}
genes <- !grepl(pattern = "^Rp[l|s]|Mt", x = rownames(spatial_data))
dec = scran::modelGeneVar(spatial_data, subset.row = genes)
Expand All @@ -451,7 +451,7 @@ rowData(spatial_data[head(hvg),])[,c("gene_id", "gene_name")]

With the highly variable genes, we perform PCA to reduce dimensionality, followed by UMAP to visualise the data in a lower-dimensional space, enhancing our ability to observe clustering and patterns in the data.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
spatial_data <-
spatial_data |>
scuttle::logNormCounts() |>
Expand All @@ -463,9 +463,9 @@ reducedDimNames(spatial_data)
reducedDim(spatial_data, "PCA")[1:5, 1:5]
```

:::: note
::: note
As for single-cell data, we need to verify that there is not significant batch effect. If so we need to adjust for it (a.k.a. integration) before calculating principal component. Many adjustment methods to output adjusted principal components directly.
::::
:::


#### UMAP
Expand All @@ -474,22 +474,22 @@ You can appreciate that, in this case, selecting within-sample variable genes, w

We can appreciate that there are no major batch effects across samples, and we don't see grouping driven by sample_id.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
set.seed(42)
spatial_data <- scater::runUMAP(spatial_data, dimred = "PCA")
scater::plotUMAP(spatial_data, colour_by = "sample_id", point_size = 0.2)
```

:::: note
::: note
**Exercise 1.1**

Visualise where the two macro clusters are located spatially. We will take a very pragmatic approach and get cluster label from splitting the UMAP coordinated in two (`colData()` and `reducedDim()` will help us, see above), and then visualise it with `ggspavis`.

- Modify the `SpatialExperiment` object based on the UMAP1 dimension so to divide those 2 cluster
- Visualise the UMAP colouring by the new labelling
- Visualise the Visium slide colouring by the new labelling
::::
:::

### 7. Clustering

Expand Down Expand Up @@ -522,15 +522,15 @@ Applying Clustering Labels and Visualising Results: After determining clusters,

We can appreciate here that we get two main pixel clusters.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
colLabels(spatial_data) <- factor(clus)
scater::plotUMAP(spatial_data, colour_by = "label") + scale_color_brewer(palette = "Paired")
```

Those two clusters group the white matter from the rest of the layers.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
## Plot in tissue map
ggspavis::plotSpots(spatial_data, annotate = "label") +
facet_wrap(~sample_id) +
Expand All @@ -539,7 +539,7 @@ ggspavis::plotSpots(spatial_data, annotate = "label") +

As for comparison, we show the manually annotated regions. We can see that while the single cell style clustering catchers, the overall tissue, architecture, a lot of details are not retrieved. We clusters cannot faithfully recapitulate the tissue morphology. However, they might represent specific cell types within morphological regions.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
## Plot ground truth in tissue map
ggspavis::plotSpots(spatial_data, annotate = "spatialLIBD") +
facet_wrap(~sample_id) +
Expand All @@ -551,7 +551,7 @@ ggspavis::plotSpots(spatial_data, annotate = "spatialLIBD") +

To cluster spatial regions (i.e. tissue domain) rather than single-cell types, the clustering algorithms need to take spatial context into account. For example what is the transcriptional profile of the neighbouring pixels or neighbouring cells.

BANKSY combines molecular and spatial information. BANKSY leverages the fact that a cell’s state can be more fully represented by considering both its own transcriptome "nd that of its local m"croenvironment.This algorithm embeds cells within a combined space that incorporates their own transcriptome and that of their locell'svironment, representing both the cell state and the surrounding microenvironment.
BANKSY combines molecular and spatial information. BANKSY leverages the fact that a cell’s state can be more fully represented by considering both its own transcriptome "nd that of its local microenvironment.This algorithm embeds cells within a combined space that incorporates their own transcriptome and that of their locell'svironment, representing both the cell state and the surrounding microenvironment.

Overview of the algorithm

Expand Down Expand Up @@ -631,9 +631,9 @@ spe_joint <- do.call(cbind, spatial_data_list)

Here, we perform PCA using the BANKSY algorithm on the joint dataset. The group argument specifies how to treat different samples, ensuring that features are scaled separately per sample group to account for variations among them.

:::: note
::: note
Note: this step takes long time
::::
:::

```{r, eval=TRUE, message=FALSE, warning=FALSE}
spe_joint <- runBanksyPCA( # Run PCA on the Banskly matrix
Expand Down Expand Up @@ -706,7 +706,7 @@ ari

Visualising Clusters and Annotations on Spatial Coordinates: We utilise the ggspavis package to visually map BANKSY clusters and manual annotations onto the spatial coordinates of the dataset, providing a comprehensive visual overview of spatial and clustering data relationships.

```{r multi-sample-spatial, eval=TRUE, fig.width=7, fig.height=8}
```{r multi-sample-spatial, eval=TRUE, fig.width=3, fig.height=3}
# Use scater:::.get_palette('tableau10medium')
library(cowplot)
Expand All @@ -724,23 +724,31 @@ pal <- c(
theme(legend.position = "none") +
labs(title = "BANKSY clusters")
ggspavis::plotSpots(
do.call(cbind, spatial_data_list),
annotate = sprintf("%s", "clust_M0_lam0.2_k50_res0.7"),
pal = pal
) +
facet_wrap(~sample_id) +
theme(legend.position = "none") +
labs(title = "BANKSY clusters")
ggspavis::plotSpots(spatial_data, annotate = "spatialLIBD") +
facet_wrap(~sample_id) +
scale_color_manual(values = libd_layer_colors) +
theme(legend.position = "none") +
labs(title = "spatialLIBD regions")
```

:::: note
::: note
**Exercise 1.2**

We have applied cluster smoothing using `smoothLabels`. How much do you think this operation has affected the cluster labels. To find out,

- Plot the non smoothed cluster
- identify the pixel that have been smoothed, and
- visualise them using `plotSpotQC` that we have used above.
::::
:::

### 8. Deconvolution of pixel-based spatial data

Expand All @@ -762,7 +770,7 @@ Mangiola et al., 2024 doi [doi.org/10.1101/2023.06.08.542671](https://www.biorxi
knitr::include_graphics(here("inst/images/curated_atlas_query.png"))
```

```{r, message=FALSE, warning=FALSE, fig.width=7, fig.height=8}
```{r, message=FALSE, warning=FALSE, fig.width=3, fig.height=3}
# Get reference
library(CuratedAtlasQueryR)
library(HDF5Array)
Expand Down Expand Up @@ -879,7 +887,7 @@ head(mgs_df)

We now utilise `SPOTlight` to deconvolve tisslet'smposition from our independent mouse brain reference. The result is visualised through a scatter pie plot, which provides a graphical representation of the spatial distribution of cell types within a let'sfic sample. This visualisation aids in understanding the spatial heterogeneity.

```{r, fig.width=7, fig.height=8, message=FALSE, warning=FALSE}
```{r, fig.width=3, fig.height=3, message=FALSE, warning=FALSE}
library(SPOTlight)
res <- SPOTlight(
Expand Down Expand Up @@ -908,7 +916,7 @@ plotSpatialScatterpie(

Let's visualise without pit'syte_cell and endothelial cells, which oclet'sa lot of the visual spectrum.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
plotSpatialScatterpie(
x = spatial_data_gene_name[,cell_first_sample],
Expand All @@ -923,7 +931,7 @@ plotSpatialScatterpie(

Let's also exclude without muscle_cell, which occupy a lot of the visual spectrum.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
plotSpatialScatterpie(
x = spatial_data_gene_name[,cell_first_sample],
Expand All @@ -938,7 +946,7 @@ plotSpatialScatterpie(

No, let's look at the correlation matrices to see which cell type are most often occurring rather than mutually exclusive within our data set.

```{r, fig.width=7, fig.height=8}
```{r, fig.width=3, fig.height=3}
plotCorrelationMatrix(res$mat)
```
Expand All @@ -947,21 +955,32 @@ plotCorrelationMatrix(res$mat)



:::: note
::: note
**Exercise 1.3**

Rather than looking at the correlation matrix, overall, let's observe whether the correlation structure amongst cell types is consistent across samples. Do you think it's consistent or noticeably different?
::::
:::


:::: note
```{r}
lapply(res_spatialLIBD, function(x) plotCorrelationMatrix(as.matrix(x[,-10])))
```

::: note
**Exercise 1.4**

Now let's observe whether the correlation structure is consistent across spatial regions, irrespectively of the sample of origin. Do you think they are consistent or noticably different?
::::
:::

```{r}
res_spatialLIBD = split(data.frame(res$mat), colData(spatial_data_gene_name)$spatialLIBD )
lapply(res_spatialLIBD, function(x) plotCorrelationMatrix(as.matrix(x[,-10])))
```

:::: note
::: note
**Exercise 1.5**

Some of the most positive correlations involve the end of cells with Oligodendrocytes and Leptomeningeal cells.
Expand All @@ -971,7 +990,7 @@ Leptomeningeal cells refer to the cells that make up the leptomeninges, which co
Oligodendrocytes are a type of glial cell in the central nervous system (CNS) of vertebrates, including humans and mouse. These cells are crucial for the formation and maintenance of the myelin sheath, a fatty layer that encases the axons of many neurons.

Let's try to visualise the pixel where these cell types most occur.
::::
:::


**Session Information**
Expand Down

0 comments on commit 4a02cba

Please sign in to comment.