<- c("Seurat", "tidyverse", "clustree")
libs suppressMessages(
suppressWarnings(sapply(libs, require, character.only =TRUE))
)
Seurat tidyverse clustree
TRUE TRUE TRUE
Ricardo Martins-Ferreira
The following script identifies the optimal clustering resolution to be used in the Seurat workflow using the clustree package (https://github.com/lazappi/clustree).
The full integrated object composed of 102,390 nuclei/cells was uploaded.
Upon integration of the immune cell clusters from all nineteen datasets, we expected to encounter T cells and potentially doublets that passed previous filtering. The standard Seurat dimensionality reduction and clustering workflow were used.
Clustering was performed with increasing resolutions before using the clustree function.
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 102390
Number of edges: 6228683
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9901
Number of communities: 9
Elapsed time: 56 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 102390
Number of edges: 6228683
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9764
Number of communities: 10
Elapsed time: 60 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 102390
Number of edges: 6228683
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9600
Number of communities: 11
Elapsed time: 56 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 102390
Number of edges: 6228683
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9461
Number of communities: 12
Elapsed time: 63 seconds
Plotting a clustree:
To characterize the clusters obtained, the expression of canonical cell-type markers was evaluated for the low resolutions (0.01 and 0.025). To do so, DefaultAssay is set to “RNA” and normalized.
DefaultAssay(Seurat)<-"RNA"
Seurat <- NormalizeData(Seurat)
# For resolution = 0.01
DotPlot(Seurat, group.by= "integrated_snn_res.0.01",
features = c("P2RY12","CX3CR1", # microglia
"MRC1","CD163", # macrophages
"CD247","TRAC")) + # T cells
scale_colour_gradient2(low = "darkblue", mid = "white", high = "darkred")+
theme(axis.text.x = element_text(angle=90, hjust = 0))
Warning: Scaling data with a low number of groups may produce misleading
results
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
# For resolution = 0.025
DotPlot(Seurat, group.by= "integrated_snn_res.0.025", features = c("P2RY12","CX3CR1", # microglia
"MRC1","CD163", # macrophages
"CD247","TRAC", # T cells
"SNAP25","SYT1", # Neurons
"PLP1","ST18", # Oligodendrocytes
"SLC1A2", # Astrocytes
"PDGFRA")) + # OPCs
scale_colour_gradient2(low = "darkblue", mid = "white", high = "darkred")+
theme(axis.text.x = element_text(angle=90, hjust = 0))
Warning: Scaling data with a low number of groups may produce misleading
results
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Plot the UMAP of the three main cell-types. Based on the expression of the aforementioned markers, we have concluded that the three populations obtained for resolution = 0.025 represent the main myeloid population (cluster 0), that included microglia and border-associated macrophages, doublets (cluster 1) and T cells (cluster 2).
The Human Microglia Atlas (HuMicA) consists of the myeloid population (cluster 0) identified above.
Humica <- subset(x = Seurat, cells = WhichCells(Seurat, expression = integrated_snn_res.0.025=="0"))
row.names(Humica@meta.data)<- Humica@meta.data$TAG
Humica
An object of class Seurat
101917 features across 93055 samples within 3 assays
Active assay: RNA (61388 features, 0 variable features)
2 layers present: counts, data
2 other assays present: SCT, integrated
2 dimensional reductions calculated: pca, umap
In addition, the nuclei/cells belonging to individual samples with less than 50 nuclei/cells were removed.
Humica_samples <- table(Humica@meta.data$Sample_ID) %>% as.data.frame() #329 individual samples
toRemove <- Humica@meta.data$TAG[Humica@meta.data$Sample_ID %in% c(Humica_samples[Humica_samples$Freq<50,] %>% pull(Var1))]
Humica <- Humica[,!colnames(Humica) %in% toRemove]
row.names(Humica@meta.data)<- Humica@meta.data$TAG
Humica
An object of class Seurat
101917 features across 90716 samples within 3 assays
Active assay: RNA (61388 features, 0 variable features)
2 layers present: counts, data
2 other assays present: SCT, integrated
2 dimensional reductions calculated: pca, umap
The final HuMicA object is composed of 90,716 nuclei/cells and 241 individual samples (Sample_ID).
Repeat the Seurat dimensionality reduction and clustering workflow as before.
Clustering with increasing resolutions.
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9900
Number of communities: 6
Elapsed time: 34 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9750
Number of communities: 6
Elapsed time: 40 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9533
Number of communities: 7
Elapsed time: 35 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9404
Number of communities: 9
Elapsed time: 49 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9334
Number of communities: 11
Elapsed time: 43 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9221
Number of communities: 12
Elapsed time: 46 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9128
Number of communities: 14
Elapsed time: 46 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9043
Number of communities: 16
Elapsed time: 53 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8993
Number of communities: 17
Elapsed time: 47 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8944
Number of communities: 19
Elapsed time: 43 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8896
Number of communities: 19
Elapsed time: 51 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8849
Number of communities: 21
Elapsed time: 46 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8814
Number of communities: 19
Elapsed time: 40 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8778
Number of communities: 22
Elapsed time: 48 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8737
Number of communities: 22
Elapsed time: 49 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8702
Number of communities: 23
Elapsed time: 47 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8669
Number of communities: 24
Elapsed time: 46 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8635
Number of communities: 25
Elapsed time: 50 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8599
Number of communities: 26
Elapsed time: 43 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8574
Number of communities: 26
Elapsed time: 46 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8542
Number of communities: 25
Elapsed time: 53 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8513
Number of communities: 29
Elapsed time: 48 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 90716
Number of edges: 5570085
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8486
Number of communities: 30
Elapsed time: 53 seconds
Plotting a clustree:
In a clustree, a high degree of cell exchange between nodes is indicative of overclustering. We considered resolution = 0.2 to represent the optimal point in terms of the balance in cluster size and low rate of internodal cell exchange.