Clustree

Author

Ricardo Martins-Ferreira

Optimal clustering resolution with clustree

The following script identifies the optimal clustering resolution to be used in the Seurat workflow using the clustree package (https://github.com/lazappi/clustree).

Load required packages

libs <- c("Seurat", "tidyverse", "clustree")
suppressMessages(
suppressWarnings(sapply(libs, require, character.only =TRUE))
)
   Seurat tidyverse  clustree 
     TRUE      TRUE      TRUE 

Integrated immune cell object

The full integrated object composed of 102,390 nuclei/cells was uploaded.

Optimal clustering resolution of the final HuMicA object

The Human Microglia Atlas (HuMicA) consists of the myeloid population (cluster 0) identified above.

Humica <- subset(x = Seurat, cells = WhichCells(Seurat, expression = integrated_snn_res.0.025=="0"))
row.names(Humica@meta.data)<- Humica@meta.data$TAG
Humica
An object of class Seurat 
101917 features across 93055 samples within 3 assays 
Active assay: RNA (61388 features, 0 variable features)
 2 layers present: counts, data
 2 other assays present: SCT, integrated
 2 dimensional reductions calculated: pca, umap

In addition, the nuclei/cells belonging to individual samples with less than 50 nuclei/cells were removed.

Humica_samples <- table(Humica@meta.data$Sample_ID) %>% as.data.frame() #329 individual samples

toRemove <- Humica@meta.data$TAG[Humica@meta.data$Sample_ID %in% c(Humica_samples[Humica_samples$Freq<50,] %>% pull(Var1))] 

Humica <- Humica[,!colnames(Humica) %in% toRemove] 
row.names(Humica@meta.data)<- Humica@meta.data$TAG
Humica
An object of class Seurat 
101917 features across 90716 samples within 3 assays 
Active assay: RNA (61388 features, 0 variable features)
 2 layers present: counts, data
 2 other assays present: SCT, integrated
 2 dimensional reductions calculated: pca, umap

The final HuMicA object is composed of 90,716 nuclei/cells and 241 individual samples (Sample_ID).

Repeat the Seurat dimensionality reduction and clustering workflow as before.

DefaultAssay(Humica)<- "integrated"
Humica <- RunPCA(Humica, verbose = FALSE)
Humica <- RunUMAP(Humica, reduction = "pca", dims = 1:50)
Humica <- FindNeighbors(Humica,  reduction = "pca",dims = 1:50)

Clustering with increasing resolutions.

Humica <- FindClusters(Humica,resolution=0.01)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9900
Number of communities: 6
Elapsed time: 34 seconds
Humica <- FindClusters(Humica,resolution=0.025)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9750
Number of communities: 6
Elapsed time: 40 seconds
Humica <- FindClusters(Humica,resolution=0.05)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9533
Number of communities: 7
Elapsed time: 35 seconds
Humica <- FindClusters(Humica,resolution=0.075)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9404
Number of communities: 9
Elapsed time: 49 seconds
Humica <- FindClusters(Humica,resolution=0.1)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9334
Number of communities: 11
Elapsed time: 43 seconds
Humica <- FindClusters(Humica,resolution=0.15)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9221
Number of communities: 12
Elapsed time: 46 seconds
Humica <- FindClusters(Humica,resolution=0.2)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9128
Number of communities: 14
Elapsed time: 46 seconds
Humica <- FindClusters(Humica,resolution=0.25)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9043
Number of communities: 16
Elapsed time: 53 seconds
Humica <- FindClusters(Humica,resolution=0.3)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8993
Number of communities: 17
Elapsed time: 47 seconds
Humica <- FindClusters(Humica,resolution=0.35)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8944
Number of communities: 19
Elapsed time: 43 seconds
Humica <- FindClusters(Humica,resolution=0.4)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8896
Number of communities: 19
Elapsed time: 51 seconds
Humica <- FindClusters(Humica,resolution=0.45)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8849
Number of communities: 21
Elapsed time: 46 seconds
Humica <- FindClusters(Humica,resolution=0.5)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8814
Number of communities: 19
Elapsed time: 40 seconds
Humica <- FindClusters(Humica,resolution=0.55)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8778
Number of communities: 22
Elapsed time: 48 seconds
Humica <- FindClusters(Humica,resolution=0.6)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8737
Number of communities: 22
Elapsed time: 49 seconds
Humica <- FindClusters(Humica,resolution=0.65)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8702
Number of communities: 23
Elapsed time: 47 seconds
Humica <- FindClusters(Humica,resolution=0.7)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8669
Number of communities: 24
Elapsed time: 46 seconds
Humica <- FindClusters(Humica,resolution=0.75)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8635
Number of communities: 25
Elapsed time: 50 seconds
Humica <- FindClusters(Humica,resolution=0.8)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8599
Number of communities: 26
Elapsed time: 43 seconds
Humica <- FindClusters(Humica,resolution=0.85)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8574
Number of communities: 26
Elapsed time: 46 seconds
Humica <- FindClusters(Humica,resolution=0.9)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8542
Number of communities: 25
Elapsed time: 53 seconds
Humica <- FindClusters(Humica,resolution=0.95)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8513
Number of communities: 29
Elapsed time: 48 seconds
Humica <- FindClusters(Humica,resolution=1)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 90716
Number of edges: 5570085

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8486
Number of communities: 30
Elapsed time: 53 seconds

Plotting a clustree:

clustree(Humica@meta.data, prefix = "integrated_snn_res.")

In a clustree, a high degree of cell exchange between nodes is indicative of overclustering. We considered resolution = 0.2 to represent the optimal point in terms of the balance in cluster size and low rate of internodal cell exchange.