A SUGGESTED WORKFLOW FOR MASS CYTOMETRY ANALYSIS

 
 

 

PRE-PROCESSING YOUR MASS CYTOMETRY DATA

The raw FCS files generated by your mass cytometer must be pre-processed prior to high dimensional analysis.

Files must be first normalized, debarcoded (if applicable), and then gated for live singlet events.

 

Analyzing your MASS CYTOMETRY DATA with algorithms

  • Normalized, debarcoded, live, singlet events can then be analyzed by a variety of machine learning algorithms.

  • We have created a youtube playlist: An Introduction to High Dimensional Data Analysis for Bench Scientists which contains a variety of useful videos for complete beginners to understand high dimensional data analysis.

  • It is important to note that there are two types of machine learning algorithms used in the field: clustering and dimensionality reduction. Within these categories there are many different options which vary in mathematical methods.

  • Finding the right algorithm for your dataset, computational skill, and analysis needs is the first step to any successful high dimensional analysis.

  • In a popular publication we compared five of the most common algorithms: viSNE (tSNE), SPADE, PhenoGraph, X-shift, and Citrus. We provided detailed methods and outlined important considerations for new users.

 
Kimball, AK … ET Clambey, J Immunol 2018

Kimball, AK … ET Clambey, J Immunol 2018

 

An overview of each algorithm discussed in this publication can be found below:


using the Phenograph algorithm in cytofkit

  • PhenoGraph is a clustering algorithm that stratifies cellular events into subpopulations (clusters) based on the similarity in expression of selected markers. This allows users to rapidly define cellular phenotypes and quantify differences in cellular abundance and expression between individuals, groups, conditions, etc.

  • Utilizing the Cytofkit package in R for analysis with the PhenoGraph algorithm is ideal for users with limited computational knowledge. The built in Shiny app empowers users to directly interact with their results.

  • PhenoGraph is the preferred clustering algorithm of the Clambey laboratory. We have utilized it for the effective analysis of samples from humans, mice, and cell lines and on a variety of tissue types.

 
 

using the X-shift algrothim in Vortex

  • The X-shift algorithm is a clustering algorithm that utilizes the data to construct a weighted k-nearest-neighbor density estimation (kNN-DE) graph. After the graph is constructed the cell event density is used to partition the data into clusters. X-shift calculates the impact of different numbers of nearest neighbors on the number of clusters discovered and produces 30+ analysis iterations. These iterations are taken into account to determine the optimal number of clusters to prevent the underclustering or overfragmentation of the data via the “elbow method”.

  • Within the VorteX Clustering Environment users can visualize their data by a number of methods (a three dimensional PCA plot, force directed layout, etc.).

  • Although X-shift/VorteX is a very powerful tool for analysis, it does require more advanced computational skills.

  • The Clambey laboratory has used this algorithm in publications for the analysis of CyTOF data as well as RNA flow cytometric data.

 
 

using tSNE, hsne, and umap algorithms

  • tSNE, HSNE, and UMAP are all different dimensionality reduction algorithms. They differ in mathematical methods, but for practical purposes they produce slightly different visualizations and vary in computational power (event number and analysis time).

  • These algorithms are extremely popular and can be found in a variety of programs: Cytobank (viSNE), Cytosplore, Cytofkit2, and FlowJo.

 
 

In order to rapidly compare a variety of algorithms and programs we utilize a common dataset from Kimball, AK … ET Clambey, J Immunol 2018. Here are ten popular platforms we have interrogated to date, emphasizing how different algorithms and platforms can result in profoundly different data visualizations, and interpretations. When possible, we would recommend comparing results across multiple algorithms to allow for a robust analysis of these complex data.

 
10 algorithms colored by CD45 final version-1.png
 

Still Confused about what algorithm is best for you?


OTHER ALGORITHMS AND ANALYSIS PLATFORMS FOR MASS CYTOMETRY DATA

  • If you’re feeling brave, or are an advanced user here are some newer/more computationally advanced algorithms and tools:


What’s your favorite algorithm?