Alzheimer DataLENS

Open data analytics portal to advance Alzheimer’s disease research by enabling the analysis, visualization, and sharing of -omics data.

Single Cell Transcriptomics

Explore gene expression profiles across different cell-types.

Bulk Transcriptomics

Explore gene expression profiles across different brain regions.

Genetics

Explore genome-wide association studies through gene queries.

DataLENS applies consistent pipelines to process and analyze public -omics data, provides easy-to-use web interfaces to query and visualize these analyses, and uses information from multiple heterogenous modalities to present an integrated view of molecular mechanisms to a neuroscientist.

About

Alzheimer DataLENS is an open-data-analytic-platform that aims to advance research in Alzheimer’s disease (AD) and related dementias by making –omics data accessible to everyday researchers through:

  • Consistent pipelines to process and analyze public -omics data from AMP-AD and other sources.
  • Easy-to-use web interfaces for query and visualization of these analytics.
  • Information from multiple heterogenous modalities to present an integrated view of molecular mechanisms to a neuroscientist.
  • Tools and methods open to all bioinformatics researchers.

Alzheimer DataLENS allows exploration of the following types of data:

  • Single-cell transcriptomics studies, including cell and sample-level queries of public datasets.
  • Bulk transcriptomics studies, including query and visualization of public human datasets spanning multiple brain regions and cohorts.
  • GWAS studies, including query and visualization of IGAP meta-analysis and AMP-AD GWAS results.

Alzheimer DataLENS was initiated by the Massachusetts Center for Alzheimer Therapeutics Science (massCATS), which is a public-private partnership to discover new treatments for Alzheimer's disease, organized through the Massachusetts Life Sciences Center. Leading academic researchers from the Massachusetts General Hospital, Broad Institute, Harvard Medical School, and MIT are working with healthcare and pharmaceutical partners to find new techniques, mechanisms and drug targets in the fight against Alzheimer's – a disease affecting 40 million people worldwide for which there is currently no cure.

Alzheimer DataLENS is also supported by IOS Press, which publishes the Journal of Alzheimer’s Disease (JAD).

How to Use AlzDataLENS

Transcriptomics: Transcriptomics is the analysis of gene expression data. The Transcriptomics Menu allows users to query and visualize Bulk & Single Cell Transcriptomics Data.

  • Single Cell Transcriptomics:
    • Transcriptomics >> Single Nucleus >> Aggregate Analysis: Aggregate Analysis allows users to visualize average gene expression using both bubble plots and heatmaps across various available factors, including cell types, subclusters, and AD disease/pathology

      (Watch Screencast)

    • Transcriptomics >> Single Nucleus >> Cell Level Analysis: Cell Level Analysis allows users to visualize and explore cell proportions and cell-level information using dimensionality reduction plots

      (Watch Screencast)

  • Bulk Expression:
    • Transcriptomics >> Bulk Expression >> Network Plot: Aggregate Analysis allows users to explore relationships between genes of interest, visualized as edges in a graph, based on the STRING database of known and predicted protein-protein interactions

      (Watch Screencast)

    • Transcriptomics >> Bulk Expression >> Regional Expression: Regional Expression allows users to explore and visualize transcriptomic datasets for brain regions of interest, with accession codes (e.g., GEO and Synapse IDs) available for downstream query and retrieval

      (Watch Screencast)

    • Transcriptomics >> Bulk Expression >> Differential Gene Expression: Differential gene expression results across various covariates for a given study can be queried using a list of gene symbols

      (Watch Screencast)

    • Transcriptomics >> Bulk Expression >> Box Plots/Heatmaps: Users can create box plots and heatmaps of gene expression across sex, APOE genotype, Braak neurofibrillary tangle stage, CERAD neuritic plaque score, and/or diagnosis (e.g., AD, progressive supranuclear palsy, pathologic aging, or elderly controls)

      (Watch Screencast)

Genetics: Genetics is the analysis of genetic data. The Genetics menu allows users to visualize genetics data through the integration of two GWAS datasets: the International Genomics of Alzheimer's Project (IGAP) meta-analysis [31] and the Accelerating Medicines Partnership Program for Alzheimer's Disease (AMP-AD) GWAS results.

  • Genetics >> GWAS: GWAS analysis allows users to query and visualize these GWAS datasets using either gene or single nucleotide polymorphism (SNP) identifiers

    (Watch Screencast)

  • Manhattan Plot: The Manhattan Plot allows users to visualize associations between genetic variants (SNPs) and the disease (in this case, AD) across the entire genome

Datasets


Bulk Transcriptomics



Single Cell Transcriptomics


Bulk Transcriptomics FAQ

ACRONYMS

   AD Alzheimer's Disease (dementia, brain lesions suggestive of AD)
   B1, B2, B3 Braak: B1 = 0/I/II, B2 = III/IV, B3 = V/VI
   C0, C1, C2, C3 CERAD: C0 = None, C1 = Sparse, C2 = Moderate, C3 = Frequent
   DNAD Dementia, Not AD (dementia without AD brain lesions)
   HD Huntington's Disease
   MCI Mild Cognitive Impairment
   NCI No Cognitive Impairment (no dementia, no brain lesions suggestive of AD)
   PA Pathlogic Aging*
   PC Preclinical AD (No dementia, brain lesions suggestive of AD)
   PSP Progressive Supranuclear Palsy
   CPM Counts Per Million (CPM) mapped reads
   FPKM Fragments Per Kilobase of transcript per Million mapped reads
   RC Raw Counts
   uArray Microarray
   logFC Log2 Fold Change of the Case vs Control that are defined in the Contrast
   AveEXpr Average Expression of the gene
   PValue P-value to test for significance of differential expression of the gene
   adjPVal P-value adjusted for multiple comparisons
   Gene Symbol Official NCBI gene symbol
   EntrezID Official NCBI Gene ID
   

* Mayo RNAseq Study: Subjects with PA had Braak NFT stage of III or less, but had CERAD neuritic and cortical plaque densities of 2 or more. None of the PA subjects had a clinical diagnosis of dementia or mild cognitive impairment. None of the PA subjects had the following pathologic diagnoses: AD, Parkinson’s disease (PD), DLB, VaD, PSP, motor neuron disease (MND), CBD, Pick’s disease (PiD), Huntington’s disease (HD), FTLD, hippocampal sclerosis (HipScl), or dementia lacking distinctive histology (DLDH).

Differential Expression Analysis

The differential expression analyses were conducted amongst the following groups (Stratification Factors). The contrast describes the comparisons of the two groups in the differential gene expression analysis. Processed microarray and RNA-Seq data were downloaded from the AMP-AD knowledge portal. All RNA-Seq Fragments Per Kilobase of transcript per Million mapped reads data were log transformed. RNA-Seq raw counts were normalized and transformed using the R edgeR and voom packages to prepare for linear modeling. Differential expression analysis was performed using the limma package in R. The false discovery rate for multiple comparisons was adjusted using the Benjamini-Hochberg method. We analyzed the data using Braak Stage, CERAD Neuritic Plaque Score, Clinical Dementia Rating (CDR), and a Combined Neuropathological and Clinical Score.

Braak Stage

A neuropathological score assessing the distribution of tau neurofibrillary tangles in the subject's brain ( Braak and Braak, 1991 )

  • B1: stages 0 (normal), I/II (hippocampal)
  • B2: stages III/IV (entorhinal)
  • B3: stages V/VI (neocortical)

CERAD Neuritic Plaque Score

A neuropathological score assessing the frequency of beta-amyloid neuritic plaques in the subject's brain ( Mirra et al., 1991 )

  • C0: None / Not AD
  • C1: Sparse / Possible AD
  • C2: Moderate / Probable AD
  • C3: Frequent / Definite AD

Clinical Dementia Rating (CDR)

A clinical score for dementia ( Morris, 1993 ; Balsis et al., 2015 )

  • NCI: No cognitive impairment CDR <= 0.5
  • AD: Alzheimer's disease CDR > 0.5

Combined Neuropathological and Clinical Score: Composite Diagnosis (CpDx)

NPScore:

First, a combined neuropathological score (NPScore) is derived from the Braak and CERAD scores of a given subject. The rationale is based on the following articles:
Hyman and Trojanowski, 1997
Hyman et al., 2012
Serrano-Pozo et al., 2016

Briefly, the latest recommendations from the National Institute of Aging - Alzheimer's Association ( Hyman et al., 2012 ) includes Braak, CERAD, and the Thal phases for the neuropathological assessment of AD. However, this score was shown not to be significantly associated with cognition by Serrano-Pozo et al. ( Serrano-Pozo et al., 2016 ). Also, this score is not available as a covariate for any dataset, therefore only Braak and CERAD are used.


The NPScore determination is adapted from the 1997 ( Hyman and Trojanowski,1997 ) and 2012 ( Hyman et al., 2012 ) NIA recommendations.

  • Braak: B1 = 0/I/II, B2 = III/IV, B3 = V/VI
  • CERAD: C0 = None, C1 = Sparse, C2 = Moderate, C3 = Frequent

Neuropathological Score (NPScore)


  • 1 = Not AD
  • 2 = Low probability of AD
  • 3 = Intermediate probability of AD
  • 4 = High probability of AD

Composite Diagnosis (CpDx)

The final composite diagnosis (CpDx) is determined by combining the above neuropathological score with a clinical staging of dementia. A subject is considered non-demented if documented with a CDR equal or inferior to 0.5 or a MMSE higher or equal to 26. These thresholds were chosen according to Balsis et al. (2015) .

  • MMSE: 1 = [30 - 26], 2 = [25 - 0]
  • CDR: 1 = [0 - 0.5], 2 = [1 - 5]

Three subject distributions can be distinguished with distinct levels of stringency.
CpDxStrict: ignores the subjects with low or intermediate NPScores (2 or 3).
CpDxLow: includes subjects with low NPScores.
CpDxAll: includes all subjects regardless of their NPScore.

CpDxStrict CpDxLow CpDxAll

NCI = No Cognitive Impairment (no dementia, no brain lesions suggestive of AD)
PCAD = Preclinical AD (no dementia, brain lesions suggestive of AD)
AD = Alzheimer's Disease (dementia associated with AD brain lesions)
DNAD = Dementia, Not AD (dementia without AD brain lesions)

Single Nucleus FAQ

The single nucleus RNA-seq (snRNA-seq) datasets were downloaded from respective study sites. The downloaded data was processed with the Seurat R package (version 4.0.0), which is often used for analysis of single-nucleus studies. Subsequently, the ShinyCell package was used to convert the results to .RDS objects that are then loaded into DataLENS. For the aggregate analysis, average expression across all cells and the proportion of cells where the gene is expressed was computed for each gene, by user-specified variable of interest (e.g., different cell types). For the cell level analysis, if processed data was not available, we removed cells with fewer than 200 genes, greater than 20,000 unique molecular identifiers (UMIs), and/or greater than 15% mitochondrial genes, and used reciprocal principal component analysis (rPCA) integration based on the top 2,000 highly variable genes (HVG) to remove donor-specific effects. Gene expression data was log-normalized, scaled, and subjected to Principal Component Analysis (PCA) to choose the number of principal components for clustering, which was followed by non-linear dimensionality reduction via Uniform Manifold Approximation and Projection (UMAP). Cell level UMAP embeddings were created using the RunUMAP function from Seurat . The proportion of cells across user-specified variables of interest (e.g., cell types in disease vs. control donors) were also computed.

The following term(s) are used throughout the Aggregate Analysis & Cell Level Analysis Pages:

  • Genes: Users can explore gene expressions with lists of official NCBI gene symbols (e.g. APOE, GFAP, SERPINE1).
  • Cell Type: Cell Types that the user can group/subset gene expression data by include:
    • astrocytes (astro)
    • endothelial cells (endo)
    • microglia (mg)
    • neurons (neuron)
    • inhibitory neurons (in)
    • oligodendrocytes (oligo)
    • oligodendrocyte precursor cells (OPC)
    • pericytes (per)

Group By

Within the selected dataset, users have the option to group cells based on various categories, allowing users to visualize group differences. Users can group cells by the following criteria: Sex, Disease Pathology, Pathology Stage, Amyloid Levels, and Cell Types.

Subset By

Users can subset cells by Sex, Disease Pathology, Pathology Stage, Cell Type, and Amyloid Levels depending on the dataset selected.

X-Axis Variable

Users can choose to plot Sex, Disease Pathology, Pathology Stage, Cell Type, or Amyloid Concentration on the x-axis, based on the dataset selected. The dimensionality reduction plots explore cell-level information based on the x-axis variable chosen by the user.

Genetics FAQ

The Genetics page of DataLENS provides access to valuable genetic data through its integration of two GWAS datasets: the International Genomics of Alzheimer's Project (IGAP) meta-analysis and the Accelerating Medicines Partnership Program for Alzheimer's Disease (AMP-AD) GWAS results. The association results were downloaded directly from the study websites.

Acronyms

  • Genes: Users can explore gene expressions with lists of official NCBI gene symbols (e.g. APOE, GFAP, SERPINE1).
  • SNP IDs: Users can explore specific genetic variants of interest by entering various single nucleotide polymorphism (SNP) identifiers.

Manhattan Plot

The interactive Manhattan plot displays the associations between genetic variants (SNPs) and the disease (in this case, AD) across the entire genome. Each SNP is represented as a point on the plot, with its position on the x-axis corresponding to its genomic location, and the significance [–log10 (p-value)] of its association with the disease on the y-axis.

Aggregate Analysis


              

Bubble Plot

In the plot, size of the bubble represents proportion and color represents gene expression

Heatmap

Cell Level Analysis

Dimensionality Reduction Plot

Gene Expression

Proprotion Plot

Network Plot

Nodes in the PPI network are colored according to fold-change in that dataset while size represents significance. Edge width represents the combined score of evidence for interaction between two nodes.

Please note that plots may take several seconds to load.

Regional Expression

Differential Gene Expression


Box Plots

Heatmaps

Genetic Association Query

Manhattan Plot

The interactive Manhattan plot displays the associations between genetic variants (SNPs) and the disease (in this case, AD) across the entire genome. Each SNP is represented as a point on the plot, with its position on the x-axis corresponding to its genomic location, and the significance [–log10 (p-value)] of its association with the disease on the y-axis.