Alzheimer DataLENS

Open data analytics portal to advance Alzheimer’s disease research by enabling the analysis, visualization, and sharing of -omics data.

How to cite?

Single Cell Transcriptomics

Explore gene expression profiles across different cell-types.

Bulk Transcriptomics

Explore gene expression profiles across different brain regions.

Genetics

Explore genome-wide association studies through gene queries.

DataLENS applies consistent pipelines to process and analyze public -omics data, provides easy-to-use web interfaces to query and visualize these analyses, and uses information from multiple heterogenous modalities to present an integrated view of molecular mechanisms to a neuroscientist.

How to Cite?

Noori, A., Jayakumar, R., Moturi, V., Li, Z., Liu, R., Serrano-Pozo, A., Hyman, B.T. and Das, S. Alzheimer DataLENS: an open data analytics portal for Alzheimer’s Disease research. Journal of Alzheimer’s Disease. 2024; 99(s2), pp.S397-S407. PMID: 38306039.

About

Alzheimer DataLENS is an open-data-analytic-platform that aims to advance research in Alzheimer’s disease (AD) and related dementias by making –omics data accessible to everyday researchers through:

Consistent pipelines to process and analyze public -omics data from AMP-AD and other sources.
Easy-to-use web interfaces for query and visualization of these analytics.
Information from multiple heterogenous modalities to present an integrated view of molecular mechanisms to a neuroscientist.
Tools and methods open to all bioinformatics researchers.

Alzheimer DataLENS allows exploration of the following types of data:

Single-cell transcriptomics studies, including cell and sample-level queries of public datasets.
Bulk transcriptomics studies, including query and visualization of public human datasets spanning multiple brain regions and cohorts.
GWAS studies, including query and visualization of IGAP meta-analysis and AMP-AD GWAS results.

Alzheimer DataLENS was initiated by the Massachusetts Center for Alzheimer Therapeutics Science (massCATS), which is a public-private partnership to discover new treatments for Alzheimer's disease, organized through the Massachusetts Life Sciences Center. Leading academic researchers from the Massachusetts General Hospital, Broad Institute, Harvard Medical School, and MIT are working with healthcare and pharmaceutical partners to find new techniques, mechanisms and drug targets in the fight against Alzheimer's – a disease affecting 40 million people worldwide for which there is currently no cure.

Alzheimer DataLENS is also supported by IOS Press, which publishes the Journal of Alzheimer’s Disease (JAD).

How to Use AlzDataLENS

Transcriptomics: Transcriptomics is the analysis of gene expression data. The Transcriptomics Menu allows users to query and visualize Bulk & Single Cell Transcriptomics Data.

Single Cell Transcriptomics:
- Transcriptomics >> Single Nucleus >> Aggregate Analysis: Aggregate Analysis allows users to visualize average gene expression using both bubble plots and heatmaps across various available factors, including cell types, subclusters, and AD disease/pathology
  (Watch Screencast)
- Transcriptomics >> Single Nucleus >> Cell Level Analysis: Cell Level Analysis allows users to visualize and explore cell proportions and cell-level information using dimensionality reduction plots
  (Watch Screencast)
Bulk Expression:
- Transcriptomics >> Bulk Expression >> Network Plot: Aggregate Analysis allows users to explore relationships between genes of interest, visualized as edges in a graph, based on the STRING database of known and predicted protein-protein interactions
  (Watch Screencast)
- Transcriptomics >> Bulk Expression >> Regional Expression: Regional Expression allows users to explore and visualize transcriptomic datasets for brain regions of interest, with accession codes (e.g., GEO and Synapse IDs) available for downstream query and retrieval
  (Watch Screencast)
- Transcriptomics >> Bulk Expression >> Differential Gene Expression: Differential gene expression results across various covariates for a given study can be queried using a list of gene symbols
  (Watch Screencast)
- Transcriptomics >> Bulk Expression >> Box Plots/Heatmaps: Users can create box plots and heatmaps of gene expression across sex, APOE genotype, Braak neurofibrillary tangle stage, CERAD neuritic plaque score, and/or diagnosis (e.g., AD, progressive supranuclear palsy, pathologic aging, or elderly controls)
  (Watch Screencast)

Genetics: Genetics is the analysis of genetic data. The Genetics menu allows users to visualize genetics data through the integration of two GWAS datasets: the International Genomics of Alzheimer's Project (IGAP) meta-analysis [31] and the Accelerating Medicines Partnership Program for Alzheimer's Disease (AMP-AD) GWAS results.

Genetics >> GWAS: GWAS analysis allows users to query and visualize these GWAS datasets using either gene or single nucleotide polymorphism (SNP) identifiers
(Watch Screencast)
Manhattan Plot: The Manhattan Plot allows users to visualize associations between genetic variants (SNPs) and the disease (in this case, AD) across the entire genome

Datasets

Bulk Transcriptomics

Single Cell Transcriptomics

Bulk Transcriptomics FAQ

ACRONYMS

AD	Alzheimer's Disease (dementia, brain lesions suggestive of AD)
B1, B2, B3	Braak: B1 = 0/I/II, B2 = III/IV, B3 = V/VI
C0, C1, C2, C3	CERAD: C0 = None, C1 = Sparse, C2 = Moderate, C3 = Frequent
DNAD	Dementia, Not AD (dementia without AD brain lesions)
HD	Huntington's Disease
MCI	Mild Cognitive Impairment
NCI	No Cognitive Impairment (no dementia, no brain lesions suggestive of AD)
PA	Pathlogic Aging*
PC	Preclinical AD (No dementia, brain lesions suggestive of AD)
PSP	Progressive Supranuclear Palsy
CPM	Counts Per Million (CPM) mapped reads
FPKM	Fragments Per Kilobase of transcript per Million mapped reads
RC	Raw Counts
uArray	Microarray
logFC	Log2 Fold Change of the Case vs Control that are defined in the Contrast
AveEXpr	Average Expression of the gene
PValue	P-value to test for significance of differential expression of the gene
adjPVal	P-value adjusted for multiple comparisons
Gene Symbol	Official NCBI gene symbol
EntrezID	Official NCBI Gene ID

* Mayo RNAseq Study: Subjects with PA had Braak NFT stage of III or less, but had CERAD neuritic and cortical plaque densities of 2 or more. None of the PA subjects had a clinical diagnosis of dementia or mild cognitive impairment. None of the PA subjects had the following pathologic diagnoses: AD, Parkinson’s disease (PD), DLB, VaD, PSP, motor neuron disease (MND), CBD, Pick’s disease (PiD), Huntington’s disease (HD), FTLD, hippocampal sclerosis (HipScl), or dementia lacking distinctive histology (DLDH).

Differential Expression Analysis

The differential expression analyses were conducted amongst the following groups (Stratification Factors). The contrast describes the comparisons of the two groups in the differential gene expression analysis. Processed microarray and RNA-Seq data were downloaded from the AMP-AD knowledge portal. All RNA-Seq Fragments Per Kilobase of transcript per Million mapped reads data were log transformed. RNA-Seq raw counts were normalized and transformed using the R edgeR and voom packages to prepare for linear modeling. Differential expression analysis was performed using the limma package in R. The false discovery rate for multiple comparisons was adjusted using the Benjamini-Hochberg method. We analyzed the data using Braak Stage, CERAD Neuritic Plaque Score, Clinical Dementia Rating (CDR), and a Combined Neuropathological and Clinical Score.

Braak Stage

A neuropathological score assessing the distribution of tau neurofibrillary tangles in the subject's brain ( Braak and Braak, 1991 )

B1: stages 0 (normal), I/II (hippocampal)
B2: stages III/IV (entorhinal)
B3: stages V/VI (neocortical)

CERAD Neuritic Plaque Score

A neuropathological score assessing the frequency of beta-amyloid neuritic plaques in the subject's brain ( Mirra et al., 1991 )

C0: None / Not AD
C1: Sparse / Possible AD
C2: Moderate / Probable AD
C3: Frequent / Definite AD

Clinical Dementia Rating (CDR)

A clinical score for dementia ( Morris, 1993 ; Balsis et al., 2015 )

NCI: No cognitive impairment CDR <= 0.5
AD: Alzheimer's disease CDR > 0.5

Combined Neuropathological and Clinical Score: Composite Diagnosis (CpDx)

NPScore:

First, a combined neuropathological score (NPScore) is derived from the Braak and CERAD scores of a given subject. The rationale is based on the following articles:
Hyman and Trojanowski, 1997
Hyman et al., 2012
Serrano-Pozo et al., 2016

Briefly, the latest recommendations from the National Institute of Aging - Alzheimer's Association ( Hyman et al., 2012 ) includes Braak, CERAD, and the Thal phases for the neuropathological assessment of AD. However, this score was shown not to be significantly associated with cognition by Serrano-Pozo et al. ( Serrano-Pozo et al., 2016 ). Also, this score is not available as a covariate for any dataset, therefore only Braak and CERAD are used.

The NPScore determination is adapted from the 1997 ( Hyman and Trojanowski,1997 ) and 2012 ( Hyman et al., 2012 ) NIA recommendations.

Braak: B1 = 0/I/II, B2 = III/IV, B3 = V/VI
CERAD: C0 = None, C1 = Sparse, C2 = Moderate, C3 = Frequent

1 = Not AD
2 = Low probability of AD
3 = Intermediate probability of AD
4 = High probability of AD

Composite Diagnosis (CpDx)

The final composite diagnosis (CpDx) is determined by combining the above neuropathological score with a clinical staging of dementia. A subject is considered non-demented if documented with a CDR equal or inferior to 0.5 or a MMSE higher or equal to 26. These thresholds were chosen according to Balsis et al. (2015) .

MMSE: 1 = [30 - 26], 2 = [25 - 0]
CDR: 1 = [0 - 0.5], 2 = [1 - 5]

Three subject distributions can be distinguished with distinct levels of stringency.
CpDxStrict: ignores the subjects with low or intermediate NPScores (2 or 3).
CpDxLow: includes subjects with low NPScores.
CpDxAll: includes all subjects regardless of their NPScore.

NCI = No Cognitive Impairment (no dementia, no brain lesions suggestive of AD)
PCAD = Preclinical AD (no dementia, brain lesions suggestive of AD)
AD = Alzheimer's Disease (dementia associated with AD brain lesions)
DNAD = Dementia, Not AD (dementia without AD brain lesions)

Single Nucleus FAQ

The single nucleus RNA-seq (snRNA-seq) datasets were downloaded from respective study sites. The downloaded data was processed with the Seurat R package (version 4.0.0), which is often used for analysis of single-nucleus studies. Subsequently, the ShinyCell package was used to convert the results to .RDS objects that are then loaded into DataLENS. For the aggregate analysis, average expression across all cells and the proportion of cells where the gene is expressed was computed for each gene, by user-specified variable of interest (e.g., different cell types). For the cell level analysis, if processed data was not available, we removed cells with fewer than 200 genes, greater than 20,000 unique molecular identifiers (UMIs), and/or greater than 15% mitochondrial genes, and used reciprocal principal component analysis (rPCA) integration based on the top 2,000 highly variable genes (HVG) to remove donor-specific effects. Gene expression data was log-normalized, scaled, and subjected to Principal Component Analysis (PCA) to choose the number of principal components for clustering, which was followed by non-linear dimensionality reduction via Uniform Manifold Approximation and Projection (UMAP). Cell level UMAP embeddings were created using the RunUMAP function from Seurat . The proportion of cells across user-specified variables of interest (e.g., cell types in disease vs. control donors) were also computed.

The following term(s) are used throughout the Aggregate Analysis & Cell Level Analysis Pages:

Genes: Users can explore gene expressions with lists of official NCBI gene symbols (e.g. APOE, GFAP, SERPINE1).
Cell Type: Cell Types that the user can group/subset gene expression data by include:
- astrocytes (astro)
- endothelial cells (endo)
- microglia (mg)
- neurons (neuron)
- inhibitory neurons (in)
- oligodendrocytes (oligo)
- oligodendrocyte precursor cells (OPC)
- pericytes (per)

Group By

Within the selected dataset, users have the option to group cells based on various categories, allowing users to visualize group differences. Users can group cells by the following criteria: Sex, Disease Pathology, Pathology Stage, Amyloid Levels, and Cell Types.

Subset By

Users can subset cells by Sex, Disease Pathology, Pathology Stage, Cell Type, and Amyloid Levels depending on the dataset selected.

X-Axis Variable

Users can choose to plot Sex, Disease Pathology, Pathology Stage, Cell Type, or Amyloid Concentration on the x-axis, based on the dataset selected. The dimensionality reduction plots explore cell-level information based on the x-axis variable chosen by the user.

Genetics FAQ

The Genetics page of DataLENS provides access to valuable genetic data through its integration of two GWAS datasets: the International Genomics of Alzheimer's Project (IGAP) meta-analysis and the Accelerating Medicines Partnership Program for Alzheimer's Disease (AMP-AD) GWAS results. The association results were downloaded directly from the study websites.

Acronyms

Genes: Users can explore gene expressions with lists of official NCBI gene symbols (e.g. APOE, GFAP, SERPINE1).
SNP IDs: Users can explore specific genetic variants of interest by entering various single nucleotide polymorphism (SNP) identifiers.

Manhattan Plot

The interactive Manhattan plot displays the associations between genetic variants (SNPs) and the disease (in this case, AD) across the entire genome. Each SNP is represented as a point on the plot, with its position on the x-axis corresponding to its genomic location, and the significance [–log10 (p-value)] of its association with the disease on the y-axis.

Single-cell Differential Gene Expression

Network Plot

Nodes in the PPI network are colored according to fold-change in that dataset while size represents significance. Edge width represents the combined score of evidence for interaction between two nodes.

Alzheimer DataLENS

Single Cell Transcriptomics

Bulk Transcriptomics

Genetics

How to Cite?

About

How to Use AlzDataLENS

Datasets

Bulk Transcriptomics

Single Cell Transcriptomics

Bulk Transcriptomics FAQ

ACRONYMS

Differential Expression Analysis

Braak Stage

CERAD Neuritic Plaque Score

Clinical Dementia Rating (CDR)

Combined Neuropathological and Clinical Score: Composite Diagnosis (CpDx)

NPScore:

Composite Diagnosis (CpDx)

Single Nucleus FAQ

Group By

Subset By

X-Axis Variable

Genetics FAQ

Acronyms

Manhattan Plot

Aggregate Analysis

Select Dataset:

Enter Genes:

Group By:

Subset By:

Color Scheme:

Font Size:

Bubble Plot

In the plot, size of the bubble represents proportion and color represents gene expression

Heatmap

Cell Level Analysis

Select Dataset:

X-Axis Variable:

Group By:

Subset By:

Dimensionality Reduction

Color By:

Subset By:

Show Cell Information Labels:

Show Axis Tick Marks:

Gene Expression

Select Gene:

Color Scheme:

Single-cell Differential Gene Expression

Enter Genes:

Network Plot

Select Genes

Select Dataset

Regional Expression

Differential Gene Expression

Enter Genes:

Box Plots

Enter Gene:

Select Study Type:

Heatmaps

Enter Genes:

Select Study Type:

Genetic Association Query

Enter Genes:

Enter SNP IDs:

Manhattan Plot