Open data analytics portal to advance Alzheimer’s disease research by enabling the analysis, visualization, and sharing of -omics data.
Explore gene expression profiles across different cell-types.
Explore gene expression profiles across different brain regions.
DataLENS applies consistent pipelines to process and analyze public -omics data, provides easy-to-use web interfaces to query and visualize these analyses, and uses information from multiple heterogenous modalities to present an integrated view of molecular mechanisms to a neuroscientist.
Alzheimer DataLENS is an open-data-analytic-platform that aims to advance research in Alzheimer’s disease (AD) and related dementias by making –omics data accessible to everyday researchers through:
Alzheimer DataLENS allows exploration of the following types of data:
Alzheimer DataLENS was initiated by the Massachusetts Center for Alzheimer Therapeutics Science (massCATS), which is a public-private partnership to discover new treatments for Alzheimer's disease, organized through the Massachusetts Life Sciences Center. Leading academic researchers from the Massachusetts General Hospital, Broad Institute, Harvard Medical School, and MIT are working with healthcare and pharmaceutical partners to find new techniques, mechanisms and drug targets in the fight against Alzheimer's – a disease affecting 40 million people worldwide for which there is currently no cure.
Alzheimer DataLENS is also supported by IOS Press, which publishes the Journal of Alzheimer’s Disease (JAD).
Transcriptomics: Transcriptomics is the analysis of gene expression data. The Transcriptomics Menu allows users to query and visualize Bulk & Single Cell Transcriptomics Data.
(Watch Screencast)
Genetics: Genetics is the analysis of genetic data. The Genetics menu allows users to visualize genetics data through the integration of two GWAS datasets: the International Genomics of Alzheimer's Project (IGAP) meta-analysis [31] and the Accelerating Medicines Partnership Program for Alzheimer's Disease (AMP-AD) GWAS results.
AD | Alzheimer's Disease (dementia, brain lesions suggestive of AD) |
B1, B2, B3 | Braak: B1 = 0/I/II, B2 = III/IV, B3 = V/VI |
C0, C1, C2, C3 | CERAD: C0 = None, C1 = Sparse, C2 = Moderate, C3 = Frequent |
DNAD | Dementia, Not AD (dementia without AD brain lesions) |
HD | Huntington's Disease |
MCI | Mild Cognitive Impairment |
NCI | No Cognitive Impairment (no dementia, no brain lesions suggestive of AD) |
PA | Pathlogic Aging* |
PC | Preclinical AD (No dementia, brain lesions suggestive of AD) |
PSP | Progressive Supranuclear Palsy |
CPM | Counts Per Million (CPM) mapped reads |
FPKM | Fragments Per Kilobase of transcript per Million mapped reads |
RC | Raw Counts |
uArray | Microarray |
logFC | Log2 Fold Change of the Case vs Control that are defined in the Contrast |
AveEXpr | Average Expression of the gene |
PValue | P-value to test for significance of differential expression of the gene |
adjPVal | P-value adjusted for multiple comparisons |
Gene Symbol | Official NCBI gene symbol |
EntrezID | Official NCBI Gene ID |
* Mayo RNAseq Study: Subjects with PA had Braak NFT stage of III or less, but had CERAD neuritic and cortical plaque densities of 2 or more. None of the PA subjects had a clinical diagnosis of dementia or mild cognitive impairment. None of the PA subjects had the following pathologic diagnoses: AD, Parkinson’s disease (PD), DLB, VaD, PSP, motor neuron disease (MND), CBD, Pick’s disease (PiD), Huntington’s disease (HD), FTLD, hippocampal sclerosis (HipScl), or dementia lacking distinctive histology (DLDH).
The differential expression analyses were conducted amongst the following groups (Stratification Factors). The contrast describes the comparisons of the two groups in the differential gene expression analysis. Processed microarray and RNA-Seq data were downloaded from the AMP-AD knowledge portal. All RNA-Seq Fragments Per Kilobase of transcript per Million mapped reads data were log transformed. RNA-Seq raw counts were normalized and transformed using the R edgeR and voom packages to prepare for linear modeling. Differential expression analysis was performed using the limma package in R. The false discovery rate for multiple comparisons was adjusted using the Benjamini-Hochberg method. We analyzed the data using Braak Stage, CERAD Neuritic Plaque Score, Clinical Dementia Rating (CDR), and a Combined Neuropathological and Clinical Score.
A neuropathological score assessing the distribution of tau neurofibrillary tangles in the subject's brain ( Braak and Braak, 1991 )
A neuropathological score assessing the frequency of beta-amyloid neuritic plaques in the subject's brain ( Mirra et al., 1991 )
A clinical score for dementia ( Morris, 1993 ; Balsis et al., 2015 )
First, a combined neuropathological score (NPScore) is derived from the Braak and CERAD scores of a given subject. The rationale is based on the following articles:
Hyman and Trojanowski, 1997
Hyman et al., 2012
Serrano-Pozo et al., 2016
Briefly, the latest recommendations from the National Institute of Aging - Alzheimer's Association ( Hyman et al., 2012 ) includes Braak, CERAD, and the Thal phases for the neuropathological assessment of AD. However, this score was shown not to be significantly associated with cognition by Serrano-Pozo et al. ( Serrano-Pozo et al., 2016 ). Also, this score is not available as a covariate for any dataset, therefore only Braak and CERAD are used.
The NPScore determination is adapted from the 1997 ( Hyman and Trojanowski,1997 ) and 2012 ( Hyman et al., 2012 ) NIA recommendations.
The final composite diagnosis (CpDx) is determined by combining the above neuropathological score with a clinical staging of dementia. A subject is considered non-demented if documented with a CDR equal or inferior to 0.5 or a MMSE higher or equal to 26. These thresholds were chosen according to Balsis et al. (2015) .
Three subject distributions can be distinguished with distinct levels of stringency.
CpDxStrict: ignores the subjects with low or intermediate NPScores (2 or 3).
CpDxLow: includes subjects with low NPScores.
CpDxAll: includes all subjects regardless of their NPScore.
NCI = No Cognitive Impairment (no dementia, no brain lesions suggestive of AD)
PCAD = Preclinical AD (no dementia, brain lesions suggestive of AD)
AD = Alzheimer's Disease (dementia associated with AD brain lesions)
DNAD = Dementia, Not AD (dementia without AD brain lesions)
The single nucleus RNA-seq (snRNA-seq) datasets were downloaded from respective study sites. The downloaded data was processed with the Seurat R package (version 4.0.0), which is often used for analysis of single-nucleus studies. Subsequently, the ShinyCell package was used to convert the results to .RDS objects that are then loaded into DataLENS. For the aggregate analysis, average expression across all cells and the proportion of cells where the gene is expressed was computed for each gene, by user-specified variable of interest (e.g., different cell types). For the cell level analysis, if processed data was not available, we removed cells with fewer than 200 genes, greater than 20,000 unique molecular identifiers (UMIs), and/or greater than 15% mitochondrial genes, and used reciprocal principal component analysis (rPCA) integration based on the top 2,000 highly variable genes (HVG) to remove donor-specific effects. Gene expression data was log-normalized, scaled, and subjected to Principal Component Analysis (PCA) to choose the number of principal components for clustering, which was followed by non-linear dimensionality reduction via Uniform Manifold Approximation and Projection (UMAP). Cell level UMAP embeddings were created using the RunUMAP function from Seurat . The proportion of cells across user-specified variables of interest (e.g., cell types in disease vs. control donors) were also computed.
The following term(s) are used throughout the Aggregate Analysis & Cell Level Analysis Pages:
Within the selected dataset, users have the option to group cells based on various categories, allowing users to visualize group differences. Users can group cells by the following criteria: Sex, Disease Pathology, Pathology Stage, Amyloid Levels, and Cell Types.
Users can subset cells by Sex, Disease Pathology, Pathology Stage, Cell Type, and Amyloid Levels depending on the dataset selected.
Users can choose to plot Sex, Disease Pathology, Pathology Stage, Cell Type, or Amyloid Concentration on the x-axis, based on the dataset selected. The dimensionality reduction plots explore cell-level information based on the x-axis variable chosen by the user.
The Genetics page of DataLENS provides access to valuable genetic data through its integration of two GWAS datasets: the International Genomics of Alzheimer's Project (IGAP) meta-analysis and the Accelerating Medicines Partnership Program for Alzheimer's Disease (AMP-AD) GWAS results. The association results were downloaded directly from the study websites.
The interactive Manhattan plot displays the associations between genetic variants (SNPs) and the disease (in this case, AD) across the entire genome. Each SNP is represented as a point on the plot, with its position on the x-axis corresponding to its genomic location, and the significance [–log10 (p-value)] of its association with the disease on the y-axis.
Nodes in the PPI network are colored according to fold-change in that dataset while size represents significance. Edge width represents the combined score of evidence for interaction between two nodes.
Please note that plots may take several seconds to load.
The interactive Manhattan plot displays the associations between genetic variants (SNPs) and the disease (in this case, AD) across the entire genome. Each SNP is represented as a point on the plot, with its position on the x-axis corresponding to its genomic location, and the significance [–log10 (p-value)] of its association with the disease on the y-axis.