Session 2: Public Databases

Presenter: Tongwu Zhang

Download the original slides for this lecture:

Major Cancer Genomic Studies and Data Portals

Cell Line-Specific Resources

  • Cell Model Passports: A Hub for Preclinical Cancer Models - Annotation, Genomics & Functional Datasets.
  • Depmap portal: A data portal to empower the research community to make discoveries related to cancer vulnerabilities by providing open access to key cancer dependencies analytical and visualization tools.
  • Genomics of Drug Sensitivity in Cancer: Association between drug response data and genomic markers of sensitivity.

Advanced Data Portals

These platforms have more advanced data analysis capabilities relative to the portals highlighted previously.

  • cBioPortal: Most popular cancer genomic data portal. Hosts multi-omics and clinical data from over 300 cancer genomics studies; supports advanced data analysis and programmatic access.
  • FIREBROWSE: Very useful platform with easy-to-use and advanced genomic analysis methods for TCGA.
  • Gene Expression Profiling Interactive Analysis (GEPIA): Gene expression database and portal for advanced analysis of transcriptomics data including both TCGA and GTEx studies.

Other Genetics Data Resources

  • Genotype-Tissue Expression (GTEx): Public resource to study normal tissue-specific gene expression and regulation. Data type including WGS, WES, bulk and single-nuclear RNAseq.
  • gnomAD: Genome aggregation database of with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects.
  • UK Biobank: A large long-term biobank study in the United Kingdom to investigate the respective contributions of genetic predisposition and environmental exposure to the development of disease.
  • TopMed: Phenotype focused genome sequencing studies, generally focused on blood, heart, and lung genomics and phenotypes.
  • 1000 Genomes: Whole-genome sequencing study from healthy populations.
  • EnCODE and Portal: A public research project with aims to identify functional elements in the human genome.
  • RoadMap: A public resource of epigenomic maps for stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease.
  • WashU Epigenome Browser: Database allowing for high-level epigenomic topography analysis across regions of the genome.
  • 3D Genome Browser: Epigenomics resource particularly useful for topography analysis, similar to WashU Epigenome Browser.
  • UCSC Genome Browser: Common resource for genome browsing including many organisms and genomes, and many custom tracks containing specialized annotations.
  • Ensembl: Genome browser with some helpful tools such as Ensembl Variant Effect Predictor, BLAST, BLAT, BioMart, and more.

Specialized Databases for Genomic Analyses

Driver Gene Analysis

Alternative Splicing Databases

Mutational Signature Analysis

  • COSMIC Mutational Signatures: A catalogue of curated reference mutational signatures.
  • Signal: A web-based tool for exploring cancer and experimentally-generated mutational signatures and performing mutational signature analysis.
  • MUTAGENE: Explore and analyze context-dependent mutational patterns in cancer samples.
  • mSigPortal: (currently in beta): Integrative mutational signature portal for cancer genomics studies.

Analytical Programming Packages for Cancer Genomic Datasets

For large scale analysis of many genes at once, programmatic analysis can be extremely useful. Here we highlight some packages for analysis.

  • TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data.
  • TCGA Workflow: An integrative package to analyze cancer genomics and epigenomics data using Bioconductor packages with data from TCGA (via GDC), ENCODE, Roadmap.
  • cBioPortalData: Obtain and analyze data from the cBioPortal API using R.
  • MAFtools: Analysis package for variants files integrating many popular tools for variant analysis, annotation, and visualization.

Cloud Resources with Available Cancer Genomic Datasets

Cloud resources can be particularly useful for users without access to advanced computing resources.

“Awesome Bioinformatics” Resources

‘Awesome Bioinformatics’ refers to a collection of introductions for various aspects of bioinformatics analysis. If you’re looking for more information on any analysis methods we recommend consulting these documents.

  • Awesome genomics: Cancer Data Science’s go to place for excellent genomics tools and packages.
  • Awesome multi-omics: A community-maintained list of software packages for multi-omics data analysis.
  • Awesome cancer variant databases: A community-maintained repository of cancer clinical knowledge bases and databases focused on cancer and normal variants.
  • Awesome cancer evolution: Papers for studying cancer evolution.
  • Awesome genome visualization: A list of interesting genome visualizers, genome browsers, or genome-browser-like implementations. New website
  • Awesome Clonality: A curated list of awesome resources on clonality and tumor heterogeneity.
  • Awesome expression browser: A curated list of software and resources for exploring and visualizing (browsing) expression data, but not only limited to that.
  • Awesome microbes: List of resources, including software packages (and the people developing these methods) for microbiome (16S), metagenomics (WGS, Shot-gun sequencing), and pathogen identification/detection/characterization.
  • Awesome bioinformatics benchmarks: A curated list of bioinformatics benchmarking papers and resources.
  • Awesome single cell: List of software packages (and the people developing these methods) for single-cell data analysis, including RNA-seq, ATAC-seq, etc.
  • Awesome bioinformatics: A curated list of awesome Bioinformatics software, resources, and libraries. Mostly command line based, and free or open-source.
  • Awesome ChiP-Seq: A curated list of CHiP-Seq analysis.
  • Awesome mutational signature: A collection of important research papers, reviews, and methods for mutational signature analysis included in mSigPortal (currently in beta).