Presenter: Tongwu Zhang
Download the original slides for this lecture:
Major Cancer Genomic Studies and Data Portals
- TCGA: Hallmark cancer genomics program, containing 33 different tumor types from more than 11,000 patients; analytical results can be accessible from many data portals.
- ICGC Data Portal: Cancer genomics data and analysis portal for International Cancer Genome Consortium (ICGC) including over 80 cancer projects, totaling 22 cancer primary sites from over 24,000 donors.
- Pan-Cancer Analysis of Whole Genomes (PCAWG): Subset of ICGC project focused on whole genome sequencing data and analyses.
- 100,000 Genomes Project | Genomics England: Sequencing from 100,000 genomes of NHS patients affected by rare diseases and cancer.
- Pediatric Cancer Genome Project (PCGP) and Pediatric Cancer Databases (PeCan): Databases focused on pediatric cancers, including whole genome, whole exome, and RNA sequencing data.
- St. Jude Cloud Genomics Platform: Cloud genomics platform designed to facilitate NGS analysis.
- ProteinPaint: Tool for visualization and analysis of NGS data developed by St. Jude cloud visualization community.
- AACR Project: Genomics Evidence Neoplasia Information Exchange (GENIE): International, open-source, pan-cancer registry of real-world clinical and genomic oncology data; over 136,000 sequencing samples (targeted sequencing).
- MSK-IMPACT: Clinical targeted sequencing cohort using a hybridization capture-based NGS panel at Memorial Sloan Kettering Cancer Center.
- Cancer Research Data Commons (CRDC) at NCI: NCI cloud-based data science infrastructure that connects data sets with analytics tools to allow users to share, integrate, analyze and visualize cancer research data to drive scientific discovery.
- GDC Data Portal: NCI cancer knowledge network that supports hosting, standardization, and analysis of genomic, clinical, and biospecimen data from cancer research programs.
Cell Line-Specific Resources
- Cell Model Passports: A Hub for Preclinical Cancer Models - Annotation, Genomics & Functional Datasets.
- Depmap portal: A data portal to empower the research community to make discoveries related to cancer vulnerabilities by providing open access to key cancer dependencies analytical and visualization tools.
- Genomics of Drug Sensitivity in Cancer: Association between drug response data and genomic markers of sensitivity.
Advanced Data Portals
These platforms have more advanced data analysis capabilities relative to the portals highlighted previously.
- cBioPortal: Most popular cancer genomic data portal. Hosts multi-omics and clinical data from over 300 cancer genomics studies; supports advanced data analysis and programmatic access.
- FIREBROWSE: Very useful platform with easy-to-use and advanced genomic analysis methods for TCGA.
- Gene Expression Profiling Interactive Analysis (GEPIA): Gene expression database and portal for advanced analysis of transcriptomics data including both TCGA and GTEx studies.
Other Genetics Data Resources
- Genotype-Tissue Expression (GTEx): Public resource to study normal tissue-specific gene expression and regulation. Data type including WGS, WES, bulk and single-nuclear RNAseq.
- gnomAD: Genome aggregation database of with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects.
- UK Biobank: A large long-term biobank study in the United Kingdom to investigate the respective contributions of genetic predisposition and environmental exposure to the development of disease.
- TopMed: Phenotype focused genome sequencing studies, generally focused on blood, heart, and lung genomics and phenotypes.
- 1000 Genomes: Whole-genome sequencing study from healthy populations.
- EnCODE and Portal: A public research project with aims to identify functional elements in the human genome.
- RoadMap: A public resource of epigenomic maps for stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease.
- WashU Epigenome Browser: Database allowing for high-level epigenomic topography analysis across regions of the genome.
- 3D Genome Browser: Epigenomics resource particularly useful for topography analysis, similar to WashU Epigenome Browser.
- UCSC Genome Browser: Common resource for genome browsing including many organisms and genomes, and many custom tracks containing specialized annotations.
- Ensembl: Genome browser with some helpful tools such as Ensembl Variant Effect Predictor, BLAST, BLAT, BioMart, and more.
Specialized Databases for Genomic Analyses
- FIREBROWSE (Broad GDAC) for Somatic Analysis: Previously mentioned, allows for many very specific cancer genomics analyses for TCGA study.
- PCAWG Data Portal for WGS-Based Somatic Analysis: Data portal including the major analytical results for PCAWG study.
- Chromothripsis Explorer: Data portal to explore chromothripsis events for PCAWG study.
- Database of Genomic Variants (DGV): Database of germline structural and copy number variation in the human genome.
- Ingenuity Pathway Analysis (IPA): Very useful pathway analysis tool for transcriptomics, proteomics, metabolomics, and more.
Driver Gene Analysis
Alternative Splicing Databases
Mutational Signature Analysis
- COSMIC Mutational Signatures: A catalogue of curated reference mutational signatures.
- Signal: A web-based tool for exploring cancer and experimentally-generated mutational signatures and performing mutational signature analysis.
- MUTAGENE: Explore and analyze context-dependent mutational patterns in cancer samples.
- mSigPortal: (currently in beta): Integrative mutational signature portal for cancer genomics studies.
Analytical Programming Packages for Cancer Genomic Datasets
For large scale analysis of many genes at once, programmatic analysis can be extremely useful. Here we highlight some packages for analysis.
- TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data.
- TCGA Workflow: An integrative package to analyze cancer genomics and epigenomics data using Bioconductor packages with data from TCGA (via GDC), ENCODE, Roadmap.
- cBioPortalData: Obtain and analyze data from the cBioPortal API using R.
- MAFtools: Analysis package for variants files integrating many popular tools for variant analysis, annotation, and visualization.
Cloud Resources with Available Cancer Genomic Datasets
Cloud resources can be particularly useful for users without access to advanced computing resources.
“Awesome Bioinformatics” Resources
‘Awesome Bioinformatics’ refers to a collection of introductions for various aspects of bioinformatics analysis. If you’re looking for more information on any analysis methods we recommend consulting these documents.
- Awesome genomics: Cancer Data Science’s go to place for excellent genomics tools and packages.
- Awesome multi-omics: A community-maintained list of software packages for multi-omics data analysis.
- Awesome cancer variant databases: A community-maintained repository of cancer clinical knowledge bases and databases focused on cancer and normal variants.
- Awesome cancer evolution: Papers for studying cancer evolution.
- Awesome genome visualization: A list of interesting genome visualizers, genome browsers, or genome-browser-like implementations. New website
- Awesome Clonality: A curated list of awesome resources on clonality and tumor heterogeneity.
- Awesome expression browser: A curated list of software and resources for exploring and visualizing (browsing) expression data, but not only limited to that.
- Awesome microbes: List of resources, including software packages (and the people developing these methods) for microbiome (16S), metagenomics (WGS, Shot-gun sequencing), and pathogen identification/detection/characterization.
- Awesome bioinformatics benchmarks: A curated list of bioinformatics benchmarking papers and resources.
- Awesome single cell: List of software packages (and the people developing these methods) for single-cell data analysis, including RNA-seq, ATAC-seq, etc.
- Awesome bioinformatics: A curated list of awesome Bioinformatics software, resources, and libraries. Mostly command line based, and free or open-source.
- Awesome ChiP-Seq: A curated list of CHiP-Seq analysis.
- Awesome mutational signature: A collection of important research papers, reviews, and methods for mutational signature analysis included in mSigPortal (currently in beta).