Session 1: Introduction to Computing Clusters and Bioinformatics

Topics Covered

  • Introduction to Available Computing Clusters
  • Cluster How-Tos: Connect, Transfer Files/Share Data
  • Basic Linux/UNIX commands (e.g. bash scripts, submitting and monitoring jobs)
  • Common bioinformatics formats and tools (e.g. FASTQ, BAM, VCF, BED, and GFF; samtools, bcftools, bedtools, and maftools)

Practical

  • Connect to Biowulf by SSH
  • Basic linux commands
  • Submit interactive jobs
  • Samtools: Read BAM file, get all aligned reads, index, stats
  • Bedtools: merge/intersect
  • Visualize results on UCSC

In this first session we will introduce the basics of working with Biowulf and CCAD including connecting to, working with, and transferring files from both clusters.

Both Biowulf and the CCAD cluster are accessed remotely via the linux command line, therefore some command line knowledge is necessary to work with either. We will cover the basics of linux and shell scripting in the lecture and provide resources for further education. Finally, we will provide an overview of the most common file formats for storing bioinformatics data as well as the tools for working with them.

In the practical section we will practice connecting to Biowulf and the basics of using the linux command line. We will also work with fastq, bam, and bed files to perform some common tasks including subsetting unaligned and aligned sequencing reads, and intersecting and visualizing bed files.