All courses | CO DRI Training

Bioinformatics for Pathway Enrichment Analysis

Description: Pathway enrichment analysis is a powerful computational approach used to identify biological pathways that are significantly overrepresented in a given set of differentially expressed genes, or any gene list derived from -omics data. This method helps to contextualize large gene lists by linking them to known biological processes, functional modules, and disease mechanisms. While highly informative, pathway enrichment analysis requires careful interpretation and an understanding of statistical methodologies, reference databases, and potential biases in gene-set analysis. In this session, we will explore key concepts and methods for pathway enrichment analysis, and we will discuss different enrichment approaches, including over-representation analysis of a defined gene list and gene set enrichment analysis (GSEA). Participants will be offered hands-on practice in which they will use RStudio to run R/BioConductor scripts for pathway enrichment analysis as well as the Cytoscape software to visualize the results of enrichment analysis on their personal computers. Basic familiarity with R will be beneficial.

Teachers: Ruth Isserlin (Bioinformatics.ca, UHN, Toronto) and Veronique Voisin (Bioinformatics.ca, UHN)

Level: Intermediate

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites:

Knowing how to open R or R-Studio and install packages.
Basic knowledge of R (recommended).
General knowledge of differential expression of RNA-seq or scRNA-seq data.

Bioinformatics: Analysis of RNA-sequencing Data

Description: RNA-Seq refers to high throughput sequencing methods that probes the entire transcriptomic landscape of a given tissue or sample of interest. The data acquired from such experiments can be used to explore the overall RNA profile of a sample as well as comparing samples under various conditions. While extremely powerful, RNA-Seq is susceptible to numerous experimental pitfalls and requires intimate knowledge of the experimental procedures and data analysis methods. When conducted properly RNA-Seq can reveal information about gene/transcript expression, splicing and the effects of mutations. In this session we will take a thorough look at a comprehensive RNA-Seq pipeline, from sample processing methods to final differential expression analysis. Relevant R / BioConductor packages will be introduced. We will have the opportunity to investigate numerous quality control metrics, perform genomic alignment, differential expression and pathway enrichment analysis. We will cover several “gotcha”s and common mistakes in experimental design and data analysis. Basic familiarity with R and Linux command line will be beneficial but not required. All necessary commands and parameters will be explained during the class. Participants will be offered hands-on practice in which they will use RStudio to run R/BioConductor scripts for data analysis as well as the Integrative Genomic Viewer (IGV) software to visualize genomic data on their laptops

Teachers: Alper Celik (HPC4Health, Centre for Computational Medicine SickKids) and Lauren Liang (HPC4Health, Centre for Computational Medicine SickKids)

Level: Intermediate

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites: Basic R and Linux beneficial but not required

Bioinformatics: Long-read Sequencing Applications

Description: Long-read sequencing technologies enable the sequencing of DNA fragments 10KB and longer. This read length greatly improves sequence mappability and assembly, providing an advantage over short-read sequences that are difficult to map uniquely to repetitive and GC-rich regions. Long-read sequencing has applications in a number of fields, including genome assembly, diagnosis of genetic diseases, and metagenomics. In this workshop, we will focus on PacBio HiFi sequences and introduce you to tools for haplotyping, calling and visualizing structural variants and repeat expansions, visualizing read methylation, and detection of novel isoforms from PacBio Iso-Seq.

Teachers: Madeline Couse (HPC4Health, The Centre for Computational Medicine at SickKids) and Lauren Liang (HPC4Health, Hospital for Sick Children)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisite: Basic knowledge about DNA/RNA sequencing.

Fortran as a Second Language

Description: The original high-level programming language, Fortran continues to be used today for high-performance computing in many fields. It has evolved over the years, and modern Fortran provides implicit parallelism (array expressions), explicit parallelism (coarrays), and object-oriented features, among other things. It supports the MPI, OpenMP, and OpenACC parallel programming standards. The primary aim of this course is to help you understand and modify existing Fortran code, but would also be useful if you wish to start a new project in Fortran. You should have prior experience with some other programming language, but this is otherwise a beginner-level course.

Teachers: Ross Dickson (ACENET, Dalhousie University)

Level: Intermediate

Format: Lecture + Hands-on

Certificate: Completion

Prerequisite: Prior experience with some other programming language

High Performance Rapid Prototyping in Biological Sequence Analysis

Description: Rapid prototyping for biological sequence analysis requires tools that let researchers move quickly from idea to working workflow without sacrificing performance. Today, this is typically done in one of two ways. The first is by chaining specialized command-line tools, either directly or through workflow systems such as Nextflow or Snakemake. The second is by scripting interactively in high-level languages such as Python, using libraries like BioPython or cogent3 in notebooks. Each approach carries significant performance trade-offs. Pipeline-based tools incur costly memory round-trips — data must be repeatedly transferred off and back onto accelerator memory (e.g., GPU) between chained commands. Python libraries, meanwhile, largely lack built-in support for CPU parallelism (e.g., BioPython) or GPU acceleration (e.g., cogent3), limiting scalability for performance-critical workloads. This course introduces GNX, a modern C++20 library and toolkit purpose-built for high-performance rapid prototyping of biological sequence analysis. GNX addresses the limitations of both paradigms through three core design principles: zero-copy operations that enable toolkit commands to chain directly through device memory without redundant data transfers; SIMD and multi-level parallelism for hardware-aware acceleration on modern CPUs; and seamless backend portability across OpenMP, CUDA, and ROCm targets. Crucially, GNX also exposes an interactive scripting interface through the xeus-cling Jupyter kernel, bringing C++-native performance to the familiar notebook environment. By the end of this 90-minute course, participants will understand GNX's architecture, be able to construct accelerated bioinformatics pipelines using its toolkit and interactively prototype analyses in a Jupyter notebook — all without sacrificing performance.

Teachers: Armin Sobhani (SHARCNET)

Level: Intermediate

Format: Lecture

Certificate: Completion

Prerequisites: None

Introduction to R

Description: This half-day session offers a brief introduction to R, with a focus on data analysis and statistics. We will discuss the following topics: the R interface, primitive data types, lists, vectors, matrices, and data frames - a crucial data type in data analysis and the trademark of the R language. Advanced topics to be covered include: basics statistics and function creation; and the basics of scripting.

Teacher: Alexey Fedoseev (SciNet, University of Toronto)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites: Some programming experience in another programming language

Introduction of R Shiny

Description: R Shiny has become one of the most widely used tools in bioinformatics for building interactive data analysis applications—without requiring web development expertise. For researchers, analysts, and students working with high dimensional biological data, Shiny provides a way to turn static analyses into dynamic, user friendly dashboards that make exploration simpler, faster, and more intuitive.

This course is intended to help learners bridge the gap between data analysis and data communication. Instead of generating dozens of plots manually, you can build a Shiny app where users adjust parameters, filter data, and visualize results in real time. This is especially valuable in genomics, where datasets are large, multidimensional, and frequently explored by multidisciplinary teams.

Teachers: Prajkta Kallurkar (HPC4Health, SickKids)

Level: Introductory

Format: Lecture + Hands-On

Certificate: Completion

Prerequisites: Basic R knowledge.

Introduction to Single cell RNA sequencing and analysis

Description:

Single-cell RNA sequencing (scRNA-seq) is a transformative technology that enables the transcriptomic profiling of individual cells, revealing the cellular and tissue heterogeneity that is obscured by traditional bulk RNA sequencing methods. This introductory course is designed for participants with little to no prior experience with scRNA-seq, and aims to build a solid conceptual foundation in the field rather than develop independent data analysis skills. Participants will learn how to critically evaluate experimental design considerations for scRNA-seq studies, understand the key steps in sample preparation, cell capture, and library construction, and gain familiarity with the standard computational analysis pipeline - from read quantification and quality control through to dimensionality reduction, clustering, and cell type annotation. By the end of the course, participants will have a broad understanding of how single-cell RNA sequencing experiments are designed, executed, and analyzed, equipping them to engage meaningfully with scRNA-seq literature and collaborate effectively with bioinformatics teams.

Teachers: Niu Huilin (Bioinformatics.ca, Western University)

Level: Introductory

Format: Lecture

Certificate: Completion

Prerequisites:

Previous experience with bulk RNA sequencing, or participation in the Bulk RNA Sequencing workshop also offered at the Compute Ontario Summer School, would be beneficial for understanding course concepts, but is not required.
You will also require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 8GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux.

Population genomics in complex agricultural crops: comparing SNP panels and GBS workflows

Description: TBA

Teachers: Tayab Soomro (Bioinformatics.ca, Agriculture and Agri-Food Canada)

Level: TBA

Format: TBA

Certificate: Attendance

Prerequisites: