Enrolment options

Description: Rapid prototyping for biological sequence analysis requires tools that let researchers move quickly from idea to working workflow without sacrificing performance. Today, this is typically done in one of two ways. The first is by chaining specialized command-line tools, either directly or through workflow systems such as Nextflow or Snakemake. The second is by scripting interactively in high-level languages such as Python, using libraries like BioPython or cogent3 in notebooks. Each approach carries significant performance trade-offs. Pipeline-based tools incur costly memory round-trips — data must be repeatedly transferred off and back onto accelerator memory (e.g., GPU) between chained commands. Python libraries, meanwhile, largely lack built-in support for CPU parallelism (e.g., BioPython) or GPU acceleration (e.g., cogent3), limiting scalability for performance-critical workloads. This course introduces GNX, a modern C++20 library and toolkit purpose-built for high-performance rapid prototyping of biological sequence analysis. GNX addresses the limitations of both paradigms through three core design principles: zero-copy operations that enable toolkit commands to chain directly through device memory without redundant data transfers; SIMD and multi-level parallelism for hardware-aware acceleration on modern CPUs; and seamless backend portability across OpenMP, CUDA, and ROCm targets. Crucially, GNX also exposes an interactive scripting interface through the xeus-cling Jupyter kernel, bringing C++-native performance to the familiar notebook environment. By the end of this 90-minute course, participants will understand GNX's architecture, be able to construct accelerated bioinformatics pipelines using its toolkit and interactively prototype analyses in a Jupyter notebook — all without sacrificing performance.

Teachers: Armin Sobhani (SHARCNET)

Level: Intermediate

Format: Lecture

Certificate: Attendance

Prerequisites: None

Self enrolment (Participant)