Level up your computational research!

Discover powerful digital research techniques at the 2025 Compute Ontario Summer School


Event Description

Taking place from June 2 to June 20, the Compute Ontario Summer School offers a comprehensive curriculum packed with over 40 courses. Delivered by experts in the field, these sessions cover a wide range of topics including Advanced Research Computing (ARC), High Performance Computing (HPC), Research Data Management (RDM), and Research Software (RS). With presentations and workshops available at introductory to intermediate levels, there is something for everyone.

Presented by ACENET, CAC, SciNet, and SHARCNET, in partnership with Bioinformatics.ca, Digital Research Alliance of Canada, HPC4Health, Ontario Brain Institute, OICR, RDM Network of Experts, and Scholars Portal.

Enrolment

In order to enrol in COSS 2025 courses:

  1. If you don't already have a Compute Ontario Training account, create one.
  2. Log in with your Compute Ontario Training account (this link opens in a new window).
  3. Enrol in a desired course below by clicking its Link to Course and then click on the Enrol button.

Participants need to enrol in each course they wish to attend. Be aware that some courses overlap, so check the schedule below carefully.

We also have a frequently asked questions (FAQ) page for this event.

Week 1
When (EDT) Course
:: Mon., June 2 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Computational and Mathematical Analysis for a Simple Network Model of Associative Memory

:: Link to Course :: n/a ::

Description: This lecture introduces the fundamental concepts of an associative network in neural computation. Studying a simple network architecture allows analyzing the process of associating one memory to another through tuned synaptic connections. The discussion combines mathematical and computational study of this system, setting the foundation for further study in neural networks and machine learning. This course will be 50% lecture and 50% lab. The lab will be hands-on, with students able to work interactively at the computer they use for the Zoom session.

Teachers: Lyle Muller (Western University, OBI Centre for Analytics) and Roberto Budzinski ( University of Lethbridge, OBI Centre for Analytics)

Level: Intermediate

Format: Lecture + Hands-on.

Certificates: Attendance and Completion

Prerequisites:

  • Basic linear algebra (vectors, matrices, matrix multiplication) and programming (functions, variables, loops).
  • Basic Python knowledge and know-how.

NOTE: This course has limited enrolment. If enrolled and you will not be able to attend, then kindly unenrol so another person can enrol.

:: Mon., June 2 ::
09:00 to 10:25 EDT

Overview of Training Opportunities in the School and "Beyond"

:: Link to Course :: n/a ::

Description: Are you not sure which workshops to sign up for in this Summer School? In this session, we will give an overview of the program of the Compute Ontario Summer School to help you decide. We'll also show you what other training opportunity in Advanced Research Computing and Research Data Management are available for you in Canada after the summer school.

Teacher: Ramses van Zon (SciNet, University of Toronto)

Level: Introductory

Format: Webinar

Certificate: Attendance

Prerequisites: None

:: Mon., June 2 ::
10:30 to 11:55 EDT

Working with Jupyter on the Clusters

:: Link to Course :: n/a ::

Description: Jupyter Notebook is commonly used for interactive computing in Python. This session provides the options and features for working with Jupyter on the Digital Research Alliance of Canada's remote computing clusters and demonstrates several use case examples on the clusters.

Teacher: Jinhui Qin (SHARCNET, Western University)

Level: Introductory

Format: Lecture + Demonstration

Certificate: Attendance

Prerequisites: Basic Python and Linux command line experience.

:: Mon., June 2 ::
13:30 to 16:30 EDT

Introduction to Version Control Using Git

:: Link to Course :: n/a ::

Description: Using version control for your scripts, codes, documents, papers, and even data, allows you to track changes, keep backups, and facilitate collaboration. This introductory workshop will teach you the basics of version control with the popular distributed version control software GIT. This workshop assumes that students have a basic understanding of Linux shell commands.

Teacher: James Willis (SciNet, University of Toronto)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisite: Basic understanding of Linux shell commands.

:: Tue., June 3 ::
09:00 to 12:00 EDT

Data Visualization in Bioinformatics (R)

:: Link to Course :: n/a ::

Description: Plotting and data visualization are essential for effectively communicating bioinformatics findings, yet they are often treated as trivial tasks. In this course, we will showcase the power of a well-designed plot! We will cover key principles of effective visualization, work through examples ranging from basic to complex, and conclude with a hands-on workshop. By the end of the course, you will be able to create publication- or presentation-ready plots for your own research using R and ggplot2.

Teacher: Rachel Edgar (Bioinformatics.ca, UHN)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites:

  • Basic knowledge of R.
  • Basic knowledge of working in R-Studio.
:: Tue., June 3 ::
09:00 to 12:00 EDT

Introduction to Advanced Research Computing

:: Link to Course :: n/a ::

Description: This workshop is a primer for those largely new to supercomputing, i.e., to computing on shared, remote resources. It is intended to demystify the somewhat intimidating term "High-Performance Computing" (HPC), and to serve as a foundation upon which to build over the coming days. Topics will include motivation for HPC, available resources, essential issues, and a high level overview of parallel programming models commonly used on these systems.

Teacher: Ramses van Zon (SciNet, University of Toronto)

Level: Introductory

Format: Lecture + Hands-on

Certificates: Attendance and Completion

Prerequisite: Basic Linux (e.g., "Introduction to Linux Shell" course)

:: Tue., June 3 ::
13:30 to 16:30 EDT

Introduction to R

:: Link to Course :: n/a ::

Description: This half-day session offers a brief introduction to R, with a focus on data analysis and statistics. We will discuss the following topics: the R interface, primitive data types, lists, vectors, matrices, and data frames - a crucial data type in data analysis and the trademark of the R language. Advanced topics to be covered include: basics statistics and function creation; and the basics of scripting.

Teacher: Alexey Fedoseev (SciNet, University of Toronto)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites: Some programming experience in another programming language

:: Tue., June 3 ::
13:30 to 16:30 EDT

Network Analysis of Neurophysiological Data

:: Link to Course :: n/a ::

Description: This course will provide an overview of the analytical process for working with neurophysiological data and deriving insights from it. We will explore preprocessing, feature extraction, and downstream analysis (e.g., machine learning) end-to-end with discussions throughout on considerations for analysis based on theory and empirical observations.

Teachers: Irene Harmsen (Cove Neuro, OBI Centre for Analytics)

Level: Intermediate

Format: Webinar

Certificates: Attendance and Completion.

Prerequisites:

  • Knowledge of time-series data analysis.
  • Python programming experience.
  • Basic understanding of statistics and machine learning.
:: Wed., June 4 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Scaling Up HPC Workflows

:: Link to Course :: n/a ::

Description: This hands-on course is designed for researchers who want to take their high-performance computing (HPC) workflows to the next level. Whether you're new to large-scale computing or looking to optimize your current practices, this course will guide you through the key steps to efficiently scale your applications on HPC systems.

Participants will begin by identifying their specific applications and learn how to properly compile their code for HPC environments. The course covers essential topics including running interactive sessions, performance tuning, and efficient batch job submission-complete with practical script examples. You'll also explore strategies for checkpointing, monitoring job progress, and effective debugging techniques.

Through a combination of lectures and hands-on exercises, this course offers real-world insights into improving performance, reducing run times, and making the most of shared computing resources. By the end, you'll be equipped with the tools and knowledge needed to run scalable, reliable, and efficient HPC workflows.

Teachers: Sergey Maschenko (SHARCNET, McMaster University) and Jaime Pinto (SciNet, University of Toronto)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites: None

:: Wed., June 4 ::
09:00 to 12:00 EDT

Bioinformatics: Analysis of RNA-sequencing Data

:: Link to Course :: n/a ::

Description: RNA-Seq refers to high throughput sequencing methods that probes the entire transcriptomic landscape of a given tissue or sample of interest. The data acquired from such experiments can be used to explore the overall RNA profile of a sample as well as comparing samples under various conditions. While extremely powerful, RNA-Seq is susceptible to numerous experimental pitfalls and requires intimate knowledge of the experimental procedures and data analysis methods. When conducted properly RNA-Seq can reveal information about gene/transcript expression, splicing and the effects of mutations. In this session we will take a thorough look at a comprehensive RNA-Seq pipeline, from sample processing methods to final differential expression analysis. Relevant R / BioConductor packages will be introduced. We will have the opportunity to investigate numerous quality control metrics, perform genomic alignment, differential expression and pathway enrichment analysis. We will cover several "gotcha"s and common mistakes in experimental design and data analysis. Basic familiarity with R and Linux command line will be beneficial but not required. All necessary commands and parameters will be explained during the class. Participants will be offered hands-on practice in which they will use RStudio to run R/BioConductor scripts for data analysis as well as the Integrative Genomic Viewer (IGV) software to visualize genomic data on their laptops

Teachers: Alper Celik (HPC4Health, SickKids) and Lauren Liang (HPC4Health, SickKids)

Level: Intermediate

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites: Basic R and Linux beneficial but not required

:: Wed., June 4 ::
13:30 to 16:30 EDT

Bioinformatics for Pathway Enrichment Analysis

:: Link to Course :: n/a ::

Description: Pathway enrichment analysis is a powerful computational approach used to identify biological pathways that are significantly overrepresented in a given set of differentially expressed genes, or any gene list derived from -omics data. This method helps to contextualize large gene lists by linking them to known biological processes, functional modules, and disease mechanisms. While highly informative, pathway enrichment analysis requires careful interpretation and an understanding of statistical methodologies, reference databases, and potential biases in gene-set analysis. In this session, we will explore key concepts and methods for pathway enrichment analysis, and we will discuss different enrichment approaches, including over-representation analysis of a defined gene list and gene set enrichment analysis (GSEA). Participants will be offered hands-on practice in which they will use RStudio to run R/BioConductor scripts for pathway enrichment analysis as well as the Cytoscape software to visualize the results of enrichment analysis on their personal computers. Basic familiarity with R will be beneficial.

Teachers: Ruth Isserlin (Bioinformatics.ca, University of Toronto) and Veronique Voisin (Bioinformatics.ca, UHN)

Level: Intermediate

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites:

  • Knowing how to open R or R-Studio and install packages.
  • Basic knowledge of R (recommended).
  • General knowledge of differential expression of RNA-seq or scRNA-seq data.
:: Thu., June 5 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Introduction to C

:: Link to Course :: n/a ::

Description: This course introduces the fundamental concepts of programming such as conditional statement, Loops(while and for), Arrays, Pointers, Functions and Dynamic memory allocation. No programming experience will be assumed or required.

Teacher: Rakesh Srirajaraghavaraju (CAC, Queen's University)

Level: Introductory

Format: Lecture

Certificate: Attendance

Prerequisites: None

:: Thu., June 5 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Machine Learning

:: Link to Course :: n/a ::

Description: This course provides an introduction to machine learning that enables computers to learn AI models from data without being explicitly programmed. It comprises two parts:

  • Part I covers the fundamentals of machine learning, and
  • Part II demonstrates the applications of various machine methods in solving a real world problem.

Rather than presenting the key concepts and components of machine learning in an abstract way, this course introduces them with a small number of examples. By using plotting and animations, insight into some of the mechanics of machine learning can be had. Furthermore, the student will gain practical skills in a case study, in which each step of developing a machine learning project is presented. By the end of this course, the student will have a solid understanding and experience with some of the fundamentals of machine learning enabling subsequent exploration.

Teacher: Weiguang Guan (SHARCNET, McMaster University)

Level: Introductory to Intermediate

Format: Lecture

Certificate: Attendance

Prerequisites:

  • Data preparation or equivalent knowledge.
  • Basic Python knowledge and experience.
  • Knowledge and experience with Tensorflow and Scikit-learn would also be helpful.
:: Fri., June 6 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Fortran as a Second Language

:: Link to Course :: n/a ::

Description: The original high-level programming language, Fortran continues to be used today for high-performance computing in many fields. It has evolved over the years, and modern Fortran provides implicit parallelism (array expressions), explicit parallelism (coarrays), and object-oriented features, among other things. It supports the MPI, OpenMP, and OpenACC parallel programming standards. The primary aim of this course is to help you understand and modify existing Fortran code, but would also be useful if you wish to start a new project in Fortran. You should have prior experience with some other programming language, but this is otherwise a beginner-level course.

Teachers: Ross Dickson (ACENET, Dalhousie University) and Chris Geroux (ACENET, Dalhousie University)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance and Completion

Prerequisite: Prior experience with some other programming language

:: Fri., June 6 ::
09:00 to 12:00 EDT

Bioinformatics: Long-read Sequencing Applications

:: Link to Course :: n/a ::

Description: Long-read sequencing technologies enable the sequencing of DNA fragments 10KB and longer. This read length greatly improves sequence mappability and assembly, providing an advantage over short-read sequences that are difficult to map uniquely to repetitive and GC-rich regions. Long-read sequencing has applications in a number of fields, including genome assembly, diagnosis of genetic diseases, and metagenomics. In this workshop, we will focus on PacBio HiFi sequences and introduce you to tools for haplotyping, calling and visualizing structural variants and repeat expansions, visualizing read methylation, and detection of novel isoforms from PacBio Iso-Seq.

Teachers: Madeline Couse (HPC4Health, SickKids) and Lauren Liang (HPC4Health, SickKids)

Level: Intermediate

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisite: Basic knowledge about DNA/RNA sequencing.

:: Fri., June 6 ::
13:30 to 16:30 EDT

Data Preparation for Machine Learning

:: Link to Course :: n/a ::

Description: This single-session course provides you with essential knowledge and skills to effectively prepare data for analysis. Starting with an overview of the Data Analytics pipeline and processes, the session explores various statistical and visualization techniques used in Exploratory and Descriptive Analytics to understand historical data. You will then delve into the art of Data Preparation, gaining expertise in data cleaning, handling missing values, detecting, and handling outliers, as well as transforming and engineering features. By the end of the session, you will be equipped with the necessary tools to ensure data quality and integrity, enabling you to make informed decisions and derive valuable insights from their data.

Teacher: Shadi Khalifa (CAC, Queen's University)

Level: Intermediate

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites:

  • Some experience and knowledge of statistics.
  • Some experience and knowledge of Python.
Week 2
When (EDT) Course
:: Mon., June 9 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

:: Tue., June 10 ::
09:00 to 12:00 EDT

Introduction to Artificial Neural Networks

:: Link to Course :: n/a ::

Description (Parts 1 and 2): Introduction of neural network programming concepts, theory and techniques. The class material will being at an introductory level, intended for those with no experience with neural networks, eventually covering intermediate concepts.

Description (Part 3): This part will continue the development of the neural network programming approaches from Parts 1 & 2. This part will focus on methods used to generate sequences: LSTM networks, sequence-to-sequence networks, and transformers.

Teacher: Erik Spence (SciNet, University of Toronto)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites:

  • Experience with Python will be assumed. (This course is being taught assuming this.)
  • No prior experience with the Keras neural framework is expected. (The Keras neural framework will be used for neural network programming.)
:: Mon., June 9 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Introduction to Python

:: Link to Course :: n/a ::

Description: This course is designed to provide you with a solid foundation in Python programming language. Through a comprehensive curriculum and hands-on coding exercises, participants will learn the fundamentals of Python syntax, data types, functions, and file handling. By the end of the course, you will have gained the essential skills to write Python programs, solve problems, and build the foundation for more advanced Python development. Whether you are a beginner or have some programming experience, this course will equip you with the necessary tools to start your journey in Python programming.

Teacher: Fernando Hernandez (CAC: Queen's University)

Level: Introductory

Format: Workshop

Certificate: Attendance

Prerequisite: An account (free) on https://replit.com/. The course is delivered using a free online tool to let us focus on coding.

:: Tue., June 10 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Introduction to Linux Shell

:: Link to Course :: n/a ::

Description: Running programs on the supercomputers is done via the Bash shell. This course is three one hour lectures on using bash. No prior familiarity with bash is assumed. In addition to the basics of getting around, globbing, regular expressions, redirection, pipes, and scripting will be covered.

Teacher: Tyson Whitehead (SHARCNET, Western University)

Level: Introductory

Format: Lecture + Exercises with Questions

Certificate: Attendance and Completion

Prerequisites: None

:: Tue., June 10 ::
13:30 to 16:30 EDT

Reproducible Research Practices and Tools

:: Link to Course :: n/a ::

Description: Have you ever tried to run someone else's code and it just didn't work? Have you ever been lost interpreting your colleague's data? This hands-on session will provide researchers with tools and techniques to make their research process more transparent and reusable in remote computing environments. We'll be using platforms like JupyterHub and scripting languages like Bash to demonstrate the material. In this workshop, you'll learn about:

  • Organizing your file directories
  • Writing readable metadata with README files
  • Automating your workflow with scripts
  • Capture and share your computational environment
    Using large language models (GenAI) to assist with the above

Teachers: Sarah Huber (University of Victoria), Shahira Khair (University of Victoria), and Drew Leske (University of Victoria)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisite: Familiarity with command-line tools in a Unix environment is not a requirement for the workshop but may be helpful for some of the hands-on activities.

:: Wed., June 11 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Scientific Visualization

:: Link to Course :: n/a ::

Description: During this workshop, we will learn about matplotlib which is a popular Python library that is great for 2D visualizations, and ParaView, a free and open-source visualization tool for creating 3D visualizations of your datasets. In this interactive workshop you will get familiar with how ParaView works and at the end you should be able to generate basic visualizations of the demo data.

Teacher: Jarno van der Kolk (University of Ottawa)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites: None

:: Wed., June 11 ::
09:00 to 10:25 EDT

Research Data Management: A Global Perspective on Making Data FAIR

:: Link to Course :: n/a ::

Description: Research Data Management (RDM) has emerged as a key component of the broader DRI (Digital Research Infrastructure) ecosystem. FAIR principles (making data Findable, Accessible, Interoperable, and Reusable) have been at the core of RDM initiatives for almost a decade now -- but how has our understanding and application of these principles evolved to address emerging technologies such as Machine Learning and AI? To answer this, we look at a recent policy document, "Enabling Global FAIR Data: WorldFAIR Policy Recommendations for Research Infrastructures", published by CODATA and WorldFAIR in 2024. The first half of this session will provide a high-level distillation of, and reflection upon, this global policy brief, flagging areas where Canadian stakeholders can better support and promote FAIR data practices. The second half of this session will address the importance of the FAIR principles at a time when data are being suppressed or deleted, data agencies are being gutted or shuttered, and data-driven decision making is devalued and disparaged. Real-world examples will be provided to illustrate the range and impact of the data deletion chaos we are witnessing in real time, and possible responses to these actions.

Teachers: Ann Allan (Compute Ontario) and Jeff Moon (Compute Ontario)

Level: Introductory

Format: Lecture

Certificate: Attendance

Prerequisites: None

:: Wed., June 11 ::
10:35 to 12:00 EDT

The Beginner’s Guide to Data Curation

:: Link to Course :: n/a ::

Description: This session provides an introduction to data curation concepts and best practices. As an essential step in the data publication process, data curation techniques can be useful at all stages of a research project and across research disciplines. Attendees will learn the foundational principles of data curation, including the reasons for curation, and leave with a toolkit of resources that can be adapted to institutional and disciplinary needs.

Teacher: Mikala Narlock (Indiana University Bloomington)

Level: Introductory

Format: Lecture

Certificate: Attendance

Prerequisites: None

:: Wed., June 11 ::
13:30 to 14:55 EDT

Introduction to Alliance RDM Services

:: Link to Course :: n/a ::

Description: The Introduction to Alliance RDM Services webinar will feature experts that will discuss good research data management practices. The training will begin with an overview of research data management and then provide helpful guidance and tips for the different stages of the research data lifecycle focusing on the stages of data management planning, data preservation, active data management and sharing and discovery of data. We will highlight different tools and services that are available to researchers as they embark on their research journey.

Teacher: Marcus Closen (Alliance), Tristan Kuehn (Alliance), Daniel Manrique-Castano (Alliance), and Amanda Tomé (Alliance)

Level: Introductory

Format: Webinar

Certificate: Attendance

Prerequisites: None

:: Wed., June 11 ::
15:05 to 16:30 EDT

Enhancing the FAIRness of Sensitive and Restricted Access Research Data: data deposit, de-identification, and re-use

:: Link to Course :: n/a ::

Description: This session will focus on enhancing the FAIRness of sensitive and restricted access research data through strategies that support data deposit, de-identification, and responsible re-use. Participants will explore ensuring consent language enables data sharing, resources to support de-identification practices in preparation for deposit, and approaches to metadata and data deposit that facilitate controlled data re-use. The session will highlight practical considerations and real-world examples relevant to researchers, data stewards, and repository professionals working with sensitive or restricted access data.

Teacher: Victoria Smith (Alliance)

Level: Introduction

Format: Lecture

Certificate: Attendance

Prerequisites: None

:: Thu., June 12 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Using Containers: Apptainer

:: Link to Course :: n/a ::

Description: Apptainer is a secure container technology designed to be used on for high performance compute clusters. This workshop will focus on how to use Apptainer as well as how to make use of tools such as Conda and Spack within Apptainer. By the end of these sessions, one will have learnt about Apptainer and how it is installed and used on our computer clusters, how to build an Apptainer container image, how to install tools such as Conda/Spack from inside an Apptainer container shell, and,
how to use Apptainer containers within job submission scripts.

Teacher: Paul Preney (SHARCNET, University of Windsor)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisite: Basic knowledge of Linux shell and how to run programs from the shell.

:: Thu., June 12 ::
09:00 to 10:25 EDT

Practical Guide To The H100 and Taking Full Advantage of Compute Ontario's Newest GPUs

:: Link to Course :: n/a ::

Description: The new H100 GPUs in Compute Ontario systems are up to 8 years newer and many times more powerful than some of the GPU that are being replaced. There are several new architectural features that are worth knowing about, and it's worth considering how to make full use of these much larger systems. In this session we'll cover: Differences from P100, V100, and A100; Advanced features; Taking advantage of the features with compilers and libraries that do much of the work for you; and the pros and cons of different approaches like MIG and MPS to stack multiple partly-accelerated runs on one GPU.

Teacher: Jonathan Dursi (NVIDIA)

Level: Intermediate

Format: Lecture

Certificate: Attendance

Prerequisites: Experience with previous generations of GPUs (V100, A100)

:: Thu., June 12 ::
10:35 to 12:00 EDT

Implementing Institutional RDM Strategies

:: Link to Course :: n/a ::

Description: With the release of the Tri-Agency Research Data Management Policy all Canadian post-secondary institutions and research hospitals that administer Tri-Agency funding were required to develop and post institutional research data management (RDM) strategies by March 1, 2023. As institutions finalized their strategies, they began to consider what implementation would look like. To support inter-institutional, cross-functional dialogue around implementation, a two-day, SSHRC-supported workshop was hosted at the University of Waterloo in September 2023. Over 30 institutions of varying sizes and research intensities sent cohorts representing libraries, information technology, and research offices to participate in dialogues around challenges and collaborative solutions in RDM strategy implementation. The high-level recommendations from that workshop have been released as the report Building an Inter-Institutional and Cross-Functional Research Data Management Community: From Strategy to Implementation. In this short workshop, participants will discuss the recommendations in the report and how they can be implemented in their institutions.

Teachers: Jennifer Abel (University of Calgary) and Ian Milligan (University of Waterloo)

Level: Intermediate

Format: Workshop

Certificate: Attendance

Prerequisites:

  • Participants should be involved in RDM-supporting work in their institution; e.g., in the library, research office, IT/research computing, ethics, etc.
  • Participants should also read the executive summary of the report before the workshop.
:: Thu., June 12 ::
13:30 to 16:30 EDT

Text Mining

:: Link to Course :: n/a ::

Description: This workshop introduces the topic of text mining and its applications. It covers different encoding mechanisms to convert text into numbers that algorithms can handle. It gives an overview of different text mining tasks, including de-identification, sentiment analysis and document clustering, and how they work with examples and live demos. There will also be references to state-of-the-art tools and libraries to conduct various text mining tasks.

Teacher: Amal Khalil (CAC, Queen's University)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites: Basic Python

:: Fri., June 13 ::
09:00 to 12:00 EDT

AI showcase

:: Link to Course :: n/a ::

Description: This course introduces Artificial Intelligence (AI), a science focusing on developing intelligent systems capable of autonomous behavior. In this course, we explore the exciting world of AI, introducing its definition and history. We discuss the advantages and challenges of AI at present, along with various applications and projects that demonstrate its capabilities. Throughout the session, participants will gain insights into different types of AI, learn about running predefined projects, and discover AI showcases on various platforms. By the end of the course, participants will have the knowledge and resources to start their own AI projects with their data and explore the latest AI advancements in our clusters.

Teacher: Nastaran Shahparian (SHARCNET, York University)

Level: Introductory

Format: Lecture

Certificate: Attendance and Completion

Prerequisite: Basic Python knowledge and know-how is beneficial but not required.

:: Fri., June 13 ::
09:00 to 12:00 EDT

DASK

:: Link to Course :: n/a ::

Description: Python is a popular language because it is easy to create programs quickly with simple syntax and a "batteries included" philosophy. However, there are some drawbacks to the language. It is notoriously difficult to parallelize because of a component called the global interpreter lock, and Python programs typically take many times longer to run than compiled languages such as Fortran, C, and C++, making Python less popular for creating performance-critical programs. Dask was developed to address the first problem of parallelism. The second problem of performance can be addressed by either using modules already compiled into fast C/C++ code, such as NumPy, or by converting performance-critical parts into a compiled language such as C/C++ nearly automatically using Cython. Together Cython and Dask can be used to gain greater performance and parallelism of Python programs.

Other than having some prior experience with a programming language, preferably Python, this is a beginner level course. During the course we will program together to build out a script used to demonstrate course concepts. This will take slightly longer than half the time, while hands on exercise will use the remaining time. No Alliance account is required.

Teacher: Chris Geroux (ACENET, Dalhousie University)

Level: Introductory

Format: Lecture + follow along coding + hands on exercises

Certificate: Attendance

Prerequisites: Should have experience programming in at least one language, ideally Python.

:: Fri., June 13 ::
13:30 to 16:30 EDT

Data Parallelism and Model Parallelism for Scaling Training Across Multiple GPUs

:: Link to Course :: n/a ::

Description: Larger Deep Neural Networks (DNNs) are typically more powerful, but training models across multiple GPUs or multiple nodes isn't trivial and requires a an understanding of both AI and high-performance computing (HPC). In this workshop we will give an overview of activation checkpointing, gradient accumulation, and various forms of data and model parallelism to overcome the challenges associated with large-model memory footprint, and walk through some examples.

Teacher: Jonathan Dursi (NVIDIA)

Level: Intermediate/Advanced

Format: Lecture + Demo

Certificate: Attendance

Prerequisites:

  • Familiarity with training models in Pytorch on a single GPU will be assumed.
:: Fri., June 13 ::
13:30 to 16:30 EDT

Incorporating Other Languages into Python

:: Link to Course :: n/a ::

Description: We will cover how to write optimized code in C, and how to include this into your Python code. We will look at Cython, as well as pure C. If time permits, we will also look at including FORTRAN.

Teacher: Joey Bernard (ACENET, University of New Brunswick)

Level: Intermediate

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites:

  • Basic Python programming experience.
  • One knows how to use a C compiler.
Week 3
When (EDT) Course
:: Mon., June 16 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

High Performance Computing in Python

:: Link to Course :: n/a ::

Description: Learn how to improve the performance and use parallel programming in Python. We will cover profiling, subprocess, numexpr, multiprocessing, MPI, and other performance enhancing techniques.

Teacher: Ramses van Zon (SciNet, University of Toronto)

Level: Intermediate

Format: Lecture + Hands-on

Certificates: Attendance and Completion

Prerequisites:

  • Basic Linux command line skills.
  • Programming experience in Python.
:: Mon., June 16 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Multicore Parallel Programming (OpenMP)

:: Link to Course :: n/a ::

Description: This is an introduction to the intermediate level OpenMP hand-on course. OpenMP is a standard parallel programming API that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran.

This one-day course will cover the principles of OpenMP compiler directives, library routines, and environment variables with step-by-step hand-on examples. Case studies include various approaches for loop parallelism. We will also talk about the Task constructs for irregular programs, and the Target constructs for accelerators such as GPU. Participants will have hand-on programming experience with OpenMP as well as how to compile and run Multi-thread OpenMP code on different alliance clusters.

Teacher: Jemmy Hu (SHARCNET, University of Waterloo)

Level: Introductory

Format: Lecture + Hands-on

Prerequisites: Basic knowledge of C, C++, or Fortran.

:: Tue., June 17 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

:: Wed., June 18 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

:: Thu., June 19 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

GPU Programming: CUDA

:: Link to Course :: n/a ::

Description: This is an introductory course covering programming and computing on GPUs - graphics processing units - which are an increasingly common presence in massively parallel computing architectures. The basics of GPU programming will be covered, and students will work through a number of hands on examples. The structuring of data and computations that makes full use of the GPU will be discussed in detail. Students should be able to leave the course with the knowledge necessary to begin developing their own GPU applications.

Teacher: Sergey Mashchenko (SHARCNET, McMaster University) and Pawel Pomorski (SHARCNET, University of Waterloo)

Level: Introductory

Format: Lecture + Hands-on

Certificates: Attendance

Prerequisite: Basic C and/or C++ programming experience.

:: Tue., June 17 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

:: Wed., June 18 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Introduction to Julia for Scientific and Parallel Computing

:: Link to Course :: n/a ::

Description: Julia is becoming increasingly popular for scientific computing. One may use it for prototyping as Matlab, R and Python for productivity, while gaining the same performance as compiled languages such as C/C++ and Fortran. The language is designed for both prototyping and performance, as well as simplicity. This is an introductory course on julia. Students will be able to get started quickly with the basics, in comparison with other similar languages such as Matlab, R, Python and Fortran and move on to learn how to write code that can run in parallel on multi-core and cluster systems through examples.

Teacher: Baolai Ge (SHARCNET, Western University)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance and Completion

Prerequisites: None

:: Tue., June 17 ::
09:00 to 12:00 EDT

Moving to Rust for Memory Safe Code

:: Link to Course :: n/a ::

Description: An introduction to Rust, highlighting some of the major differences to traditional languages like C/C++. We will also have a brief review of available science modules.

Teacher: Joey Bernard (ACENET, University of New Brunswick)

Level: Introductory

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites:

  • Knowledge of how compiled code works.
  • Some knowledge of C-like programming languages.
:: Tue., June 17 ::
13:30 to 16:30 EDT

Data Security

:: Link to Course :: n/a ::

Description: Be aware. Stay secure. Join us to learn more about the tools you can use to prevent the theft of your data and possibly of your identity. Other topics of discussion will include common hacking attempts, how to recognize them, and how to avoid having your data compromised, stolen, or destroyed. We will also talk about data encryption and provide tips for when travelling with electronic devices.

Teacher: Jarno van der Kolk (University of Ottawa)

Level: Introductory

Format: Lecture

Certificate: Attendance

Prerequisites: None

:: Wed., June 18 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

:: Fri., June 20 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Parallel Programming with MPI

:: Link to Course :: n/a ::

Description: We will cover the basics of parallel programming, in the context of MPI. There will be a great deal of hands-on experience, with lots of examples.

Teachers: Joey Bernard (ACENET, University of New Brunswick) and Gurpreet Singh (ACENET, St. Francis Xavier University)

Level: Introductory / Intermediate

Format: Lecture + Hands-on

Certificate: Attendance

Prerequisites:

  • Familiarity with the C (or C-like) programming languages.
  • Some knowledge involving issues concerning parallel programming languages.
:: Thu., June 19 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

:: Fri., June 20 ::
09:00 to 12:00 EDT
13:30 to 16:30 EDT

Modern C++ for Parallel Programming

:: Link to Course :: n/a ::

Description: This course will focus on the following in both sequential and parallel contexts:

  • using <mdspan> for accessing multi-dimensional arrays and multi-dimensional array slices,
  • using <linalg> for linear algebra,
  • using P2300 (senders and receivers; asynchronous) support,
  • using NVIDIA C++ compiler's stdpar support (CPU and/or GPU) for the above, and,
  • using C++'s extended floating-point types.

Teacher: Paul Preney (SHARCNET, University of Windsor)

Level: Intermediate

Format: Lecture + Hands-on

Certificates: Attendance and Completion

Prerequisite: Previous experience developing C++ programs.

:: Thu., June 19 ::
09:00 to 10:25 EDT

Depositing in Borealis, the Canadian Dataverse Repository

:: Link to Course :: n/a ::

Description: This online workshop will support researchers with uploading data files of all types and examples of documentation and metadata for submission to an institutional collection (hosted in Borealis). Participants will learn more about direct integrations for dropbox, handling .zips, geospatial file support, creating documentation and metadata, linking to code and publications, integrated previewers and analysis tools for reuse and sharing.

Teacher: Amber Leahey (OCUL, Scholars Portal, University of Toronto), Billie Hu (OCUL, Scholars Portal, University of Toronto), Alyssa Conlon (OCUL, Scholars Portal, Queen's University)

Level: Introductory

Format: Workshop

Certificate: Attendance

Prerequisites: None

:: Thu., June 19 ::
10:35 to 12:00 EDT

Using Data Collections in Odesi and Scholars GeoPortal in Your Research

:: Link to Course :: n/a ::

Description: This online workshop will demonstrate to researchers how-to search, filter by variables, topics, and themes, as well as exploring and analyzing data using these repository platforms. Highlights of collections including historical census data and geographic boundary data as well as open historical topographic maps and data for reuse. Participants will be able to search for data and explore datasets to learn more about data for reuse. A significant focus will be on Canadian open access and historical government data, an open Q&A portion will be facilitated by staff and data experts for further consultation.

Teachers: Amber Leahey (OCUL, Scholars Portal, University of Toronto) and Alicia Urquidi Diaz (OCUL, Scholars Portal, University of Toronto), Alexandra Cooper (Queen's University)

Level: Introductory

Format: Workshop

Certificate: Attendance

Prerequisites: None

:: Thu., June 19 ::
13:30 to 14:55 EDT

Metadata in the DRI Ecosystem: A Pragmatic Introduction

:: Link to Course :: n/a ::

Description:

Metadata is as metadata does. As researchers and institutions embrace digital research infrastructures (DRIs) and digital tools for conducting research, researchers need a better and deeper practical understanding of metadata in these new digital contexts.
Usually defined as essentially being data about data, metadata provides additional information or context about an item (a digital object, a piece of data..). Most people have an intuitive understanding of the ways metadata can make it easier to understand, manage, and organize digital items, based on their experiences interacting with metadata in practice-from managing personal files in their own computers or on the cloud, to using library catalogues or scientific databases.
This workshop will aim to demystify DRI metadata by experiencing it "in action" across three data services: Borealis, Scholars GeoPortal, and Odesi. The goal is to connect the (sometimes arcane) metadata best practices and recommendations with some actual, practical (and often surprising!) implications of your metadata decisions.

Teacher: Alicia Urquidi Diaz (OCUL, Scholars Portal, University of Toronto)

Level: Introductory

Format: Workshop

Certificate: Attendance

Prerequisites: None

:: Thu., June 19 ::
15:05 to 16:30 EDT

Data Management Plans: Researcher and RDM Expert Panel

:: Link to Course :: n/a ::

Description: This panel will convene three researchers and three Research Data Management Specialists to explore the process of writing data management plans (DMPs) in an academic setting. Panelists will be invited to share their experiences guided by a series of questions highlighting the evolving roles of researchers and data specialists in this planning process. The audience will have the opportunity to ask questions of panelists to better understand how they can plan to make their research data FAIR (findable, accessible, interoperable, reusable) which, in turn, ensures data are safeguarded, properly organized, and that science is reproducible.

Teachers: Ann Allan (Compute Ontario), Anneliese Eber (University of Waterloo), Danica Evering (McMaster University), Fiona Brinkman (Simon Fraser University), UW Fire Research Group (University of Waterloo), Jeff Moon (Compute Ontario), Kharah Ross (Athabasca University), and Robyn Stobbs (Athabasca University)

Level: Introductory

Format: Panel

Certificate: Attendance

Prerequisites: None