Reproducibility and experiment tracking are essential in machine learning workflows. MLflow is an open-source platform for experiment tracking and model management in machine learning and AI development. This webinar introduces MLflow with quickstart examples running on the clusters, focusing on a lightweight setup with local storage. The examples will be demonstrated in Jupyter notebooks and in batch jobs.
Similar to disk space, inodes (the number of files on a filesystem) are a limited resource. Therefore, each user and group are allocated a fixed number of inodes, by default. In this webinar, filesystem quotas on the Alliance clusters and best practices for managing file quotas will be presented, including the use of archival storage. Inconsistencies in file ownership leading to “disk quota exceeded” errors will be discussed. Finally, file formats such as NetCDF and SQL BLOBs for effectively storing large sets of small files will be presented.
Learn how to create and manage virtual machines on SHARCNet's cloud infrastructure using OpenStack. This session covers the essentials of working with the OpenStack dashboard to launch VMs, configure security groups, manage storage volumes, and control your cloud resources. Whether you're setting up a web server, running custom software environments, or building virtual clusters, you'll discover how OpenStack gives you complete control over your computing environment.
The Canadian Bioinformatics Hub (CBH) supports the growth of bioinformatics and computational biology in Canada through training, mentorship, and community-building initiatives. This presentation introduces CBH and the resources it offers to students, early-career researchers, and industry professionals. We will highlight key programs within the training and community pillars and provide an overview of bioinformatics user groups across Canada, with a focus on Ontario. Join us to learn how CBH connects people, builds skills, and strengthens bioinformatics communities across the province and beyond.
In our last talk on Fully Sharded Data Parallel (FSDP), we offered insight into training large models using FSDP and strategies for customizing model training with FSDP for performance benefits.
PyTorch has an updated interface for Fully Sharded Data Parallel called FSDP2, here we will present how to implement FSDP2 in your training code, compare FSDP2 with FSDP, and examine training performance using FSDP2 on the new systems. Intermediate experience with Python, PyTorch and deep learning is expected.
This simulation will contain several advances relative to the widely used 1/48° MITgcm simulation (also known as LLC4320), including increased vertical and horizontal resolution, an updated global bathymetry, the use of a more accurate surface pressure solver, the addition of ice-shelf cavities around Greenland and Antarctica, hourly atmospheric forcing, realistic river discharge, and more accurate astronomical tides. These improvements directly address long-standing issues in earlier high-resolution MITgcm simulations, for example, a misplaced Gulf Stream, a crude representation of Antarctic shelf currents, and anemic tropical instability waves.
The resulting model output will offer an unprecedented benchmark for studies of internal tides and internal waves, turbulence parameterization, and sea-surface height variability. All configurations, tools, and outputs will be openly released, positioning this Canada-led effort as a major global resource for oceanography and climate modelling.
Date: October 22nd 12:00pm-1:00pm
Contributors: Sahar Naseer (Privacy Specialist, The Hospital for Sick Children), Roohie Sharma (Legal Privacy Counsel, The Hospital for Sick Children), Melissa Lanuza (Privacy and FOI, The Hospital for Sick Children)
This presentation provides a foundational overview of health privacy principles and legislation relevant to Ontario hospitals. It introduces the ten privacy principles that underpin Canadian privacy laws, with a focus on the Personal Health Information Protection Act (PHIPA), the Personal Information Protection and Electronic Documents Act (PIPEDA), and the Freedom of Information and Protection of Privacy Act (FIPPA). The session covers the rights and responsibilities associated with personal health information (PHI) and personal information (PI), the process for handling freedom of information requests, and the importance of cybersecurity awareness.
Date: October 15th 12:00pm-1:00pm
Presenters: Kristi Thompson (Research Data Management Librarian, Western University) and Alexandra Cooper (Data Services Coordinator, Queen’s University)
In early 2025 data and documents began to disappear from U.S. government web sites in response to a series of executive orders. This led to a scramble as various individuals and groups mobilized to save as much disappearing data as they could. These events served as a wake-up call and led to the founding of the Canadian Public Data Rescue Initiative (CPDRI).
Initially formed, in part, to support rescue efforts in the U.S., the CPDRI also set out to build infrastructure to support public data in Canada. Drawing on projects including the Canadian Government Information Digital Preservation Network and the OCUL Ontario Data Rescue Group, the CPDRI is working to establish a sustainable strategy for preservation of vital Canadian public datasets.