This course provides you with essential knowledge and skills to effectively prepare data for analysis. Starting with an overview of the Data Analytics pipeline and processes, the course explores various statistical and visualization techniques used in Exploratory and Descriptive Analytics to understand historical data. You will then delve into the art of Data Preparation, gaining expertise in data cleaning, handling missing values, detecting, and handling outliers, as well as transforming and engineering features. By the end of the course, you will be equipped with the necessary tools to ensure data quality and integrity, enabling you to make informed decisions and derive valuable insights from their data.

Level: Introductory

Length: 3 Hours

Format: Lecture + Hands-on

Prerequisites: Basic Python

Be aware. Stay secure. Join us to learn more about the tools you can use to prevent the theft of your data and possibly of your identity. Other topics of discussion will include common hacking attempts, how to recognize them, and how to avoid having your data compromised, stolen, or destroyed. We will also talk about data encryption and provide tips for when travelling with electronic devices.

Level: Introductory

Length: 3 hours

Format: Lecture

Prerequisites: None

RNA-Seq refers to high throughput sequencing methods that probes the entire transcriptomic landscape of a given tissue or sample of interest. The data acquired from such experiments can be used to explore the overall RNA profile of a sample as well as comparing samples under various conditions. While extremely powerful, RNA-Seq is susceptible to numerous experimental pitfalls and requires intimate knowledge of the experimental procedures and data analysis methods. When conducted properly RNA-Seq can reveal information about gene/transcript expression, splicing and the effects of mutations. In this session we will take a thorough look at a comprehensive RNA-Seq pipeline, from sample processing methods to final differential expression analysis. Relevant R / BioConductor packages will be introduced. We will have the opportunity to investigate numerous quality control metrics, perform genomic alignment, differential expression and pathway enrichment analysis. We will cover several “gotcha”s and common mistakes in experimental design and data analysis. Basic familiarity with R and Linux command line will be beneficial but not required. All necessary commands and parameters will be explained during the class. Participants will be offered hands-on practice in which they will use RStudio to run R/BioConductor scripts for data analysis as well as the Integrative Genomic Viewer (IGV) software to visualize genomic data on their laptops

Level: Intermediate

Length: 3 Hours

Format: Lecture + Hands-on

Prerequisites: Basic R and Linux beneficial but not required

Long-read sequencing technologies enable the sequencing of DNA fragments 10KB and longer. This read length greatly improves sequence mappability and assembly, providing an advantage over short-read sequences that are difficult to map uniquely to repetitive and GC-rich regions. Long-read sequencing has applications in a number of fields including genome assembly, diagnosis of genetic diseases, and metagenomics. In this workshop, we will focus on PacBio HiFi sequences and introduce you to tools for haplotyping, calling and visualizing structural variants and repeat expansions, visualizing read methylation, and detecting novel isoforms from PacBio Iso-Seq data. Participants will be offered hands-on practice in which they will use RStudio to run R/BioConductor scripts for data analysis as well as the Integrative Genomic Viewer (IGV) software to visualize genomic data on their laptops

Level: Intermediate

Length: 3 Hours

Format: Lecture + Hands-on

Prerequisites: Basic R

The role of good research data management practices in supporting research reproducibility is becoming increasingly well known. The literature is replete, however, with examples of poor methodology, lack of transparency, mistakes, and misconduct leading to bad science and an inability to reproduce results. This introductory session will provide real-world, illustrative examples of each of these, along with practical suggestions on how to avoid them.

Level: Introductory

Length: 1.5 Hours

Format: Lecture

Prerequisites: None

Google's 2017 research paper "Attention Is All You Need" described the transformer, a new machine learning technique. From that paper the modern Large Language Model was born, and we're now living in the thick of a new era brought on by companies like OpenAI, Mistral and Anthropic. But where does this cutting-edge technology come from? What are its roots? What are its problems?

This talk explores the history of procedural generation in text and games, from the I-Ching to tranformer-based language models and beyond. The talk will emphasize current state of the art in text-based language models, and include demonstrations on how to run language models locally on your own hardware.

Level: Introductory

Length: 1.5 Hours

Format: Lecture

Prerequisites: None

In this workshop, we will explore the potential uses of generative artificial intelligence tools in research data management (RDM) with a focus on specific use cases. For example, can AI tools be used to write Data Management Plans, summarize funder requirements, assist with data analysis, or suggest file naming conventions and folder structures? This workshop will be interactive, and participants will be welcome to practice using AI tools along with the presenters using real-world data and prompts. We will also discuss the ethical considerations, including benefits and risks, of using AI tools in research and whether it is possible to use AI for RDM practices in an ethical manner.

Level: Introductory

Length: 1.5 Hours

Format: Lecture + Hands-on

Prerequisites: None

This session provides an overview of the Research Data Management (RDM) Services offered by the Digital Research Alliance of Canada, including the DMP Assistant, a national, bilingual platform for the creation and management of data management plans (DMPs), the Federated Research Data Repository (FRDR), a bilingual publishing platform for sharing and preserving Canadian research data, and Lunaris, Canada’s national discovery service for multidisciplinary data from over 90 academic, government, and research repositories across the country. This session will introduce participants to these platforms and provide an overview of how they support the research lifecycle. Attendees will gain valuable insights into the benefits of these tools and how they can help researchers to streamline their data management workflows.

Presenter Biographies:

  • Neha Milan serves as the Product Lead for the Federated Research Data Repository (FRDR), a pivotal role that sees her overseeing the ongoing design and development of the FRDR platform. Based at the University of Saskatchewan, Neha is at the forefront of the FRDR Sensitive Data Pilot Project, steering its direction and implementation.
  • Laura Gerlitz is a Curation Officer for the Federated Research Data Repository (FRDR), based out of Edmonton, Alberta. With a background in library and information studies and digital humanities, Laura specializes in metadata for the FRDR platform.
  • Shlomi Linoy is a Research Data Analyst and Data Discovery Metadata Specialist at McMaster University, specializing in data discovery metadata for the Lunaris platform.
  • Marcus Closen works on the DMP service team for the Digital Research Alliance of Canada. He is in the late stages of completing a PhD in political science at the University of Toronto (working with mixed-methods, as well as side projects in machine learning applications) and holds a masters degree from the University of Manitoba.

Level: Introductory

Length: 1.5 Hours

Format: Lecture

Prerequisites: None

This workshop will help you understand the relation between storage systems and application-level performance. We will survey the design of storage found on national systems, and consider their performance implications. A range of different IO techniques, data formats, and libraries will be considered. Ideally, participants should have an account on the National Platform (DRI). Level: intermediate, examples/exercises will be in Python; having a DRAC account will be helpful.

Level: Intermediate

Length: 3 Hours

Format: Lecture + Hands-on

Prerequisites: Alliance Account, Python Experience

This session will provide participants with information, guidance, and resources for supporting research through the development and implementation of data management plans (DMPs). General topics covered will include the importance and benefits of DMPs, their content, and impending DMP requirements relating to the Tri-Agency research data management (RDM) policy. Specific focus will be given to the Digital Research Alliance of Canada DMP Assistant platform that is hosted nationally at the University of Alberta Library, along with a new DMP template developed by the Alliance’s DMP Expert Group (DMPEG). This new template is targeted specifically to support researchers in meeting DMP requirements at the funding opportunity application stage. Additional information relating to an accompanying assessment rubric that is currently in development will be shared. Time will be reserved for questions and discussion.

Biography: James Doiron is the Research Data Management Strategies Director, University of Alberta Library, and Academic Director of the UofA Research Data Centre. Locally, James serves on UofA’s Institutional Research Data Management Strategy Working Group (chair), Indigenous Research Strategy Task Force, and Health Research Ethics Board. Nationally, he serves as a member of the Canadian Research Data Centre Network Board of Directors and is co-chair of the Digital Research Alliance of Canada’s Data Management Planning (DMP) Expert Group.

Level: Introductory

Length: 1.5 Hours

Format: Lecture

Prerequisites: None

The reproducibility of research is essential to the scientific community, as it ensures the accuracy and reliability of research findings that are used to build upon existing knowledge. However, reproducibility is often hindered by the lack of access to research data, documentation, and code. This workshop will provide an overview of the concepts of open science, reproducibility, and the FAIR principles of research data, as well as explore how to deposit and share data in Borealis, the Canadian Dataverse Repository, a bilingual, multidisciplinary, secure, Canadian research data repository, supported by academic libraries and research institutions across Canada. The learning objectives of the workshop include:

  • Understand the Canadian context of sharing data as it relates to the FAIR principles and the importance of scientific reproducibility
  • Gain skills related to depositing and sharing research data, documentation, and code in Borealis
  • Explore Borealis features to support reproducibility and effective reuse of research data, including Computational Workflow Metadata and uploading from GitHub.
  • Participants will have the opportunity to search and access sample datasets and code, with a focus on real world examples and use cases.

By the end of the workshop, participants will have gained skills and knowledge related to depositing and sharing research data, documentation, and code with an emphasis on openness and reproducibility, improving the quality and impact of their research. 

Level: Introductory

Length: 1.5 Hours

Format: Lecture

Prerequisites: None

The application of machine learning (ML) to academic libraries promises to be transformational. A Task Force of the Ontario Council of University Libraries (OCUL) has been exploring this technology and identifying specific ML use cases. OCUL is an association of the 21 university libraries in Ontario who collaborate on many shared services and resources.

This session will review the work of the Task Force with a focus on use cases, and the requirements and processes to implement pilot programs and production services. Particular attention will be placed on the technology infrastructure (compute, software) and the expertise requirements (technology, domain).

Use cases to be discussed include audio to text transcription, metadata creation, virtual reference (chat), and discovery using natural language processing (NLP), semantic search, and summarization. The discovery use case will be applied to some of the extensive data collections maintained by Scholar Portal, the shared resource managed by OCUL, including over 65 million articles from over 27,000 full text scholarly journals and a collection of over 800K digital books and government documents.

Participants will be encouraged to engage with key questions about the adoption and use of machine learning in libraries and to provide feedback on the ongoing evolution of this technology as it benefits library applications. 

Level: Introductory

Length: 1.5 Hours

Format: Lecture

Prerequisites: None

Jupyter Notebook is commonly used for interactive computing in Python. This session provides the options and features for working with Jupyter on the Digital Research Alliance of Canada's remote computing clusters and demonstrates several use case examples on the clusters.

Level: Introductory

Length: 1.5 Hours

Format: Lecture + Demonstration

Prerequisites: Basic Python and Linux command line experience.

Odesi (https://odesi.ca) is a Canadian social science data repository and online data exploration and analysis tool. Odesi’s collections include over 5,700 historical and contemporary surveys and public opinion polls from a variety of data providers such as Statistics Canada and the Canadian Opinion Research Archive (CORA). This workshop will demonstrate how to effectively search for and access data within Odesi on a variety of social, economic, and political topics. Attendees will learn how to navigate the interface, using search features and available collections, explore survey questions (variables), perform basic tabulations and analysis using connected tools, and download datasets into statistical software (e.g. R, SPSS) for further analysis.

Level: Introductory

Length: 1.5 Hours

Format: Lecture

Prerequisites: None