Gallantries Grant - Intellectual Output 2 - Large-scale data analysis, and introduction to visualisation and data modelling

PURL: https://gxy.io/GTN:P00013

Comment: What is a Learning Pathway?

We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.

This Learning Pathway collects the results of Intellectual Output 2 in the Gallantries Project

Success Criteria:

SC2.1) Large-scale data analyses and -handling. In this module, learners will gain competency in managing, organizing, and analysing large collections of datasets.
SC2.2) Analysis of high-dimensional datasets. Real-world scientific studies often involve more complex datasets. For example, combining data from different experiments or timepoints. This more complex experimental setup translates to increased complexity in data analysis.
SC2.3) Data visualisation. This module will cover the basics of data visualisation to aid with exploration, interpretation of complex datasets.
SC2.4) Data modelling. This module will introduce learners to the basics data modelling techniques. This is often required for the identification of patterns in data required for e.g. classification.
SC2.5) Machine learning. This module will also cover more advanced data modelling techniques such as machine learning.
SC2.6) Reasoning about impact of computation on results. Many choices must be made during data analysis. This includes experimental design, choice of data analysis tools and their parameter settings, and external reference databases. Each of these choices will impact the results. Accurate interpretation of results is only possible with an understanding and awareness of the impact of these factors.

Year 1: Introduction to large-scale analyses in Galaxy

Galaxy offers support for the analysis of large collections of data. This submodule will cover the upload, organisation, and analysis of such large sets of data and files. [SC2.1; SC1.3,5]

Time estimation: 5 hours 10 minutes

Learning Objectives

Learn about the Rule Based Uploader
Learn even more about the Rule Based Uploader
Learn about SRA aligned read format and vcf files for Runs containing SARS-CoV-2 content
Understand how to search the metadata for these Runs to find your dataset of interest and then import that data in your preferred format
Learn how to extract a workflow from a Galaxy history
Learn how to change a workflow using the workflow editor
Understand and master dataset collections
Learn to use the `planemo run` subcommand to run workflows from the command line.
Be able to write simple shell scripts for running multiple workflows concurrently or sequentially.
Learn how to use Pangolin to assign annotated variants to lineages.
Understand key aspects of workflows
Create clean, non-repetitive workflows
Learn how to use Workflow Parameters to improve your Workflows

Lesson	Slides	Hands-on	Recordings
Rule Based Uploader collections tags tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (February 2021) - 22m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (February 2021) - 22m video View All
Rule Based Uploader: Advanced collections tags tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (February 2021) - 22m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (February 2021) - 22m video View All
SRA Aligned Read Format to Speed Up SARS-CoV-2 data Analysis ncbi covid19 plain text Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages Plain text slides tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Lecture (May 2021) - 15m video Tutorial (May 2021) - 40m video View All	plain text Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages Plain text slides	tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Lecture (May 2021) - 15m video Tutorial (May 2021) - 40m video View All
Extracting Workflows from Histories workflows tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages
Using dataset collections collections tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (May 2023) - 13m video Tutorial (August 2021) - 11m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (May 2023) - 13m video Tutorial (August 2021) - 11m video View All
Automating Galaxy workflows using the command line workflows variant-analysis covid19 tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (February 2021) - 30m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (February 2021) - 30m video View All
Creating, Editing and Importing Galaxy Workflows workflows tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages
Using Workflow Parameters workflows tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (July 2022) - 30m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (July 2022) - 30m video View All

Year 1: Introduction to the human microbiome analyses

The human microbiome consists of a community of thousands of species of microorganisms. Sequencing of this community is often performed to identify which species of microorganism are present. This aids in diagnostics and treatment of patients. [SC2.1-3,6; SC1.4,5]

Time estimation: 3 hours

Learning Objectives

Inspect metagenomics data
Run metagenomics tools
Identify yeast species contained in a sequenced beer sample using DNA
Visualize the microbiome community of a beer sample
Use Nanopore data for studying soil metagenomics
Analyze and preprocess Nanopore reads
Use Kraken2 to assign a taxonomic labels

Lesson	Slides	Hands-on	Recordings
Identification of the micro-organisms in a beer using Nanopore sequencing nanopore beer citizen science metagenomics microgalaxy tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (May 2023) - 1h5m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (May 2023) - 1h5m video View All
16S Microbial analysis with Nanopore data metabarcoding 16S nanopore microgalaxy plants tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages

Year 1: Advanced microbiome analysis

By using more complex sequencing techniques, it is possible to not only obtain information about which organisms are present in the microbiome, but also their activity. This can e.g. aid in identification of antibiotic resistance. This more complex sequencing requires more complex data analysis [SC2.1-4,6; SC1.4,5]

Time estimation: 4 hours

Learning Objectives

Check quality reports generated by FastQC and NanoPlot for metagenomics Nanopore data
Preprocess the sequencing data to remove adapters, poor quality base content and host/contaminating reads
Perform taxonomy profiling indicating and visualizing up to species level in the samples
Identify pathogens based on the found virulence factor gene products via assembly, identify strains and indicate all antimicrobial resistance genes in samples
Identify pathogens via SNP calling and build the consensus gemone of the samples
Relate all samples' pathogenic genes for tracking pathogens via phylogenetic trees and heatmaps

Lesson	Slides	Hands-on	Recordings
Pathogen detection from (direct Nanopore) sequencing data using Galaxy - Foodborne Edition microgalaxy Nanopore data analysis Pathogens detection Phylogenetic tree Heatmap cyoa tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (August 2024) - 1h55m video Tutorial (May 2023) - 1h45m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (August 2024) - 1h55m video Tutorial (May 2023) - 1h45m video View All

Year 2: Cancer Analysis

The previous submodules focused on scaling up in terms of number of samples. This submodule will focus on scaling up in terms of complexity. Cancer is a disease of the genome, it is a multifaceted and heterogeneous disease. This leads to complex datasets and analysis pipelines [SC2.3,4; SC1.5]

Time estimation: 2 hours

Learning Objectives

Use joint variant calling and extraction to facilitate variant comparison across samples
Perform variant linkage analyses for phenotypically selected recombinant progeny
Filter, annotate and report lists of variants

Lesson	Slides	Hands-on	Recordings
Mapping and molecular identification of phenotype-causing mutations tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages

Year 2: Intro to machine learning

Going beyond conventional statistics, many scientific data analyses benefit from machine learning techniques for modelling of datasets. This is widely used in biomedical domain. [SC2.4,5; SC1.4]

Time estimation: 3 hours

Learning Objectives

Understand the ML taxonomy and the commonly used machine learning algorithms for analysing -omics data
Understand differences between ML algorithms categories and to which kind of problem they can be applied
Understand different applications of ML in different -omics studies
Use some basic, widely used R packages for ML
Interpret and visualize the results obtained from ML analyses on omics datasets
Apply the ML techniques to analyse their own datasets

Lesson	Slides	Hands-on	Recordings
Introduction to Machine Learning using R interactive-tools tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (February 2021) - 1h30m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (February 2021) - 1h30m video View All

Year 2: Introduction to the Galaxy visualisation framework

(This module was cancelled due to insufficiencies in the Galaxy Visualisation Framework.) Galaxy has many options for visualisation of scientific data. This module will cover how to use this framework to create and share visualisation. [SC2.2-3; SC1.1,3,6]

Time estimation:

Learning Objectives

Lesson	Slides	Hands-on	Recordings

Year 3: Visualisation of complex multidimensional data

For advanced visualisation, tools such as Circos may be utilized where Galaxy’s basic visualisation framework does not suffice. [SC2.2-3; SC1.5]

Time estimation: 2 hours 30 minutes

Learning Objectives

Create a number of Circos plots using the Galaxy tool
Familiarise yourself with the various different track types
Plot an *E. coli* genome in Galaxy
With tracks for the annotations, sequencing data, and variants.

Lesson	Slides	Hands-on	Recordings
Visualisation with Circos plain text Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages Plain text slides tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Lecture (February 2021) - 6m video Tutorial (February 2021) - 50m video View All	plain text Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages Plain text slides	tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Lecture (February 2021) - 6m video Tutorial (February 2021) - 50m video View All
Ploting a Microbial Genome with Circos tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages

Year 3: Introduction to Visualisation with R and Python

When the available visualisation options do not suffice, custom plots and visualisations can be created using one of several extensive visualisation libraries available in R and Python. This module will cover the basics of using R and Python to create custom plots and visualisations. [SC2.3; SC1.1]

Time estimation: 2 hours

Learning Objectives

Produce scatter plots, boxplots, and time series plots using ggplot.
Set universal plot settings.
Describe what faceting is and apply faceting in ggplot.
Modify the aesthetics of an existing ggplot plot (including axis labels and color).
Build complex and customized plots from data in a data frame.
Use the scientific library matplolib to explore tabular datasets

Lesson	Slides	Hands-on	Recordings
Data visualisation Olympics - Visualization in R cyoa R rmarkdown-notebook jupyter-notebook tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages
Plotting in Python tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages