Pathway analysis with the MINERVA Platform
Author(s) | Marek Ostaszewski Matti Hoch Iacopo Cristoferi Myrthe van Baardwijk Saskia Hiltemann Helena Rasche |
Tester(s) | Helena Rasche |
Reviewers | |
Infrastructure | Matti Hoch Mira Kuntz José Manuel Domínguez Sanjay Kumar Srikakulam Björn Grüning |
OverviewQuestions:Objectives:
Which pathways are affected in this COVID-19 study?
How can I visualise the results of a differential expression analysis in the MINERVA Platform?
Requirements:
Perform an analysis using a workflow from WorkflowHub
Visualise and interpret the results with MINERVA
- Introduction to Galaxy Analyses
- slides Slides: Quality Control
- tutorial Hands-on: Quality Control
- slides Slides: Mapping
- tutorial Hands-on: Mapping
Time estimation: 1 hourLevel: Intermediate IntermediateSupporting Materials:Published: Mar 28, 2024Last modification: Jul 26, 2024License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurl PURL: https://gxy.io/GTN:T00437version Revision: 3
This tutorial is a partial reproduction of Togami et al. 2022 wherein they evaluated mRNA and miRNA in a selection of COVID-19 patients and healthy controls. While that paper uses a closed source pipeline, we’ll be reproducing the analysis with open source tools in Galaxy, using a workflow on WorkflowHub developed for the BY-COVID project.
Comment: Full dataThe original data is available at NCBI under BioProject PRJNA754796
AgendaIn this tutorial, we will deal with:
There are several places you can jump to in this tutorial, using pre-calculated data. We recommend you jump skipping the data download and counting step, and skipping to the analysis, as that precludes the slowest and most data intensive parts of this tutorial. However, the entire process is documented in case you want to reproduce our work.
Study Design
Hands-on: Data upload
Create a new history for this tutorial
To create a new history simply click the new-history icon at the top of the history panel:
Import the factor table from Zenodo
https://zenodo.org/records/10405036/files/factordata.tabular
- Copy the link location
Click galaxy-upload Upload Data at the top of the tool panel
- Select galaxy-wf-edit Paste/Fetch Data
Paste the link(s) into the text field
Press Start
- Close the window
Check that the datatype
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, click galaxy-chart-select-data Datatypes tab on the top
- In the galaxy-chart-select-data Assign Datatype, select
tabular
from “New type” dropdown
- Tip: you can start typing the datatype into the field to filter the dropdown menu
- Click the Save button
Analysis
We have split this workflow into three parts, based only on how long the first two portions of the workflow take to execute. The rough runtime of the workflow portions, when this was being developed, can be broken down as follows:
Step | Time |
---|---|
Data Download | ~6h |
Processing Counts | ~8h |
Analysis & Visualisation | 15m |
These numbers were generated on UseGalaxy.eu and may not represent the most efficient possible computation, as they are executed on a shared cluster that can, at times, be more or less busy.
As such we recommend you skip to the analysis step to progress to the interesting portion of the tutorial. We have provided in the Zenodo record data from the entire analysis, analysed with the Download & Counts steps that can be skipped.
Hands-on: Choose Your Own TutorialThis is a "Choose Your Own Tutorial" section, where you can select between multiple paths. Click one of the buttons below to select how you want to follow the tutorial
Choose whether you want to run the time-consuming Download Data step.
Data Download
We’ll start by downloading our FASTQ files from the GEO Dataset GSE182152
Hands-on: Download the data from GEO (ETA: 6 Hours)
Import the workflow into Galaxy
Hands-on: Importing and launching a GTN workflow
- Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
- Click on galaxy-upload Import at the top-right of the screen
- Paste the following URL into the box labelled “Archived Workflow URL”:
https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/minerva-pathways/workflows/Galaxy-Workflow-BY-COVID__Data_Download.ga
- Click the Import workflow button
Below is a short video demonstrating how to import a workflow from GitHub using this procedure:
Run the workflow with the following parameters:
- “Sample Table”:
factordata.tabular
Here we have cut the SRR*
identifiers from the sample table and downloaded them with fasterq
, part of the SRA toolkit.
Counts
With that done, we can start to analyse the data using HISAT2 and featureCounts. This workflow takes in the RNA Sequencing data we’ve downloaded previously, before trimming it with cutadapt. Both the trimmed and untrimmed reads are run through FastQC for visualisation.
The trimmed reads are then handled by HISAT2 for alignment to the reference human genome, and featureCounts is run for quantification.
This workflow produces a set of featureCounts tables, a set of featureLengths (needed for goseq), and a MultiQC report.
Hands-on: Run the WorkflowHands-on: Importing and Launching a WorkflowHub.eu WorkflowLaunch mRNA-Seq BY-COVID Pipeline (v1) (View on WorkflowHub)WorkflowHub is a workflow management system which allows workflows to be FAIR (Findable, Accessible, Interoperable, and Reusable), citable, have managed metadata profiles, and be openly available for review and analytics.
- Ensure that you are logged in to your Galaxy account.
- Click on the Workflow menu, located in the top bar.
- Click on the Import button, located in the right corner.
- In the section “Import a Workflow from Configured GA4GH Tool Registry Servers (e.g. Dockstore)”, click on Search form.
- In the TRS Server: workflowhub.eu menu you should type
name:"mRNA-Seq BY-COVID Pipeline"
- Click on the desired workflow, and finally select the latest available version.
After that, the imported workflows will appear in the main workflow menu. In order to run the workflow, just need to click in the workflow-run Run workflow icon.
Below is a short video showing this uncomplicated procedure:
- Run the workflow with the following parameters:
This workflow produces a handful of outputs: the featureCounts results, and a MultiQC report. Looking at the report we see generally reasonable quality data.
limma
Hands-on: Download the Counts Files
- Open the Rule Builder
- “Upload data as”:
Collection(s)
- “Load tabular data from”:
Pasted Table
Paste the following table:
https://zenodo.org/records/10405036/files/gene_lengths.tabular gene_lengths Gene Lengths https://zenodo.org/records/10405036/files/SRR15462516.featureCounts.tabular SRR15462516 featureCounts https://zenodo.org/records/10405036/files/SRR15462517.featureCounts.tabular SRR15462517 featureCounts https://zenodo.org/records/10405036/files/SRR15462518.featureCounts.tabular SRR15462518 featureCounts https://zenodo.org/records/10405036/files/SRR15462519.featureCounts.tabular SRR15462519 featureCounts https://zenodo.org/records/10405036/files/SRR15462520.featureCounts.tabular SRR15462520 featureCounts https://zenodo.org/records/10405036/files/SRR15462521.featureCounts.tabular SRR15462521 featureCounts https://zenodo.org/records/10405036/files/SRR15462522.featureCounts.tabular SRR15462522 featureCounts https://zenodo.org/records/10405036/files/SRR15462523.featureCounts.tabular SRR15462523 featureCounts https://zenodo.org/records/10405036/files/SRR15462524.featureCounts.tabular SRR15462524 featureCounts https://zenodo.org/records/10405036/files/SRR15462525.featureCounts.tabular SRR15462525 featureCounts https://zenodo.org/records/10405036/files/SRR15462526.featureCounts.tabular SRR15462526 featureCounts https://zenodo.org/records/10405036/files/SRR15462527.featureCounts.tabular SRR15462527 featureCounts https://zenodo.org/records/10405036/files/SRR15462528.featureCounts.tabular SRR15462528 featureCounts https://zenodo.org/records/10405036/files/SRR15462529.featureCounts.tabular SRR15462529 featureCounts https://zenodo.org/records/10405036/files/SRR15462530.featureCounts.tabular SRR15462530 featureCounts https://zenodo.org/records/10405036/files/SRR16681520.featureCounts.tabular SRR16681520 featureCounts https://zenodo.org/records/10405036/files/SRR16681521.featureCounts.tabular SRR16681521 featureCounts https://zenodo.org/records/10405036/files/SRR16681522.featureCounts.tabular SRR16681522 featureCounts https://zenodo.org/records/10405036/files/SRR16681523.featureCounts.tabular SRR16681523 featureCounts https://zenodo.org/records/10405036/files/SRR16681524.featureCounts.tabular SRR16681524 featureCounts https://zenodo.org/records/10405036/files/SRR16681525.featureCounts.tabular SRR16681525 featureCounts https://zenodo.org/records/10405036/files/SRR16681526.featureCounts.tabular SRR16681526 featureCounts https://zenodo.org/records/10405036/files/SRR16681527.featureCounts.tabular SRR16681527 featureCounts https://zenodo.org/records/10405036/files/SRR16681528.featureCounts.tabular SRR16681528 featureCounts https://zenodo.org/records/10405036/files/SRR16681529.featureCounts.tabular SRR16681529 featureCounts https://zenodo.org/records/10405036/files/SRR16681530.featureCounts.tabular SRR16681530 featureCounts https://zenodo.org/records/10405036/files/SRR16681531.featureCounts.tabular SRR16681531 featureCounts https://zenodo.org/records/10405036/files/SRR16681532.featureCounts.tabular SRR16681532 featureCounts https://zenodo.org/records/10405036/files/SRR16681533.featureCounts.tabular SRR16681533 featureCounts https://zenodo.org/records/10405036/files/SRR16681534.featureCounts.tabular SRR16681534 featureCounts https://zenodo.org/records/10405036/files/SRR16681535.featureCounts.tabular SRR16681535 featureCounts https://zenodo.org/records/10405036/files/SRR16681536.featureCounts.tabular SRR16681536 featureCounts https://zenodo.org/records/10405036/files/SRR16681537.featureCounts.tabular SRR16681537 featureCounts https://zenodo.org/records/10405036/files/SRR16681538.featureCounts.tabular SRR16681538 featureCounts https://zenodo.org/records/10405036/files/SRR16681539.featureCounts.tabular SRR16681539 featureCounts https://zenodo.org/records/10405036/files/SRR16681540.featureCounts.tabular SRR16681540 featureCounts https://zenodo.org/records/10405036/files/SRR16681541.featureCounts.tabular SRR16681541 featureCounts https://zenodo.org/records/10405036/files/SRR16681542.featureCounts.tabular SRR16681542 featureCounts https://zenodo.org/records/10405036/files/SRR16681543.featureCounts.tabular SRR16681543 featureCounts https://zenodo.org/records/10405036/files/SRR16681544.featureCounts.tabular SRR16681544 featureCounts https://zenodo.org/records/10405036/files/SRR16681545.featureCounts.tabular SRR16681545 featureCounts https://zenodo.org/records/10405036/files/SRR16681546.featureCounts.tabular SRR16681546 featureCounts https://zenodo.org/records/10405036/files/SRR16681547.featureCounts.tabular SRR16681547 featureCounts https://zenodo.org/records/10405036/files/SRR16681548.featureCounts.tabular SRR16681548 featureCounts https://zenodo.org/records/10405036/files/SRR16681549.featureCounts.tabular SRR16681549 featureCounts https://zenodo.org/records/10405036/files/SRR16681550.featureCounts.tabular SRR16681550 featureCounts https://zenodo.org/records/10405036/files/SRR16681551.featureCounts.tabular SRR16681551 featureCounts https://zenodo.org/records/10405036/files/SRR16681552.featureCounts.tabular SRR16681552 featureCounts https://zenodo.org/records/10405036/files/SRR16681553.featureCounts.tabular SRR16681553 featureCounts https://zenodo.org/records/10405036/files/SRR16681554.featureCounts.tabular SRR16681554 featureCounts https://zenodo.org/records/10405036/files/SRR16681555.featureCounts.tabular SRR16681555 featureCounts https://zenodo.org/records/10405036/files/SRR16681556.featureCounts.tabular SRR16681556 featureCounts https://zenodo.org/records/10405036/files/SRR16681557.featureCounts.tabular SRR16681557 featureCounts https://zenodo.org/records/10405036/files/SRR16681558.featureCounts.tabular SRR16681558 featureCounts https://zenodo.org/records/10405036/files/SRR16681559.featureCounts.tabular SRR16681559 featureCounts https://zenodo.org/records/10405036/files/SRR16681560.featureCounts.tabular SRR16681560 featureCounts https://zenodo.org/records/10405036/files/SRR16681561.featureCounts.tabular SRR16681561 featureCounts https://zenodo.org/records/10405036/files/SRR16681562.featureCounts.tabular SRR16681562 featureCounts https://zenodo.org/records/10405036/files/SRR16681563.featureCounts.tabular SRR16681563 featureCounts https://zenodo.org/records/10405036/files/SRR16681564.featureCounts.tabular SRR16681564 featureCounts https://zenodo.org/records/10405036/files/SRR16681565.featureCounts.tabular SRR16681565 featureCounts https://zenodo.org/records/10405036/files/SRR16681566.featureCounts.tabular SRR16681566 featureCounts
- Click
Build
- From Rules menu, select
Add / Modify Column Definitions
Add Definition
→Collection Name
→ Select ColumnC
Add Definition
→List Identifier(s)
→ Select ColumnB
Add Definition
→URL
→ ColumnA
- Press the tool button by Rules
- Paste the following JSON into the dialog:
{"rules":[],"mapping":[{"type":"collection_name","columns":[2]},{"type":"list_identifiers","columns":[1],"editing":false},{"type":"url","columns":[0]}],"genome":"hg19"}
- Click Apply
At the bottom of the dialog set
Genome
tohg19
(it is probably something like “Human Feb 2009 (GRCh37/hg19) (hg19)” but we are focused on that last parenthetical portion).- Click Upload
Now we’re ready to analyse the counts files. Here we’ll take the feature counts dataset collection and merge it into one count matrix through the use of “Column join”. This can then be annotated with the human readable names of the genes. This is all passed to limma for differential expression analysis.
With this result in hand, we’re ready to do two further steps: preparing the dataset for goseq, and for analysis in the MINERVA Platform. Goseq is a tool for gene ontology enrichment analysis, and the MINERVA Platform is a tool for visualising pathway analysis.
Hands-on: Analyse the Counts
Run the workflow with the Factor Data from the first Hands-on, and the datasets from the workflow or Zenodo download, depending on your path:
Hands-on: Importing and Launching a WorkflowHub.eu WorkflowLaunch mRNA-Seq BY-COVID Pipeline (v1) (View on WorkflowHub)WorkflowHub is a workflow management system which allows workflows to be FAIR (Findable, Accessible, Interoperable, and Reusable), citable, have managed metadata profiles, and be openly available for review and analytics.
- Ensure that you are logged in to your Galaxy account.
- Click on the Workflow menu, located in the top bar.
- Click on the Import button, located in the right corner.
- In the section “Import a Workflow from Configured GA4GH Tool Registry Servers (e.g. Dockstore)”, click on Search form.
- In the TRS Server: workflowhub.eu menu you should type
name:"mRNA-Seq BY-COVID Pipeline"
- Click on the desired workflow, and finally select the latest available version.
After that, the imported workflows will appear in the main workflow menu. In order to run the workflow, just need to click in the workflow-run Run workflow icon.
Below is a short video showing this uncomplicated procedure:
You should have a few outputs, namely the goseq
outputs, and a table ready for visualisation in the MINERVA Platform!
The MINERVA Platform
The dataset prepared for the MINERVA Platform must be correctly formatted as a tabular dataset (\t
separated values) like the following, with the dbkey set to hg19
or hg38
.
If you’ve run the above workflow, this should be the case.
SYMBOL logFC P.Value adj.P.Val
TRIM25 2.07376444684004 1.2610025125617e-18 3.57368112059986e-15
ACSL1 2.90647033200259 2.71976234791064e-16 3.85390324698937e-13
NBEAL2 2.45952426389725 2.71787290816654e-14 2.56748394058132e-11
MIR150 -2.55304226607428 9.55912390273625e-14 6.74866152827879e-11
SLC2A3 2.95861349227708 1.19066011437523e-13 6.74866152827879e-11
The tabular dataset, as prepared above is then used by a dedicated MINERVA plugin (Hoksza et al. 2019) to visualise the data on-the-fly in the COVID-19 Disease Map. To visualise and explore the data, follow these steps:
Hands-on: Visualise in MINERVA
Click to expand the final “MINERVA-Ready Table”
Click on the galaxy-barchart (Visualize) icon
Select “display at Minerva (SARS-CoV-2 Minerva Map)”
The MINERVA visualisation is only for correctly formatted files with the correct genome (i.e. human, hg19). If you dont’ see MINERVA listed, first check that your dataset is:
recognised as a tabular dataset
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, click galaxy-chart-select-data Datatypes tab on the top
- In the galaxy-chart-select-data Assign Datatype, select
tabular
from “New type” dropdown
- Tip: you can start typing the datatype into the field to filter the dropdown menu
- Click the Save button
Has the correct genome build:
- Click the desired dataset’s name to expand it.
Click on the “?” next to database indicator:
- In the central panel, change the Database/Build field
- Select your desired database key from the dropdown list:
hg19
- Click the Save button
It should be specifically
hg19
not a patch likehg19Patch5
If that still doesn’t work, please check that the Galaxy server you are using is updated to 24.0 or later.
Analysis in the MINERVA Platform
Welcome to the MINERVA Covid-19 Disease Map! It has a similar interface to Galaxy, there is an interaction menu on the left, the main area is where you’ll do your investigation, and on the right are your datasets! In this case, the differentially expressed genes analysed above automatically loaded from Galaxy when you clicked “Display at MINERVA”.
After the loading time, marked as “Reading Map Elements”, the dataset will be visible in the right panel of the COVID-19 Disease Map, with the four corresponding columns specified earlier (see image below). The MINERVA-Galaxy plugin allows you to:
- filter the data table by fold-change (FC) threshold or by p-value (default: adjusted p-value, threshold set to 0.05)
- Search for specific gene symbols to display (“Search” box)
- Select specific differential expression values to display in the map (checkboxes in the data tab)
- Select all entries in the data table for visualisation (Select All)
- Reset the visualisation
The general process of data exploration looks like:
- In the main map, find pathways with matching entries indicated by blue pins.
- After selecting what you want to see, browse the COVID-19 Disease Map to explore the pathways with the corresponding expression pattern.
- Select a pathway of your choice and in the left panel click the “Associated submap” button
- Explore the expression patterns in the diagram that will be displayed.
Hands-on: Explore TLR pathways
- Use the Search box above the table on the right to search for TLR.
Select all four TLR genes.
- TLR3
- TLR4
- TLR7
- TLR8
In the main map, find PAMP Signalling and click on it. (Note: don’t click the blue pin, click the pathway name)
- In the left panel, click the Associated submap: PAMP Signalling button.
Explore the detailed diagram to examine the expression pattern.
QuestionWhat is the expression pattern of TLR3, TLR7, and TLR8 in the PAMP Signalling pathway?
TLR3 and TLR7 are downregulated (cool/blue colour), TLR8 is upregulated (warm/red colour).
Without closing the PAMP Signalling submap, click “Select all” to visualise the entire data table
QuestionWhat is the expression pattern downstream of TLR7 and TLR8, namely how are MYD88 and IRAK4 regulated?
MYD88 and IRAK4 are strongly and weakly upregulated, respectively, despite TLR7 downregulation.
For further analysis in the MINERVA Platform, a full user guide is available