Gallantries Grant - Intellectual Output 3 - Data stewardship, federation, standardisation, and collaboration
purlPURL: https://gxy.io/GTN:P00014Comment: What is a Learning Pathway?We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.
This Learning Pathway collects the results of Intellectual Output 3 in the Gallantries Project
Success Criteria:
- SC3.1) Data stewardship. This competency will provide learners with the necessary skills to evaluate data owned by their organisation, identify key metadata and content requirements, and establish controls and assurance metrics to ensure new data produced by their organisation is of sufficient quality.
- SC3.2) Data federation. While hosting data internally is good, sharing data with teams across the Union and around the world is even better. Students need to understand how to achieve data federation and interchange with other organisations.
- SC3.3) Data standardisation. A key component of stewardship and federation, standardisation of data allows it to be reused internally by common pipelines, but more importantly to be submitted to external databases. This is a key requirement of many data related projects.
- SC3.4) Data collaboration. Many projects now scale beyond the ability of an individual to work on alone. Students need to learn how to work together with large datasets.
- SC3.5) FAIR Data. For high quality, reproducible science, datasets should be FAIR; findable, accessible, interoperable, and reusable. This competency aids learners in ensuring their data is FAIR.
Year 1: Introduction to genomics and genome annotation
This will give students a good basic knowledge in the application domain of this IO and give them their first taste of data management [SC3.1,SC3.3,SC3.5]
Time estimation:
Learning Objectives
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Introduction to Genome Annotation
|
Year 1: Prokaryotic annotation
This module will cover the background relevant to annotating prokaryotic genomes in Galaxy (one of the two main classes of genomes), and collaborative curation with Apollo, as well as further exploration of annotation from code. [SC1.5, SC3.1-4]
Time estimation: 4 hours
Learning Objectives
- Load genome into Galaxy
- Annotate genome with Prokka
- View annotations in JBrowse
- Load a genome into Galaxy
- View annotations in JBrowse
- Learn how to load JBrowse data into Apollo
- Learn how to manually refine genome annotations within Apollo
- Export refined genome annotations
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Genome annotation with Prokka | |||
Refining Genome Annotations with Apollo (prokaryotes) |
Year 2: FAIR Data
This submodule will focus specifically on how learners can make their data more FAIR (findable, accessible, interoperable, and reusable) [SC3.5]
Time estimation: 3 hours 35 minutes
Learning Objectives
- Learn the FAIR principles
- Recognise the relationship between FAIR and Open data
- Learn about metadata and findability
- Learn how to support system and content curation
- Learn best practices in data management
- Learn how to introduce computational reproducibility in your research
- Locate bioimage data repositories
- Compare repositories to find which are suitable for your data
- Find out what the requirements are for submitting
- Construct an RO-Crate by hand using JSON
- Describe each part of the Research Object
- Learn basic JSON-LD to create FAIR metadata
- Connect different parts of the Research Object using identifiers
- Understanding, viewing and creating Galaxy Workflow Run Crates
- Create a custom, annotated RO-Crate
- Use ORCIDs and other linked data to annotate datasets contained within the crate
- Generate a workflow test using Planemo
- Understand how testing can be automated with GitHub Actions
Year 2: Automatic Annotation
Building on the modules developed in the previous years, this will be further automated giving students the tools required to scale genome annotation regardless of the size of their organism. [SC1.1, SC1.6, SC2.1, SC3.1, SC3.3]
Time estimation: 8 hours
Learning Objectives
- Load genome into Galaxy
- Annotate genome with Funannotate
- Perform functional annotation using EggNOG-mapper and InterProScan
- Evaluate annotation quality with BUSCO
- View annotations in JBrowse
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Genome annotation with Funannotate |
Year 3: Eukaryotic annotation
This module will cover the background relevant to annotating eukaryotic genomes in Galaxy (the second of the two main genome classes), and collaborative curation with Apollo. Additionally students will learn about automating this annotation process using Galaxy and code. [SC1.5, SC2.1, SC3.1-4]
Time estimation: 6 hours
Learning Objectives
- Use Red and RepeatMasker to soft-mask a newly assembled genome
- Load data (genome assembly, annotation and mapped RNASeq) into Galaxy
- Perform a transcriptome assembly with StringTie
- Annotate lncRNAs with FEELnc
- Classify lncRNAs according to their location
- Update genome annotation with lncRNAs
- Load a genome into Galaxy
- View annotations in JBrowse
- Learn how to load JBrowse data into Apollo
- Learn how to manually refine genome annotations within Apollo
- Export refined genome annotations
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
Masking repeats with RepeatMasker | |||
Long non-coding RNAs (lncRNAs) annotation with FEELnc | |||
Refining Genome Annotations with Apollo (eukaryotes) |
Year 3: Official Gene Set
One of the key tasks in annotation is producing an official gene set (OGS), and ensuring integrity and validation of all of the curated annotations. This will also further familiarise students with public databases and the process for submitting datasets. [SC3.1, SC3.5]
Time estimation: 30 minutes
Learning Objectives
- Validate your genes and create an official gene set from them.
Lesson | Slides | Hands-on | Recordings |
---|
Editorial Board
This material is reviewed by our Editorial Board:
Anthony Bretaudeau Helena Rasche Saskia HiltemannFunding
These individuals or organisations provided funding support for the development of this resource