Introduction to Data Driven Life Sciences
purlPURL: https://gxy.io/GTN:P00017Comment: What is a Learning Pathway?
We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.
This learning path starts with the history of biology and takes you on a journey through fundamental data analysis techniques and their applications.
Module 1: History
Knowing history is essential for understanding how we arrived to the current state of affairs in our field
Time estimation: 1 hour
Learning Objectives
- Have a basic understanding of history of biology from Darwin to today.
Lesson | Slides | Hands-on | Recordings |
---|
Module 2: Data Processing Tooling
Before jumping to Biology we need to review basic data processing machinery
Time estimation: 13 hours
Learning Objectives
- Explain how the shell relates to the keyboard, the screen, the operating system, and users' programs.
- Explain when and why command-line interfaces should be used instead of graphical interfaces.
- Explain the similarities and differences between a file and a directory.
- Translate an absolute path into a relative path and vice versa.
- Construct absolute and relative paths that identify specific files and directories.
- Use options and arguments to change the behaviour of a shell command.
- Demonstrate the use of tab completion and explain its advantages.
- Create a directory hierarchy that matches a given diagram.
- Create files in that hierarchy using an editor or by copying and renaming existing files.
- Delete, copy and move specified files and/or directories.
- Redirect a command's output to a file.
- Process a file instead of keyboard input using redirection.
- Construct command pipelines with two or more stages.
- Explain what usually happens if a program or pipeline isn't given any input to process.
- Explain Unix's 'small pieces, loosely joined' philosophy.
- Write a loop that applies one or more commands separately to each file in a set of files.
- Trace the values taken on by a loop variable during execution of the loop.
- Explain the difference between a variable's name and its value.
- Explain why spaces and some punctuation characters shouldn't be used in file names.
- Demonstrate how to see what commands have recently been executed.
- Re-run recently executed commands without retyping them.
- Use `grep` to select lines from text files that match simple patterns.
- Use `find` to find files and directories whose names match simple patterns.
- Use the output of one command as the command-line argument(s) to another command.
- Explain what is meant by 'text' and 'binary' files, and why many common tools don't handle the latter well.
- Learn the fundamentals of programming in Python
- Have a basic understanding of the history of sequencing
- Understand Python basics
- Understand manipulation of FASTQ data in Python
- Understand quality metrics
- Understanding of lists and dictionaries
- Learning about dynamic programming
- Learning about how to translate DNA in Python
- Understand manipulation of files in Python
- Understand data manipulation in Pandas
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
CLI basics | |||
Advanced CLI in Galaxy | |||
Introduction to Python
|
Editorial Board
This material is reviewed by our Editorial Board: