Introduction to Galaxy as an RDM platform

Author(s) orcid logoSaskia Hiltemann avatar Saskia Hiltemann
Editor(s) orcid logoDaniela Schneider avatar Daniela Schneider
Tester(s) orcid logoDaniela Schneider avatar Daniela Schneider
Reviewers Saskia Hiltemann avatar
Overview
Creative Commons License: CC-BY Questions:
  • Which RDM features does Galaxy offer?

Objectives:
  • Familiarize yourself with the basics of Galaxy

  • Learn how to import data

  • Learn how to process and analyze data

  • Learn how to create workflows and scale up analysis

  • Learn how to share your work

  • Learn how to reuse workflows shared with you

Time estimation: 3 hours
Level: Introductory Introductory
Supporting Materials:
Published: Mar 19, 2026
Last modification: Mar 19, 2026
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00575
version Revision: 1

This tutorial aims to familiarize you with the Galaxy user interface, with a special focus on highlighting Galaxy’s many RDM (Research Data Management) features.

Galaxy has over 10,000 available tools in its Tool Shed, covering a wide variety of scientific domains, ranging from life sciences to astronomy and digital humanities, and covering techniques from simple text manipulation to advanced machine learning and other complex algorithms.

To keep this tutorial accessible for people with different backgrounds, we perform a toy analysis on a tabular dataset, namely a table of all athletes competing in the Olympics. The question we ask ourselves is “What is the age distribution of Olympic athletes?”. In addition, we want to make sure our analysis is reproducible, so that it can be easily repeated on different datasets and shared with others.

Agenda

In this tutorial, we will cover:

  1. Overview
    1. The Research Data Life Cycle
    2. Galaxy as part of the RDM Life Cycle
    3. Watch
    4. Scope
  2. The Galaxy Web Interface
    1. Create an account on a Galaxy instance/server
    2. What does Galaxy look like?
  3. Collect: Data import
    1. The Galaxy History
    2. Upload a dataset
    3. Dataset attributes (metadata)
  4. Process: Data preparation and QC
    1. Use a tool
    2. Tool provenance
    3. Visualise a dataset
    4. Re-run a tool
    5. Troubleshooting errors
    6. Keeping your history clean
    7. Optional: Use an Interactive Tool
    8. Scaling up
  5. Analyse: Calculate results
    1. Plan our approach
    2. Get summary statistics for our age column
    3. Create a histogram
    4. Extract workflow from our history
    5. Run workflow on all Olympics
  6. Preserve: Export data, history, and workflow
    1. Downloading your history
    2. Exporting your history to a repository
    3. Exporting tool citations
    4. Exporting your workflows
  7. Share: Share or publish data and workflow
  8. Reuse: Find and run workflows shared by others
    1. Where to find Galaxy Workflows
    2. Showcase 1: WorkflowHub
    3. Showcase 2: IWC
  9. Conclusion

Overview

The Research Data Life Cycle

The research life cycle refers to the series of stages through which a research project or study progresses from inception to completion. Although the specifics of the research process vary across disciplines, they share several key phases that help ensure systematic, rigorous research and reliable results. It ranges from planning and designing your study, to collecting, processing, and analysing your data, evaluating results, and finally preserving and sharing your data and findings for reuse by others. As this is an iterative process, it is often referred to as the Research data life cycle.

RDM life cycle.

Good RDM practices are critical to scientific research, to illustrate this in a fun way, have a look at the RDM Scary Tales game (based on the game Black Stories)

Galaxy as part of the RDM Life Cycle

Galaxy supports you in your research throughout the different stages of the life cycle, covering the steps from data collection to data reuse.

The RDM lifecycle with Galaxy features listed for each stage.

For more information, see also the RDMKit Galaxy page.

Watch

Below is a 5-minute video introducing Galaxy as a cross-domain RDM platform.

Scope

In this tutorial, we will take you through all the stages of the Research data life cycle and provide a hands-on introduction to the Galaxy platform at each stage.

The Galaxy Web Interface

Before we go into the stages of the RDM life cycle, let’s start with the basics and log into Galaxy and explore the graphical user interface.

Create an account on a Galaxy instance/server

If you already have an account, skip to the next section!

In Galaxy, server and instance are often used interchangeably. These terms basically mean that different regions have different Galaxy servers/instances, with slightly different tool installations and appearances. If you don’t have a specific server/instance in mind, we recommend registering at one of the main public servers/instances, detailed below.

  1. To create an account at any public Galaxy instance, choose your server from the available list of Galaxy Platforms.

    There are several UseGalaxy servers:

  2. Click on “Login or Register” in the masthead on the server.

    Login or Register on the top panel

  3. On the login page, find the Register here link and click on it.

  4. Fill in the the registration form, then click on Create.

    Your account should now get created, but will remain inactive until you verify the email address you provided in the registration form.

    Banner warning about account with unverified email address

  5. Check for a Confirmation Email in the email you used for account creation.

    Missing? Check your Trash and Spam folders.

  6. Click on the Email confirmation link to fully activate your account.

    galaxy-info Delivery of the confimation email is blocked by your email provider or you mistyped the email address in the registration form?

    Please do not register again, but follow the instructions to change the email address registered with your account! The confirmation email will be resent to your new address once you have changed it.

    Trouble logging in later? Account email addresses and public names are caSe-sensiTive. Check your activation email for formats.

Depending on your Galaxy server, you may also be able to log in with your institutional or social account.

In the Galaxy login screen, you may find the option to log in with an institutional or other external account. Which options are offered depend on which Galaxy you are using.

example of SSO options to log into Galaxy

example of SSO options to log into Galaxy

What does Galaxy look like?

Hands On: Log in to Galaxy
  1. Open your favourite browser (Chrome, Safari, Edge or Firefox as your browser, not Internet Explorer!)
  2. Browse to your Galaxy instance
  3. Log in or register

    1. To create an account at any public Galaxy instance, choose your server from the available list of Galaxy Platforms.

      There are several UseGalaxy servers:

    2. Click on “Login or Register” in the masthead on the server.

      Login or Register on the top panel

    3. On the login page, find the Register here link and click on it.

    4. Fill in the the registration form, then click on Create.

      Your account should now get created, but will remain inactive until you verify the email address you provided in the registration form.

      Banner warning about account with unverified email address

    5. Check for a Confirmation Email in the email you used for account creation.

      Missing? Check your Trash and Spam folders.

    6. Click on the Email confirmation link to fully activate your account.

      galaxy-info Delivery of the confimation email is blocked by your email provider or you mistyped the email address in the registration form?

      Please do not register again, but follow the instructions to change the email address registered with your account! The confirmation email will be resent to your new address once you have changed it.

      Trouble logging in later? Account email addresses and public names are caSe-sensiTive. Check your activation email for formats.

Screenshot of Galaxy Australia with the register or login button highlighted.

Comment: Different Galaxy servers

This is an image of Galaxy Australia, located at usegalaxy.org.au

The particular Galaxy server that you are using may look slightly different and have a different web address.

You can also find more possible Galaxy servers at the top of this tutorial in Available on these Galaxies

The Galaxy homepage is divided into four sections (panels):

  • The Activity Bar on the left: This is where you will navigate to the resources in Galaxy (Tools, Workflows, Histories, etc.)
  • Currently active “Activity Panel” on the left: By default, the tool Tools activity will be active and its panel will be expanded
  • Viewing panel in the middle: The main area for context for your analysis
  • History of analysis and files on the right: Shows your “current” history; i.e.: Where any new files for your analysis will be stored

Screenshot of the Galaxy interface with aforementioned structure.

Now that you are logged in to Galaxy, let’s start!

Collect: Data import

The RDM lifecycle with the collect stage highlighted.

The collect stage in Galaxy usually consists of importing data into what we call your analysis history, this is your Galaxy working environment. Data can be uploaded from from your own machine, from a URL, or imported directly from various general-purpose or domain-specific databases that have been integrated into Galaxy. Before we start our analysis, let’s familiarize ourselves with the Galaxy history system.

The Galaxy History

Your “History” is in the panel at the right. This is where all the files you import or create will be shown. It is also a record of the actions you have taken. Galaxy tracks the provenance of all datasets: which tools, versions, and parameter settings were used to create them. Everything you need to write the methods section of your journal publication. Before we begin, let’s name our history. It is recommended to create a new history for each analysis that you perform, and giving your histories good names will help keep your analyses organized.

Name your current history

Hands On: Name history
  1. Go to the History panel (on the right)
  2. Click galaxy-pencil (Edit) next to the history name (which by default is “Unnamed history”)

    Screenshot of the galaxy interface with the history name being edited, it currently reads "Unnamed history", the default value. An input box is below it.

  3. Type in a new name, for example, “Olympics Data Analysis”
  4. Click Save
Comment: Renaming not an option?

If renaming does not work, it is possible you aren’t logged in, so try logging in to Galaxy first. Anonymous users are only permitted to have one history, and they cannot rename it.

Upload a dataset

Comment: Galaxy Data Import Options

There are various ways to get data into Galaxy

  • Uploading from your computer
  • Import from URL
  • Import directly from data repositories, e.g.
    • SRA/NCBI/EBI/Uniprot (Biological Sequence Data)
    • OMERO (Image database)
    • Copernicus (Climate Data)
    • CERN Open Data (Particle Physics)
    • many more (See “Get Data” section of the Tool panel in Galaxy)
  • Bring-your-own-data (e.g. Dropbox, Google Drive, OneData, eLabFTW)

    Here, we are going to briefly explain how you can Bring-Your-Own-Data to Galaxy or export your dataset, results, or history to 3rd party repositories. In order to add a new repository to your account follow these steps:

    • Click on your Username on top right part of the website and then click on Preferences.
    • From the middle panel, click on the Manage Your Repositories (previously called Manage your remote file sources).
    • Click on the + Create button on top of the page. Here, you get multiple options to connect various repositories to your account.

    For all of the possible repositories, you should fill the following fields:

    • In the Name section, give a name to your repository. This name will be used to choose the repository on Galaxy for importing or exporting datasets.
    • Optionally, you can provide a Description for this repository. This is a note for yourself.
    Hands-on: Choose Your Own Tutorial

    This is a 'Choose Your Own Tutorial' (CYOT) section (also known as 'Choose Your Own Analysis' (CYOA)), where you can select between multiple paths. Click one of the buttons below to select how you want to follow the tutorial

    Select the repository you like to add to your Galaxy account.

    If you have an Onedata account, you can use this repository to import and/or export your data directly from and to Onedata. The minimal supported Onezone version is 21.02.4. More information on Onedata can be found on Onedata’s website.

    There are extensive tutorials for setting up and utilizing of OneData on Galaxy Training Network (GTN). At the moment, we have the following tutorials for Onedata on GTN:

    1. Getting started with Onedata distributed storage
    2. Importing (uploading) data from Onedata
    3. Exporting to Onedata remote
    4. Setting up a dev Onedata instance
    5. Configuring the Onedata connectors (remotes, Object Store, BYOS, BYOD)

    In short, you can connect your Galaxy account to an Onedata repository as follows:

    • In the Onezone domain field, please fill in the address to your Onezone domain. It could be something like “datahub.egi.eu”.
    • Using the Writable? option you can decide whether to grant access to Galaxy to export (write) to your Onedata or not.
    • You should provide an Access Token to Galaxy so it can read (import) and write (export) data to your OneData. Read more on access tokens here. You can limit the access to read-only data access, unless you wish to export data to your repository (write permissions are needed then).
    • In case you want to disable validation of SSL certificates, you can use Disable tls certificate validation? option. However, we strongly recommend you to not use this option unless you know what your are doing.
    • Click on Create.

    To connect an AWS private bucket to your Galaxy account, you need to submit the following information on the form:

    • First, read the Manage access keys for IAM (Identity and Access Management) users documentation of AWS. Also, you should be familiar with Buckets (Buckets overview).
    • Please fill in the Access Key ID (something like AKIAIOSFODNN7EXAMPLE) and Secret Access Key (similar to wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY) in the corresponding fields on the Galaxy interface.
    • Please enter the URL to your Bucket (for example, https://amzn-s3-demo-bucket.s3.us-west-2.amazonaws.com) in the Bucket section.
    • Click on Create.

    To connect anonymously to an AWS public bucket using your Galaxy account, you need to enter the Bucket address in the Bucket section. For more information about AWS Bucket, please read AWS documentaion. Click on Create.

    To setup access to your Azure Blob Storage within the Galaxy, follow the steps:

    • Provide the name of your Azure Blob Storage account in the Container Name field. More information about container’s name could be found on the Microsoft documentation here.
    • Fill the Storage Account Name based on your account. More information is available on the Microsoft website.
    • Using the Hierarchical? option you can determine whether your storage is hierarchical or not. More information on Data Lake Storage namespaces can be found in the Azure Blob Storage documentation.
    • Please provide the account access key to your Azur Blob Storage account, using Account Key field. This is the documentation on Managing storage account access keys.
    • If you want to be able to export data to your Azure Blob Storage container, please set Writable? option to “Yes”.
    • Click on Create.
    • We recommend to first login to your Dropbox account.
    • On the Galaxy website, click on the Create button of the Dropbox section. You will be redirected to the Dropbox website for authentication.
    • You have to login there and grant access for the Galaxy.
    • Click on Create.

    eLabFTW is a free and open source electronic lab notebook from Deltablot. Each lab can either host their own installation or go for Deltablot’s hosted solution. Using Galaxy, you can connect to an eLabFTW instance of your choice.

    • Provide a URL with the protocol (http or https) and the domain name in the eLabFTW instance endpoint (e.g. https://demo.elabftw.net) field.
    • If you want to let Galaxy to export data to your eLabFTW, please set the Allow Galaxy to export data to eLabFTW? to “Yes” to grant required access to Galaxy. Keep in mind that your API key must have matching permissions.
    • You should provide an API Key to your eLabFTW as well. To do so, navigate to the Settings page on your eLabFTW server and go to the API Keys tab to generate a new key. Choose “Read/Write” permissions to enable both importing and exporting data. “Read Only” API keys still work for importing data to Galaxy, but they will cause Galaxy to error out when exporting data to eLabFTW. You will receive a string (similar to 2-50dd721027f56a2e119b3bdbf64f4b8518b3f82b97e7876d56dad74109c8be73d8919b88097d3c9eb8952) and you should enter this in the API Key field of Galaxy interface.
    • Click on Create.

    You can setup connections to FTP and FTPS servers to import and export files as follows:

    • Provide the address to your FTP server using the FTP Host field.
    • If you want to login with a specific user, provide the username in the FTP User field. Leave this blank to connect to the server anonymously (if allowed by the server).
    • If you want to export data to this FTP, you should set the Writable? option to “Yes”.
    • Please specify the port that Galaxy should use to connect to your FTP server using the FTP Port field.
    • In the FTP Password field provide the password to connect to the FTP server. Leave this blank to connect to the server anonymously (if allowed by the server).
    • Click on Create.
    • We recommend to login to your Google account first.
    • On the Galaxy website, click on Select button of Export to Google Drive. You will be redirected to the Google.
    • Pick the account that you want to connect to Galaxy for import and export. Grant the required permissions.
    • You will be back on the Galaxy portal and you can access your Google Drive for import and export (depending on your how you set up your accuont).
    • Click on Create.

    InvenioRDM is a research data management platform that allows you to store, share, and publish research data. You can connect to an InvenioRDM instance of your choice by following these steps:

    • Please fill the address to your InvenioRDM in the following field: InvenioRDM instance endpoint (for example, https://inveniordm.web.cern.ch/). This should include the protocol (http or https).
    • Use the Allow Galaxy to export data to InvenioRDM? option to give permission to Galaxy to export data to your repository or not.
    • Click on Create.
    • You should fill Publication Name with a name as the “creator” metadata of the records. This could be a person or an organization. You can later modify this. If left blank, an anonymous user will be used as the creator.
    • You should also enter your Personal Access Token. You can get this information in your InvenioRDM instance. Navigate to Account Settings. Then, go to Applications to generate a new token. This will allow Galaxy to display your draft records and upload files to them.
    • Click on Create.

    Using WebDAV you can connect various services that supports WebDAV protocol such as OwnCloud and NextCloud among others. The configuration of WebDAV is slightly variable from service to service but the general principles apply everywhere.

    • Provide the server address to this repository in the Server Domain field.
    • In the WebDAV server Path, you have to provide the path on this server to WebDAV.
    • In the Username field, you should write the username you use to login to this server.
    • You can grant write access for this repository using the Writable? (set to Yes) and therefore make it possible to export datasets, or histories to your connected repository.
    • Click on Create.

    As an example, if I want to connect my nextCloud repository to my Galaxy account, I should login to my nextCloud server and find the information from File settings (bottom left of the page) under the WebDAV section to fill this template. It could be something like: https://server_address.com/remote.php/dav/files/username_or_text. Here, the Server Domain is https://server_address.com and WebDAV server Path is remote.php/dav/files/username_or_text.

    In some cases, you may need to activate some features on your ownCloud or nextCloud to allow this integration. For example, some nextCloud servers require the user to use “App Passwords”. This can be done using the Settings > Security > Devices & sessions > Create new app password.

    Zenodo is an open-access repository for research data, software, publications, and other digital artifacts. It is developed and maintained by CERN and funded by the European Commission as part of the OpenAIRE project. Zenodo provides a free platform for researchers to share and preserve their work, ensuring long-term access and reproducibility. Zenodo is widely used by researchers, institutions, and organizations to share scientific knowledge and comply with open-access mandates from funding agencies.

    • Using the Allow Galaxy to export data to Zenodo?, you can decide whether you like to give write access to Galaxy or not. Set it to “Yes” if you want to export data from Galaxy to Zenodo, set it to “No” if you only need to import data from Zenodo to Galaxy.
    • Provide a name for the “creator” metadata of your records on Zenodo using the Publication Name field. You can always change this value later by editing the records in Zenodo. If left blank, an anonymous user will be used as the creator.
    • You have to provide a Personal Access Token from your Zenodo account to Galaxy. To do so, you need to log into your account. Then, visit this site: https://zenodo.org/account/settings/applications/. Alternatively, you can click on your username on top right and then click on “Applications”. Here, you need to create a “Personal Access Token”. This will allow Galaxy to display your draft records and upload files to them. If you enabled the option to export data from Galaxy to Zenodo, make sure to enable the deposit:write scope when creating the token.
    • Click on Create.

    Importing data to your Galaxy account

    When you connect a repository to your Galaxy account, you can use it to import data to Galaxy. To do so, you can click on the Upload Icon on the left panel. In the poped up window, you can click on Choose from repository to select a repository that you have added to your account. Navigate to a file that you want to upload to your Galaxy account, check the box of the file, and click on Select. You can determine the format of the file, give it a name, and then click on Start to upload the file to your Galaxy account.

    Exporting histories, datasets, and results to connected repositories

    If you have given Galaxy the permission to write to your repository, you can export your histories, datasets and reulsts in the history to that repository.

    Histories

    If you want to export a history, you should click on the History Options icon (galaxy-history-options) on the right panel. Then, you can click on Export History to File. Next, you can click on to repository on the middle panel. If you click on the Click to select directory, there will be a pop up window. Here, you can pick a repository that you have added to your account and when you are in that repository, click on Select. You can give a Name to your exported history, so you can find it easier in your connected repository. Finally, click on Export to write the history to your repository. Similarly, you can use to RDM repository or to Zenodo instead of the to repository option in the middle panel to export your history to connected RDM repositories or Zenodo.

    To have more options on exporting your history, you can click on Show advanced export options on top of the middle panel. This provides further control over the format and datasets that will be included in your exported history.

    Datasets

    If you are interested to export a single dataset or results to a connected repository, you can use a tool called Export datasets.

    1. Select the desired option from What would you like to export?.
    2. Using the Directory URI option, you can Select a connected repository. You can also give it a directory name here.
    3. We recommend to export the metadata with your datasets and results using the Include metadata files in export?.

  • Connections to your LIMS system

    This section will guide you through generating external links to your data stored in the Sierra LIMS system to be downloaded directly into Galaxy.

    1. Go to the Sierra portal and login to your account.
    2. Click on the Sample ID of the sample you want to download data from.
    3. Click on the Edit Sample Details button.
    4. At the bottom of the page there will be an input box for creating a link, enter a description for the link in the Reason for link section, and click Create link. This will reload the page and add a new link to the sample under Authorised links to this sample.
    5. Go back to the sample page or click on the hyperlink called link to take you back.
    6. In the Results section select the lane you want to access your data from.
    7. The bottom of the page, under the Links section, will now contain a list of wget commands with links for accessing all the files within that sample/lane.
    8. Since this list is for wget commands, you need to extract out the links from the command. You can copy the link in the first set of double quotes for each line and galaxy-wf-edit Paste/Fetch Data them directly into Galaxy to download the files.

For this tutorial, we will import datasets from the general-purpose FAIR data repository Zenodo.

Hands On: Upload a file from URL
  1. At the top of the Activity Bar, click the galaxy-upload Upload activity

    upload data button shown in the galaxy interface.

    This brings up a box:

    the complicated galaxy upload dialogue, the 'regular' tab is active with a large textarea to paste subsequent URL.

  2. Click Paste/Fetch data
  3. Paste in the address of a file:

    https://zenodo.org/records/18803585/files/olympics-2010-winter.tsv
    
  4. Click Start
  5. Click Close

Your uploaded file is now in your current history. When the file has been uploaded to Galaxy, it will turn green.

Comment

After this, you will see your first history item (called a “dataset”) in Galaxy’s right panel. It will go through the grey (preparing/queued) and yellow (running) states to become green (success).

What is this file?

Hands On: View the dataset content
  1. Click the galaxy-eye (eye) icon next to the dataset name, to look at the file content

    galaxy history view showing a single dataset olympics-2010-winter.tsv. Display link is being hovered.

The contents of the file will be displayed in the central Galaxy panel. If the dataset is large, you will see a warning message which explains that only the first megabyte is shown.

This file contains a table listing all athletes who competed in the 2010 Winter Olympics in Oslo.

galaxy center panel view showing a single dataset olympics-2010-winter.tsv. . Open image in new tab

Figure 1: Preview of the dataset in Galaxy. Each row corresponds to an athlete, and each column provides further information about this athlete, including birth year, weight, and medals.
Question: Explore the dataset
  1. How many athletes participated in these Olympics?
  2. What was the location of these Olympic Games?
  1. 4402 athletes. Each row signifies an athlete; there are 4403 rows, one of which is the header row.
  2. Vancouver. This information is given in column 12.

Dataset attributes (metadata)

Let’s have a look at the metadata that Galaxy tracks for your datasets.

Hands On: Explore metadata
  1. Expand the item in your history by clicking on its name
    • Here you will see a peek of the contents, some basic file attributes such as the format, number of lines, and number of columns

    the history item expanded, showing dataset preview, number of links, and a series of buttons.

  2. Click on the “Dataset Details” details button
    • Here you can see further metadata such as file size, creation date, hash, format, original URL, and more
    • Scrolling down, you will also see details of the upload job that performed the import. We will look more closely at this later.
    dataset details. Open image in new tab

    Figure 2: Screenshot of the dataset details
  3. Rename the file to include the city of the Olympics. You can do this by editing the dataset attributes
    • This can be done by clicking on the Edit tab at the top of your screen, or the pencil icon galaxy-pencil on the expanded dataset.
    • For example, rename it to 2010 Winter Olympics Vancouver
    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, change the Name field
    • Click the Save button

    the renamed dataset.

Process: Data preparation and QC

The RDM lifecycle with the process stage highlighted.

The process phase of the research life cycle involves preparing your data for analysis. This includes steps such as data cleaning, data transformation, and quality control. Galaxy offers many tools that can help prepare your data for analysis, such as format conversions and data manipulation tools.

Use a tool

Recall that our research question in this tutorial is “What is the age distribution of Olympic athletes?” Looking at the dataset, you will see that we do not have an “age” column in our table. We do, however, have a column with the birth year of each athlete, and a column containing the year of the Olympics. Let’s prepare our data for analysis by calculating a new age column based on these two existing columns.

Hands On: Find a tool
  1. Search for the tool Compute - on rows ( Galaxy version 2.1)
    • Click on Tools tool in the Activity bar on the left
    • Enter “Compute” in the search bar
  2. Open the tool by clicking on it
    • You will see the tool form in the center panel of Galaxy
  3. Scroll down to the Help section and read about the tool
    • Here you will always find usage information about the tool, including citations and links to tutorials describing the tool.
    • How could we use this tool to add an age column to our dataset?
    tool help section. Open image in new tab

    Figure 3: Help section of the Compute tool

We can use this tool to compute an age column for our dataset, but first, we must ask ourselves some questions:

Question: Explore the dataset
  1. Which column contains the birth year information
  2. Which column contains the year of the Olympics?
  3. How can we compute the age of the athlete from these columns?
  1. column 4 (c4)
  2. column 10 (c10)
  3. we subtract the columns, c10-c4

We now have what we need to add an age column to our dataset, let’s do it:

Hands On: Use a tool
  1. Compute - on a row ( Galaxy version 2.1) with the following parameters
    • param-file “Input file”: 2010 Winter Olympics Vancouver
    • “Input has a header line with column names?”: Yes
    • “Expressions”
      • plus Insert Expressions
        • “Add expression”: c10-c4
        • “Mode of the operation”: Append
        • “The new column name”: age
  2. Run workflow-run the tool

This tool will run, and a new output dataset will appear at the top of your history panel.

Hands On: Check the results
  1. View galaxy-eye the resulting file
    • make sure the new column was successfully added, and the column header is “age”
Question:
  1. What column number is our new “age” column?
  2. What age is the first Olympian in our file, Muhammad Abbas?
  1. The column was added at the end, column 16.
  2. Age 23.

Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.

  • Open your Galaxy server
  • Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
  • Navigate to your tutorial
  • Tool names in tutorials will be blue buttons that open the correct tool for you
  • Note: this does not work for all tutorials (yet) gif showing how GTN-in-Galaxy works
  • You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface
Warning: Not all browsers work!
  • We’ve had some issues with Tutorial mode on Safari for Mac users.
  • Try a different browser if you aren’t seeing the button.

Tool provenance

We already examined the attributes for the file we uploaded. For datasets that result from running tools, Galaxy tracks even more provenance. Let’s look at this now.

Hands On: Explore metadata
  1. Expand the item in your history by clicking on its name
  2. Click on the “Dataset Details” details button

Here you will see all the metadata that Galaxy keeps track of. It has all the same basic information as we saw with the uploaded file. In addition, it shows which tool produced this output, complete with exact parameter settings and tool version.

tool parameters of our tool run. Open image in new tab

Figure 4: Parameters of the job (tool run) that produced this dataset
job information of our tool run. Open image in new tab

Figure 5: Job information of our tool run.
Question: Examine the Job metadata
  1. What was the version of the tool that produced your dataset?
  2. What was the command that was run behind the scenes?
  1. Version 2.1. This can be found under “Job Information -> Galaxy Tool ID”, where the last part is the version. E.g. toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.1. Note that this may be different for you if a newer version has been released since writing this tutorial.
  2. The command that is run can be found under “Job Information -> Command Line”. It will be something like:
    python '/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/column_maker/aff5135563c6/column_maker/column_maker.py' --column-types int,str,str,int,float,float,str,str,str,int,str,str,str,str,str  --header --file '/data/jwd07/main/097/599/97599988/configs/tmp1vp1f4gh' --fail-on-non-existent-columns --fail-on-non-computable '/data/dnb12/galaxy_db/files/7/a/6/dataset_7a6bad76-3181-45e2-a460-31cbe2a6e4a3.dat' '/data/jwd07/main/097/599/97599988/outputs/dataset_5ce07003-fe0c-4836-8f16-b64f25dc9219.dat'
    

Visualise a dataset

Hands On: Visualise a dataset
  1. Expand the output from Compute tool
  2. Click on the visualise galaxy-visualise icon

    visualisation button.

  3. Select the Boxplot option

    visualisation options.

  4. In the “Tracks” tab, change Column of y-axis values to Column: 16 (our new age column)

This is a quick way to get a feeling for our data.

histogram visualisation. Open image in new tab

Figure 6: screenshot of the resulting boxplot. Hovering your mouse over the plot shows you the labels
Question
  1. What age range were our athletes?
  1. Based on the box plot, it looks like our youngest athlete was 15, and our oldest was 51. The mean age was 25.
  1. Click on the Save cloud-upload icon at the top-right

    visualisation save button.

  2. In the Activity bar, click on Visualization
  3. Click on Saved Visualizations at the top of the panel

    saved visualisations.

  4. Here you will find your saved visualisations
    • Here you can view, adjust, and rename your previously saved visualisaions

    saved visualisations list.

Re-run a tool

Our file only contained information for a single Olympics, let’s have a look at a second Olympics as well.

We will import another file from Zenodo, but in a slightly different way. Instead of providing the download URL for the dataset, we can also browse Zenodo repositories (and many other data repositories) directly from the Galaxy upload menu.

Hands On: Upload a second dataset
  1. Option 1: Choose from Repository
    • Open the Upload window
    • At the bottom, click on “Choose from Repository”

      choose from repositories button.

    • Search for “Zenodo” at the top
      • If you do not find anything, this is not supported on your Galaxy yet, please skip to option 2 below
    • Search for the repository with the same name as this tutorial “Introduction to Galaxy as an RDM platform”

      choose from repositories button.

    • Select the file olympics-2008-summer.tsv

      choose from repositories button.

    • Click Start
    • Close the upload window
  2. Option 2: From URL (same method as before)

    https://zenodo.org/records/18803585/files/olympics-2008-summer.tsv
    
    • Copy the link location
    • Click galaxy-upload Upload at the top of the activity panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

Question: Examine the file
  1. Which Olympics is this file for? Which city was it held in?
  1. This file is from the 2008 Summer Olympics in Beijing.

Now that we have a second dataset, we want to run the same Compute tool tool on this data so that we get an age column. We could open the tool again and reconfigure all the settings, but there is an easier way to repeat what we did before.

Hands On: Re-run a tool
  1. Click the Re-run dataset-rerun button on the output from the Compute tool
    • You will see the tool form with the exact same settings we used before
    • Because Galaxy tracks all the parameter settings, it is easy to repeat a tool on new data, without having to choose all the parameters again.

    rerun button.

  2. Change the input dataset param-file to the summer olympics file we just uploaded
    • Run workflow-run the tool
Question: How did it go?
  1. What do you see?
  1. If all went well, something went wrong in this step. That is, your dataset turned red instead of green. Not to worry, we will show you how to troubleshoot errors next.

    a red failed dataset in history.

Oh no! The dataset turned red! This means something went wrong. In the next section, we will show you how you can troubleshoot errors in Galaxy.

Troubleshooting errors

So something went wrong with one of your tools. This will happen now and then, and can have different causes. It might be something you can fix yourself (e.g. a problem with the input dataset), or it might be something that needs to be fixed in Galaxy (e.g. a bug in the tool). Next, we will see how you can find more information about the error and submit a bug report if you think it might be a problem with the tool.

Hands On: Troubleshoot a failed tool
  1. Click on the bug icon galaxy-bug on the failed (red) dataset

    the bug icon on a historty item.

You will now see details about the error in the center panel:

Error tab of dataset details. Open image in new tab

Figure 7: the error tab of the dataset details page. This page shows us the error message (stderr) and other tool logs (stdout). It also has a form to submit a bug report at the bottom.

The error messages can sometimes be a bit cryptic, but the more you use the tools, the easier it will get. If you do not know how to fix the error yourself, you can submit a bug report at the bottom of this page. This will be sent to the administrators of the Galaxy you are using.

Question: Examine the Error message
  1. Can you guess what went wrong based on the error message?
  2. Is this something we can fix? How?
  1. The error message says could not convert string to float: 'NA'. This suggests there is a line in the input file that contains an unexpected value (NA). This is a common way to denote a missing value, but if we assume this column to be a number and use it in our calculation things can go wrong.

    Failed to convert some of the columns in line #1859 to their expected types.
    The error was: "could not convert string to float: 'NA'" for the line:
    "19504	Cha Yong-Hwa	F	NA	145	39	North Korea	PRK	2008 Summer	2008
     Summer	Beijing	Gymnastics	Gymnastics Women's Individual All-Around	NA"
    

    Apparently, no birth year was known for this athlete from North Korea

  2. Yes, since the problem is with our input file, this is something we can fix ourselves.

    • One solution could be to remove all lines that contain NA in the birth year column.
    • Another would be to replace all NA values with nan (not a number), which is the appropriate way to indicate missing values in numeric columns
    • In our case, there is an easier option: we can tell the Compute tool tool how to deal with such cases.

The error is caused because Galaxy is trying to interpret the birthyear column as a number, but it cannot do this for columns containing an “NA” (Not Available) value.

So now that we know what caused the error, let’s fix it by re-running our tool once more, with different error-handling settings. We can tell the Compute tool tool to stop autodetecting the column type, and instruct it what to do with “NA” values.

Hands On: Re-run the tool with error handling parameters
  1. Re-run dataset-rerun the failed (red) dataset
    • Expand the Error Handling section at the bottom of the tool form
      • param-toggle “Autodetect column types”: No
      • “If an expression cannot be computed for a row”: Skip the row
    • Change the “Expression” parameter to: int(c10)-int(c4)
      • the int() part tells the tool to turn the value into an integer (whole number). Since we told the tool not to autodetect anymore, we need to tell it how to interpret the values in the column.

    expression of the tool form.

    error handling section of the tool form.

Question:
  1. What age is the first Olympian in this file, Ragnhild Margrethe Aamodt?
  1. Age 27. The age column is the last one.

If this solution seemed a bit cryptic, don’t worry too much; there are always multiple ways to solve the problem. The important thing is that you ran into a problem, looked at the error, and then solved it.

If you get an error message that you don’t understand or don’t know how to solve, you can always ask for help in one of our support channels.

If you need support for using Galaxy, running your analysis or completing a tutorial, please try one of the following options:

Starring your favourite tools

Since Galaxy has so many tools to choose from, once you find one that is useful for you, you will likely want to use it more often. To make it easier to find your favourite tools, you can star them.

Hands On: Star/Favourite a tool
  1. Star galaxy-star the Compute tool tool

    Galaxy servers can have a lot of tools available, which can make it challenging to find the tool you are looking for. To help find your favourite tools, you can:

    • Keep a list of your favourite tools to find them back easily later.
      • Adding tools to your favourites
        • Open a tool
        • Click on the star icon galaxy-star next to the tool name to add it to your favourites
      • Viewing your favourite tools
        • Click on the star icon galaxy-star at the top of the Galaxy tool panel (above the tool search bar)
        • This will filter the toolbox to show all your starred tools
    • Change the tool panel view
      • Click on the galaxy-panelview icon at the top of the Galaxy tool panel (above the tool search bar)
      • Here you can view the tools by EDAM ontology terms
        • EDAM Topics (e.g. biology, ecology)
        • EDAM Operations (e.g. quality control, variant analysis)
        • You can always get back to the default view by choosing “Full Tool Panel”

  2. You can access your favourite tools by clicking on the galaxy-star icon at the top of the Tool Panel

    • this will filter the tool panel for the tools you have starred and your most-used tools

Keeping your history clean

If you have failed items in your history, you might want to delete them. This helps keep your history organized.

Hands On: Delete failed dataset
  1. Click on the trashcan icon galaxy-delete on the failed (red) dataset

Did you accidentally delete a dataset you didn’t mean to delete? Not to worry, your data is not gone yet. You can show these deleted datasets in your history, and undelete them.

  1. Click on include deleted galaxy-delete at the top of your history
    • so not on the dataset, but at the top of the history panel
    • you will see the deleted dataset appear in your history again - if you expand the deleted dataset, the delete icon has turned into an undelete icon

    history option to include deleted datasets.

Your dataset is now gone from your history. But deleting it does not remove it completely yet. So if you delete something by accident, you can still view it and undelete it.

You can also delete datasets in bulk:

Deleting datasets individually

To delete datasets individually simply click the galaxy-delete button with dataset’s box. That’s it! This action is reversible: datasets can be undeleted.

Deleting datasets in bulk

To delete multiple datasets at once:

  • Click history-select-multiple icon at the top of the history pane;
  • Select datasets you want to delete;
  • Click the dropdown that would appear at the top of the history;
  • Select “Delete” option.

This action is also reversible: datasets can be undeleted.

An animated gif showing how to delete datasets

Deleting datasets permanently warning Danger zone!

Warning: Permanent is ... PERMANENT!

Datasets deleted in this fashion CANNOT be undeleted!

To delete multiple datasets PERMANENTLY:

  • Click history-select-multiple icon at the top of the history pane;
  • Select datasets you want to delete;
  • Click the dropdown that would appear at the top of the history;
  • Select “Delete (permanently)” option.

Storage Quota

Sometimes you really want to permanently delete a dataset, for example, to free up your storage quota. By default, you get 250 GB storage (exact number may depend on your Galaxy instance), and more can usually be requested on a temporary basis. If you are running out of storage space, you can purge (permanently delete) datasets as well. This cannot be undone.

  1. All account Datasets can be reviewed under User > Datasets.
  2. To permanently delete: use the link from within the dataset, or use the Operations on Multiple Datasets functions, or use the Purge Deleted Datasets option in the History menu.

Notes:

  • Within a History, deleted/permanently deleted Datasets can be reviewed by toggling the deleted link at the top of the History panel, found immediately under the History name.
  • Both active (shown by default) and hidden (the other toggle link, next to the deleted link) datasets can be reviewed the same way.
  • Click on the far right “X” to delete a dataset.
  • Datasets in a deleted state are still part of your quota usage.
  • Datasets must be purged (permanently deleted) to not count toward quota.

  • Download Datasets as individual files or entire Histories as an archive. Then purge them from the public server.
  • Transfer/Move Datasets or Histories to another Galaxy server, including your own Galaxy. Then purge.
  • Copy your most important Datasets into a new/other History (inputs, results), then purge the original full History.
  • Extract a Workflow from the History, then purge it.
  • Back-up your work. It is a best practice to download an archive of your FULL original Histories periodically, even those still in use, as a backup.

Resources Much discussion about all of the above options can be found at the Galaxy Help forum.

We recommend always keeping your history clean and deleting any failed steps.

Further learning about data preprocessing in Galaxy

Galaxy offers a wide range of basic file manipulation tools that are very helpful for data cleaning. Operations such as file transformations, filtering, sorting, grouping, joining, splitting, etc., are all possible inside Galaxy.

For more practice with such tools, please see our Data Manipulation Olympics tutorial

Optional: Use an Interactive Tool

Galaxy also offers various Interactive Tools. For example, we could have performed this preprocessing with OpenRefine as well. Or if we know a bit of programming in R or Python, we could have done these steps using RStudio or Jupyter Notebooks. All of these have been integrated into Galaxy as Interactive Tools.

Using these interactive tools is not quite as reproducible as using standard Galaxy tools, but it is great for the exploratory analysis phase of research, especially if you are already familiar with these tools.

In this optional section, we will show you how to use such an interactive tool. Here we will use OpenRefine, a powerful free, open source tool for working with messy data: cleaning it, transforming it from one format into another. We will use OpenRefine to perform the same task of adding an age column to our dataset.

Hands On: Launch OpenRefine
  1. Click on Interactive Tools in the Activity Bar
  2. Search for OpenRefine

    OpenRefine Interactive tool search result.

  3. You will see a tool form where you can select files to open
    • “Input file in tabular format”: 2010 Winter Olympics Vancouver
    • Run workflow-run the tool
  4. Click on Interactive Tools in the Activity Bar again
    • It may take a little time to start
  5. Once it has started, click on the name to open it
    • Clicking on the external-link icon will open it in a new tab

    Olympics dataset as a table in OpenRefine.

  6. Click on Open Project on the left panel of OpenRefine
    • Click on Galaxy File

    Olympics dataset as a table in OpenRefine.

  7. You will see our Olympics dataset loaded in OpenRefine:

    Olympics dataset as a table in OpenRefine.

Next, let’s create the same age column in OpenRefine as we did earlier with regular tools.

Hands On: Edit dataset in OpenRefine
  1. First, we tell OpenRefine to interpret the birthyear column as a number
    • Click on the dropdown dropdown icon next to the birthyear column name
    • birthyear dropdown –> Edit Cells –> Common Transforms –> To number

    menu item to convert column to number.

  2. The values in the column turned green

  3. Click on column birthyear dropdown –> Edit Column –> Add column based on this column

  4. Fill in the form
    • “New column name”: age
    • “Expression”: 2010-value

    create new column menu.

  5. You should now see a new column named “age”

    age column added.

Now that we have transformed our dataset as needed, we want to export this table back to our Galaxy history so that we can continue working on it.

Hands On: Save OpenRefine dataset to Galaxy History
  1. Click on the Export button in the top-right corner of OpenRefine
  2. Select Galaxy Exporter from the dropdown

    button to export to Galaxy.

  3. You will get a message that your dataset has been exported to Galaxy
    • Check your history and view the exported file
  4. You can now stop stop your Interactive tool again

    1. Go to Interactive Tools on the Activity Bar on the left
    2. Click on the stop button stop next to OpenRefine

Interactive tools can be a powerful addition to your Galaxy analysis.

If you are interested in using R or Python programming in Galaxy, we recommend you have a look at the Foundations of Data Science topic in GTN for comprehensive tutorials.

Scaling up

Now that we have preprocessed our data, we can continue our analysis, but before we do that, let’s explore some more Galaxy RDM features that can help you manage your research data and analyses.

Multiple histories

You can have multiple histories in Galaxy to organize your different analyses. We will now start a second history, and show you how you can switch between histories and move data from one history to another.

Hands On: Create a second History
  1. Create a new History

    To create a new history simply click the new-history icon at the top of the history panel:

    UI for creating new history

  2. Name galaxy-pencil your history

    • call it Multi-Olympics Data Analysis

You have now created a new, empty history.

To avoid re-uploading our Olympics dataset and duplicating that data, we can simply copy the files from our previous history

Hands On: Copy datasets from another history
  1. View your histories side by side. Instructions are in the tip box below:

    You can view multiple Galaxy histories at once. This allows to better understand your analyses and also makes it possible to drag datasets between histories. This is called “History multiview”. The multiview can be enabled either view History menu or via the Activity Bar:

    1. Option 1: Enabling Multiview via History menu is done by first clicking on the galaxy-history-optionsHistory options” drop-down and selecting galaxy-multihistoryShow Histories Side-by-Side option”:

      Enabling side-by-side view using History Options menu

    2. Option 2: Clicking the galaxy-multihistoryHistory Multiview” button within the Activity Bar:

      Enabling side-by-side view using Activity Bar

    multi-history view.

  2. Drag-and-drop datasets between histories

    • drag the Winter Olympics file to the new history
    • do the same for the Summer Olympics file

    multi-history view after file copy.

We now have both our datasets in our new history. By doing it this way, rather than re-uploading the files, we do not increase our storage usage.

Comment: The Galaxy History System

Galaxy allows you to have as many histories as you like. Some tips and tricks for using histories:

  • Use a different history for each analysis
  • Always give your histories a good name. This makes it easy to find your histories later.
  • You can easily switch back and forth between histories as needed

    To switch to an existing history simply click the switch-histories icon at the top of the history panel. This opens a list of histories existing in a given Galaxy account in the middle part of the interface.

    UI for switching histories

  • And you can search your histories

    1. To review all histories in your account, go to Histories galaxy-histories-activity in the Activity Bar (on the left).
    2. There you will find 4 tabs with histories:
      • My Histories
      • Historires Shared with Me
      • Public Histories
      • Archived Histories
    3. At the top of each tab is a search bar
    4. Click Advanced Search galaxy-advanced-search at the right of the search bar for more search options
      • Searching by tag galaxy-tags
      • Filtering by state (e.g. published, deleted, purged)
    5. Histories in all states are listed for registered accounts. Meaning one will always find their data here if it ever appears to be “lost”.
    6. Note: Permanently deleted (purged) Histories may be fully removed from the server at any time. The data content inside the History is always removed at the time of purging (by a double-confirmed user action), but the purged History artifact may still be in the listing. Purged data content cannot be restored, even by an administrator.

We will continue our analysis in this second history and use collections and dataset tags to analyze multiple datasets simultaneously, and keeping our data organized.

Warning: Proceed with the correct history

Important! We will now continue the rest of this tutorial in this new history (named Multi-Olympics Data Analysis)

  1. If this is not your active history, please switch to it now (see the box below for instructions)

    To switch to an existing history simply click the switch-histories icon at the top of the history panel. This opens a list of histories existing in a given Galaxy account in the middle part of the interface.

    UI for switching histories

  2. Your history should contain two datasets, and look something like this:

    screenshot of history to continue with.

Dataset tags

You may have noticed in our first history that the results from the Compute tool tool were named Compute on dataset 1 and Compute on dataset 3. To make it a bit clearer for ourselves which dataset was generated from which input file, we can add dataset tags galaxy-tags

Hands On: Add dataset tags
  1. Add two dataset tags galaxy-tags to the Winter Olympics file
    • Make sure all tags start with a hashtag (#), then they will also be added to any datasets derived from it during analysis.
    • tag 1: #winter
    • tag 2: #Vancouver
    • tag 3: #2010

    Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

    To tag a dataset:

    1. Click on the dataset to expand it
    2. Click on Add Tags galaxy-tags
    3. Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).
    4. Press Enter
    5. Check that the tag appears below the dataset name

    Tags beginning with # are special!

    They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

    1. a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;
    2. dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);
    3. datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;
    4. datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

    A history without name tags versus history with name tags

    Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

    The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

    More information is in a dedicated #nametag tutorial.

  2. Do the same for the Summer Olympics file:
    • tag 1: #summer
    • tag 2: #Beijing
    • tag 3: #2008
  3. Your history should now look something like this:
    • a history with 2 files, each with dataset tags

    datasets with tags added.

Dataset collections

In order to easily run analysis on multiple datasets at once, we can create dataset collections in Galaxy:

Hands On: Create a collection
  1. Create a collection param-collection out of our two Olympic datasets:
    • Click on Select Items param-check (checkbox icon) at top-left of the history panel

      select datasets from history.

    • Check the box in front of both our datasets

      checked both our datasets.

    • At the top, click on All 2 Selected dropdown
    • Choose the option Auto Build List

      Auto Build list menu option.

    • Name your Collection
      • for example “Olympics Dataset”

      dataset collection builder.

    • Click Build
    • Click on galaxy-selector Select Items at the top of the history panel (letter a) Select Items button
    • Check all the datasets in your history you would like to include (letter b)
    • Click n of N selected (see letter b below) and choose Auto build List

      Collection building with autobuild

    • Enter a name for your collection (letter c)
    • Turn off Remove file extension (letter d)

      Put a name and remove extension

    • Click Build to build your collection (letter e)
    • Click on the checkmark icon at the top of your history again (first letter a)

    Once the collection is created, all files turn green. You can limit visible files using the eye icons in the history panel.

  2. Your history now has a single item in it
    • It tells you what is inside “a list with 2 tabular datasets”

    collection history item.

  3. Click on the collection to see the files inside it

    collection contents.

  4. Return to the regular history view by clicking the link at the top of the history panel
    • The link will be called something like “« History: Multi-Olympics Data Analysis

We can now treat this collection the same way as a single dataset. If we use a collection as input to a tool, that tool will be run multiple times, once on each of the datasets inside the collection. The output of the tool will again be a collection, this time with all the result files.

Run a tool on a collection

Now that we have set up our inputs as a collection with tags, let’s see how to run the Compute tool tool on both datasets in the collection at once.

Remember that you starred galaxy-star the compute tool, so you can use that to easily find it again now!

Hands On: Run a tool on a collection
  1. Compute - on rows ( Galaxy version 2.1) with the following parameters
    • param-collection “Input file”: Olympics Dataset (collection)
      • In front of this parameter, click on the param-collection icon to switch to collection input
    • “Input has a header line with column names?”: Yes
    • “Expressions”
      • plus Insert Expressions
        • “Add expression”: int(c10)-int(c4)
        • “Mode of the operation”: Append
        • “The new column name”: age
    • Expand the Error Handling section at the bottom of the tool form
      • param-toggle “Autodetect column types”: No
      • “If an expression cannot be computed for a row”: Skip the row
    1. Click on param-collection Dataset collection in front of the input parameter you want to supply the collection to.
    2. Select the collection you want to use from the list

  2. View galaxy-eye the results
Question: What is our output?
  1. How many outputs were created? Are the files the same as before?
  2. What happened with the tags?
  1. One output collection was created, with two files inside. The files themselves are the same as before.
  2. The tags from our input datasets were also added to the results

Collections allow you to easily run tools on multiple datasets at once. We have 2 datasets in our collection, but you can have as many as you like, even hundreds or thousands.

Now that we have everything in Galaxy set up for analysis, and our data pre-processed to the right format, we can start to answer our research question, “What is the age distribution of Olympic athletes?”.

Analyse: Calculate results

The RDM lifecycle with the analyse stage highlighted.

The Analyse phase of the research life cycle is where you extract knowledge from your data in order to answer your research questions. The details of this phase vary greatly depending on your domain, but Galaxy is a cross-domain platform and has a wide range of analysis tools available for any scientific domain.

Comment: Domain-specific analysis tools

Because this is an intro tutorial, our “analysis” will be quite basic. But Galaxy offers thousands of tools covering a wide range of scientific domains. From life sciences to ecology, climate, astronomy, digital humanities, and many more.

Galaxy has a lot of computational power behind it, so whether you need a simple calculation or a complex algorithm requiring a supercomputer, Galaxy can handle it.

If you are interested in a specific domain, have a look at the following resources:

  • Galaxy Special Interest Groups (SIGs)
  • Galaxy Labs (aka subsites or subdomains)

    Galaxy offers different domain-specific views of its interface. These are sometimes called Galaxy Labs, or Galaxy subsites, or Galaxy subdomains.

    This is the same Galaxy, but with a few differences:

    • Different home page, containing information aimed at domain researchers
    • Different tool bar, filtered for most useful tools for that domain
    • You may have to re-login, you do this with the same account as always, and you can access all your existing histories etc from any subsite.
    1. On the top menu bar, on the right, click on the “switch sites” icon, galaxy_sites or galaxy_sites2 depending on the version of Galaxy

      switch site button

    2. Select the Lab you want to start using
      • The options will be different per Galaxy instance

      switch site options

    3. You can always switch back to the main Galaxy site by selecting “base site” in this list

      base site option

And the following GTN resources:

Plan our approach

Recall that our research question in this tutorial is “What is the age distribution of Olympic athletes?”

Question: What to do?
  1. How would you approach answering our research question?
  2. Can you find tools in Galaxy that might help you do this?
  1. There are several things we might like to compute in order to answer our question, perhaps
    • What is the average age of our Olympians?
    • What is the standard deviation?
    • What ages are the oldest and youngest Olympians?
    • What does the histogram look like for the age distribution?
    • Create a boxplot for the age distribution
  2. If you already know the name of the tool you want to use, you can simply enter this in the search bar. But often you might not know the name of the tool, then just search for some related keywords

    Try searching for terms like:

    • statistics, mean, average, minimum, maximum, standard deviation, summary, column, histogram, boxplot

    The tool Summary Statistics - for any numerical column tool looks interesting!

    As does the Histogram tool

Let’s do some analysis based on our plan.

Get summary statistics for our age column

Hands On: Summary Statistics
  1. Summary Statistics for any numerical column with the following parameters:
    • param-collection “Summary statistics on”: output from Compute tool
      • remember to switch to collection input param-collection
    • “Column or expression”: c16
  2. View galaxy-eye the results
    • it should look something like this:

      #sum 	mean 	stdev 	0% 	25% 50% 75% 100%
      114999 	26.1243 5.01207 15 	23 	25 	29 	51
      
Question:
  1. Which of these two Olympics game had the youngest contestants on average?
  2. What was the age of the oldest contestant in each Olympics?
  1. The 2008 Summer Olympics. Compare the mean of each output. For the 2010 Winter Games, this was 26.1243, and for the 2008 Summer Games, it was 25.7341
  2. The value of the 100th percentile indicates the highest value encountered. For 2008, this was 67 years, for 2010, it was 51.

This is great, we know some summary statistics for the age distribution of the Olympics. Let’s see if we can also create a visual representation.

Create a histogram

A picture is worth a 1000 words, so let’s see if we can plot the age distribution as well. We already created a boxplot before; let’s try a histogram this time. We will also use a tool rather than a Galaxy visualisation, so that we get an output file with the plot in our history.

The tool we are going to use for this is Histogram with ggplot2 tool. This tool will plot every compatible column in the input dataset. Since we are only interested in the age column, we will extract this column first, and then plot it.

Hands On: Create a Histogram Plot
  1. Remove columns - by heading ( Galaxy version 1.0) with the following parameters:
    • param-collection “Tabular file”: output from Compute tool (remember to switch to collection param-collection input again)
    • “Header name”: age
    • param-toggle “Keep named columns”: Yes
    • Click on Run
  2. View galaxy-eye the outputs
    • Make sure the output is as expected
    • You should have a collection with two files
    • Each file should contain only 1 column, the age column

    output of remove columns step.

  3. Histogram with ggplot2 ( Galaxy version 3.5.1+galaxy1) with the following parameters:
    • “Input”: The output from Remove columns tool (This is a collection input param-collection)
    • “Plot title”: enter a good title, e.g. Age distribution of athletes
    • “Label for x axis”: Age
    • “Label for y axis”: Count
    • “Bin width for plotting”: 1
  4. View galaxy-eye the resulting plots side by side using the Window Manager galaxy-scratchbook

    If you would like to view two or more datasets at once, you can use the Window Manager feature in Galaxy:

    1. Click on the Window Manager icon galaxy-scratchbook on the top menu bar.
      • You should see a little checkmark on the icon now
    2. View galaxy-eye a dataset by clicking on the eye icon galaxy-eye to view the output
      • You should see the output in a window overlayed over Galaxy
      • You can resize this window by dragging the bottom-right corner
    3. View galaxy-eye a second dataset from your history
      • You should now see a second window with the new dataset
      • This makes it easier to compare the two outputs
    4. Repeat this for as many files as you would like to compare
    5. You can turn off the Window Manager galaxy-scratchbook by clicking on the icon again

The Window Manager is an easy way to quickly compare two datasets.

the histograms side by side.

But this doesn’t scale to a large number of datasets. So, as the final step of our analysis, let’s create a montage of our histograms.

Hands On: Create a montage of plots
  1. Image Montage - with ImageMagick ( Galaxy version 7.1.2-2+galaxy1) with the following parameters
    • param-collection “Image”: Output from Histogram tool
    • ”# of images wide”: 2
    • “Add a Title to the image”: Age distribution of athletes in Olympic Games
    • “Add the name of the files as image labels.”: Yes
    • “Point size of the labels and/or title”: 60

montage of histograms.

Awesome, we now have a pretty good answer to our question. We have some basic summary statistics for each Olympics, and a montage of histogram plots.

Next, we would like to repeat all this for all Olympic games.

Note that we chose a 2-image-wide montage because we only had 2, but when we run it on more datasets at once, we might want to change this. We will do this later.

Extract workflow from our history

To make it easy to repeat this entire analysis without a lot of clicking, we will create a workflow based on our analysis history.

Hands On: Extract workflow from history
  1. Clean up your history
    • Remove any failed (red) jobs from your history by clicking the galaxy-delete button.
    • This will make creating the workflow easier.
  2. Click galaxy-history-options (History options) at the top of your history panel and select Extract workflow.

    'Extract Workflow' entry in the history options menu.

    The central panel will show the content of the history in reverse order (oldest on top), and you will be able to choose which steps to include in the workflow.

    Selection of steps for Extract Workflow from history.

  3. Replace the Workflow name with something more descriptive, for example: Olympic Age distribution.
    • Here you can also uncheck any steps you forgot to delete when you cleaned up your history
  4. Click the Create Workflow button near the top.

    You will get a message that the workflow was created.

Next, let’s view our workflow in the workflow editor

Hands On: View the workflow in the editor
  1. Click galaxy-workflows-activity Workflows in the Activity bar.

    Here you have a list of all your workflows (the My Workflows tab is active by default).

    workflows table.

    You can see all available actions for the workflow on the workflow card, e.g. copy, download, share, edit and run

  2. Click the galaxy-wf-edit (Edit) button on the bottom right of the workflow card.

  3. Play around with the editor
    • You can move boxes around
    • You can add tools and make connections between tools
    • You can click on a tool and change parameters

    workflows editor.

    We will only make 1 change: since we will have many more histograms, let’s make the montage image 4 plots wide

  4. Click on the Montage tool
    • A panel with the tool’s configuration will open on the right.
    • Change the value for ”# of images wide” to 4.

    workflows table.

  5. Save dataset-save the workflow via the dataset-save icon at the top right.

  6. Exit the editor by clicking on the Home button (Galaxy logo) at the left of the top menu bar.

Next, we will run this workflow on all Olympic games.

If your workflow looks very different than the one pictured above, it may be that you missed a step or continued in the wrong history. This is ok and won’t affect the rest of the tutorial too much (though you may have to make some adjustments in the next step where we run the workflow)

If you would like to see the workflow as intended, you can follow the steps below to import the example workflow to your account so you can start using it:

Hands On: Importing the example workflow for this tutorial
  1. Open the workflows page for this tutorial
    • Every tutorial in the GTN has a workflow, you can always find the link to this in the overview box at the start of a tutorial
  2. Click on Olympic Age Distribution
  3. You should see the following page

    GTN workflow page.

  4. Click the “Run Workflow in Galaxy button at the top of the page
  5. Select your Galaxy from the dropdown
  6. Click on Import workflow
    • You will now see this workflow under Workflows galaxy-workflows-activity in the Activity Bar

Run workflow on all Olympics

First, we will create a new history and upload our data.

Hands On: New history
  1. Create a new history

    To create a new history simply click the new-history icon at the top of the history panel:

    UI for creating new history

  2. Rename galaxy-pencil your history, e.g. “All Olympics”

    1. Click on galaxy-pencil (Edit) next to the history name (which by default is “Unnamed history”)
    2. Type the new name
    3. Click on Save
    4. To cancel renaming, click the galaxy-undo “Cancel” button

    If you do not have the galaxy-pencil (Edit) next to the history name (which can be the case if you are using an older version of Galaxy) do the following:

    1. Click on Unnamed history (or the current name of the history) (Click to rename history) at the top of your history panel
    2. Type the new name
    3. Press Enter

  3. Upload the zip file with all Olympic datasets from Zenodo

    https://zenodo.org/records/18803585/files/olympics-all.zip
    
    • Copy the link location
    • Click galaxy-upload Upload at the top of the activity panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

  4. Unzip - a file ( Galaxy version 6.0+galaxy3) with the following parameters:

    • param-file “Input file”: olympics-all.zip
Question:
  1. How many Olympic Games do we have data for?
  1. Our collection contains 51 datasets, one per Olympics

Now it’s time to run our workflow

Hands On: Run the workflow
  1. Click galaxy-workflows-activity Workflows in the Activity bar.

  2. Click the workflow-run (Run workflow) button on the bottom right of the workflow card.

    The central panel will change to allow you to configure and launch the workflow.

    workflow run menu.

  3. Make sure the input of the workflow is our collection of 51 datasets.

  4. Click Run Workflow workflow-run at top right.

    • You will now see the workflow invocation screen
    • Here you can see the progress of the workflow
    • You can find all your previous workflow runs (invocations) in the Activity bar under Workflow Invocations galaxy-panelview

    workflow invocation.

Our analysis will now be run on all 51 Olympics files. This may take a bit of time (~5-10 minutes or more, depending on how busy Galaxy is at the moment), so now is a good time to grab a coffee. You can also already proceed to the next section while you wait.

Once your workflow is finished, you should get a final montage image with 51 histograms.

final montage image. Open image in new tab

Figure 8: Montage of histograms for all 51 Olympic games in our dataset
Question
  1. What was the youngest athlete in the 1896 Olympics?
  1. Look in the Summary statistics output, the 0th percentile is 10 years old

Well done! You have created your first Galaxy workflow and rerun it on a collection of datasets.

The next step is often preserving your work. Whether you want to publish your findings and methods in a journal article, or share it with colleagues, or simply have a detailed record for yourself. The next sections deal with exporting and sharing everything you created in Galaxy for your research.

Preserve: Export data, history, and workflow

The RDM lifecycle with the preserve stage highlighted.

The preserve phase of the research life cycle consists of ensuring that our data, results and the details of our analysis are preserved and remain accessible long-term. For instance so that we can share our work with colleagues at our institute, or with the wider world via e.g. a journal publication. Everything you do in Galaxy, can be exported so that you can share it with others or archive it in specialized data repositories.

Downloading your history

Individual datasets can be downloaded via the save save icon on the expanded dataset in history, or via the command line.

  1. Click on the dataset in your history to expand it
  2. Click on the Download icon galaxy-save to save the dataset to your computer.

From the terminal window on your computer, you can use wget or curl.

  1. Make sure you have wget or curl installed.
  2. Click on the Dataset name, then click on the copy link icon galaxy-link. This is the direct-downloadable dataset link.
  3. Once you have the link, use any of the following commands:
    • For wget

      wget '<link>'
      wget -O '<link>'
      wget -O --no-check-certificate '<link>' # ignore SSL certificate warnings
      wget -c '<link>' # continue an interrupted download

    • For curl

      curl -o outfile '<link>'
      curl -o outfile --insecure '<link>' # ignore SSL certificate warnings
      curl -C - -o outfile '<link>' # continue an interrupted download

  4. For dataset collections and datasets within collections you have to supply your API key with the request
    • Sample commands for wget and curl respectively are:

      wget https://usegalaxy.org/api/dataset_collections/d20ad3e1ccd4595de/download?key=MYSECRETAPIKEY

      curl -o myfile.txt https://usegalaxy.org/api/dataset_collections/d20ad3e1ccd4595de/download?key=MYSECRETAPIKEY

But we can also download our entire history at once, including all metadata. You can download your history in two formats

  • Compressed folder
  • RO-crate (Research Object crates), a community standard for bundling research (meta)data and analysis.

An RO-Crate is an integrated view through which you can see an entire Research Object; the methods, the data, the output and the outcomes of a project or a piece of work. Linking all this together enables the sharing of research outputs with their context, as a coherent whole.

RO-Crates link data and metadata no matter where they are stored – so that from a paper, you can find the data, and from the data, you can find its authors, and so on.

For example, an RO Crate won’t just contain an author’s name. It would also contain their ORCID, which in turn is connected to their affiliations, their funding, and their other publications.

RO crates infographic.

For more information, see the ROcrates website

Hands On: Export your history
  1. Click on History options galaxy-history-options
  2. Select Export history to file
  3. Select the format of your choice

    history export format selection menu.

  4. Select as destination: Temporary Direct Download

    history export destination selection menu.

  5. Click Generate Download Link

    history export destination selection menu.

  6. You will get a Download button and a Download link

    history export destination selection menu.

You now have your full history available outside of Galaxy. This is useful if you want to continue your analysis on your local machine, or simply want a backup of your work.

This exported history can also be imported into a different Galaxy.

  1. Open the link to the shared history
  2. Click on the Import this history button on the top left
  3. Enter a title for the new history
  4. Click on Copy History

If you want to share your history with another Galaxy user, there are more direct ways to do that, which we will cover in the share section next.

Exporting your history to a repository

You can also directly export Galaxy datasets to external repositories such as Zenodo, Google Drive, OneData, and many more.

In order to do this, you will first need to configure one of these repositories in your Galaxy account settings.

Hands On: Manage your repositories
  1. Configure a repository in your Galaxy account by following the instructions in the box below
    • Pick a repository you already have an account for. E.g. Google Drive may be a good option.
    • If you do not have accounts on any of these systems, you can skip this and watch the video below this hands-on box.

    Here, we are going to briefly explain how you can Bring-Your-Own-Data to Galaxy or export your dataset, results, or history to 3rd party repositories. In order to add a new repository to your account follow these steps:

    • Click on your Username on top right part of the website and then click on Preferences.
    • From the middle panel, click on the Manage Your Repositories (previously called Manage your remote file sources).
    • Click on the + Create button on top of the page. Here, you get multiple options to connect various repositories to your account.

    For all of the possible repositories, you should fill the following fields:

    • In the Name section, give a name to your repository. This name will be used to choose the repository on Galaxy for importing or exporting datasets.
    • Optionally, you can provide a Description for this repository. This is a note for yourself.
    Hands-on: Choose Your Own Tutorial

    This is a 'Choose Your Own Tutorial' (CYOT) section (also known as 'Choose Your Own Analysis' (CYOA)), where you can select between multiple paths. Click one of the buttons below to select how you want to follow the tutorial

    Select the repository you like to add to your Galaxy account.

    If you have an Onedata account, you can use this repository to import and/or export your data directly from and to Onedata. The minimal supported Onezone version is 21.02.4. More information on Onedata can be found on Onedata’s website.

    There are extensive tutorials for setting up and utilizing of OneData on Galaxy Training Network (GTN). At the moment, we have the following tutorials for Onedata on GTN:

    1. Getting started with Onedata distributed storage
    2. Importing (uploading) data from Onedata
    3. Exporting to Onedata remote
    4. Setting up a dev Onedata instance
    5. Configuring the Onedata connectors (remotes, Object Store, BYOS, BYOD)

    In short, you can connect your Galaxy account to an Onedata repository as follows:

    • In the Onezone domain field, please fill in the address to your Onezone domain. It could be something like “datahub.egi.eu”.
    • Using the Writable? option you can decide whether to grant access to Galaxy to export (write) to your Onedata or not.
    • You should provide an Access Token to Galaxy so it can read (import) and write (export) data to your OneData. Read more on access tokens here. You can limit the access to read-only data access, unless you wish to export data to your repository (write permissions are needed then).
    • In case you want to disable validation of SSL certificates, you can use Disable tls certificate validation? option. However, we strongly recommend you to not use this option unless you know what your are doing.
    • Click on Create.

    To connect an AWS private bucket to your Galaxy account, you need to submit the following information on the form:

    • First, read the Manage access keys for IAM (Identity and Access Management) users documentation of AWS. Also, you should be familiar with Buckets (Buckets overview).
    • Please fill in the Access Key ID (something like AKIAIOSFODNN7EXAMPLE) and Secret Access Key (similar to wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY) in the corresponding fields on the Galaxy interface.
    • Please enter the URL to your Bucket (for example, https://amzn-s3-demo-bucket.s3.us-west-2.amazonaws.com) in the Bucket section.
    • Click on Create.

    To connect anonymously to an AWS public bucket using your Galaxy account, you need to enter the Bucket address in the Bucket section. For more information about AWS Bucket, please read AWS documentaion. Click on Create.

    To setup access to your Azure Blob Storage within the Galaxy, follow the steps:

    • Provide the name of your Azure Blob Storage account in the Container Name field. More information about container’s name could be found on the Microsoft documentation here.
    • Fill the Storage Account Name based on your account. More information is available on the Microsoft website.
    • Using the Hierarchical? option you can determine whether your storage is hierarchical or not. More information on Data Lake Storage namespaces can be found in the Azure Blob Storage documentation.
    • Please provide the account access key to your Azur Blob Storage account, using Account Key field. This is the documentation on Managing storage account access keys.
    • If you want to be able to export data to your Azure Blob Storage container, please set Writable? option to “Yes”.
    • Click on Create.
    • We recommend to first login to your Dropbox account.
    • On the Galaxy website, click on the Create button of the Dropbox section. You will be redirected to the Dropbox website for authentication.
    • You have to login there and grant access for the Galaxy.
    • Click on Create.

    eLabFTW is a free and open source electronic lab notebook from Deltablot. Each lab can either host their own installation or go for Deltablot’s hosted solution. Using Galaxy, you can connect to an eLabFTW instance of your choice.

    • Provide a URL with the protocol (http or https) and the domain name in the eLabFTW instance endpoint (e.g. https://demo.elabftw.net) field.
    • If you want to let Galaxy to export data to your eLabFTW, please set the Allow Galaxy to export data to eLabFTW? to “Yes” to grant required access to Galaxy. Keep in mind that your API key must have matching permissions.
    • You should provide an API Key to your eLabFTW as well. To do so, navigate to the Settings page on your eLabFTW server and go to the API Keys tab to generate a new key. Choose “Read/Write” permissions to enable both importing and exporting data. “Read Only” API keys still work for importing data to Galaxy, but they will cause Galaxy to error out when exporting data to eLabFTW. You will receive a string (similar to 2-50dd721027f56a2e119b3bdbf64f4b8518b3f82b97e7876d56dad74109c8be73d8919b88097d3c9eb8952) and you should enter this in the API Key field of Galaxy interface.
    • Click on Create.

    You can setup connections to FTP and FTPS servers to import and export files as follows:

    • Provide the address to your FTP server using the FTP Host field.
    • If you want to login with a specific user, provide the username in the FTP User field. Leave this blank to connect to the server anonymously (if allowed by the server).
    • If you want to export data to this FTP, you should set the Writable? option to “Yes”.
    • Please specify the port that Galaxy should use to connect to your FTP server using the FTP Port field.
    • In the FTP Password field provide the password to connect to the FTP server. Leave this blank to connect to the server anonymously (if allowed by the server).
    • Click on Create.
    • We recommend to login to your Google account first.
    • On the Galaxy website, click on Select button of Export to Google Drive. You will be redirected to the Google.
    • Pick the account that you want to connect to Galaxy for import and export. Grant the required permissions.
    • You will be back on the Galaxy portal and you can access your Google Drive for import and export (depending on your how you set up your accuont).
    • Click on Create.

    InvenioRDM is a research data management platform that allows you to store, share, and publish research data. You can connect to an InvenioRDM instance of your choice by following these steps:

    • Please fill the address to your InvenioRDM in the following field: InvenioRDM instance endpoint (for example, https://inveniordm.web.cern.ch/). This should include the protocol (http or https).
    • Use the Allow Galaxy to export data to InvenioRDM? option to give permission to Galaxy to export data to your repository or not.
    • Click on Create.
    • You should fill Publication Name with a name as the “creator” metadata of the records. This could be a person or an organization. You can later modify this. If left blank, an anonymous user will be used as the creator.
    • You should also enter your Personal Access Token. You can get this information in your InvenioRDM instance. Navigate to Account Settings. Then, go to Applications to generate a new token. This will allow Galaxy to display your draft records and upload files to them.
    • Click on Create.

    Using WebDAV you can connect various services that supports WebDAV protocol such as OwnCloud and NextCloud among others. The configuration of WebDAV is slightly variable from service to service but the general principles apply everywhere.

    • Provide the server address to this repository in the Server Domain field.
    • In the WebDAV server Path, you have to provide the path on this server to WebDAV.
    • In the Username field, you should write the username you use to login to this server.
    • You can grant write access for this repository using the Writable? (set to Yes) and therefore make it possible to export datasets, or histories to your connected repository.
    • Click on Create.

    As an example, if I want to connect my nextCloud repository to my Galaxy account, I should login to my nextCloud server and find the information from File settings (bottom left of the page) under the WebDAV section to fill this template. It could be something like: https://server_address.com/remote.php/dav/files/username_or_text. Here, the Server Domain is https://server_address.com and WebDAV server Path is remote.php/dav/files/username_or_text.

    In some cases, you may need to activate some features on your ownCloud or nextCloud to allow this integration. For example, some nextCloud servers require the user to use “App Passwords”. This can be done using the Settings > Security > Devices & sessions > Create new app password.

    Zenodo is an open-access repository for research data, software, publications, and other digital artifacts. It is developed and maintained by CERN and funded by the European Commission as part of the OpenAIRE project. Zenodo provides a free platform for researchers to share and preserve their work, ensuring long-term access and reproducibility. Zenodo is widely used by researchers, institutions, and organizations to share scientific knowledge and comply with open-access mandates from funding agencies.

    • Using the Allow Galaxy to export data to Zenodo?, you can decide whether you like to give write access to Galaxy or not. Set it to “Yes” if you want to export data from Galaxy to Zenodo, set it to “No” if you only need to import data from Zenodo to Galaxy.
    • Provide a name for the “creator” metadata of your records on Zenodo using the Publication Name field. You can always change this value later by editing the records in Zenodo. If left blank, an anonymous user will be used as the creator.
    • You have to provide a Personal Access Token from your Zenodo account to Galaxy. To do so, you need to log into your account. Then, visit this site: https://zenodo.org/account/settings/applications/. Alternatively, you can click on your username on top right and then click on “Applications”. Here, you need to create a “Personal Access Token”. This will allow Galaxy to display your draft records and upload files to them. If you enabled the option to export data from Galaxy to Zenodo, make sure to enable the deposit:write scope when creating the token.
    • Click on Create.

    Importing data to your Galaxy account

    When you connect a repository to your Galaxy account, you can use it to import data to Galaxy. To do so, you can click on the Upload Icon on the left panel. In the poped up window, you can click on Choose from repository to select a repository that you have added to your account. Navigate to a file that you want to upload to your Galaxy account, check the box of the file, and click on Select. You can determine the format of the file, give it a name, and then click on Start to upload the file to your Galaxy account.

    Exporting histories, datasets, and results to connected repositories

    If you have given Galaxy the permission to write to your repository, you can export your histories, datasets and reulsts in the history to that repository.

    Histories

    If you want to export a history, you should click on the History Options icon (galaxy-history-options) on the right panel. Then, you can click on Export History to File. Next, you can click on to repository on the middle panel. If you click on the Click to select directory, there will be a pop up window. Here, you can pick a repository that you have added to your account and when you are in that repository, click on Select. You can give a Name to your exported history, so you can find it easier in your connected repository. Finally, click on Export to write the history to your repository. Similarly, you can use to RDM repository or to Zenodo instead of the to repository option in the middle panel to export your history to connected RDM repositories or Zenodo.

    To have more options on exporting your history, you can click on Show advanced export options on top of the middle panel. This provides further control over the format and datasets that will be included in your exported history.

    Datasets

    If you are interested to export a single dataset or results to a connected repository, you can use a tool called Export datasets.

    1. Select the desired option from What would you like to export?.
    2. Using the Directory URI option, you can Select a connected repository. You can also give it a directory name here.
    3. We recommend to export the metadata with your datasets and results using the Include metadata files in export?.

  2. Export datasets - to repositories with the following parameters
    • “Choose your dataset”: the montage output from Galaxy
    • “Directory URI”: the repository you configured in the previous step
  3. Go to your repository and view the file there.

Below is a video showing this feature in action:

Video: Example of importing and exporting datasets between Galaxy and Zenodo

Now that you have configured a data repository in your Galaxy account, you can also use it to import data from repositories into Galaxy for analysis.

As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a Choose from repositories:

  1. Click on Upload Data on the top of the left panel
  2. Click on Choose from repository and scroll down to find your repository or type the repository name in the search box on the top.

  3. Select the datasets you want to import
  4. click on OK
  5. Click on Start
  6. Click on Close
  7. You can find the dataset has begun loading in you history.

Exporting tool citations

When you publish your analysis, you will have to cite the tools you used. Galaxy makes this easy for you:

Hands On: Export Tool Citations
  1. Click on History options galaxy-history-options
  2. Select Export Tool References
  3. Here you will find all known citations for the tools used in your current history
    • They are provided in 2 formats, References (APA) and Bibtex

    history tool citations in APA format.

Exporting your workflows

Any workflows you have created can also be exported. For example, to share them when you publish your analysis.

  1. Click on Workflows galaxy-workflows-activity in the Galaxy Activity Bar.
  2. You will see a list of all your workflows
  3. Click on the Download download button of the workflow you would like to download

Your exported Galaxy workflow will be a file with a .ga extension. This file can be imported into Galaxy easily by others.

  1. Click on galaxy-workflows-activity Workflows in the Galaxy activity bar (on the left side of the screen, or in the top menu bar of older Galaxy instances). You will see a list of all your workflows
  2. Click on galaxy-upload Import at the top-right of the screen
  3. Provide your workflow
    • Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
    • Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
  4. Click the Import workflow button

Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

Video: Importing a workflow from URL

Share: Share or publish data and workflow

The RDM lifecycle with the share stage highlighted.

The share phase of the research life cycle consist of making the data and metadata you preserved FAIR (findable, accessible, interoperable, reusable) so that others may benefit from your work (and cite you!).

As a first sharing step, you can share your work without exporting it out of Galaxy, by providing others access to your Galaxy datasets, histories, workflows, and visualisations. This makes it very easy to collaborate with others on Galaxy.

Galaxy objects can be shared in different ways:

  1. With specific users on the same Galaxy
  2. With Galaxy users via a share link (anybody with the link can access)
  3. Publicly visible to everybody (published on Galaxy)

Let’s see how we can share our work in Galaxy.

Hands On: Share your work
  1. Share your history with another Galaxy user.
    • If you do not know other Galaxy users, publish it for everybody to see.

    Sharing your history allows others to import and access the datasets, parameters, and steps of your history.

    Access the history sharing menu via the History Options dropdown (galaxy-history-options), and clicking “history-share Share or Publish”

    1. Share via link
      • Open the History Options galaxy-history-options menu at the top of your history panel and select “history-share Share or Publish”
        • galaxy-toggle Make History accessible
        • A Share Link will appear that you give to others
      • Anybody who has this link can view and copy your history
    2. Publish your history
      • galaxy-toggle Make History publicly available in Published Histories
      • Anybody on this Galaxy server will see your history listed under the Published Histories tab opened via the galaxy-histories-activity Histories activity
    3. Share only with another user.
      • Enter an email address for the user you want to share with in the Please specify user email input below Share History with Individual Users
      • Your history will be shared only with this user.
    4. Finding histories others have shared with me
      • Click on the galaxy-histories-activity Histories activity in the activity bar on the left
      • Click the Shared with me tab
      • Here you will see all the histories others have shared with you directly

    Note: If you want to make changes to your history without affecting the shared version, make a copy by going to History Options galaxy-history-options icon in your history and clicking Copy this History

  2. [If possible] Have somebody else share a history with you.

  3. Find histories shared with you
    • If nobody shared a history with you, choose a public history

    To find shared histories

    1. Select Histories from the Activity Bar
    2. Click on the Histories shared with me tab
      • Published workflows are available from the Public Histories tab

    To work with shared histories:

    1. Click on view galaxy-eye on the history item.
    2. Preview the history to make sure it is the one you want
    3. Click on Import this history button at the top to make a copy under your own account

    Note: Shared Histories (when copied into your account or not) do count in portion toward your total account data quota usage. More details on histories shared concerning account quota usage can be found in this link.

  4. Import this history into your own account to start working with it.

You now have your own copy of this history in your account. Any changes you make will not affect the original history. This is a quick and easy way to collaborate with your colleagues on Galaxy.

Workflows can be shared or published in a similar way.

  1. Click on galaxy-workflows-activity Workflows in the Galaxy activity bar (on the left side of the screen, or in the top menu bar of older Galaxy instances). You will see a list of all your workflows
  2. Click on the history-share Share button of the workflow you would like to publish

  3. Go to Share Workflow with Individual Users

  4. Enter the email address of the user you want to share the workflow with.

  5. They will be able to find and import your workflow under Workflows shared with me tab in the Workflows section of the Activity bar

  1. Click on galaxy-workflows-activity Workflows in the Galaxy activity bar (on the left side of the screen, or in the top menu bar of older Galaxy instances). You will see a list of all your workflows
  2. Click on the history-share Share button of the workflow you would like to publish
  3. Click on Make Workflow accessible. This makes the workflow publicly accessible but unlisted.
  4. To also list the workflow for all users on the Public workflows tab of the galaxy-workflows-activity Workflows page, click Make Workflow publicly available in Published Workflows

When you publish a paper about your research, we recommend always publishing your workflow and history (e.g. as an RO-crate) with your journal article for optimal FAIRness.

Workflows can be shared in dedicated workflow repositories such as WorkflowHub for increased visibility. Similarly, you can publish your exported history to repositories such as Zenodo or other data repositories.

Sharing every aspect of your research, from data to metadata and workflows, enables other researchers to reuse your work (and cite you!) and build on top of it. Teamwork makes the (science) dream work!

Reuse: Find and run workflows shared by others

The RDM lifecycle with the reuse stage highlighted.

The ultimate goal of preserving and sharing your research data and analyses is to enable others to repeat your analysis and reuse your work. To illustrate this, we will now show you how you can find and reuse shared Galaxy workflows.

Where to find Galaxy Workflows

There are various places where you can find Galaxy workflows to reuse:

  1. IWC (Intergalactic Workflows Commission). High-quality workflows curated by Galaxy community experts.
  2. WorkflowHub. A registry for describing, sharing and publishing scientific computational workflows. Not limited to Galaxy workflows.
  3. Dockstore. A free and open source platform for sharing reusable and scalable analytical tools and workflows.
  4. The “Published Workflows” section in Galaxy. All the workflows published by others on your Galaxy.
  5. Workflow definition files (ending in .ga) shared with you by others, e.g. in a publication.

In the following sections, we will showcase some of these workflow repositories.

Showcase 1: WorkflowHub

About the workflow

We will now walk you through reusing the Voronoi segmentation workflow you may recognize from the video at the start of this tutorial

This workflow has been made available via WorkflowHub. We will import this workflow into Galaxy, upload a dataset, and run the workflow in Galaxy.

From Wikipedia: In mathematics, a Voronoi diagram is a partition of a plane into regions close to each of a given set of objects. It can also be classified as a tessellation. In the simplest case, these objects are just finitely many points in the plane (called seeds, sites, or generators). For each seed, there is a corresponding region, called a Voronoi cell, consisting of all points of the plane closer to that seed than to any other.

example of voronoi diagram.

screenshot of the voronoi workflow from workflowhub.

For more information about this workflow and a full walkthrough of all its steps, see also the full GTN tutorial

Import the Workflow

We start by importing this workflow into Galaxy.

Hands-on: Choose Your Own Tutorial

This is a 'Choose Your Own Tutorial' (CYOT) section (also known as 'Choose Your Own Analysis' (CYOA)), where you can select between multiple paths. Click one of the buttons below to select how you want to follow the tutorial

If you are working on GalaxyEU (usegalaxy.eu), the next step can be made a bit quicker. If that is the case, choose the corresponding button below

Hands On: Obtain workflow from WorkflowHub
  1. Open WorkflowHub
    • Here you can browse for workflows
    • On the left panel, you can filter workflows by type (Galaxy, Nextflow, CWL, etc.)
    • Search for “Voronoi” via the search bar at the top
  2. Click on the first result, named “Voronoi segmentation”
  3. This will lead to the Workflow page

    screenshot of workflowhub.

  4. Click on the “Run on Galaxy” button in the top-right instead!

    screenshot of workflowhub.

  5. This will automatically import the workflow to Galaxy EU and display the workflow run window.

    screenshot of workflowhub.

  6. In the Activity Bar, click on Workflows
    • you will see the workflow listed under My Workflows

    screenshot of workflowhub.

Hands On: Obtain workflow from WorkflowHub
  1. Open WorkflowHub
    • Here you can browse for workflows
    • On the left panel, you can filter workflows by type (Galaxy, Nextflow, CWL, etc.)
    • Search for “Voronoi” via the search bar at the top
  2. Click on the first result, named “Voronoi segmentation”
  3. This will lead to the Workflow page

    screenshot of workflowhub.

  4. Click on the Files tab

    screenshot of workflowhub.

  5. Click on voronoi-segmentation.ga in the file list

  6. Download the .ga file OR copy the URL to it (via the “Raw” button)

  7. In Galaxy, click on Workflows in the Activity Bar

  8. Import the Voronoi workflow to Galaxy via URL or file upload

    1. Click on galaxy-workflows-activity Workflows in the Galaxy activity bar (on the left side of the screen, or in the top menu bar of older Galaxy instances). You will see a list of all your workflows
    2. Click on galaxy-upload Import at the top-right of the screen
    3. Provide your workflow
      • Option 1: Paste the URL of the workflow into the box labelled “Archived Workflow URL”
      • Option 2: Upload the workflow file in the box labelled “Archived Workflow File”
    4. Click the Import workflow button

    Below is a short video demonstrating how to import a workflow from GitHub using this procedure:

    Video: Importing a workflow from URL

Run the workflow

Hands On
  1. Create a new history, and give it a good name

    To create a new history simply click the new-history icon at the top of the history panel:

    UI for creating new history

  2. Upload the two input images by URL

    https://zenodo.org/records/18803585/files/tree-image.tiff
    https://zenodo.org/records/18803585/files/tree-seeds.tiff
    
    • Copy the link location
    • Click galaxy-upload Upload at the top of the activity panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

  3. Run the Voronoi workflow with the following inputs:
    • Image: tree-image.tiff
    • Seeds: tree-seeds.tiff
    1. Click on Workflows on the Activity Bar on the left.
    2. At the top of the resulting page you will have the option to switch between the My workflows, Workflows shared with me and Public workflows tabs.
    3. Select the tab you want to see all workflows in that category
    4. Search for your desired workflow.

      Select workflow

    5. Click on the workflow name: a pop-up window opens with a preview of the workflow.
    6. To run it directly: click Run (top-right).
    7. Recommended: click Import (left of Run) to make your own local copy under Workflows / My Workflows.

  4. After the workflows are completed (~5-10 minutes), you can explore the outputs

example output from voronoi workflow.

The details of this workflow are out of scope for this tutorial; the important thing is that you have seen how to find and import workflows shared by others.

Showcase 2: IWC

All the workflows from IWC are reviewed and maintained by a group of Galaxy experts.

All IWC workflows are available from the [IWC Workflow Library]((https://iwc.galaxyproject.org/). The IWC workflow library makes it even easier to try out workflows by providing example data preconfigured with the workflow.

Hands On: Try an IWC workflow with example data
  1. Open the IWC Workflow Library

    IWC workflow library home page.

  2. Browse for a workflow that interests you.
  3. On the workflow page, look at the options at the bottom right
    • Select your Galaxy from the dropdown
    • Click Try with Example Data

    IWC workflow library home page.

  4. Your workflow will now be imported to your account, and the workflow run menu will be opened, preconfigured with example inputs
    • Simply click Run Workflow to start it (top-right)

    IWC workflow library home page.

  5. Once the workflow is completed, you can explore the outputs
    • If you picked the Segmentation workflow, one of the outputs is this image, a microscope image analysed to detect objects (cells in this case), count them and label them:

      output image from segmentation workflow.

This is a great way to evaluate a workflow without requiring the effort of finding good example datasets.

Conclusion

Congratulations! You have now completed this introduction to Galaxy and seen how Galaxy can support you in every phase of the research data life cycle.

The RDM lifecycle with Galaxy features listed for each stage.