Galaxy Interactive Tools

Overview
Creative Commons License: CC-BY Questions:
  • What is an Interactive Tool on Galaxy (GxIT)?

  • How to set up a GxIT?

Objectives:
  • Discover what Galaxy Interactive Tools (GxIT) are

  • Understand how GxITs are structured

  • Understand how GxITs work

  • Be able to dockerise a basic web application

  • Be able to wrap a dockerised application as a GxIT

  • Be able to test and debug a new GxIT locally and on a Galaxy server

  • Be able to distribute a new GxIT for others to use

Requirements:
Time estimation: 3 hours
Supporting Materials:
Published: Mar 2, 2022
Last modification: Dec 3, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00116
version Revision: 9

This tutorial demonstrates how to build and deploy a Galaxy Interactive Tool (GxIT). GxITs are accessible through the Galaxy tool panel, like any installed Galaxy tool. Our example application is a simple R Shiny app that we call Tabulator.

There are three elements to a GxIT - an application script, a Docker container and a Galaxy tool XML file. This tutorial will take you through creating those components, and installing them as a new Interactive Tool into a local Galaxy instance and an existing Galaxy instance.

Comment: If you plan to use an existing Galaxy instance

The Galaxy server requires specific configuration in order to run Interactive Tools! Please refer to this admin tutorial for setting up a compatible Galaxy instance for development and testing of your GxIT. As well as updating the Galaxy server configuration, you will also have to configure the server’s DNS provider to allow wildcard DNS records. This allows Galaxy to create unique host names (subdomains) for GxITs to be served over, separating them from the main Galaxy application.

Agenda

In this tutorial, we will cover:

  1. How do Interactive Tools work?
  2. When is an Interactive Tool appropriate?
  3. The development process
  4. The application
    1. The R scripts
    2. The Dockerfile
    3. Test the image
    4. Push the image
    5. The tool XML
  5. Additional components
    1. Run script
    2. Templated config files
    3. Reserved environment variables
    4. Galaxy history interaction
  6. The JupyterLab
    1. The Dockerfile
    2. Test the image
    3. Push the image
    4. The tool XML
  7. Testing locally
    1. Docker installation
    2. Galaxy installation
    3. Galaxy configuration
    4. Run Galaxy
  8. Deployment in a running Galaxy instancce
  9. Debugging
    1. Self-destruct script
  10. Troubleshooting

How do Interactive Tools work?

Interactive tools are a special breed of Galaxy tools, which is relatively new to the Galaxy ecosystem - they are a work in progress! GxITs enable the user to run an entire web application through Galaxy, which opens as a new tab in the browser. This can enable users to explore and manipulate data in a rich interface, such as Jupyter notebooks and RStudio. To see some examples of GxITs in action, take a look at Galaxy EU “Live”.

Interactive tool development builds on the canonical tool-wrapping process. Instead of running a command, the tool feeds user input to a Docker container running the application. Once it’s up and running, the GxIT application can then be accessed through a unique URL generated by the Galaxy server. The user can then open the application, interact with their Galaxy data and then terminate the tool. On termination, the Docker container is stopped and removed, and the job is considered “complete”.

When is an Interactive Tool appropriate?

In a regular Galaxy tool the user passes data to the tool and waits for it to run. They then get some output file(s) when the tool run is complete. In an Interactive Tool, however, the users are provided with a graphical web interface allowing them to interact with their data in real time. This is great for visualising data, but if it is possible to provide the same functionality with a regular tool (e.g. by rendering an HTML file as an output), then an Interactive Tool might not be necessary.

If you are sure that a static output is not sufficient, then it’s time to start building your first Interactive Tool!

Comment: Interactive tool infrastructure

Interactive tools require some rather complex infrastructure in order to work! However, most of the infrastructure requirements are taken care of by Galaxy core. As such, wrapping a new GxIT requires only three components:

  • Application script(s)
  • Docker container image
  • Galaxy tool XML

However, as we will see in the next section, testing and deploying a GxIT is not so simple.

The development process

Since the infrastructure for building GxITs is not as well developed as regular tool wrapping, the development process is unfortunately not so streamlined. Where Planemo is typically used for tool linting and testing, the complex architecture of GxITs requires a local instance or a development server to be built to manually test and run the tool. In addition, they are currently not supported by the Galaxy ToolShed and have to be installed manually. This means that distributed GxITs can be found in the Galaxy core codebase, and they can be manually enabled by the Galaxy server administrator.

However, the build process itself is not too complex! We can break it down into just a few steps:

  1. Find or create the application you wish to install on Galaxy
  2. Find or create a Docker image containing this application
  3. Write a Galaxy tool XML to pass IT details to Galaxy and pass user input to the Docker container
  4. Add the tool XML to your Galaxy server (local or distant)
  5. Try out the tool in the Galaxy interface. Error messages might appear in the Galaxy history.
  6. If errors occur, revise the container or tool XML, and try again until the application is working.

The last step is likely where the most time is spent - the process requires iterative development of the Docker image and tool XML until they work together. As such, reducing the iteration time is the key to quick development! Throughout the tutorial, we’ll sprinkle in some tips on how to speed up the development cycle.

Comment: A note on architecture

When building a GxIT, it is best to keep as much logic as possible in the tool XML, while keeping the Docker image as generic as possible. Why? Updating the tool XML is simple for Galaxy admins and developers in the future. They can view the tool XML directly on the server and understand how the tool works. The Docker image, meanwhile, is relatively opaque to other developers and administrators. To understand the container they must locate the original Dockerfile, which is not always available. Updating the container is more complex, as we will see later. Additionally, keeping the Docker container generic makes it testable outside of Galaxy.

In short: updating the Docker container is hard, but updating the tool XML is easy!

Hands-on: Choose Your Own Tutorial

This is a "Choose Your Own Tutorial" section, where you can select between multiple paths. Click one of the buttons below to select how you want to follow the tutorial

Do you want to build a desktop application or a JupyterLab tool ?

The application

The application that we will wrap in this tutorial is a simple web tool which allows the user to upload csv and tsv files, manipulate them and download them. Our application is based on an R Shiny App hosted with a Shiny server.

Note that there is no link between this Interactive Tool and the Galaxy history. More complex applications might be able to read and write outputs to the user’s history to create a more integrated experience - see the Additional components section to see an example of how this can be done.

Our example application can already be found online. In the following sections, we will study how it can be built into a GxIT.

Hands-on

First, let’s clone the repository to take a quick look at it.

$ git clone https://github.com/Lain-inrae/geoc-gxit
$ cd geoc-gxit

$ tree .
├── Dockerfile
├── interactivetool_tabulator.xml
├── gxit
│   ├── app.R
│   └── install.R
├── Makefile
└── README.md

You’ll find a Galaxy tool XML, a Dockerfile and two R scripts that will be injected into the container image.

The R scripts

  • app.R defines the R Shiny application.
  • install.R will be used by the docker container to install the R packages needed to run app.R.

These are specific to your container; these are required for an R-Shiny container, but won’t be for other containers like a Jupyter notebook container.

The Dockerfile

If you need some help to start your Dockerfile you can always get some inspiration from the previous interactive tools built in Galaxy. Go check some of the Dockerfiles, for instance the one for the QGIS application or for ODV.

Docker allows an entire application context to be containerized. A typical web application consists of an operating system, installed dependancies, web server configuration, database configuration and, of course, the codebase of the software itself. A Docker container can encapsulate all of these components in a single “image”, which can be run on any machine with Docker installed.

Essentials of Docker:

  1. Write an image recipe as a Dockerfile. This single file selects an OS, installs software, pulls code repositories and copies files from the host machine (your computer).
  2. Build the image from your recipe:

    docker build -t <image_name> .

  3. View existing images with

    docker image list

  4. Run a container with a specified command:

    docker run <image_name> <command>

  5. View running containers:

    docker ps

  6. Stop a running container:

    docker stop <container_name>

  7. Remove a stopped container:

    docker container rm <container_name>

  8. Remove an image:

    docker image rm <container_name>

Let’s check out the Dockerfile that we’ll use to containerize our application.

This container recipe can be used to build a Docker image which can be pushed to a container registry in the cloud, ready for consumption by our Galaxy instance:

# Set image to build upon
FROM rocker/shiny

# set author
MAINTAINER Lain Pavot <lain.pavot@inra.fr>

## we copy the installer and run it before copying the entire project to prevent
## reinstalling everything each time the project has changed

COPY ./gxit/install.R /tmp/

RUN \
        apt-get update                                \
    &&  apt-get install -y --no-install-recommends    \
        fonts-texgyre                                 \
    &&  Rscript /tmp/install.R                        \
    &&  apt-get clean autoclean                       \
    &&  apt-get autoremove --yes                      \
    &&  rm -rf /var/lib/{apt,dpkg,cache,log}/         \
    &&  rm -rf /tmp/*                                 ;


# ------------------------------------------------------------------------------

# These default values can be overridden when we run the container:
#     docker run -p 8080:8080 -e PORT=8080 -e LOG_PATH=/tmp/shiny/gxit.log <container_name>

# We can also bind the container $LOG_PATH to a local directory in order to
# follow the log file from the host machine as the container runs. This command
# will create the log/ directory in our current working directory at runtime -
# inside we will find our Shiny app log file:
#     docker run -p 8888:8888 -e LOG_PATH=/tmp/shiny/gxit.log -v $PWD/log:/tmp/shiny <container_name>

ARG PORT=8765
ARG LOG_PATH=/tmp/gxit.log

ENV LOG_PATH=$LOG_PATH
ENV PORT=$PORT

# ------------------------------------------------------------------------------

# Edit shiny-server config to use our port
RUN cat /etc/shiny-server/shiny-server.conf \
    | sed "s/3838/${PORT}/" > /etc/shiny-server/shiny-server.conf.1
RUN mv /etc/shiny-server/shiny-server.conf.1 /etc/shiny-server/shiny-server.conf

# ------------------------------------------------------------------------------

RUN mkdir -p $(dirname "${LOG_PATH}")
EXPOSE $PORT
COPY ./gxit/app.R /srv/shiny-server/

CMD ["/bin/sh", "-c", "shiny-server > ${LOG_PATH} 2>&1"]

In a previous version of this tutorial, we ran the Shiny App with R -e "shiny:runApp()" rather than using shiny-server. The latter is better practice, because it ensures that ports are mapped correctly for websocket functionality. With shiny::runApp() you will probably notice a websocket timeout in the app when run as a GxIT - the UI often greys out and becomes unresponsive after 20-30 seconds.

This image is already hosted on Docker Hub , but anyone can use this Dockerfile to rebuild the image if necessary. If so, don’t forget to create a gxit folder containing app.R and install.R next to your Dockerfile.

Hands-on

Let’s start working on this Docker container.

  1. Install Docker as described on the docker website. Click on your distribution name to get specific information.

  2. Now let’s use the recipe to build our Docker image.

    # Build a container image from our Dockerfile
    IMAGE_TAG="myimage"
    LOG_PATH=`pwd`  # Create log output in current directory
    PORT=8765
    docker build -t $IMAGE_TAG --build-arg LOG_PATH=$LOG_PATH --build-arg PORT=$PORT .
    

    While developing the Docker container you may find yourself tweaking and rebuilding the container image many times. In the GitHub repository linked above, you’ll notice that the author has used a Makefile to accelerate the build and deploy process. This allows the developer to simply run make docker and make push_hub to build and push the container, or make to rebuild the container after making changes during development. Check out the Makefile to see what commands can be run using make in this repository.

If you are lucky, you might find an available Docker image for the application you are trying to wrap. Some configuration changes can be needed such as:

  1. Expose the correct port. The application, Docker and tool XML ports must be aligned!
  2. Log output to an external file - useful for debugging.
  3. Make the application callable from tool <command> - this sometimes requires a wrapper script to interface the application inside the container (we’ll take a look at this later).

Test the image

Before we go pushing our container to the cloud, we should give it a local test run to ensure that it’s working correctly on our development machine. Have a play and see how our little web app works!

Hands-on
# Run our application in the container
docker run -it -p 127.0.0.1:8765:$PORT $IMAGE_TAG

# Or to save time, take advantage of the Makefile
make it

# Give it a few moments to start up, and the application should be available
# in your browser at http://127.0.0.1:8765

Push the image

If you are happy with the image, we are ready to push it to a container registry to make it accessible to our Galaxy server.

During development, we suggest making an account on Docker Hub if you don’t have one already. This can be used for hosting container images during development. Docker Hub has great documentation on creating repositories, authenticating with tokens and pushing images.

Hands-on
# Set remote tag for your container. This should include your username and
# repository name for Docker Hub.
REMOTE=<DOCKERHUB_USERNAME>/my-first-gxit

# Tag your image
docker tag $IMAGE_TAG:latest $REMOTE:latest

# Authenticate your DockerHub account
docker login  # >>> Enter username and token for your account

# Push the image
docker push $REMOTE:latest

For production deployment, the Galaxy standard for container image hosting is Biocontainers. This requires you to make a pull request against the Biocontainers GitHub repository, so this should only be done when an image is considered production-ready. You can also push your image to a repository on hub.docker.com or quay.io but please ensure that it links to a public code repository (e.g. GitHub) to enable maintenance of the image by the Galaxy community!

You should now have a container in the cloud, ready for action. Check out your repo on Docker Hub and you should find the container image there. Awesome!

Now we just need to write a tool XML that will enable Galaxy to pull and run our new Docker container as a Galaxy tool.

The tool XML

Hands-on

Create a Galaxy tool XML file named interactivetool_tabulator.xml. The file is similar to a regular tool XML, but calls on our remote Docker image as a dependency. The tags that we are most concerned with are:

  • A <container> (under the <requirements> tag)
  • A <port> which matches our container
  • An <input> file
  • The <command> section
Comment: Writing the tool command

This step can cause a lot of confusion. Here are a few pointer that you will find critical to understanding the process:

  • The <command> will be templated by Galaxy
  • The templated command will run inside the Docker container
<tool id="interactive_tool_tabulator" tool_type="interactive" name="Tabulator" version="0.1">
    <description>Tuto tool for Gxit</description>

    <requirements>
        <container type="docker">ancelete/geoc-gxit:latest</container>
    </requirements>

    <entry_points>
        <entry_point name="first gxit" requires_domain="True">
            <port>8765</port>

            <!--
                 Some apps have a non-root entrypoint.
                 We can provide the URL with a <url> tag like this:
                 <url>/my/entrypoint</url>
             -->
            <url>/</url>

        </entry_point>
    </entry_points>

    <environment_variables>
        <!-- These will be accessible as environment variables inside the Docker container -->
    </environment_variables>

    <command><![CDATA[

        ## The command will be templated by Cheetah within Galaxy, and
        ## then run inside the Docker container!

        ## This only works because Galaxy's user data directory is mapped
        ## onto the Docker container at runtime - enabling access to
        ## '$infile' and '$outfile' from inside the container.

        shiny-server 2>&1 > /var/log/tuto-gxit-01.log
        ## The log file can be found inside the container, for debugging purposes

    ]]>
    </command>

    <inputs>
    </inputs>

    <outputs>
        <!--
            Even if our IT doesn't export to Galaxy history,
            adding an output ensures to keep track of the IT
            execution in the history
        -->

        <data name="file_output" format="txt"/>
    </outputs>

    <tests>
        <!-- Tests are difficult with GxITs! -->
    </tests>

    <help> <![CDATA[

        Some help is always of interest ;)

    ]]></help>
    <citations>
       <citation type="bibtex">
       @misc{
            author       = {Lain Pavot - lain.pavot@inrae.fr},
            title        = {first-gxit -  A tool to visualise tsv/csv files},
            publisher    = {INRAE},
            url          = {}
        }
        </citation>
    </citations>
</tool>

Don’t forget to change the image path (see the $REMOTE variable above) and the citation to fit your project settings.

Additional components

The GxIT that we wrapped in this tutorial was a simple example, and you should now understand what is required to create an Interactive Tool for Galaxy. However, there are a few additional components that can enhance the reliability and user experience of the tool. In addition, more complex applications may require some additional components or workarounds the create the desired experience for the user.

Run script

In the case of our Tabulator application, the run script is simply the R script that renders our Shiny App. It is quite straightforward to call this from our Galaxy tool XML. However, some web apps might require more elaborate commands to be run. In this situation, there are several solutions demonstrated in the <command> section of existing GxITs:

  • Guacamole Desktop: application startup with startup.sh
  • HiCBrowser: application startup with supervisord
  • AskOmics: configuration with Python and Bash scripts, followed by start_all.sh to run the application.

Templated config files

Using the <configfiles> section in the tool XML, we can enable complex user configuration for the application by templating a run script or configuration file to be read by the application. In this application, for example, we could use a <configfiles> section to template user input into the app.R script that runs the application within the Docker container. This could enable the user to customize the layout of the app before launch.

Reserved environment variables

There are a few environment variables that are accessible in the command section of the tool XML - these can be handy when writing your tool script. Check the docs for a full reference on the tool XML.

$__tool_directory__
$__root_dir__
$__user_id__
$__user_email__

It can also be useful to create and inject environment variables into the tool context. This can be achieved using the <environment variables> tag in the tool XML. The RStudio GxIT again provides an example of this:

<environment_variables>
    <environment_variable name="HISTORY_ID" strip="True">${__app__.security.encode_id($jupyter_notebook.history_id)}</environment_variable>
    <environment_variable name="REMOTE_HOST">${__app__.config.galaxy_infrastructure_url}</environment_variable>
    <environment_variable name="GALAXY_WEB_PORT">8080</environment_variable>
    <environment_variable name="GALAXY_URL">$__galaxy_url__</environment_variable>
    <environment_variable name="DEBUG">true</environment_variable>
    <environment_variable name="DISABLE_AUTH">true</environment_variable>
    <environment_variable name="API_KEY" inject="api_key" />
</environment_variables>

Galaxy history interaction

We have demonstrated how to pass an input file to the Docker container. But what if the application needs to interact with the user’s Galaxy history? For example, if the user creates a file within the application. That’s where the environment variables created in the tool XML become useful.

From the R-Studio GxIT we can see that there is an R library that allows us to interact with Galaxy histories.

“The convenience functions gx_put() and gx_get() are available to you to interact with your current Galaxy history. You can save your workspace with gx_save()

Under the hood, this library uses galaxy_ie_helpers - a Python interface to Galaxy histories written with BioBlend. You could also use BioBlend directly (or even the Galaxy REST API) if your GxIT requires a more flexible interface than these wrappers provide.

The JupyterLab

The JupyterLab environment that we will wrap in this tutorial is a simple JupyterLab tool which allows the user to upload data in a Jupyter environment, manipulate notebooks, and download outputs.

Our example JupyterLab can already be found online . But you can also check another implementation of JupyterLabs. In the following sections, we will study how it can be built into a GxIT.

Hands-on

First, let’s clone the repository to take a quick look at a basic implementation of a JupyterLab.

$ git clone https://github.com/bgruening/docker-jupyter-notebook.git
$ cd docker-jupyter-notebook

$ tree .
├── Dockerfile
├── LICENSE
├── README.md
├── .
├── .
├── .
└── startup.sh

You’ll find a Dockerfile, startup.sh, and a README that will describe how the Dockerfile here for a GxIT works. We encourage you to read this README.

The Dockerfile

If you need some help to start your Dockerfile you can always get some inspiration from the previous interactive tools built in Galaxy. Go check some of the Dockerfiles, for instance, the one for the Copernicus data space ecosystem JupyterLab or for a basic JupyterLab here.

Docker allows an entire application context to be containerized. A typical web application consists of an operating system, installed dependancies, web server configuration, database configuration and, of course, the codebase of the software itself. A Docker container can encapsulate all of these components in a single “image”, which can be run on any machine with Docker installed.

Essentials of Docker:

  1. Write an image recipe as a Dockerfile. This single file selects an OS, installs software, pulls code repositories and copies files from the host machine (your computer).
  2. Build the image from your recipe:

    docker build -t <image_name> .

  3. View existing images with

    docker image list

  4. Run a container with a specified command:

    docker run <image_name> <command>

  5. View running containers:

    docker ps

  6. Stop a running container:

    docker stop <container_name>

  7. Remove a stopped container:

    docker container rm <container_name>

  8. Remove an image:

    docker image rm <container_name>

Let’s check out the Dockerfile that we’ll use to containerize our JupyterLab.

This container recipe can be used to build a Docker image which can be pushed to a container registry in the cloud, ready for consumption by our Galaxy instance:

# Jupyter container used for Galaxy copernicus notebooks (+other kernels) Integration

# from 5th March 2021
FROM jupyter/datascience-notebook:python-3.10

MAINTAINER Björn A. Grüning, bjoern.gruening@gmail.com

ENV DEBIAN_FRONTEND noninteractive
USER root

RUN apt-get -qq update && \
    apt-get install -y wget unzip net-tools procps && \
    apt-get autoremove -y && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# Set channels to (defaults) > bioconda > conda-forge
RUN conda config --add channels conda-forge && \
    conda config --add channels bioconda
    #conda config --add channels defaults
RUN pip install --upgrade pip
RUN pip install --no-cache-dir bioblend galaxy-ie-helpers

ENV JUPYTER /opt/conda/bin/jupyter
ENV PYTHON /opt/conda/bin/python
ENV LD_LIBRARY_PATH /opt/conda/lib/

# Python packages
RUN conda config --add channels conda-forge && \
    conda config --add channels bioconda && \
    conda install --yes --quiet \
    bash_kernel \
    ansible-kernel \
    bioblend galaxy-ie-helpers \

With those packages installed, you have the bases to have a functional environment in your JupyterLab AND be able to link your JupyterLab to the Galaxy history with the package bioblend galaxy-ie-helpers.

Then you can add the package specific to the environment you want to have in your JupyterLab for instance

    # specific sentinel, openeo packages
    sentinelhub \
    openeo \
    # other packages for notebooks
    geopandas \
    rasterio \
    ipyleaflet \
    netcdf4 \
    h5netcdf \
    # Jupyter widgets
    jupytext && \
    conda clean -yt && \
    pip install jupyterlab_hdf \
    fusets

Then, you can add some configurations to have some configuration and a welcome notebook by default (this is generic to all JupyterLab GxITs).

ADD ./startup.sh /startup.sh
ADD ./get_notebook.py /get_notebook.py

# We can get away with just creating this single file and Jupyter will create the rest of the
# profile for us.
RUN mkdir -p /home/$NB_USER/.ipython/profile_default/startup/ && \
    mkdir -p /home/$NB_USER/.jupyter/custom/

COPY ./ipython-profile.py /home/$NB_USER/.ipython/profile_default/startup/00-load.py
COPY jupyter_notebook_config.py /home/$NB_USER/.jupyter/
COPY jupyter_lab_config.py /home/$NB_USER/.jupyter/

ADD ./custom.js /home/$NB_USER/.jupyter/custom/custom.js
ADD ./custom.css /home/$NB_USER/.jupyter/custom/custom.css
ADD ./default_notebook.ipynb /home/$NB_USER/notebook.ipynb

You can also add your own set of notebooks to guide the user like that:

# Download notebooks
RUN cd /home/$NB_USER/ &&  \
    wget -O notebook-samples.zip https://github.com/eu-cdse/notebook-samples/archive/refs/heads/main.zip && \
    unzip notebook-samples.zip && \
    rm /home/$NB_USER/notebook-samples.zip && \
    mv /home/$NB_USER/notebook-samples-main/geo /home/$NB_USER && \
    mv /home/$NB_USER/notebook-samples-main/sentinelhub /home/$NB_USER && \
    mv /home/$NB_USER/notebook-samples-main/openeo /home/$NB_USER && \
    rm -r /home/$NB_USER/notebook-samples-main

Finally, some general variables of environment:

# ENV variables to replace conf file
ENV DEBUG=false \
    GALAXY_WEB_PORT=10000 \
    NOTEBOOK_PASSWORD=none \
    CORS_ORIGIN=none \
    DOCKER_PORT=none \
    API_KEY=none \
    HISTORY_ID=none \
    REMOTE_HOST=none \
    GALAXY_URL=none

# @jupyterlab/google-drive  not yet supported

USER root
WORKDIR /import

# Start Jupyter Notebook
CMD /startup.sh

And with all this your Dockerfile is ready.

You now need to publish this image on a public repository. Anyone can use this Dockerfile to rebuild the image if necessary. In any case don’t forget to have all the files shown when we cloned this repository next to your Dockerfile.

Hands-on

Let’s start working on this Docker container.

  1. Install Docker as described on the docker website. Click on your distribution name to get specific information.

  2. Now let’s use the recipe to build our Docker image.

    # Build a container image from our Dockerfile
    IMAGE_TAG="myimage"
    LOG_PATH=`pwd`  # Create log output in current directory
    PORT=8765
    docker build -t $IMAGE_TAG --build-arg LOG_PATH=$LOG_PATH --build-arg PORT=$PORT .
    

    While developing the Docker container you may find yourself tweaking and rebuilding the container image many times. In the GitHub repository linked above, you’ll notice that the author has used a Makefile to accelerate the build and deploy process. This allows the developer to simply run make docker and make push_hub to build and push the container, or make to rebuild the container after making changes during development. Check out the Makefile to see what commands can be run using make in this repository.

If you are lucky, you might find an available Docker image for the application you are trying to wrap. Some configuration changes can be needed such as:

  1. Expose the correct port. The application, Docker and tool XML ports must be aligned!
  2. Log output to an external file - useful for debugging.
  3. Make the application callable from tool <command> - this sometimes requires a wrapper script to interface the application inside the container (we’ll take a look at this later).

Test the image

Before we push our container to the cloud, we should give it a local test run to ensure that it’s working correctly on our development machine. Have a play and see how our little web app works!

Hands-on
# Run our application in the container
docker run -it -p 127.0.0.1:8765:$PORT $IMAGE_TAG

# Or to save time, take advantage of the Makefile
make it

# Give it a few moments to start up, and the application should be available
# in your browser at http://127.0.0.1:8765

Push the image

If you are happy with the image, we are ready to push it to a container registry to make it accessible to our Galaxy server.

During development, we suggest making an account on Docker Hub if you don’t have one already. This can be used for hosting container images during development. Docker Hub has great documentation on creating repositories, authenticating with tokens and pushing images.

Hands-on
# Set remote tag for your container. This should include your username and
# repository name for Docker Hub.
REMOTE=<DOCKERHUB_USERNAME>/my-first-gxit

# Tag your image
docker tag $IMAGE_TAG:latest $REMOTE:latest

# Authenticate your DockerHub account
docker login  # >>> Enter username and token for your account

# Push the image
docker push $REMOTE:latest

For production deployment, the Galaxy standard for container image hosting is Biocontainers. This requires you to make a pull request against the Biocontainers GitHub repository, so this should only be done when an image is considered production-ready. You can also push your image to a repository on hub.docker.com or quay.io but please ensure that it links to a public code repository (e.g. GitHub) to enable maintenance of the image by the Galaxy community!

You should now have a container in the cloud, ready for action. Check out your repo on Docker Hub and you should find the container image there. Awesome!

Now we just need to write a tool XML that will enable Galaxy to pull and run our new Docker container as a Galaxy tool.

The tool XML

Hands-on

Create a Galaxy tool XML file named interactivetool_copernicus.xml. The file is similar to a regular tool XML, but calls on our remote Docker image as a dependency. The tags that we are most concerned with are:

  • A <container> (under the <requirements> tag)
  • A <port> which matches our container
  • An <input> file
  • The <command> section
Comment: Writing the tool command

This step can cause a lot of confusion. Here are a few pointer that you will find critical to understanding the process:

  • The <command> will be templated by Galaxy
  • The templated command will run inside the Docker container
<tool id="interactive_tool_copernicus_notebook" tool_type="interactive" name="Copernicus Data Space Ecosystem" version="@VERSION@">
    <description>sample notebooks to access and discover data</description>
    <macros>
        <token name="@VERSION@">0.0.1</token>
    </macros>
    <requirements>
        <container type="docker">quay.io/galaxy/copernicus-jupyterlab:@VERSION@</container>
    </requirements>
    <entry_points>
        <entry_point name="Jupyter Interactive Tool" requires_domain="True">
            <port>8888</port>
            <url>ipython/lab</url>
        </entry_point>
    </entry_points>
    <environment_variables>
        <environment_variable name="HISTORY_ID">$__history_id__</environment_variable>
        <environment_variable name="REMOTE_HOST">$__galaxy_url__</environment_variable>
        <environment_variable name="GALAXY_WEB_PORT">8080</environment_variable>
        <environment_variable name="GALAXY_URL">$__galaxy_url__</environment_variable>
        <  environment_variable name="API_KEY" inject="api_key" />
    </environment_variables>
    <command detect_errors="aggressive"><![CDATA[
    #import re
    export GALAXY_WORKING_DIR=`pwd` &&
    mkdir -p ./jupyter/outputs/ &&
    mkdir -p ./jupyter/data &&
    mkdir -p ./jupyter/notebooks &&
    mkdir -p ./jupyter/notebooks/geo &&
    mkdir -p ./jupyter/notebooks/openeo &&
    mkdir -p ./jupyter/notebooks/sentinelhub &&

    #for $count, $file in enumerate($input):
        #set $cleaned_name = str($count + 1) + '_' + re.sub('[^\w\-\.\s]', '_', str($file.element_identifier))
        ln -sf '$file' './jupyter/data/${cleaned_name}' &&
    #end for

    ## change into the directory where the notebooks are located
    cd ./jupyter/ &&
    export HOME=/home/jovyan/ &&
    export PATH=/home/jovyan/.local/bin:\$PATH &&

    #if $mode.mode_select == 'scratch'
        ## copy all notebooks, workflows and data
        cp '$__tool_directory__/default_notebook.ipynb' ./ipython_galaxy_notebook.ipynb &&
        jupyter trust ./ipython_galaxy_notebook.ipynb &&
        cp -r /home/\$NB_USER/geo/* ./notebooks/geo/ &&
        cp -r /home/\$NB_USER/openeo/* ./notebooks/openeo/ &&
        cp -r /home/\$NB_USER/sentinelhub/* ./notebooks/sentinelhub/ &&


        ## provide all rights to copied files
        jupyter lab --allow-root --no-browser &&
        cp ./*.ipynb '$jupyter_notebook' &&
        
        cd outputs/ &&
        sleep 2 &&
        for file in *; do mv "\$file" "\$file.\${file\#\#*.}"; done
    #else
        #set $noteboook_name = re.sub('[^\w\-\.\s]', '_', str($mode.ipynb.element_identifier))
        cp '$mode.ipynb' './${noteboook_name}.ipynb' &&
        jupyter trust './${noteboook_name}.ipynb' &&
        #if $mode.run_it
            jupyter nbconvert --to notebook --execute --output ./ipython_galaxy_notebook.ipynb --allow-errors  ./*.ipynb &&
            #set $noteboook_name = 'ipython_galaxy_notebook'
        #else
            jupyter lab --allow-root --no-browser --NotebookApp.shutdown_button=True &&
        #end if
        cp './${noteboook_name}.ipynb' '$jupyter_notebook' &&
        
        cd outputs/ &&
        sleep 2 &&
        for file in *; do mv "\$file" "\$file.\${file\#\#*.}"; done
    #end if
]]>
    </command>
    <inputs>

        <conditional name="mode">
            <param name="mode_select" type="select" label="Do you already have a notebook?" help="If not, no problem we will provide you with a default one.">
                <option value="scratch">Start with a fresh notebook</option>
                <option value="previous">Load a previous notebook</option>
            </param>
            <when value="scratch"/>
            <when value="previous">
                <param name="ipynb" type="data" format="ipynb" label="Copernicus Notebook"/>
                <param name="run_it" type="boolean" truevalue="true" falsevalue="false" label="Copernicus notebook and return a new one."
                help="This option is useful in workflows when you just want to execute a notebook and not dive into the webfrontend."/>
            </when>
        </conditional>
        <param name="input" multiple="true" type="data" optional="true" label="Include data into the environment"/>
    </inputs>
    <outputs>
        <data name="jupyter_notebook" format="ipynb" label="Executed Copernicus notebook"></data>
        <collection name="output_collection" type="list" label="Copernicus outputs collection">
            <discover_datasets pattern="__designation_and_ext__" directory="jupyter/outputs" visible="true"/>
        </collection>
    </outputs>
    <tests>
        <test expect_num_outputs="1">
            <param name="mode" value="previous" />
            <param name="ipynb" value="test.ipynb" />
            <param name="run_it" value="true" />
            <output name="jupyter_notebook" file="test.ipynb" ftype="ipynb"/>
        </test>
    </tests>
    <help>
This tool contains sample Jupyter notebooks for the Copernicus Data Space Ecosystem. Notebooks are grouped per kernel: sentinelhub, openeo and geo. To know more https:/github.com/eu-cdse/notebook-samples

To have more example notebooks produced by th OpenEO community that you can use in this jupyterlab go there https://github.com/Open-EO/openeo-community-examples/tree/main

    </help>
    <citations>
       <citation type="bibtex">
        @Manual{,
        title = {Copernicus Data Space Ecosystem },
        author = {The eu-cdse community},
        year = {2023},
        note = {https://github.com/eu-cdse}
        </citation>
	</citations>    
</tool>

In order to get your data from your galaxy history into your jupyterlab don’t forget to have this kind of line in the executable

#for $count, $file in enumerate($input):
    #set $cleaned_name = str($count + 1) + '_' + re.sub('[^\w\-\.\s]', '_', str($file.element_identifier))
    ln -sf '$file' './jupyter/data/${cleaned_name}' &&
#end for

AND to get your output in your history data

<collection name="output_collection" type="list" label="Copernicus outputs collection">
    <discover_datasets pattern="__designation_and_ext__" directory="jupyter/outputs" visible="true"/>
</collection>

Don’t forget to change the image path and the citation to fit your project settings.

Testing locally

You would like to check your GxIT integration in Galaxy but don’t have a development server or don’t want to disturb your sysadmin at this point? Let’s check this integration on your machine. You can use a VM if you prefer not to modify your machine environment.

Comment: A note on the OS

This part of the tutorial has been tested on Ubuntu and Debian and there is no guaranteed success for other operating systems. If you have another OS on your machine (i.e. Windows or MacOS), you may need to use an Ubuntu virtual machine or perhaps try Windows subsystem for Linux.

Docker installation

Hands-on: Install Docker

Install Docker as described on the docker website. Click on your distribution name to get specific information.

Galaxy installation

Hands-on: Install Galaxy

For Ubuntu:

# Install git to get Galaxy project
sudo apt-get install git
# Create a working directory and move to it
mkdir ~/GxIT && cd ~/GxIT
# Get the galaxy project. A new directory named "galaxy" will be created.
# This directory contains the whole project
git clone https://github.com/galaxyproject/galaxy
# Checkout the last stable version (v23.1 as the time of writing)
cd galaxy && git checkout release_23.1

Galaxy configuration

Hands-on
cd ~/GxIT/galaxy/config
# Create custom config files
cat galaxy.yml.interactivetools > galaxy.yml
cat tool_conf.xml.sample > tool_conf.xml

In galaxy.yml, ensure that the galaxy_infrastructure_url parameter is present under the galaxy section:

galaxy:
  galaxy_infrastructure_url: http://localhost:8080

This will make galaxy to provide your GxIT using links like http://your_gxit_identifier.localhost:8080.

Configure the tool panel by adding a section in ~/GxIT/galaxy/config/tool_conf.xml:

  <section id="interactivetools" name="Interactive tools">
    <tool file="interactive/interactivetool_tabulator.xml" />
    <tool file="interactive/interactivetool_copernicus.xml" />
  </section>

With these lines, Galaxy will create a new section named “Interactive tools” in the tool panel with our interactive tabulator inside. Choose whatever name and id you want as long as the id is unique. And of course, you have no obligation to put your GxITs in this section. You can put them in any section.

Next, create a simple job_conf.xml with the following contents, which basically specifies how Galaxy should execute a job. It is also possible to execute the tools directly from the local environment, but here we would like the tool to make use of the container we have just made.

<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
    </plugins>
    <destinations default="docker_dispatch">
        <destination id="local" runner="local"/>
        <destination id="docker_local" runner="local">
            <param id="docker_enabled">true</param>
            <param id="docker_volumes">$defaults</param>
            <param id="docker_sudo">false</param>
            <param id="docker_net">bridge</param>
            <param id="docker_auto_rm">true</param>
            <param id="require_container">true</param>
            <param id="container_monitor">true</param>
            <param id="docker_set_user"></param>
            <param id="docker_run_extra_arguments">--add-host localhost:host-gateway</param>
        </destination>
        <destination id="docker_dispatch" runner="dynamic">
            <param id="type">docker_dispatch</param>
            <param id="docker_destination_id">docker_local</param>
            <param id="default_destination_id">local</param>
        </destination>
    </destinations>
</job_conf>

Finally, copy your GxIT wrapper to the Interactive Tool directory (depending on if you followed the Application or Jupyterlab part of the tuto):

cp ~/my_filepath/interactivetool_tabulator.xml ~/GxIT/galaxy/tools/interactive/

OR

cp ~/my_filepath/interactivetool_copernicus.xml ~/GxIT/galaxy/tools/interactive/

Run Galaxy

Go to the Galaxy directory and:

./run.sh

Galaxy is available at http://localhost:8080/ and you should be able to use your GxIT. Congrats!

Deployment in a running Galaxy instancce

We now have all the required components and tested in a local Galaxy instance above, we can install the tool in our configured Galaxy instance for immediate production use. This is as simple as dropping the tool XML in the right location inside the Galaxy core application directory, and adding the tool to our tool_conf_interactive.xml file.

Hands-on: Installing
  1. Add the tool XML

    Access your Galaxy instance and take a look at the Galaxy application directory on to see the existing Interactive Tools:

    # Drop into the Galaxy application directory
    cd /srv/galaxy/server/
    
    # Show the existing GxIT tool files
    ls -l tools/interactive
    
  2. Now we can simply create our tools XML here by writing it with nano

    # Open a new file for editing
    sudo nano tools/interactive/interactivetool_tabulator.xml
    
    # >>>  paste the XML content from your code editor and save the file
    
  3. Enable the new tool

    This step is the same as activating any other existing Interactive Tool. See the admin tutorial for detailed instructions.

    # Open the Interactive Tools config file for editing:
    sudo nano /srv/galaxy/config/tool_conf.xml
    
  4. This configuration file should have been created when administering the Galaxy instance to serve Interactive Tools We just need to add a single line to this file to enable our tool. Can you figure it out?

    <toolbox monitor="true">
        <section id="interactivetools" name="Interactive Tools">
            <tool file="interactive/interactivetool_tabulator.xml" />
        </section>
    </toolbox>
    

    OR

    <toolbox monitor="true">
        <section id="interactivetools" name="Interactive Tools">
            <tool file="interactive/interactivetool_copernicus.xml" />
        </section>
    </toolbox>
    
  5. Now we just need to restart the Galaxy server to refresh the tool registry

    sudo galaxyctl restart
    

Have a look in the web interface of your Galaxy instance. You should find the new tool under the “Interactive tools” section in the tool panel. If so, we are ready to start testing it out!

To release a GxIT for production use, we must distribute two components:

  • the Galaxy tool XML
  • the Docker image

We have already pushed the Docker image to the cloud (though it should be hosted on an approved registry for production use).

All that’s left is to distribute the tool XML. This would conventionally be done through the ToolShed. But the ToolShed doesn’t support GxITs yet! This leaves us only two options for distributing the tool XML:

  • Make a pull request against Galaxy core to include the XML file under tools/interactive/
  • Deploy the tool to specific Galaxy instance(s) in an Ansible Playbook

The steps that we took in this section can be easily incorporated into an Ansible playbook for deploying GxITs to a Galaxy server. This means that you can manage and deploy a GxIT as part of your Galaxy instance without merging into the galaxyproject/galaxy repository (or a fork of it).

The Interactive Tools admin tutorial demonstrates how this can be acheived by adding our tool XML to the “local tools” section of the Ansible Playbook. However, for our GxIT to show up in the correct tool panel, we need to add an extra config file: local_tool_conf.xml.

  1. Copy the GxIT tool XML to files/galaxy/tools/interactivetool_tabulator.xml in your Ansible directory

  2. Create the template templates/galaxy/local_tool_conf.xml.j2

    <?xml version='1.0' encoding='utf-8'?>
    <toolbox monitor="true" tool_path="{{ galaxy_local_tools_dir }}">
        <section id="interactivetools" name="Interactive tools">
            <tool file="interactivetool_tabulator.xml" />
        </section>
    </toolbox>
    

OR

   <?xml version='1.0' encoding='utf-8'?>
   <toolbox monitor="true" tool_path="{{ galaxy_local_tools_dir }}">
       <section id="interactivetools" name="Interactive tools">
           <tool file="interactivetool_copernicus.xml" />
       </section>
   </toolbox>
  1. Create variables in the following sections of group_vars/galaxyservers.yml

    # ...
    galaxy_local_tools_dir: "{{ galaxy_server_dir }}/tools/local"
    galaxy_tool_config_files:
      # ...
      - "{{ galaxy_config_dir }}/local_tool_conf.xml"
    
  2. Run the playbook and your Interactive Tool should be available at the bottom of the tool panel

    ansible-playbook galaxy.yml
    

Debugging

The most obvious way to test a tool is simply to run it in the Galaxy UI, straight from the tool panel. If you are extremely lucky, you will find that the tool starts up and runs without error. But we all know that never happens! So this is where we start iteratively debugging our tool, until it functions as expected.

Comment: A successful tool run

It is worth pointing out that the appearance of a GxIT in the Galaxy user history is not intuitive when you are used to running “regular” tool jobs. When the history item turns orange (“processing”), that’s when a GxIT is actually ready to use! At this point, the tool UI should refresh and display a link to the active GxIT. Remember, the history item doesn’t turn green until a job has terminated. With a GxIT, that only happens when the tool has been stopped by the user, or by wall time limits imposed by the Galaxy administrators.

Testing and debugging is currently the trickiest part of GxIT development. Ideally, Galaxy core will be developed in the future to better support the process, but for the time being we have to make the most with what is available! In the future, we would like to see GxITs being tested with Planemo, and being installed and tested by Ephemeris from the ToolShed.

Self-destruct script

Unlike regular tools that exit after the execution of the underlying command is complete, web applications will run indefinitely until terminated. With Galaxy’s legacy “Interactive Environments”, this used to result in “zombie” containers hanging around and clogging up the Galaxy server. You may notice a terminate.sh script in some older GxITs as a workaround to this problem, but the new GxIT architecture handles container termination for you. This script is no longer required nor recommended.

Troubleshooting

Having issues with your Interactive Tool? Here are a few ideas for how to troubleshoot your application. Remember that Galaxy Interactive Tools are a work in progress, so feel free to get creative with your solutions here!

  • Getting an error in the Galaxy History? Click on the “view” icon to see details of the tool run, including the tool command, stdout and stderr.
  • If the tool’s stdout/stderr is not enough, consider modifying the Docker image to make it more verbose. Add print/log statements and assertions. Write an application log to a file that can be collected as Galaxy output.
  • Try running the container with Docker directly on your development machine. If the application doesn’t work independently, it certainly won’t work inside Galaxy!
  • If you need to debug the Docker container itself, it can be useful to write output/logging to a mounted volume that can be inspected after the tool has run.
  • You can also open a bash terminal inside the container to check the container state while the application is running: docker exec -it mycontainer /bin/bash