Debugging Memory Leaks

memray is a great memory profiler for debugging memory issues.

In the context of Galaxy, this is significantly easier for job handlers. Install it in your virtualenv and

memray run  --trace-python-allocators -o the_dump <your_handler_startup_command_here>

Once you’ve collected enough data,

memray flamegraph --leaks --temporal the_dump -o the_dump.html

would then produce a report that shows allocation made but not freed over time.

It might also be useful to just check what the process is doing with py-spy dump.

You can follow web workers in gunicorn with

memray run --follow-fork -o the_dump gunicorn 'galaxy.webapps.galaxy.fast_factory:factory()' --timeout 600 --pythonpath lib -k galaxy.webapps.galaxy.workers.Worker -b localhost:8082 --config python:galaxy.web_stack.gunicorn_config -w 1 --preload

the traced app will run on port 8082, you can then for instance in an upstream nginx section direct a portion of the traffic to your profiled app.

Define once, reference many times

Using variables, either by defining them ahead of time, or simply accessing them via existing data structures that have been defined, e.g.:

# defining a variable that gets reused is great!
galaxy_user: galaxy

# Re-using the galaxy_config_dir variable saves time and ensures everything
# is in sync!
datatypes_config_file: "{{ galaxy_config_dir }}/datatypes_conf.xml"

# and now we can re-use "{{ galaxy_config.galaxy.datatypes_config_file }}"
# in other places!

- src: templates/galaxy/config/datatypes_conf.xml
dest: "{{ galaxy_config.galaxy.datatypes_config_file }}"

Practices like those shown above help to avoid problems caused when paths are defined differently in multiple places. The datatypes config file will be copied to the same path as Galaxy is configured to find it in, because that path is only defined in one place. Everything else is a reference to the original definition! If you ever need to update that definition, everything else will be updated accordingly.

Error: "skipping: no hosts matched"

There can be multiple reasons this happens, so we’ll step through all of them. We’ll start by assuming you’re running the command

ansible-playbook galaxy.yml

The following things can cause issues:

  1. Within your galaxy.yml, you’ve referred to a host group that doesn’t exist or is misspelled. Check the hosts: galaxyservers to ensure it matches the host group defined in the hosts file.
  2. Vice-versa, the group in your hosts file should match the hosts selected in the playbook, galaxy.yml.
  3. If neither of these are the issue, it’s possible Ansible doesn’t know to check the hosts file for the inventory. Make sure you’ve specified inventory = hosts in your ansible.cfg.

Failing all jobs from a specific user

This command will let you quickly fail every job from the user ‘service-account’ (replace with your preferred user)

gxadmin tsvquery jobs --user=service-account --nonterminal | awk '{print $1}' |  xargs -I {} -n 1 gxadmin mutate fail-job {} --commit

How do I know what I can do with a role? What variables are available?

You don’t. There is no standard way for reporting this, but well written roles by trusted authors (e.g. geerlingguy, galaxyproject) do it properly and write all of the variables in the README file of the repository. We try to pick sensible roles for you in this course, but, in real life it may not be that simple.

So, definitely check there first, but if they aren’t there, then you’ll need to read through defaults/ and tasks/ and templates/ to figure out what the role does and how you can control and modify it to accomplish your goals.

How do I see what variables are set for a host?

If you are using a simple group_vars file only, per group, and no other variable sources, then it’s relatively easy to tell what variables are getting set for your host! Just look at that one file.

But if you have graduated into using a more complex setup, perhaps with multiple sets of variables, like for example:

├── group_vars
│ ├── all
│ │ ├── all.yml
│ │ └── secret.yml
│ ├── galaxyservers.yml
│ └── pulsarservers.yml
├── hosts
├── host_vars
│ ├──
│ │ ├── all.yml
│ │ └── secret.yml
│ ├──
│ │ ├── all.yml
│ │ ├── pulsar.yml
│ │ └── secret.yml

Then it might be harder to figure out what variables are being set, in full. This is where ansible-inventory command can be useful.

Graph shows you the structure of your host groups:

$ ansible-inventory --graph
| |
| |
| |

Here is a relatively simple, flat example, but this can be more complicated if you nest sub-groups of hosts:

| |--localhost
| |--@workshop_eu:
| | |
| | |
| |--@workshop_oz:
| |--@workshop_us:

List shows you all defined variables:

$ ansible-inventory --host | head
[WARNING]: While constructing a mapping from
/group_vars/galaxyservers.yml, line 3, column
1, found a duplicate dict key (tiaas_templates_dir). Using last defined value
"ansible_connection": "local",
"ansible_user": "ubuntu",
"certbot_agree_tos": "--agree-tos",
"certbot_auth_method": "--webroot",
"certbot_auto_renew": true,
"certbot_auto_renew_hour": "{{ 23 |random(seed=inventory_hostname) }}",
"certbot_auto_renew_minute": "{{ 59 |random(seed=inventory_hostname) }}",

And, helpfully, if variables are overridden in precedence you can see that as well with the above warnings.

Is YAML sensitive to True/true/False/false

By this reference, YAML doesn’t really care:

{ Y, true, Yes, ON   }    : Boolean true
{ n, FALSE, No, off } : Boolean false

Mapping Jobs to Specific Storage By User

It is possible to map your jobs to use specific storage backends based on user! If you have e.g. specific user groups that need their data stored separately from other users, for whatever political reasons, then in your dynamic destination you can do something like:

job_destination = app.job_config.get_destination(destination_id)
if user == "alice":
job_destination.params['object_store_id'] = 'foo' # Maybe lookup the ID from a mapping somewhere

If you manage to do this in production, please let us know and we can update this FAQ with any information you encounter.

Operating system compatibility

These Ansible roles and training materials were last tested on Centos 7 and Ubuntu 18.04, but will probably work on other RHEL and Debian variants.

The roles that are used in these training are currently used by usegalaxy.*, and other, servers in maintaining their infrastructure. (US, EU, both are running CentOS 7)

If you have an issue running these trainings on your OS flavour, please report the issue in the training material and we can see if it is possible to solve.

Running Ansible on your remote machine

It is possible to have ansible installed on the remote machine and run it there, not just from your local machine connecting to the remote machine.

Your hosts file will need to use localhost, and whenever you run playbooks with ansible-playbook -i hosts playbook.yml, you will need to add -c local to your command.

Be certain that the playbook that you’re writing on the remote machine is stored somewhere safe, like your user home directory, or backed up on your local machine. The cloud can be unreliable and things can disappear at any time.

Updating from 22.01 to 23.0 with Ansible

Galaxy introduced a number of changes in 22.05 and 23.0 that are extremely important to be aware of during the upgrade process. Namely a new database migration system, and a new required running environment (gunicorn instead of uwsgi).

The scripts to migrate to the new database migration system are only compatible with release 22.05, and then were subsequently removed, so it is mandatory to upgrade to 22.05 if you want to go further.

Here is the recommended update procedure with ansible:

  1. Update to 22.01 normally
  2. Change the release to 22.05, and run the upgrade
    1. Galaxy will probably not start correctly here, ignore it (even if the build fail, this if fine, just ignore).
    2. Run the database migration manually (with the galaxy user with the venv activated)

      GALAXY_CONFIG_FILE=/srv/galaxy/config/galaxy.yml sh /srv/galaxy/server/ -c /srv/galaxy/config/galaxy.yml upgrade
  3. Update your system’s ansible, you probably need something with a major version of at least 2.
  4. Set the release to 23.0 and make other required changes. There are a lot of useful changes, but the easiest procedure is probably something like:

    1. git clone
    2. cd git-gat
    3. git checkout c2e7bf6d3584fbf3281fb57d8024a9189f957e0e (this corresponds to the version of the repo after the 23.0 integration without too much customization and after potential bug fixes)
    4. Diff and sync (e.g. vimdiff group_vars/galaxyservers.yml git-gat/group_vars/galaxyservers.yml) for the main configuration files:

      • group_vars/all.yml
      • group_vars/dbservers.yml
      • galaxy.yml
      • requirements.yml (and don’t forget to install the new role versions)
      • hosts
      • templates/nginx/galaxy.j2

    But the main change is the swap from uwsgi to gravity+gunicorn

    -  uwsgi:
    - socket:
    - buffer-size: 16384
    - processes: 1
    - threads: 4
    - offload-threads: 2
    - static-map:
    - - /static=/static
    - - /favicon.ico=/static/favicon.ico
    - static-safe: client/galaxy/images
    - master: true
    - virtualenv: ""
    - pythonpath: "/lib"
    - module: galaxy.webapps.galaxy.buildapp:uwsgi_app()
    - thunder-lock: true
    - die-on-term: true
    - hook-master-start:
    - - unix_signal:2 gracefully_kill_them_all
    - - unix_signal:15 gracefully_kill_them_all
    - py-call-osafterfork: true
    - enable-threads: true
    - mule:
    - - lib/galaxy/
    - - lib/galaxy/
    - farm: job-handlers:1,2
    + gravity:
    + process_manager: systemd
    + galaxy_root: "/server"
    + galaxy_user: ""
    + virtualenv: ""
    + gunicorn:
    + # listening options
    + bind: "unix:/gunicorn.sock"
    + # performance options
    + workers: 2
    + # Other options that will be passed to gunicorn
    + # This permits setting of 'secure' headers like REMOTE_USER (and friends)
    + #
    + extra_args: '--forwarded-allow-ips="*"'
    + # This lets Gunicorn start Galaxy completely before forking which is faster.
    + #
    + preload: true
    + celery:
    + concurrency: 2
    + loglevel: DEBUG
    + handlers:
    + handler:
    + processes: 2
    + pools:
    + - job-handlers
    + - workflow-schedulers

    Some other important changes include:

    • uchida.miniconda is replaced with galaxyproject.conda
    • usegalaxy_eu.systemd is no longer needed
    • galaxy_user_name is defined in all.yml in the latest git-gat
    • the galaxy_job_config needs to have a database handling specified - assign set to db-skip-locked
    • git-gat also separates out the DB serving into a dbservers.yml host group
  5. Backup your venv, mv /srv/galaxy/venv/ /srv/galaxy/venv-old/, as your NodeJS is probably out of date and Galaxy doesn’t handle that gracefully
  6. Do any local customs for luck (knocking on wood, etc.)
  7. Run the playbook
  8. Things might go wrong with systemd units
    • try running galaxyctl -c /srv/galaxy/config/galaxy.yml update as root
    • you may also need to rm /etc/systemd/system/galaxy.service which is then no longer needed
    • you’ll have a and you can instead systemctl daemon-reload and systemctl start
  9. You may need to restart galaxy manually with sudo galaxyctl restart

Variable connection

When the playbook runs, as part of the setup, it collects any variables that are set. For a playbook affecting a group of hosts named my_hosts, it checks many different places for variables, including “group_vars/my_hosts.yml”. If there are variables there, they’re added to the collection of current variables. It also checks “group_vars/all.yml” (for the built-in host group all). There is a precedence order, but then these variables are available for roles and tasks to consume.

What if you forget `--diff`?

If you forget to use --diff, it is not easy to see what has changed. Some modules like the copy and template modules have a backup option. If you set this option, then it will keep a backup copy next to the destination file.

However, most modules do not have such an option, so if you want to know what changes, always use --diff.

What is the difference between the roles with `role:` prefix and without?

The bare role name is just simplified syntax for the roles, you could equally specifiy role: <name> every time but it’s only necessary if you want to set additional variables like become_user


Library Permission Issues

When running setup-data-libraries it imports the library with the permissions of the admin user, rather locked down to the account that handled the importing.

Due to how data libraries have been implemented, it isn’t sufficient to share the folder with another user, instead you must also share individual items within this folder. This is an unfortunate issue with Galaxy that we hope to fix someday.

Until then, we can recommend you install the latest version of Ephemeris which includes the set-library-permissions command which let’s you recursively correct the permissions on a data library. Simply run:

set-library-permissions -g -a $API_KEY LIBRARY --roles ROLES role1,role2,role3

Where LIBRARY is the id of the library you wish to correct.


Blank page or no CSS/JavaScript

This generally means that serving of static content is broken:

  • Check browser console for 404 errors.
  • Check proxy error log for permission errors.
  • Verify that your proxy static configuration is correct.
  • If you have recently upgraded Galaxy or changed the GUI in some way, you will need to rebuild the client

Database Issues

For slow queries, start with EXPLAIN ANALYZE

However it can be useful to dig into the queries with the Postgres EXPLAIN Visualizer (PEV) to get a more visual and clear representation. (Try it with this demo data)

You can set some options in the Galaxy configuration or database that will help debugging this:

  • database_engine_option_echo (but warning, extremely verbose)
  • slow_query_log_threshold logs to Galaxy log file
  • sentry_sloreq_threshold if using Sentry

Additionally check that your database is running VACUUM regularly enough and look at VACUUM ANALYZE

There are some gxadmin query pg-* commands which can help you monitor and track this information.

Lastly, check your database settings! It might not have enough resources allocated. Check PGTune for some suggestions of optimised parameters.

Debugging tool errors

Tool stdout/stderr is available in UI under “i” icon on history dataset

  1. Set cleanup_job to onsuccess
  2. Cause a job failure
  3. Go to job working directory (find in logs or /data/jobs/<hash>/<job_id>)
  4. Poke around, try running things (srun --pty bash considered useful)

Familiarize yourself with the places Galaxy keeps things

Debugging tool memory errors

Often the tool output contains one of:

MemoryError                 # Python
what(): std::bad_alloc # C++
Segmentation Fault # C - but could be other problems too
Killed # Linux OOM Killer


  • Change input sizes or params
    • Map/reduce?
  • Decrease the amount of memory the tool needs
  • Increase the amount of memory available to the job
    • Request more memory from cluster scheduler
    • Use job resubmission to automatically rerun with a larger memory allocation
  • Cross your fingers and rerun the job

Galaxy UI is slow

There is a great Tutorial from @mvdbeek which we recommend you follow.

Additionally you can use py-spy to record the issue and generate a flame graph.

Tool missing from Galaxy

First, restart Galaxy and watch the log for lines like:

Loaded tool id:, version: 1.33 into tool panel....

After startup, check integrated_tool_panel.xml for a line like the following to be sure it was loaded properly and added to the toolbox (if not, check the logs further)

<tool id="" />

If it is a toolshed tool, check shed_tool_conf.xml for

<tool file="" guid="">

Additionally if you have multiple job handlers, sometimes, rarely they don’t all get the update. Just restart them if that’s the case. Alternatively you can send an (authenticated) API requested:

curl -X PUT

Using data source tools with Pulsar

Data source tools such as UCSC Main will fail if Pulsar is the default destination.

To fix this issue you can force individual tools to run on a specific destination or handler by adding to your job_conf file:

For job_conf.xml

<tool id="ucsc_table_direct1" destination="my-local" />

For job_conf.yml

- id: ucsc_table_direct1
handler: my-local


How to read a Diff

If you haven’t worked with diffs before, this can be something quite new or different.

If we have two files, let’s say a grocery list, in two files. We’ll call them ‘a’ and ‘b’.

Input: Old
$ cat old
Output: New
$ cat new

We can see that they have some different entries. We’ve removed 🍒 because they’re awful, and replaced them with an 🍍

Diff lets us compare these files

$ diff old new
< 🍒
> 🍍

Here we see that 🍒 is only in a, and 🍍 is only in b. But otherwise the files are identical.

There are a couple different formats to diffs, one is the ‘unified diff’

$ diff -U2 old new
--- old 2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:06:36.340962616 +0100
@@ -3,4 +3,4 @@

This is basically what you see in the training materials which gives you a lot of context about the changes:

  • --- old is the ‘old’ file in our view
  • +++ new is the ‘new’ file
  • @@ these lines tell us where the change occurs and how many lines are added or removed.
  • Lines starting with a - are removed from our ‘new’ file
  • Lines with a + have been added.

So when you go to apply these diffs to your files in the training:

  1. Ignore the header
  2. Remove lines starting with - from your file
  3. Add lines starting with + to your file

The other lines (🍊/🍋 and 🥑) above just provide “context”, they help you know where a change belongs in a file, but should not be edited when you’re making the above change. Given the above diff, you would find a line with a 🍒, and replace it with a 🍍

Added & Removed Lines

Removals are very easy to spot, we just have removed lines

--- old	2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:10:14.370722802 +0100
@@ -4,3 +4,2 @@

And additions likewise are very easy, just add a new line, between the other lines in your file.

--- old	2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:11:11.422135393 +0100
@@ -1,3 +1,4 @@

Completely new files

Completely new files look a bit different, there the “old” file is /dev/null, the empty file in a Linux machine.

$ diff -U2 /dev/null old
--- /dev/null 2022-02-15 11:47:16.100000270 +0100
+++ old 2022-02-16 14:06:19.697132568 +0100
@@ -0,0 +1,6 @@

And removed files are similar, except with the new file being /dev/null

--- old	2022-02-16 14:06:19.697132568 +0100
+++ /dev/null 2022-02-15 11:47:16.100000270 +0100
@@ -1,6 +0,0 @@


How many mules?

Start with 2 and add more as needed. If you notice that your jobs seem to inexplicably sit for a long time before being dispatched to the cluster, or after they have finished on the cluster, you may need additional handlers.

Galaxy admin interface

Install tools via the Admin UI

  1. Open Galaxy in your browser and type `` in the tool search box on the left. If “” is among the search results, you can skip the following steps.
  2. Access the Admin menu from the top bar (you need to be logged-in with an email specified in the admin_users setting)
  3. Click “Install and Uninstall”, which can be found on the left, under “Tool Management”
  4. Enter `` in the search interface
  5. Click on the first hit, having devteam as owner
  6. Click the “Install” button for the latest revision
  7. Enter “” as the target section and click “OK”.


Time to git commit

Hands-on: Time to git commit

It’s time to commit your work! Check the status with

git status

Add your changed files with

git add ... # any files you see that are changed

And then commit it!

git commit -m 'Finished '

Using Git With Ansible Vaults

Hands-on: Using Git With Ansible Vaults

When looking at git log to see what you changed, you cannot easily look into Ansible Vault changes: you just see the changes in the encrypted versions which is unpleasant to read.

Instead we can use .gitattributes to tell git that we want to use a different program to visualise differences between two versions of a file, namely ansible-vault.

  1. Check your git log -p and see how the Vault changes look (you can type /vault to search). Notice that they’re just changed encoded content.
  2. Create the file .gitattributes in the same folder as your galaxy.yml playbook, with the following contents:

    group_vars/secret.yml diff=ansible-vault merge=binary
  3. Try again to git log -p and look for the vault changes. Note that you can now see the decrypted content! Very useful.


Opening a split screen in byobu

Shift-F2: Create a horizontal split

Shift-Left/Right/Up/Down: Move focus among splits

Ctrl-F6: Close split in focus

Ctrl-D: (Linux, Mac users) Close split in focus

There are more byobu commands described in this gist


Got lost along the way?

Comment: Got lost along the way?

If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.

If you’re using git to track your progress, remember to add your changes and commit with a good commit message!

