The ocean is a key component of the Earth’s climate system. It thus needs continuous real-time monitoring to help scientists better understand its dynamics and predict its evolution.
All around the world, oceanographers have managed to join their efforts and set up a Global Ocean Observing System among which Argo is a key component.
Argo is an international program that collects information from inside the ocean using a fleet of robotic instruments that drift with the ocean currents and move up and down between the surface and a mid-water level.
The data used in this tutorial are from the Argo gliders network. We are interested in the following variables: water temperature, latitude, longitude and time.
Our main objective is to plot the water temperature with respect to time. For this, we will be using the netCDF xarray tools available in the Galaxy Europe (or your favourite Galaxy Instance) server.
These tools are part of the Pangeo ecosystem in which the next generation of open-source analysis tools for ocean, atmosphere and climate science can be developed, distributed,
and sustained. These tools must be scalable to meet the current and future challenges of big data, and these solutions should leverage the existing expertise outside of
the geoscience community.
The first time you use Galaxy, there will be no files in your history panel.
Argo gliders data
Argo is a global network of nearly 4000 autonomous probes measuring pressure, temperature and salinity from the surface to 2000m depth every 10 days.
The localisation of these probes is nearly random between the 60th parallels. All probes data are collected by satellite in real-time, processed by several data
centers and finally merged in a single dataset (collecting more than 2 million vertical profiles) made freely available to anyone through an FTP server or monthly zip snapshots.
Each Argo probe is an autonomous, free drifting, profiling float, i.e. a probe that can’t control its trajectory but can control its buoyancy and thus move up and down the water
column as it wishes. Argo floats continuously operate the same program, or cycle, illustrated in the figure below. After 9 to 10 days of free drift at a parking depth of about 1000m,
a typical Argo float dives down to 2000m and then shoals back to the surface while measuring pressure, temperature and salinity. Once it reaches the surface,
the float sends by satellite its measurements to a data center where they are processed in real-time and made freely available on the web in less than 24h00.
Figure 2: 10 days program, cycle, of an Argo float
Here we will focus on the Caribbean Sea surrounding the Antilles during April and May. On the 9th of April 2021 an eruption of the volcano La Soufriere Saint Vincent (Antilles) occurred.
Another tutorial on this event is available on the Galaxy Training Network.
Get Argo data
Hands-on: History management
Create a new history for this tutorial and give it a name (example: “Argo data with Pangeo”) for you to find it again later if needed.
To create a new history simply click the new-history icon at the top of the history panel:
Argo data access ( Galaxy version 0.1.15+galaxy0) with the following parameters:
param-select“We have preconfigured some mode of operations for you. What mode do you want to use?”: 🏊 standard mode simplifies the dataset, removes most of its jargon and returns a priori good data
param-select“How do you want to select your data of interest ?”: 🗺 For a space/time domain
“Input longitude min (+east/-west)”: -75.0
“Input longitude max (+east/-west)”: -45.0
“Input latitude min (+north/-south)”: 20.0
“Input latitude max (+north/-south)”: 30.0
“Input pressure min (db)”: 0.0
“Input pressure max (db)”: 10.0
“Input starting date”: 2021-04
“Input ending date”: 2021-06
param-select“Which kind of dataset do you want ?”: Physical parameters: temperature, salinity, pressure
Run Tool
After a couple of minutes, an Argo data output will appear green in your history.
Check that your data are in netcdf format with galaxy-pencil, it should be netcdf
Click on the galaxy-pencilpencil icon for the dataset to edit its attributes
In the central panel, click galaxy-chart-select-dataDatatypes tab on the top
In the galaxy-chart-select-dataAssign Datatype, select nectdf from “New type” dropdown
Tip: you can start typing the datatype into the field to filter the dropdown menu
Argo data analysed, managed, and visualised by the Pangeo tools
Xarray tools
First, we’ll use 2 xarray netcdf tools. xarray, formerly known as xray, is a Python package which enables us to play with gridded data.
This package shares most of its features from numpy, but in a more convenient manner by keeping track of labels in arrays.
The gridded data is mainly available in netCDF data format. Thus xarray comes in very handy while dealing with netCDF files.
Knowing more about hour data
After fetching the required Argo data, the following stage is to obtain the meta info or metadata of the file.
The very purpose of these steps is to obtain information about dimensions, variables, global attributes, etc. The coordinate info helps to
know about the actual data entries present under the various variables.
Get metadata
Hands-on: NetCDF dataset with Xarray metadata Galaxy Tool
NetCDF xarray Metadata Info ( Galaxy version 2022.3.0+galaxy0) with the following parameters:
param-file“Netcdf file”: output_argo (output of Argo data accesstool)
In the info output file, we can identify 4 different sections:
Dimensions: name of dimensions and corresponding number of elements;
Coordinates: contains coordinate arrays (longitude, latitude, level and time) with their values.
Data variables: contains all the variables available in the dataset. Here, we only have one variable. For each variable, we get information on its shape and values.
Global Attributes: at this level, we get the global attributes of the dataset. Each attribute has a name and a value.
Coordinates information
Hands-on: Get Coordinate information with Xarray Coordinate
NetCDF xarray Coordinate Info ( Galaxy version 2022.3.0+galaxy0) with the following parameters:
param-file“Netcdf file”: output_argo (output of Argo data accesstool)
Visualisation mapping with a Galaxy Earth System’s tool
The Earth System is a complex and dynamic system that encompasses the interactions between the atmosphere, oceans, land, and biosphere.
Understanding and analyzing data from the Earth System Model (ESM) is essential, for example, to predict and mitigate the impacts of climate change.
The ESM that the project tries to implement includes coastal water dynamics, ocean bio-geochemical in-situ data, marine omics observations, volcano activities and land degradation.
Ocean Data View (ODV) is a software package for the interactive exploration, analysis and visualization of oceanographic and other geo-referenced profile, time-series, trajectory or sequence data. To know more about ODV go check the official page.
ODV is now integrated with Galaxy as an interactive tool. This kind of tool works differently than classical tools as it allows the user to interact with a dedicated graphical interface. Those tools are used to give access to Jupyter Notebooks, RStudio or R Shiny apps for example.
Hands-on: Launch ODV
ODV with the following parameters:
“Select if you are using a ODV collection in a zip folder or if you have your own raw data”: The data you are using are Netcdf or tabular text files
param-file“Netcdf or tabular text file. For text file, odv format is recommanded.”: timeseries_tabular (output of NetCDF timeseries Extractortool)
Click on Run Tool
Go to User > Active InteractiveTools
Wait for the ODV to be running (Job Info)
Click on ODV
If at one point your ODV interface becomes grey with a red panel on the top “X ODV - Disconnected”, do NOT panic ;) you just need to reload your tab (circular arrow on top left)
You can expand the ODV left panel (where there are 3 dots, vertically) to access the “clipboard” menu, and paste the content you want to paste on an ODV form. From there you can copy-paste everything from one side to the other. Then, click outside of this panel to collapse it.
Awesome! You now know how to get Argo data, then get metadata and other information within the Pangeo ecosystem and finally visualise these data with an Earth System tool,
Ocean Data View.
Extra information
Coming up soon even more tutorials on and other Earth-System related trainings. Keep an galaxy-eye open if you are interested!
You've Finished the Tutorial
Please also consider filling out the Feedback Form as well!
Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012
@misc{climate-argo_pangeo,
author = "Marie Josse",
title = "Analyse Argo data (Galaxy Training Materials)",
year = "",
month = "",
day = "",
url = "\url{https://training.galaxyproject.org/training-material/topics/climate/tutorials/argo_pangeo/tutorial.html}",
note = "[Online; accessed Thu Feb 13 2025]"
}
@article{Hiltemann_2023,
doi = {10.1371/journal.pcbi.1010752},
url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
year = 2023,
month = {jan},
publisher = {Public Library of Science ({PLoS})},
volume = {19},
number = {1},
pages = {e1010752},
author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
editor = {Francis Ouellette},
title = {Galaxy Training: A powerful framework for teaching!},
journal = {PLoS Comput Biol}
}
Funding
These individuals or organisations provided funding support for the development of this resource