name: inverse layout: true class: center, middle, inverse
---
# Intro to DataPLANT ARCs
Dominik Brilhaus
Cristina Martins Rodrigues
Kevin Frey
Martin Kuhl
Sabrina Zander
Stella Eggels
Saskia Hiltemann
last_modification
Updated:
purl
PURL
:
gxy.io/GTN:S00126
video-slides
Video slides
|
text-document
Plain-text slides
|
Tip:
press
P
to view the presenter notes |
arrow-keys
Use arrow keys to move between slides
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- # About DataPLANT ![DataPLANT: participate in a thriving PLANT data research community, document and publish your research data FAIR, ensure the reproducibility of your research](images/dataplant-infographic.png) Towards democratization of plant research. ??? - DataPLANT is a consortium from the heart of the German plant research community. - It aims to establish sustainable Research Data management, RDM, by providing both digital assistance, such as software or teaching material, as well as and personal assistance, for example via on-site consultation or workshops. - DataPLANT is committed to developing an RDM system that meets community requirements and facilitates the processing and contextualization of research datasets in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable). --- # About DataPLANT .pull-left[ - DataPLANT’s mission is to lead the **digital transformation** in plant science by advancing from traditional publications to innovative data-driven formats like Annotated Research Contexts (ARC). - DataPLANT builds **user-friendly services** that simplify data annotation and metadata management for plant scientists. By leveraging existing IT infrastructure, it aims to make the process as seamless and efficient as possible. ] .pull-right[ ![logo of the word FAIR alongside a spreadsheet icon](images/dataplant-fair.png) ] .footnote[ [nfdi4plants.org](https://nfdi4plants.org/)] ??? - DataPLANT’s mission is to lead the digital transformation in plant science by advancing from traditional publications to innovative data-driven formats like Annotated Research Contexts (ARC). - DataPLANT builds user-friendly services that simplify data annotation and metadata management for plant scientists. By leveraging existing IT infrastructure, it aims to make the process as seamless and efficient as possible. - You can read more about dataplant at nfdi4plants.org --- # Data Stewardship between DataPLANT and communities ![images showing the interplay between dataplant on one hand, and research communities on the other](images/slides-dataplant-communities.png) ??? - DataPLANT works closely together with various plant consortia and projects. - DataPLANT acts as the service provider, and has a team of technology experts and semantic specialists. - DataPLANT supports communities through their tools, services and consultation. - And in turn, the communities provide feedback and contributions to DataPLANT. --- # Annotated Research Context (ARC) ![concept of an ARC, experimental data, computation and annotation bundled together](images/arc-intro.png) Your entire investigation in a single unified bag ??? - Annotated research contexts, or ARCs for short, provide a way to bundle your entire investigation in one unified place. - ARCs can contain your experimental data and annotation, as well as your computational results and workflows. - ARCs allow you to share your research in a FAIR and open way. --- # What does an ARC look like? ![basic folder structure of an arc](images/slides-arc-structure.png) ??? - An ARC, at its core, is a structured folder of data. - This structure is based on the ISA data model. - ISA stands for investigation, study, assay. - Every ARC represents an investigation, and contains one or more studies, assays, workflows and runs at its root. - We will focus mostly on studies and assays in this tutorial. - This is where you put your experimental data, and where you usually start when creating your ARC. --- # ARCs store experimental data ![arc folder structure highlighting the studies and assays folder as places for storing experimental data](images/slides-arc-structure-experimental.png) ??? - Studies contain information about the biological materials you used in your research. The plants you grew, but also lab protocols chemicals you used. - Assays contain results and metadata about any measurements you performed. - At the end of a measurement you either have another sample, for example in the case of an extraction, or you have data, for example for a sequencing assay. --- # Computations can be run inside ARCs ![arc folder structure highlighting the workflows and runs folders for computational data](images/slides-arc-structure-computational.png) ??? - In the workflows folder you would store any scripts or workflows used to analyze the data coming from your assays. - By specifying CWL workflows, your bioinformatics analysis can be reproduced, right inside the ARC. - Any results from these analysis workflows are stored in the runs folder. --- # ARCs come with comprehensive metadata ![arc structure with the metadata files highlighted](images/slides-arc-structure-metadata.png) ??? - In addition to raw data, ARCs also contain structured metadata. - This metadata uses ontologies to describe your research. - Metadata annotations are stored in so-called ISA files. These are stored as excel workbooks in the ARC. - There is investigation level metadata in the ISA investigation file. - And similarly we have study-level and assay-level metadata files. - For example, on the investigation level this is high-level information about your research, who you are, what your biological questions are, what the experimental design was, any related publications, etcetera. - On the study level, you would describe things like your plant samples, how they were grown, harvested and cultured. - On the assay level, you would describe information about the measurement. - For example for a sequencing assay you would describe the RNA or DNA extraction, the library preparation and the instruments used. Essentially the aim is to capture the entire path of your samples in the lab. --- # ARC builds on standards .pull-left[ ![arc structure highlighting places where standards such as isa, cwl and ro-crates come into play](images/slides-arc-standards.png) ] .pull-right[ <br></br> ARC incorporate established standards - **RO-Crate:** standardized exchange - **ISA:** structured, machine-readable metadata - **CWL:** reproducible, re-usable data analysis - **Git:** version control - **Ontologies:** standardized metadata ] ??? - All of this builds on existing standards. - ARCs are an RO-crates implementation. - They use the ISA data model. - CWL is used to describe data analysis. - Git is used for version control. - Ontologies are leveraged to standardize metadata. --- # You can store ARCs in the DataHUB ![image of your local computer, connected to datahub for online storage and backup](images/arc-datahub.png) ??? - Typically you start creating your ARC on your computer. - But you can store ARCs online in the DataHUB, and thereby also creating a backup of your research. - You can make changes to your ARC locally on your computer, push it to DataHUB, and from there sync it again, maybe to a different computer. --- # ARCs are versioned ![image showing different versions of an arc on datahub](images/slides-arc-versioned.png) ??? - DataHUB also provides version control for your ARC. - This means you have a detailed log of how your ARC changed over time, and you can always go back to a previous version if needed. --- # You can invite collaborators ![images showing different people having access to an arc](images/slides-arc-collaborate.png) ??? - By default your ARC is private to you on DataHUB. - But you can invite other people to collaborate on your ARC, by giving them access to your ARC. - This can be other people from your lab, or people from other institutes. --- # Collaborate and Contribute ![images showing you you having access to multiple ARCs](images/slides-arc-collaborate-contribute.png) ??? - You can contribute to multiple ARCs, multiple research projects. - For example, if others invited you to collaborate, you can contribute to their ARC. - Or if you have multiple research projects of your own, you can have multiple ARCs on DataHUB. --- # Reuse data in ARCs ![image depicting parts of one arc being re-used in another](images/slides-arc-reuse.png) ??? - You can also reuse parts of other ARCs, so you don't always have to recreate things like scripts, protocols, assays, and other shared research components. --- # Publish your ARC ![an arc being published and receiving a doi. arcs can be published to dataplant, or to third party repositories](images/slides-arc-publish.png) ??? - Once your ARC is complete, and you are ready to release your work, you can publish your ARC. - You will receive a DOI, a digital object identifier, for your ARC. - DataPLANT is also creating converters for popular data repositories such as ENA, GEO, and NCBI. - For example, if the editor of your journal requires you to deposit your data in one of these repositories, you can easily extract the data and information from your ARC in the appropriate format for these repositories. --- # Publish your ARC, get a DOI ![image showing an arc being referenced by doi in a manuscript](images/slides-arc-doi.png) ??? - The DOI you receive for your ARC can be referenced in your journal article, anabling readers to reuse your data and workflows. - If you make changes to your ARC, you can publish a new version, and receive a new DOI, while your original DOI will always point to the original version of your ARC. --- # Moving from paper to data publications ![image showing move from classical publication to a more data-centric publishing model](images/slides-arc-data-publications.png) ??? - This approach allows us to move from classical publications to a more data-centric publication model. --- # ARC ecosystem ![image depicting the circular RDM research cycle, with around its edge various dataplant tools and services](images/slides-arc-ecosystem.png) ??? - DataPLANT offers an entire ecosystem of tools and services around this concept, in all phases of the research data management cycle. - From writing your data management plan, to storing and describing your research data, sharing and collaborating, and finally publishing your research and making it findable and accessible to scientists worldwide. --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Author(s)
Dominik Brilhaus
Cristina Martins Rodrigues
Kevin Frey
Martin Kuhl
Sabrina Zander
Stella Eggels
Saskia Hiltemann
Reviewers
Tutorial Content is licensed under
Creative Commons Attribution 4.0 International License
.