Towards Responsible Plant Data Linkage: Global Challenges for Food Security and Governance - Session 2: Technical Challenges of Data Linkage
An Alan Turing Institute & University of Exeter Workshop. Co-Hosted by Egenis and IDSAI
Making plant data FAIR (Findable, Accessible, Interoperable, Reusable) has been the subject of much effort. Extensive semantic tools are now available, including the multiple, intersecting ontologies that comprise the Planteome project, as are metadata standards such as the Minimum Information About a Plant Phenotyping Experiment (MIAPPE). Such tools nevertheless require collective work to develop and maintain. Beyond ensuring data themselves are FAIR, actively linking and circulating data poses further challenges. These include finding ways to link biologically, experimentally or geographically related yet heterogeneous datasets consistently, and to make data usable in practice to potential users with divergent aims and resources, not only reusable in theory. This session will address the technical challenges of data linkage, including the development of standards and infrastructures; epistemic issues; and the organizational requirements of this work.
| An Egenis, the Centre for the Study of Life Sciences workshop | |
|---|---|
| Date | 12 March 2021 |
| Time | 14:00 to 16:00 |
| Place | Online event. |
| Provider | Egenis, the Centre for the Study of Life Sciences |
| Registration information | Details to follow |
Event details
14:00
Introduction by organisers
14:05
Linking Legacies: Realising the Potential of Long-Term Agricultural Experiments
Richard Ostler (Rothamsted Research)
Long-Term agricultural Experiments are vital resources for assessing the sustainability of food production and soil health. For researchers to effectively use a long-term experiment it is essential to have access to relevant historical data and necessary metadata. In turn, new datasets generated from investigations using an LTE should be resolvable back to the source LTE as part of that experiment’s continuing narrative. Further value from LTEs can be derived if experiments sharing common characteristics, such as cropping system, treatment, management or environment, can be identified and their datasets integrated.
LTEs can generate very diverse data types, from annually collected yield traits, periodic and ad hoc surveys to continuous sensor data. To be usefully findable, interoperable and reusable LTE datasets not only need to be described using community accepted semantic and metadata standards but require knowledge both of how they relate to each other, in time, space and scale, and when they do not. Within a single experimental system an LTE can therefore encapsulate key challenges facing plant data linkage and these challenges are only amplified when attempting to link data across LTEs.
This presentation reviews the approach being taken at Rothamsted Research to apply FAIR data principles to its long-term datasets, and how Rothamsted is working with the wider agricultural data and long-term experiments communities to address some of the technical and cultural challenges faced.
14:30
Challenges to Data Linkage in Plants: Two Parables from the Pea
Gregory Radick (University of Leeds)
This chapter will draw upon the history of scientific studies of inheritance in Mendel's best-remembered model organism, the garden pea, as a source of two parables -- one pessimistic, the other optimistic -- on the challenges of data linkage in plants. The moral of the pessimistic parable, from the era of the biometrician-Mendelian controversy, is that the problem of theory-ladenness in data sets can be a major stumbling block to making new uses of old data. The moral of the optimistic parable, from the long-run history of studies at the John Innes Centre of aberrant or "rogue" pea varieties, is that an excellent guarantor of the continued value of old data sets is the preservation of the relevant physical materials -- in the first instance, the plant seeds.
14:55
From Farm to FAIR: The Trials of Linking and Sharing Wheat Research Data
Chris Rawlings (Rothamsted Research), Robert P. Davey (Earlham Institute) & the Designing Future Wheat Data Coordination Task Force
This paper describes progress towards an integrated data framework that supports the sharing of data from the Designing Future Wheat (DFW) strategic research programme funded by the UK BBSRC. DFW is a 5 year project (https://designingfuturewheat.org.uk/) that spans eight research institutes and universities, and aims to develop new wheat varieties (germplasm) containing the next generation of key traits. Much of the research in DFW is contributing new wheat germplasm that are assessed in large scale field trials at partner sites and also by a precompetitive consortium of wheat breeding companies. To complement the field trials, a large number of DFW research studies are collecting additional data which gives more detailed understanding of the trait of interest. Many of
these projects make extensive use of genetic and genomic datasets that are being developed in DFW or are available through other national and international collaborations. The application of novel field-scale image-based phenotyping platforms are also being employed which present new challenges through the volume of data they generate and complexity of the analysis methods used to extract phenotypic data.
DFW is committed to making our data open to the wider research community by adopting FAIR data sharing approaches. It is also a good example of a data-intensive strategic research programme which follows a Field-to-Lab-to-Field approach that is representative of much contemporary and multidisciplinary crop science research. However, even with dedicated funding to develop crop data research infrastructures within DFW, we found that there are many challenges that require pragmatic and flexible ways to enable them to interoperate. We present key DFW data resources as a case study to assess progress and discuss these challenges with a view to developing infrastructure that exposes metadata-rich datasets and that meets FAIR principles. We describe our approaches to: (1) reporting internally and to sponsors about research outputs (2) federating institutional data and information resources (3) improving and standardising the collected data and metadata for improved FAIRification (4) methods in development to expose data for reuse through the adoption and implementation of community standards across multiple data layers.
15:20
Plant Scientific Data Integration, From Building Community Standards to Defining a Consistent Data Lifecycle
Cyril Pommier (INRAE)
Applying the FAIR principles to plant research data drew partially on its use within other life science domains, especially for genomic data. But plant particularities, especially when dealing with the plant environment interaction such as phenotypes, needed some specialized answers. The plant communities from major global players, such as ELIXIR, EMPHASIS and the CGIAR, have therefore joined forces to build an ecosystems of data standards with the Minimal Information About a Plant Phenotyping Experiment (MIAPPE, www.miappe.org) to handle the general data and metadata organization, the Multicrop Passport Descriptors (MCPD) for the identification of the plant genetic resources and the Crop Ontology (www.cropontology.org) for the documentation of the measurement methodology. The organization of the researcher communities and the collaborative methodologies allowed to dramatically improve the usability of MIAPPE and its adoption. From that first success, the Elixir Plant Community described a general data lifecycle to identify the gaps and the needed developments. As a consequence, several actions have been identified, like in particular providing tools to address the “first mile” of data publishing, i.e. the gathering and documentation of data, including automated metadata capture. The current paper will therefore describe some of the existing tools, as well as their adoption of plant standards. Finally, we will also describe how different standards tend to converge to address common needs.
15:45
Final discussion and wrap-up


