The GHGA Metadata Model

The goal of the GHGA Metadata Model is to help our communities to richly describe their submitted genomic data and efficiently retrieve data of interest. In order to achieve this, we focus on making the metadata Findable, Accessible, Interoperable and Reusable (FAIR) by utilising established and widely used ontologies and vocabularies. All ontologies and vocabularies are evaluated based on their maintenance and content with maximum granularity using https://fairsharing.org.

The implementation of our metadata model is done using the Linked Data Modelling Language LinkML and is openly accessible for everyone on the GHGA GitHub Repository. Here, one can track the changes made to the model, and access the yaml schema and the autogenerated excel submission sheets. The GHGA User Documentation provides detailed documentation about the GHGA Metadata Model and tools, including descriptions of the model itself, as well as the underlying concepts and standards.

The model classes ensure interoperability and provide flexibility to capture  different experiment and analysis types. The classes are grouped based on their functionality: 

The Research Metadata focus on the reusability and FAIRness of the data and the Administrative Metadata capture the conditions and committee that govern data access.

Research Metadata

  • Individual, Biospecimen and Sample collect metadata about the samples that were used in the experiment and their source. This includes non-personal data about the individual from whom the samples were collected and details about the biospecimen from which samples were generated.
  • Experiment and Experiment Method describe the experimental setup or protocol that was followed to perform the omics experiment. The experiment execution generates research data files.
  • Analysis and Analysis Method capture details about the analysis process or workflow that was used to generate output files (processed files) from research data files.

Administrative Metadata

  • Dataset is a container for files. Files in one dataset are a shareable unit that is controlled by a Data Access Policy.
  • Data Access Policy and Data Access Committee capture information about the policy under which data access and reuse are granted, and who governs the policy and answers data access requests.
  • Study and Publication describe why the data was collected and where associated papers are published.

 

The data can be submitted using GHGAs submission spreadsheet, which reflects the GHGA Metadata Model and enables the GHGA Data Portal to display valuable information about a submitted dataset through the linkage of all categories. Richly described metadata will help to promote data submitters datasets and encourage the community to reuse the data.