Database Structure

Coastal CarbonDatabase Structure

Naming Conventions for Attributes and Variables (Version 1)

3 July 2018

Overview
- Development Process to Date
- Ongoing and Future Development
Study Level Metadata
Materials and Methods
Site Level
Core Level
Soil Depth Series
Multiple Special Conditions at the Level of the Site or Core
- Dominant Species Present
- Anthropogenic Impacts Present
Submitter Defined Attributes and Definitions
- Study Level Species Table
- Other Attributes and Variables

Overview

This page serves as guidance for the types and scope of data and metadata that will be archived as part of the Network’s developing tidal soil carbon synthesis. We propose the following data structure and standardized attribute names for metadata and data in order to make datasets machine-readable and interoperable. Each subheading lists a level of metadata or data hierarchy from study level metadata to site level to core level to depth series information. Each subheading also represents separate tables which can be joined by common attributes such as study_id, site_id, and core_id. We also include accompanying sets of recommended controlled vocabulary for key categorical variables (also known as factors). Some attributes have controlled units that we wish to keep uniform across datasets. Data that we curate will follow naming conventions outlined herein. Data that we ingest from outside sources will be converted to these conventions when being ingested into the central GitHub database using custom-built R scripts.

At a minimum a submission should have the following for inclusion in soil carbon synthesis products: study_id, author information, core_id, latitude and longitude information associated with either a core or the site, depth_min, depth_max, dry_bulk_density, organic_matter_fraction and/or carbon_fraction. The more auxiliary detail that you provide, the more widely your data can be used. Throughout the tables below mandatory attributes are shown in bold.

The depth series is the level at which carbon-relevant information is housed. This synthesis will not ingest core-level or site-level averages of variables like dry bulk density, fraction organic matter, or fraction carbon. These averages can be derived from the database, but are not immediately useful to our research questions unless those averages can be traced back to their original data.

There are many opportunities to express your data’s individuality. We refer throughout to ‘flags’ and ‘notes’. Flags refer to common methodological choices or data issues that can be coded using categorical variables. The idea behind flags is to allow users the option to query datasets based on methodology. Flags are very machine-readable but not very flexible from the standpoint of a submitter. Notes are available for almost all measured attributes and take the form of free-text allowing submitters to provide context, observations, or concerns about methods, sites, cores, or observations. These are more flexible from the perspective of a submitter but are less machine-readable.

Study Information
attribute name	definition	data type	format, unit or codes
study_id	Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year spearated by underscores.	character
one_liner	If this is data the CCRCN curates, the submitter should include a one line description of the study.	character
study_code	If this is data the CCRCN curates, the study will be assigned a 128-bit universal unique identifier. Submitters should only include this if it already exists for the data. Otherwise CCRCN personnel will generate this as part of the data ingestion process.	character
study_start_date	Study start date.	Date	YYYY-MM-DD
study_end_date	Study end date.	Date	YYYY-MM-DD
title	If this is data the CCRCN curates, the submitter should include a study title. If this is data the CCRCN is ingesting, this can be pulled from the metadata or source text.	character
abstract	If this is data the CCRCN curates, the submitter should include a one paragraph description of the study. If this is data the CCRCN is ingesting, this can be pulled from the metadata or source text.	character

Keywords
attribute name	definition	data type	format, unit or codes
study_id	Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year spearated by underscores.	character
key_words	If this is data the CCRCN curates, the submitter should include five to fifteen descriptive words or phrases describing the study. If this is data the CCRCN is ingesting, this can be pulled from the metadata or source. Keywords help build some search functionality into the databases.	character

Authors
attribute name	definition	data type	format, unit or codes
study_id	Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores.	character
last_name	Submitter’s family name.	character
given_name	Submitter’s first name, middle name, middle initial, or any other names.	character
institution	Submitter’s current institution.	character
email	Submitter’s current email address.	character
address	Submitter’s current mailing address.	character
phone	Submitter’s current phone number.	character
corresponding_author	TRUE or FALSE indicating whether the author should be contacted as the corresponding author.	factor	TRUE = The author should be contacted with any further questions. FALSE = The author should not be contacted with any further questions.

Funding
attribute name	definition	data type
study_id	Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores.	character
funding_agency	Agency name funding the research, spelled out, no acronyms.	character
funding_id	Code used by the agency to track the project funding.	character
funding_notes	Any other submitter-generated notes about the project funding.	character

Naming Conventions for Attributes and Variables (Version 1)

3 July 2018

Contents

Overview

Development Process to Date

Ongoing and Future Development

Study Level Metadata

Study Information

Keywords

Authors

Funding Sources

Associated Publications

Materials and Methods

Site Level

Core Level

Soil Depth Series

Multiple Special Conditions at the Level of the Site or Core

Dominant Species Present

Anthropogenic Impacts Present

Submitter Defined Attributes and Definitions

Study Level Species Table

Other Attributes and Variables

That’s It