Naming Conventions for Attributes and Variables (Version 1)
3 July 2018
Contents
Overview
This page serves as guidance for the types and scope of data and metadata that will be archived as part of the Network’s developing tidal soil carbon synthesis. We propose the following data structure and standardized attribute names for metadata and data in order to make datasets machine-readable and interoperable. Each subheading lists a level of metadata or data hierarchy from study level metadata to site level to core level to depth series information. Each subheading also represents separate tables which can be joined by common attributes such as study_id, site_id, and core_id. We also include accompanying sets of recommended controlled vocabulary for key categorical variables (also known as factors). Some attributes have controlled units that we wish to keep uniform across datasets. Data that we curate will follow naming conventions outlined herein. Data that we ingest from outside sources will be converted to these conventions when being ingested into the central GitHub database using custom-built R scripts.
At a minimum a submission should have the following for inclusion in soil carbon synthesis products: study_id, author information, core_id, latitude and longitude information associated with either a core or the site, depth_min, depth_max, dry_bulk_density, organic_matter_fraction and/or carbon_fraction. The more auxiliary detail that you provide, the more widely your data can be used. Throughout the tables below mandatory attributes are shown in bold.
The depth series is the level at which carbon-relevant information is housed. This synthesis will not ingest core-level or site-level averages of variables like dry bulk density, fraction organic matter, or fraction carbon. These averages can be derived from the database, but are not immediately useful to our research questions unless those averages can be traced back to their original data.
There are many opportunities to express your data’s individuality. We refer throughout to ‘flags’ and ‘notes’. Flags refer to common methodological choices or data issues that can be coded using categorical variables. The idea behind flags is to allow users the option to query datasets based on methodology. Flags are very machine-readable but not very flexible from the standpoint of a submitter. Notes are available for almost all measured attributes and take the form of free-text allowing submitters to provide context, observations, or concerns about methods, sites, cores, or observations. These are more flexible from the perspective of a submitter but are less machine-readable.
Development Process to Date
This guidance is the culminations of three efforts:
-
A meeting of 47 experts in Menlo Park, CA in January 2016, hosted by the United States Carbon Cycle Science Program, in order to establish community priorities.
-
Experience with the initial curation of a dataset of ~1,500 public soil cores as part of the publication Holmquist et al., 2018 Accuracy and Precision of Tidal Wetland Soil Carbon Mapping in the Conterminous United States.
-
The results of 19 collaborators submitting commentary on an initial draft of these recommendations put up for public comment in April and May 2018.
Ongoing and Future Development
We acknowledge that this is a lot of information to process and do not want to imply >100 attributes are mandatory. They are not. While the entire entry template is available for download , we are also in the process of designing an application which will generate a custom submission template based on your answers to a questionnaire about your dataset.
Submitters can feel free to add other attributes to data submissions as long as the attributes and any associated categorical variables are defined with the submission. CCRCN personnel will accept and archive related soils data within reason, but will not be able to quality control data falling outside the outlined guidance. If attributes or variables are submitted often and there is community coordination behind their inclusion, they could be integrated into periodic updates to this guidance.
We anticipate that this guidance will evolve as we synthesize new datasets as part of five working groups. Part of each working group’s task will be to revisit this guidance and agree on new needed attribute names, definitions, variables, controlled vocabulary and units. Any further guidance based on the working group’s experience will be made available to the community via post-workshop reports and peer reviewed publications. Documentation on any changes to the data management plan and submission templates will be issued with version numbers. CCRCN produces will reference these documents and version numbers. We will avoid changing attribute or variable names, and will only do so if there is a compelling reason to. If in the future there ends up being more than one acceptable redundant attribute or variable name, names will be added to a database of synonyms and working synthesis products will be updated given the most current standards.
Study Level Metadata
Study-level information is essential for formatting the Ecological Metadata Language, and is a great way for you to express your project’s history, context, and originality.
Study Information
Please provide some custom text for your study.
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year spearated by underscores. | character | |
one_liner | If this is data the CCRCN curates, the submitter should include a one line description of the study. | character | |
study_code | If this is data the CCRCN curates, the study will be assigned a 128-bit universal unique identifier. Submitters should only include this if it already exists for the data. Otherwise CCRCN personnel will generate this as part of the data ingestion process. | character | |
study_start_date | Study start date. | Date | YYYY-MM-DD |
study_end_date | Study end date. | Date | YYYY-MM-DD |
title | If this is data the CCRCN curates, the submitter should include a study title. If this is data the CCRCN is ingesting, this can be pulled from the metadata or source text. | character | |
abstract | If this is data the CCRCN curates, the submitter should include a one paragraph description of the study. If this is data the CCRCN is ingesting, this can be pulled from the metadata or source text. | character |
Keywords
Keywords are not necessary, but can help make your data more searchable in a database.
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year spearated by underscores. | character | |
key_words | If this is data the CCRCN curates, the submitter should include five to fifteen descriptive words or phrases describing the study. If this is data the CCRCN is ingesting, this can be pulled from the metadata or source. Keywords help build some search functionality into the databases. | character |
Authors
For each dataset at least one corresponding author should be specified. Specifying author names will allow users (or you in the future) to query the dataset and see how many cores you’ve submitted.
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. | character | |
last_name | Submitter’s family name. | character | |
given_name | Submitter’s first name, middle name, middle initial, or any other names. | character | |
institution | Submitter’s current institution. | character | |
Submitter’s current email address. | character | ||
address | Submitter’s current mailing address. | character | |
phone | Submitter’s current phone number. | character | |
corresponding_author | TRUE or FALSE indicating whether the author should be contacted as the corresponding author. | factor | TRUE = The author should be contacted with any further questions. FALSE = The author should not be contacted with any further questions. |
Funding Sources
Your funders will love being acknowledged in a data release, and will appreciate being searchable in the database. One dataset can have multiple funding sources.
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. | character | |
funding_agency | Agency name funding the research, spelled out, no acronyms. | character | |
funding_id | Code used by the agency to track the project funding. | character | |
funding_notes | Any other submitter-generated notes about the project funding. | character |
Associated Publications
One dataset can be affiliated with multiple publications. This allows an original work to be cited as a primary source, as well as any secondary or synthesis papers that added value to the dataset’s archival. Submitters can simply add a bibtex style citation, such as one copied over from Google Scholar, or they can fill out all of the relevant attributes for the data release. It’s all the same to us. Much of this guidance came from the Wikipedia page for BibTeX.
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. | character | |
bibtex_citation | Submitters can associate multiple BibTeX style citations with the dataset. They can also include the same information by filling out following attributes in tabular form if more convenient than BibTeX formatting. | character | |
publication_type | Code indicating the type of publication the study originates from. | factor | article = Journal article. book = Book. mastersthesis = Master’s Thesis. misc = Miscellaneous publications such as online datasets. phdthesis = PhD thesis or dissertation. techreport = Technical report. unpublished = Unpublished source. |
author | The names of the author separated by “and”. | character | |
year | The year of publication or, if unpublished, the year of creation. | Date | YYYY |
title | The title of the work. | character | |
journal | The journal or magazine the work was published in. | character | |
volume | The volume of a journal or multi-volume book. | character | |
number | The “(issue) number” of a journal, magazine, or tech-report, if applicable. (Most publications have a “volume”, but no “number” field.). | character | |
pages | Page numbers, separated either by commas or double-hyphens. | character | |
url | Permanent web address where the work can be located. | character | |
doi | Digital object identifier associated with the work. | character | |
address | Publisher’s address (usually just the city, but can be the full address for lesser-known publishers). | character | |
annote | An annotation for annotated bibliography styles (not typical). | character | |
booktitle | The title of the book, if only part of it is being cited. | character | |
chapter | The chapter number. | character | |
crossref | The key of the cross-referenced entry. | character | |
edition | The edition of a book, long form (such as “First” or “Second”). | character | |
editor | The name(s) of the editor(s). | character | |
howpublished | How it was published, if the publishing method is nonstandard. | character | |
institution | The institution that was involved in the publishing, but not necessarily the publisher. | character | |
key | A hidden field used for specifying or overriding the alphabetical order of entries (when the “author” and “editor” fields are missing). Note that this is very different from the key (mentioned just after this list) that is used to cite or cross-reference the entry. | character | |
month | The month of publication (or, if unpublished, the month of creation). | Date | MM |
note | Miscellaneous extra information. | character | |
organization | The conference sponsor. | character | |
publisher | The publisher’s name. | character | |
school | The school where the thesis was written. | character | |
series | The series of books the book was published in (e.g. “The Hardy Boys” or “Lecture Notes in Computer Science”). | character | |
type | The field overriding the default type of publication (e.g. “Research Note” for techreport, “{PhD} dissertation” for phdthesis, “Section” for inbook/incollection). | character |
Materials and Methods
For each study please fill out key data regarding materials and methods that are important to the soil carbon stocks meta-analysis. Some users may want to include or exclude certain methodologies, or see your commentary on the methods. Let’s make it easy for them.
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. | character | |
coring_method | Code indicating what type of device was used to collect soil depth profiles. | factor | gouge auger = A half cylinder coring device in which the coring section is open, not sealed off by a fin. hargas corer = A large diameter (>10 cm) coring device consisting of a tube, piston, and a cutting head. mcauley corer = A half cylinder coring device with the coring section sealed off by a fin attached to a rotating pivot point. mccaffrey peat cutter = U-shaped blade that extracts a core by cutting down through peat. none specified = No coring device was specified. other shallow corer = Any other type of coring device typically taking cores shallower than 30 centimeters. piston corer = A device that extrudes core into tube upward with a plunger. push core = Any number of coring types involving driving a tube into the sediment to recover a core. pvc and hammer = PVC pipe was driven into the sediment with a hammer to recover a core. russian corer = A half cylinder coring device with the coring section sealed off by a fin attached to a rotating pivot point. vibracore = A technique involving collecting a core by sinking a continuous pipe into sediment attaching a source of vibration, then recovering using a winch and pulley. surface sample = A technique involving collecting a core shallower than ~5 cm using a circular metal cutter. |
roots_flag | Code indicating whether live roots were included or excluded from carbon assessments. | factor | roots and rhizomes included = Roots and rhizomes were included in dry bulk density and or organic matter and carbon measurements. roots and rhizomes separated = Roots and rhizomes were separated from soil before dry bulk density and or organic matter and carbon measurements. |
sediment_sieved_flag | Code indicating whether or not sediment was sieved prior to carbon measurements. | factor | sediment sieved = Sediment was sieved prior to analysis for organics. sediment not sieved = Sediment was not sieved prior to analysis for organics. |
sediment_sieve_size | If sediment was sieved, the size of sieve used. | numeric | millimeters |
compaction_flag | Code indicating how the authors qualified or quantified compaction of the core. | factor | compaction qualified = Compaction was at least qualified and noted by the authors. compaction quantified = Compaction was quantified and corrected for in core based measurements. corer limits compaction = Authors specified that the coring device’s design minimized compaction. no obvious compaction = Authors observed no obvious compaction. not specified = Compaction was not specified. |
dry_bulk_density_temperature | Temperature at which samples were dried to measure dry bulk density. This can include either samples that were freeze dried or oven dried. | numeric | celsius |
dry_bulk_density_time | Time over which samples were dried to measure dry bulk density. | numeric | hour |
dry_bulk_density_sample_volume | Sample volume used for bulk density measurements, if held constant. | numeric | cubicCentimeters |
dry_bulk_density_sample_mass | Sample mass used for bulk density measurements, if held constant. | numeric | grams |
dry_bulk_density_flag | Any notable codes regarding how the authors quantified dry bulk density. | factor | air dried to constant mass = Methodology specified that samples were air dried to a constant mass. modeled = Bulk density was not measured, but was modeled from loss on ignition and assumptions about the particle densities of organic and inorganic matter. freeze dried = Bulk density was measured on freeze dried samples. not specified = No additional details regarding bulk density methodology were provided. removed non structural water = Bulk density methodology did not specify drying temperature or length, only that non-strucural water was removed. time approximate = Bulk density time recorded herin is an approximate estimate. to constant mass = Bulk density methodology did not specify drying temperature or length, only that samples were dried to a constant mass. |
loss_on_ignition_temperature | Temperature at which samples were combusted to estimate fraction organic matter. | numeric | celsius |
loss_on_ignition_time | Time over which samples were combusted to estimate fraction organic matter. | numeric | hour |
loss_on_ignition_sample_volume | Sample volume used for loss on ignition, if held constant. | numeric | cubicCentimeters |
loss_on_ignition_sample_mass | Sample mass used for loss on ignition, if held constant. | numeric | grams |
loss_on_ignition_flag | Common codes regarding loss on ignition methodology. | factor | time approximate = Loss on ignition time recorded herein is an approximate estimate. not specified = No additional details regarding loss on ignition methodology or time were provided. |
carbon_measured_or_modeled | Code indicating whether fraction carbon was measured or estimated as a function of organic matter. | factor | measured = Fraction carbon was measured as opposed to modeled. modeled = Fraction carbon was modeled as opposed to measured. |
carbonates_removed | Whether or not carbonates were removed prior to calculating fraction organic carbon. | factor | FALSE = Carbonates were not removed before measuring organic carbon. TRUE = Carbonates were removed before measuring organic carbon. |
carbonate_removal_method | The method used to remove carbonates prior to measuring fraction carbon. | factor | direct acid treatment = Carbonates were removed using direct application of dilute acid. acid fumigation = Carbonates were removed by fumigating with concentrated acid. low carbonate soil = Organic carbon fraction was measured without removing carbonates assuming carbonate content of the soil type was minimal. carbonates not removed = Carbonates were not removed and low carbonate soil was not specified. none specified = Carbonate removal methodology was not specified. |
fraction_carbon_method | Code indicating the method for which fraction carbon was measured or modeled (Note: regression based models are permitted, but the use of the Bemmelen factor [0.58 gOC gOM-1] is discouraged). | factor | Craft regression = Used regression model from Craft et al., 1991, Estuaries, to predict fraction carbon as a function of fraction organic matter. EA = Each sample presented was measured using Elemental Analysis. Fourqurean regression = Used regression model from Fourqurean et al., 2012, Nature Geoscience, to predict fraction carbon as a function of fraction organic matter. Holmquist regression = Used regression model from Holmquist et al., 2018, Scientific Reports, to predict fraction carbon as a function of fraction organic matter. kjeldahl digestion = Each sample was measured kjeldahl digestion method. local regression = A regression model fit using a subset of measurements was used to predict fraction carbon as a function of fraction organic matter. not specified = No additional details were provided regarding fraction carbon methodologies. wet oxidation = Each sample was measured using a wet oxidation method. |
fraction_carbon_type | Code indicating whether fraction_carbon refers to organic or total carbon. | factor | organic carbon = Author specified that fraction carbon measurements were of organic carbon. total carbon = Author specified that fraction carbon measurements were of total carbon. |
carbon_profile_notes | Any other submitter defined notes describing methodologies for determining dry bulk density, organic matter fraction, and carbon fraction. | character | |
cs137_counting_method | Code indicating the method used for determining radiocesium activity. | factor | alpha = Alpha counting method used. gamma = Gamma counting method used. |
pb210_counting_method | Code indicating the method used for determining lead 210 activity. | factor | alpha = Alpha counting method used. gamma = Gamma counting method used. |
excess_pb210_rate | Code indicating the mass or accretion rate used in the excess_pb_210_model | factor | mass accumulation = Excess 210Pb modeled using mass accumulation rate. accretion = Excess 210Pb modeled using vertical accretion rate. |
excess_pb210_model | Code indicating the model used to estimate excess lead 210. | factor | CRS = Constant rate of supply model used. CIC = Constant initial concentration model used. CFCS = Constant flux constant sedimentation model used. |
ra226_assumption | Code indicating the assumption used to estimate the core’s background 226Ra levels. | factor | each sample = 226Ra was measured for each sample. total core = 226Ra was measured for the total core, at asymptote = asy |
c14_counting_method | Code indicating the method used for determining radiocarbon activity. | factor | AMS = Accelerator mass spectroscopy used. beta = Beta counting used. |
dating_notes | Any submitter defined notes elaborating on the process of dating the core not yet made clear by the coding. | character | |
age_depth_model_reference | Code indicating the reference or 0 year of the age depth model. | factor | YBP = Year zero is defined as years before present, 1960 CE. CE = Year zero is set according to Common Era and Before Common Era standards. core collection date = Year zero is set as the core’s collection year. |
age_depth_model_notes | Any submitter defined notes on how the age depth model was created. | character |
Site Level
Site information provides important context for your study. You should describe the site and how it fits into your broader study, provide geographic information (although this can be generated automatically from the cores as well), and add any relevant tags and notes regarding site vegetation and inundation. Vegetation and inundation can alternatively be incorporated into the core-level data, whatever makes the most sense for your study design.
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. | character | |
site_id | Site identification code unique to each study. | character | |
site_description | Site description including relevant study details and political geographic units. Some of these descriptions can be automated by the ingestion code. | character | |
site_latitude_max | Maximum latitude defining a bounding box for the site in decimal degree World Geodedic System of 1984 (WGS84). This can also be generated automatically by the ingestion code. | numeric | degree |
site_latitude_min | Minimum latitude defining a bounding box for the site in decimal degree WGS84. This can also be generated automatically by the ingestion code. | numeric | degree |
site_longitude_max | Maximum longitude defining a bounding box for the site in decimal degree WGS84. This can also be generated automatically by the ingestion code. | numeric | degree |
site_longitude_min | Minimum longitude defining a bounding box for the site in decimal degree WGS84. This can also be generated automatically by the ingestion code. | numeric | degree |
site_boundaries | As an alternative to submitting or automatically generating a bounding box, submitters can include a shapefile (.shp) or keyhole markup language (.kml) documenting the geographic boundaries of the site. This can be converted to and stored in well known text (WTK) format. | character | |
salinity_class | Code based on submitter field observation or measurement indicating average annual salinity (Note: Palustrine and freshwater should only include tidal wetlands, or wetlands that are potentially/formerly tidal but artificially freshened due to artificial tidal restrictions). | factor | estuarine C-CAP = 5-35 parts per thousand salinity (ppt) according to the coastal change analysis program. palustrine C-CAP = < 5 ppt according to the coastal change analysis program. estuarine = 0.5-35 ppt according to most other definitions. palustrine = < 0.5 ppt according to most other definitions. brine = >50 ppt. saline = 30-50 ppt. brackish = 0.5-30 ppt. fresh = <0.5 ppt. mixoeuhaline = 30-40 ppt. polyhaline = 18-30 ppt. mesohaline = 5-18 ppt. oligohaline = 0.5-5 ppt. |
salinity_method | Indicate whether salinity_class was determined using a field observation or a measurement. | factor | field observation = Salinity inferred by field observation such as vegetation. measurement = Salinity observed from local instrument. |
salinity_notes | Any relevant submitter generated notes on how salinity_class was determined. | character | |
vegetation_class | Code based on submitter field observations or measurement indicating dominant wetland vegetation type. | factor | emergent = Describes wetlands dominated by persistent emergent vascular plants. scrub shrub = Describes wetlands dominated by woody vegetation <= 5 meters in height. forested = Describes wetlands dominated by woody vegetation > 5 meters in height. forested to shrub = Dominated by forested to scrub/shrub biomass. forested to emergent = Dominated by forest and underlying marsh. seagrass = Describes tidal or subtidal communities dominated by rooted vascular plants. |
vegetation_method | Indicate whether vegetation_class was determined using a field observation or a measurement | factor | field observation = Vegetation inferred by field observation. measurement = Vegetation measured by counts or plots. |
vegetation_notes | Any relevant submitter generated notes on how vegetation_class was were determined | character | |
inundation_class | Code based on submitter field observation or measurement indicating how often the coring location is inundated | factor | high = Study-specific definition of an elevation relatively high in the tidal frame, typically defined by vegetation type. mid = Study-specific definition of an elevation in the relative middle of the tidal frame, typically defined by vegetation type. low = Study-specific definition of an elevation in relatively low in the tidal frame, typically defined by vegetation type. levee = Study-specific definition of a relatively high elevation zone built up on the edge of a river, creek, or channel. back = Study-specific definition of a relatively low elevation zone behind a levee. |
inundation_method | Indicate whether inundation_class was determined using a field observation or a measurement | factor | field observation = Inundation inferred by field observation such as vegetation. measurement = Inundation class assessed from elevation and nearby tide gauge or other similar method. |
inundation_notes | Any relevant submitter generated notes on how inundation was determined | character |
Core Level
Note that positional data can be assigned at the core level, or at the site level. However, it is important that this is specified, that site coordinates are not attributed as core coordinates, and that the method of measurement and precision is noted. Vegetation and inundation can alternatively be incorporated into the site-level data, whatever makes the most sense to your study design. In the future this level of hierarchy will be complemented by a ‘subsite level’ as this level of hierarchy can handle any sublocation information such as vegetation plot, and instrument location/description.
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id |
Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. |
factor | NA |
site_id | Site identification code unique to each study. | factor | NA |
core_id | Core identification code unique to each site. | factor | NA |
core_date | Date of core collection. | Date | YYYY-MM-DD |
core_notes | Any other relevant submitter generated notes on how cores were collected. | character | |
core_latitude | Positional latitude of the core in decimal degree WGS84. | numeric | degree |
core_longitude | Positional longitude of the core in decimal degree WGS84. | numeric | degree |
core_position_accuracy | Accuracy of latitude and longitude measurement, if determined and recorded. | numeric | meter |
core_position_method | Code indicating how latitude and longitude were determined. | factor |
RTK = Real-time kinematic global position system (GPS). handheld = Conventional Commercially available hand-held GPS. other high resolution = Any other technique resulting in positional error < 1 meter. other moderate resolution = Any other technique resulting in positional error < 30 meters. other low resolution = Any other technique resulting in positional error > 30 meters. |
core_position_notes | Any relevant submitter generated notes on how latitude and longitude were determined. | character | |
core_elevation | Surface elevation of the core relative to defined datum. | numeric | meters |
core_elevation_datum | The datum relative to which the core elevation was measured against (For a complete list of datum names and aliases please refer to the ISO Geodedic Registry https://geodetic.isotc211.org/). | factor |
NAVD88 = A gravity-based geodetic datum, North American Vertical Datum of 1988. MSL = A tidal datum, Mean Sea Level as measured against a local tide gauge. MTL = A tidal datum, Mean Tidal Level as measured against a local tide gauge. MHW = A tidal datum, Mean High Water as measured against a local tide gauge. MHHW = A tidal datum, Mean Higher High Water as measured against a local tide gauge. MHHWS = A tidal datum, Mean Higher High Water for Spring Tides as measured against a local tide gauge. MLW = A tidal datum, Mean Low Water as measured against a local tide gauge. MLLW = A tidal datum, Mean Lower Low Water as measured against a local tide gauge. |
core_elevation_accuracy | Accuracy of elevation measurement, if determined and recorded | numeric | meters |
core_elevation_method | Code indicating how elevation was determined | factor |
RTK = Real-time kinematic GPS. other high resolution = Any other technique resulting in positional error < 5 cm of random error. LiDAR = Handheld GPS matched to lidar-based digital elevation model. DEM = Handheld GPS matched to another digital elevation model. other low resolution = Any other technique resulting in positional error > 5 cm of random error. |
core_elevation_notes | Any relevant submitter generated notes on how elevation was determined | character | |
salinity_class | Code based on submitter field observation or measurement indicating average annual salinity (Note: Palustrine and freshwater should only include tidal wetlands, or wetlands that are potentially/formerly tidal but artificially freshened due to artificial tidal restrictions). | factor |
estuarine C-CAP = 5-35 parts per thousand salinity (ppt) according to the coastal change analysis program. palustrine C-CAP = < 5 ppt according to the coastal change analysis program. estuarine = 0.5-35 ppt according to most other definitions. palustrine = < 0.5 ppt according to most other definitions. brine = >50 ppt. saline = 30-50 ppt. brackish = 0.5-30 ppt. fresh = <0.5 ppt. mixoeuhaline = 30-40 ppt. polyhaline = 18-30 ppt. mesohaline = 5-18 ppt. oligohaline = 0.5-5 ppt. |
salinity_method | Indicate whether salinity_class was determined using a field observation or a measurement | factor |
field observation = Salinity inferred by field observation such as vegetation. measurement = Salinity observed from local instrument. |
salinity_notes | Any relevant submitter generated notes on how salinity_class was determined | character | |
vegetation_class | Code based on submitter field observations or measurement indicating dominant wetland vegetation type. | factor |
emergent = Describes wetlands dominated by persistent emergent vascular plants. scrub shrub = Describes wetlands dominated by woody vegetation < 5 meters in height. forested = Describes wetlands dominated by woody vegetation > 5 meters in height. seagrass = Describes tidal or subtidal communities dominated by rooted vascular plants. |
vegetation_method | Indicate whether vegetation_class was determined using a field observation or a measurement | factor |
field observation = Vegetation inferred by field observation. measurement = Vegetation measured by counts or plots. |
vegetation_notes | Any relevant submitter generated notes on how vegetation_class and dominant_species were determined. | character | |
inundation_class | Code based on submitter field observation or measurement indicating how often the coring location is inundated. | factor |
high = Study-specific definition of an elevation relatively high in the tidal frame, typically defined by vegetation type. mid = Study-specific definition of an elevation in the relative middle of the tidal frame, typically defined by vegetation type. low = Study-specific definition of an elevation in relatively low in the tidal frame, typically defined by vegetation type. levee = Study-specific definition of a relatively high elevation zone built up on the edge of a river, creek, or channel. back = Study-specific definition of a relatively low elevation zone behind a levee. |
inundation_method | Indicate whether inundation_class was determined using a field observation or a measurement | factor | field observation = Inundation inferred by field observation such as vegetation. measurement = Inundation class assesed from elevation and nearby tidegauge or other similar method. |
inundation_notes | Any relevant submitter generated notes on how elevation was determined | character | |
core_length_flag | Indicated whether or not the coring team believes they recovered a full sediment profile, down to bedrock, or other non-marsh interface. | factor | core depth limited by length of corer = The total depth of the core was limited by the length of the coring device. core depth represents deposit depth = Authors report that the depth of the core represents the depth of the wetland soil deposit. not specified = Authors did not specify whether or not the depth of the core represents the depth of the wetland soil deposit. |
Soil Depth Series
This level of hierarchy contains the actual depth series information. At minimum a submission needs to specify minimum and maximum depth increments, dry bulk density, and either fraction organic matter or fraction carbon. Sample ID’s should be used in the case that there are multiple replicates of a measurements. There is plenty of room for recording raw data from various dating techniques as well as age depth models.
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. | character | |
site_id | Site identification code unique to each study | character | |
core_id | Core identification code unique to each site | character | |
depth_min | Minimum depth of a sampling increment. | numeric | centimeter |
depth_max | Maximum depth of a sampling increment. | numeric | centimeter |
sample_id | Sample identification unique to the core. This should be used in the case that there are relevant lab specific sample codes, or in the case that there are multiple replicate samples. | character | |
dry_bulk_density | Dry mass per unit volume of a soil sample. This does not include ash free bulk density. | numeric | gramsPerCubicCentimeter |
fraction_organic_matter | Mass of organic matter relative to sample dry mass. Ash free bulk density should not be used here but should be expressed as a loss on ignition fraction. | numeric | dimensionless |
fraction_carbon | Mass of carbon relative to sample dry mass. | numeric | dimensionless |
compaction_fraction | Fraction of the sample depth interval reduced due to compaction. | numeric | dimensionless |
compaction_notes | Any submitter generated notes on compaction. | character | |
cs137_activity | Radioactivity counts per unit dry weight for radiocesium (137Cs). | numeric | becquerelPerKilogram |
cs137_activity_sd | 1 standard deviation of uncertainty associated with cs137_activity. | numeric | becquerelPerKilogram |
total_pb210_activity | Total radioactivity counts per unit dry weight for excess lead 210 (210Pb). | numeric | becquerelPerKilogram |
total_pb210_activity_sd | 1 standard deviation of uncertainty associated with total_pb210_activity. | numeric | becquerelPerKilogram |
ra226_activity | Total radioactivity counts per unit dry weight for Radium 226 (226Ra) if measured as part of the 210Pb dating process. | numeric | becquerelPerKilogram |
ra226_activity_sd | 1 standard deviation of uncertainty associated with ra226_activity. | numeric | becquerelPerKilogram |
excess_pb210_activity | Excess radioactivity counts per unit dry weight for excess lead 210 (210Pb). | numeric | becquerelPerKilogram |
excess_pb210_activity_sd | 1 standard deviation of uncertainty associated with excess_pb210_activity. | numeric | becquerelPerKilogram |
c14_age | Radiocarbon age as estimated from AMS measurements. | numeric | radiocarbonYear |
c14_age_sd | Estimated uncertainty in c14_age. | numeric | radiocarbonYear |
c14_material | Description of the material selected for radiocarbon (14C) dating. | character | |
c14_notes | Any relevant submitter generated notes on 14C dating process. | character | |
delta_c13 | The isotopic signature of 13C. This is oftentimes measured along with c14_age and can be useful for analyzing carbon lability and provenance. | numeric | partsPerMillion |
be7_activity | Radioactivity counts per unit dry weight for 7Be. | numeric | becquerelPerKilogram |
be7_activity_sd | Estimated uncertainty in be_7_activity. | numeric | becquerelPerKilogram |
am241_activity | Radioactivity counts per unit dry weight for 241Am. | numeric | becquerelPerKilogram |
am241_activity_sd | Estimated uncertainty in am_241_activity. | numeric | becquerelPerKilogram |
marker_date | The age of any other dated depth horizon such as an artificial marker, pollen horizon, pollution horizon, etc. | Date | YYYY-MM-DD |
marker_type | Code indicating the type of marker. | factor | artificial horizon = Horizon was added to the surface artificially by using materials such as feldspar, glitter, or rare earth elements. pollen = Pollen analysis was used to tie horizon to the timing of vegetation change such as the arrival of invasives, or the beginning of local agriculture. pollution = Chemical analysis was used to tie the horizon to the timing of a pollution event. tsunami = Sediment analysis was used to tie the horizon to the timing of a tsunami event. |
marker_notes | Any other submitter generated notes about the origin of the marker. | character | |
age | Most likely, median, or mean age of the depth interval from submitter generated age depth model. | numeric | year |
age_min | Minimum age of the depth interval from submitter generated age depth model. | numeric | year |
age_max | Maximum age of the depth interval from submitter generated age depth model. | numeric | year |
age_sd | Standard deviation of age estimate from submitter generated age depth model. | numeric | year |
depth_interval_notes | Any other submitter generated notes specific to the depth interval. | character |
Multiple Special Conditions at the Level of the Site or Core
Because there may be multiple observations or conditions that are part of the study, such as species present, or degradation or restoration activities, that can affect a site or core, these are archived separately.
Dominant Species Present
You can record species codes associated with sites and/or cores. The CCRCN is species code system is derived from the USDA PLANTS Database, and for most taxa, the code consists of the first two letters of genus follow by the first two letters of the species (e.g., "Spartina alterniflora" = "SPAL").
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. | character | |
site_id | Site identification code unique to each study. | character | |
core_id | Core identification code unique to each site. | character | |
species_code | Code associated with a species or a vegetation assemblage. | character |
Anthropogenic Impacts Present
You can record various codes associated with degradation or restoration conditions at sites and/or cores.
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. | character | |
site_id | Site identification code unique to each study. | character | |
core_id | Core identification code unique to each site. | character | |
impact_class | Code indicating any major anthropogenic impacts historically and currently affecting the coring location. | factor | tidally restricted = Tidal flow is muted or blocked by built structures. impounded = Water level is raised artificially by a tidal restriction, resulting in ponding of water on the wetland and or upland surface. managed impounded = Wetland is impounded seasonally, and other times natural or semi natural hydrology occurs. ditched = Tidal hydrology is altered because artificial ditches have been cut to promote tidal flooding and drainage. diked and drained = The wetland has been diked and drained, with or without flapper gates, pumping, or other means. farmed = Managed impoundment or drainage in which wetland has been converted to agricultural land. tidally restored = Tidal flow has been restored by removing an artificial obstruction. revegetated = Wetland vegetation has been reintroduced by replanting on unvegetated surfaces. invasive plants removed = Natural plant communities have been restored by the active removal of invasive plant species. invasive herbivores removed = Tidal wetland vegetation has been managed by the removal of invasive herbivores. sediment added = Elevation has been managed by artificially adding sediment to the site using techniques such as thin layering or sediment diversion. wetlands built = Constructed wetland using sediments such as dredge spoils or other sediment source. |
Submitter Defined Attributes and Definitions
Part of the reason we control these attribute and variable names are so that the dataset does not become unmanageable, and we can deliver products that run cleanly and smoothly to you. However, we know that research is complicated, and not all of the data you want to include can be represented here. As long as it fits within this hierarchy, we allow you to submit user defined attributes.
Study Level Species Table
If species codes or common names are used anywhere in the study, there should be a separate table included defining all names using scientific names. The CCRCN is species code system is derived from the USDA PLANTS Database, and for most taxa, the code consists of the first two letters of genus follow by the first two letters of the species (e.g., "Spartina alterniflora" = "SPAL").
attribute name | definition | data type | format, unit or codes |
---|---|---|---|
study_id | Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. | character | |
species_code | Code associated with a species or a vegetation assemblage. | character | |
genus | Genus according to the most up to date classification. | character | |
species | Species according to the most up to date classification. | character | |
sub_species | Any nomenclature referring to subspecies special cases. | character | |
hybrid | Any nomenclature referring to special cases of hybridization. | character | |
common_name | Common name associated with the species, especially if it is referred to in any accompanying text. | character | |
species_notes | Any other submitter defined notes regarding the species. | character |
Other Attributes and Variables
Any submitter-defined attributes should be included in a separate table indicating the associated level of hierarchy, attribute name, data type (date, factor, character, or numeric). Attribute names should follow good naming practices: self-descriptive, don’t start with a number or special character, no spaces. Dates should be stored as a character string and should have an accompanying ‘string format’ indicating the position, number of digits and deliminators for the date time. For example June twenty-sixth two-thousand eighteen written as 2018-06-26 would be formatted as ‘YYYY-MM-DD’. Here is a handy dateTime reference. Numeric values should have their units defined. Factors (i.e. categorical variables) should be defined in a separate table.
level of hierarchy | attribute name | description | data Type | format, unit |
---|---|---|---|---|
ex. site level or core level | (your column name here. [use good naming conventions]) | (describe your attribute here.) | Date, factor, character, or numeric | (extra necessary info here) |
Variable names, like attribute names, should be self-descriptive. Such as ‘experimental’ or ‘control’ as opposed to ‘1’ and ‘2’.
level of hierarchy | attribute name | categorical variable name | description |
---|---|---|---|
ex. site level or core level | (parent column name here) | (your variable name here.) | (describe your variable) |
That’s It
You now know everything there is to know about soil carbon data management.