NSCN Data Management
If you are embarking on a new project or proposal, the NSCN Database is well suited to be your solution for data management and long-term accessibility. This page has materials, free for your own adaptation, that can be used to describe how the NSCN Database fits into your data management plans.
- NSCN Database flyer: This short document has an overview of database structure and function. It may be used as a source of text, or for a general overview of the database before you begin writing your data management section.
Information for NSF Proposals
NSF Data management requirements consist of 5 components, which are described generally in the Grant Proposal Guide. There are also 5 (largely overlapping) requirements associated with RFPs issued by the BIO Directorate, which is likely the origin of most NSF opportunities pursued by NSCN members. These requirements are summarized below, and each is followed by information that describes how contributing your project data to the NSCN Database will meet the intent of the NSF data management requirements.
- Types of data and science products developed by the project. The NSCN Database is a point-based, georeferenced, relational database for all types of soil information. Its fundamental capabilities are the storage of soil chemical and physical data related to carbon inventory, with associated metadata such as ecologic, geographic, analytical, experimental, historical, and contributor information. The most common types of data stored in the NSCN Database are geo-referenced research sites, along with the soil profiles sampled within these sites and the incremental soil layers that comprise the individual soil profiles. Through this hierarchical design, data contributors are able to store comprehensive soil characterization datasets of any size, with associated documentation of methods and uploading of relevant papers and other files also supported. The NSCN Database is dynamic and flexible, and supports storage of and access to other types of data, including soil fractions, spectral datasets, and images. There may also be limited support for storage of physical samples (archived soils) in several member archive facilities; contact NSCN Support for information about these opportunities.
- Standards to be used for data and metadata format and content. The NSCN Database shares standards for data and metadata format with two widely used schemes. Ecologic, geographic, experimental, historical, and contributor data are stored using the same variable names and conventions employed by the FLUXNET Synthesis Network, a sister science effort with which the NSCN Database shares server space and computational resources. Soil characterization data in the NSCN Database are described using the variable names employed by the USDA-NRCS, Soil Survey Lab in its NCSCD and NASIS databases. Basic analytical and methods nomenclature for soil datasets follows USDA-NRCS conventions, but there is considerable flexibility built-in to describe methods that lack widely implemented standards (e.g., soil fractionation schemes). For these types of data that lack widely accepted standards, the design of the NSCN Database encourages development of broader standards by categorizing information at coarse levels (e.g., denoting fractionations as 'density,' 'chemical,' etc.) while accommodating detailed, user-specified information in fine-level methods description. Input and output of data to/from the NSCN Database is supported for comma-separated value, Microsoft Excel (at present), Access (upon request), and NetCDF (in development) formats.
- Physical and/or cyber resources and facilities, dissemination methods used to store the data and make them accessible. The NSCN Database is housed on infrastructure developed and maintained by DOE-Lawrence Berkeley National Lab, the UC Berkeley Water Center, University of Virginia, and Microsoft Research. The data server infrastructure consists of SQL Server databases and a Sharepoint web portal, designed around maximizing data usability. Data received from contributors is normalized to a standardized template and processed to provide calculation of variables of interest to users (e.g., carbon to 1 meter). A data curation process is in place to ensure that provenance of contributed datasets is clearly tracked over time. The databases, portal, and servers are backed up regularly, and access is provided via download or online viewing of database contents on the National Soil Carbon Network website (soilcarb.net). Users may access pre-packaged Excel reports containing various sub-sets of the database, or create their own Excel reports by specifying variable sets and geographic constraints for the data of interest.
- Policies for data sharing and access, including provisions for privacy, security, intellectual property, and production of derivatives. Access to and use of data from the NSCN Database is controlled by a data policy that ensures reasonable data quality, fair use, and appropriate citation of data contributors. Database access is available on a secured portion of the NSCN website open only to users who have obtained a password-protected account; all visitors to and users of the NSCN website and its resources are logged. The data policy delineates three phases through which each data submission moves from initial contribution (Phase I), to data validation (Phase II) to release for consumption by the NSCN membership (Phase III). These three phases maintain the privacy of newly submitted datasets and provide for a quality assessment period during which the data are unavailable to all but the contributor and authorized associates. Upon release to the database (Phase III), fair use provisions explicitly describe terms for citation of contributors' published data, and in relevant cases, acknowledgment of unpublished data. These terms ensure that proper credit is given to the individuals, agencies, or institutions that own the data and associated intellectual property. Policies for the production of derivatives from the NSCN Database (e.g., datasets provided to other networks) will be developed by the Scientific Steering Group in 2012. These policies will not be prohibitive but rather based on current practices in the science synthesis community (e.g., CLM, ESG) that use database archival/versioning features to maintain unique identifiers of dataset provenance and ownership.
- Rights and obligations of all parties with regard to responsibilities for management and retention of research data. Per the NSCN Data Policy, data contributors have two primary obligations: 1) identify any restrictions on publishing data location information; 2) declare the phase of the data at the time of submission and upon any data status changes (e.g., publication and release to Phase III). Furthermore, it is highly recommended that the data contributor or authorized representative maintain contact with NSCN Database managers from the time of data submission until after the data are transitioned to Phase III and released to the Database, in order to perform QA/QC and answer questions from data users. The obligations of the NSCN Database team also are twofold: 1) protect location information for sensitive sites; 2) release data to the NSCN membership via the Database according to contributor-approved data phase transitions. Database managers shall also make reasonable efforts to accomplish management, facilitate retention and use of contributed datasets by performing normal database maintenance and curation activities, assisting database users in basic navigation and interpretation of data products, and providing alternative data access mechanisms as possible. Because the NSCN Database is a mechanism for maintaining long-term access to datasets that have been adapted to its format, researchers wishing to: a) preserve their data in the form in which they were originally collected or b) permanently archive redundant copies are urged to consider alternate arrangements such as NSF DataNets.