Skip to content

Data Management Plan

Christina Bandaragoda edited this page Aug 17, 2018 · 9 revisions

Data and Metadata Standards

We plan to organize and share the data we assemble within the HydroShare data repository (see plans for archival below). HydroShare makes full use of existing and emerging standards for sharing environmental datasets. All HydroShare resources (i.e, datasets, models, scripts, etc.) are described with metadata that conform to the Dublin Core metadata standard (DCMI, 2012), conform to a data model that is an implementation of the Open Archives Initiative’s Object Reuse and Exchange (OAI-ORE) standard (Lagoze et al., 2008), and are stored on disk and packaged for download using the BagIt hierarchical file packaging specification (Boyko et al., 2012). These standards are well known within the library, information science, and digital archiving communities. HydroShare has also adopted standard file formats for the content files of known resource/data types. For example, HydroShare uses Version 2 of the Observations Data Model (ODM2) for time series data (Horsburgh et al., 2016), the Network Common Data Form (NetCDF) for multidimensional space/time datasets, ESRI shapefiles for vector geospatial data, and the GeoTIFF format for raster datasets. Part of the contribution of this project will be to extend HydroShare’s capabilities of storing data derived from environmental samples using ODM2 for archival in HydroShare. This combination of standard data formats, standardized metadata description, and standard packaging means that HydroShare resources are publishable and fully archivable.

Policies for Data and Research Products

The goal of assembling Hurricane Maria-related data in HydroShare is to promote collaboration around, sharing, and reuse of these data. As such, all of the data we assemble as a direct result of this project will be publicly, and freely available in HydroShare using a Creative Commons License, where applicable. HydroShare is a data repository and collaboration environment. We anticipate that by curating and sharing Maria-related data in HydroShare we will enable users to not only discover and access the original data, but also to create derivative products through collaborative analyses and research that can also be shared within HydroShare.

When creating derivative products, HydroShare provides users with the choice to create public or private resources and public or private collaboration groups, accessible only to selected users, within which these activities can take place. Groups of researchers may wish to share data, model instances, or simulation results derived from the Hurricane Maria data within their group before they are published externally. To support this, authentication and access control have been fully integrated within HydroShare, and users can already choose how to share resources with other users or the larger community. Our experience has been that collaborative activities may result in multiple intermediate research products, only some of which may be considered publishable by the researchers. As such, HydroShare has functionality for creating formal, tracked versions of resources. Users can choose the Creative Commons License under which their resources are shared, and HydroShare has already established the facilities required to formally publish data, models, and simulation results, enabling individual researchers to select and publish their results as they see fit. Formally published resources are made immutable and receive a citable digital object identifier (DOI). Access to private resources and private research groups is at the discretion of resource and group owners. Final research results can be made freely and publicly available when they are deemed publication ready by the author. These collaboration capabilities make HydroShare an ideal location for publishing Maria data in a way that they can be cited, reused, and formally linked to derivative products.

All HydroShare resources have a landing page the displays the resource’s metadata and contents, including attribution information (i.e., authors and contributors, funding agency credits, etc.) and a formal citation. HydroShare users must agree to a formal publication agreement prior to formally publishing a resource. This agreement was developed in collaboration with the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) and specifies the terms and conditions under which users can publish resources in HydroShare.

Plans for Archiving Data

HydroShare will serve as the archival system for results from this project. Curated research products (e.g., datasets, models, etc.) published in HydroShare are citable for use in peer-reviewed journal articles, conference presentations and proceedings, and other formal publications using a formal digital object identifier (DOI). HydroShare and all of its attendant systems are hosted on fault-tolerant, enterprise-class servers dedicated to the HydroShare system and housed in the Renaissance Computing Institute’s (RENCI’s) managed, climate controlled, UPS-backed information technology facility, ensuring the reliability of the HydroShare system. All source code developed as part of this project will be openly shared in GitHub repositories associated with the HydroShare project.

Guidelines for metadata management using the CUAHSI Project and HydroShare Resources.

The umbrella discovery location for project resources is at www.cuahsi.org/projects/maria2017 is the keyword to apply to HydroShare resources. Hydroshare metadata fields for Owner, AUthor, Contributor, and Funding will be annually updated to compile a list of project collaborators and related code, data, and

Drinking water campaign data protection and sharing

  1. Drinking water campaign data collected by this project will be made publicly available on HydroShare when peer-reviewed articles reporting the data are submitted or at the end of the funded project, whichever comes first.
  2. Data reporting will include all details about the samples (pH, temperature, gene copies, culture numbers, etc).
  3. GPS locations for small distribution systems, or any personally or system based identifiable information will NOT be made publicly available. A de-identifying system will mask the identities of the participants, with information aggregated at the municipality scale, e.g. identification will be as "small systems in the Patillas region". Specific characteristics about the systems - size, materials used, treatment (if applicable), and any other pertinent but non-identifiable information will be included.
  4. For watershed samples taken from surface streams that are not taken on personal property and are publicly accessible sites, GPS coordinates will be included, as well as the details about the samples.
  5. VT will work through the NSF PIRES IRB protocol and with the IAU team directly to communicate our results to the individuals whose water we sample. VT will draft those materials with guidance and input from Graciela (to do the brunt of the work associated with effectively communicating to those individuals), but IAU’s team will interface with them directly. VT or other team members will not communicate directly with any community member without IAU permission and/or assistance.
  6. VT and IAU will share all data, sample collection and analysis procedures, QAQC methods, protocols, etc. internally with the NSF RAPID collaborators (UW and USU, or others as needed with permission) as soon as possible after the start of the project (for methods and protocols) or as soon as possible after samples have been collected and analyzed (for data results). This data will be privately posted on HydroShare and shared with the “Puerto Rico Water Studies: Confidential” HydroShare group so that project team members can work with the data internally towards their final product as expeditiously as possible. As specified in Item 1, drinking water campaign data and resources will remain private until peer-reviewed articles are submitted or until the end of the funded project, whichever comes first. Datasets will then be made publicly accessible under the terms of the project’s NSF-approved data management plan.

Reference for HIPAA identifier to remove https://cphs.berkeley.edu/hipaa/hipaa18.html

Data Archive Deliverables for Data Sharing Agreement Design

Publicly distributed on HydroShare, journals, websites, etc.

  • Patillas map summary on post-maria drinking water quality; includes data sharing agreement and description of private data and how to contact to use the research.
  • Deidentified/template example of a community system result
  • public stream water quality result with source area info
  • deidentified/template example CI.

Privately distributed/private on HydroShare:

  • community system results (6)
  • Source area watersheds for all areas of interest in study areas
  • Household results
  • Long form table with code to convert from long form tables to short form tables