NTU DMP Template v2 (15 Jan 2018 - 17 Jun 2020) - Research Data Management - LibGuides at Nanyang Technological University

Data Management Plan Template v2

The NTU DMP template v2 (15 Jan 2018 - 17 Jun 2020) is no longer in use.

DMP data for active projects has been mapped to DMP template v 3 and migrated to RISE (Research Information System).

For earlier offline versions of DMP template, please see below:

Version 2 (15 Jan 2018 - 17 Jun 2020)
- Offline DMP template (Sign in here to download the PDF)
- Comparison between versions 1 and 2 (Sign in here to download the PDF)
Version 1 (14 Apr 2016 - 14 Jan 2018)
- Offline DMP template (Sign in here to download the PDF)

The following is a compilation of the NTU DMP version 2 template questions, guides and samples:

DMP Question 1

Types and Size of data

a. What data will you be collecting or creating?

b. What is the estimated size of the project data?

Guide for a.:

Describe type of data e.g. quantitative, qualitative, survey data, experimental measurements, models, images, audio-visual data, samples etc.
Describe format of data e.g. text, numeric, audio-visual, models, computer code, discipline-specific, instrument-specific.
Indicate which data are of long-term value and should be shared and/or preserved.
Are there any existing data or methods that you can use? This could include data from earlier projects or third-party sources. Provide the title, author, date, URLs/name of these sources. Do you need to pay to reuse existing data? If purchasing or reusing existing data sources, explain how issues such as copyright and IPR have been addressed.
Consider how your data could complement and integrate with existing data.

Additional Information:

You may refer to Re3data for a list of data repositories where you might find existing relevant third-party research data.

Guide for b.:

Do you have sufficient storage or should you include costs for more?
Will the scale of the data pose challenges when sharing or transferring data between sites?
Have you consulted with a data repository to determine preservation costs?
Consider the implications of data volumes in terms of storage, backup and access.
Consider how the data volume will grow to make sure any additional storage and technical support required can be met.

SAMPLE 1:

Class observation data, faculty interview data and student survey data will be collected. The data will be collected during the research period (Jan 2013 – Dec 2013). Most of the data will be in text format (notes, paper survey).

(Adapted from: Cmor, D., & Marshall, V. (2006). Librarian Class Attendance: Methods, Outcomes and Opportunities. 27th Annual IATUL Conference.)

SAMPLE 2:

Experimental and observational data in physical paper format will be collected. These are data related to production and decomposition, ecophysiological functional traits, soil extractable nutrients and mineralization rates.
As these original data in physical paper format will be used to identify outliers and possible transcription errors, the physical paper copies will be kept for at least 10 years.

(Adapted from: Cleland, E., Lipson, D., & Kim, J. The influence of plant functional types on ecosystem responses to altered rainfall. Retrieved Nov 24, 2015, from UC San Diego ‍Sample ‍NSF ‍Data ‍Management ‍Plans ‍website: ‍ https://library.ucsd.edu/research-and-collections/research-data/_files/dmpsample/DMP-Example-Cleland.pdf)

SAMPLE 3:

Experimental lab data will be collected using microscope. The data generated will be time- and location- stamped image files of natural resources in Delaware County, PA. The images will be served as a record of the occurrence of creatures, natural artefacts, and conditions at specific places and times during the period 2003 through 2011.
For many of the photos, taxonomic information and metadata will also be available. The occurrence data will be observational and qualitative. Metadata files shall be retained to facilitate reuse.

(Adapted from: Hampton, S. Examples of Data Management Plans. Retrieved Nov 24, 2015, ‍from ‍DataOne ‍website: ‍https://www.dataone.org/sites/all/documents/ESA11_SS3_hampton.pdf‍)

SAMPLE 4:

Recorded oral interviews from 30 residents will be collected at the Nnindye community located in the Mpigi district in Uganda over a period of 6 months in the form of photos and videos.

(Adapted from: Sapp Nelson, Megan and Beavis, Katherine (2013) “History / Sustainable Development – Purdue University,” Data Curation Profiles Directory: Vol. 5, Article 1. http://dx.doi.org/10.7771/2326-6651.1032 )

SAMPLE 5:

The primarily public data from 2000 to 2015 from the US Census Bureau will be acquired. Some preliminary (non-public) Census data, and some other sources, e.g. the US Bureau of Labour Statistics, and New York State Dept of Health will also be purchased and gathered.

(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

SAMPLE 6:

Primary data of audio files including Cheyenne and English language will be collected. Text files are generated after the files are transcribed.

(Adapted from: Tancheva, Kornelia (2012) “Linguistics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 7. ‍
‍http://dx.doi.org/10.5703/1288284315007)

SAMPLE 7:

Sensor data, images and possibly 3rd party data (weather and road conditions) will be collected. Data is saved as excel spreadsheets and in SQL database.

(Adapted from: Carlson, Jake R. (2009) “Traffic Flow – Purdue University,” Data Curation Profiles Directory: Vol. 1, Article 4. http://dx.doi.org/10.5703/1288284315016)

SAMPLE 8:

Experimental data will be generated from pressure sensors using Labview and generated from chromatographs. They includes variety of files including text, video specific to the equipment involved.

(Adapted from: Kashyap, Nabil (2011) “Aerospace Engineering / Chemical Kinetics – University of Michigan,” Data Curation Profiles Directory: Vol. 3, Article 1. http://dx.doi.org/10.5703/1288284314989)

SAMPLE 9:

Field data from survey & bioessays will be collected using excel spreadsheet. Raw data of samples from lab will be collected using proprietary instrument. Ancillary data includes GIS data.

(Adapted from: Wright, Sarah J. (2012) “Environmental Science / Herbivory – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 3. http://dx.doi.org/10.5703/1288284315002)

SAMPLE 10:

Quantitative data will be collected using motion capture system. The processed data types will include Matlab files, MS Excel files, codebook texts, and graphical files.

(Adapted from: Cragin, Melissa; Kogan, Marina; and Collie, Aaron (2011) “Bio-Mechanics Motion Studies – University of Illinois Urbana-Champaign,” Data Curation Profiles Directory: Vol. 3, Article 6. http://dx.doi.org/10.5703/1288284314998)

DMP Question 2

Collection Methods and Data Acquisition

How will the data be collected and processed?

Describe data collection method, e.g. observational, experimental, simulation, derived/compiled.
Describe the methods and standards that you will adopt to ensure quality data. This may include processes such as calibration, repeat samples or measurements, standardised data capture, data entry validation, peer review of data or representation with controlled vocabularies.
Describe how the data will be organised in your research project e.g. naming conventions, version control, folder structures, any community data standards (if any) will be used.

Additional Information:

Consistent, well-ordered research data will be easier for the research team to find, understand and reuse.
See DataOne Best Practices for data quality.
Learn more about file naming conventions:
- Best practices for file naming (Source: Stanford University Libraries)
- Managing and sharing data: best practice for researchers, pg. 13 (Source: UK Data Archive)
Learn more about folder structures:
- Organising data: file structure (Source: UK Data Archive)
- Managing and sharing data: best practice for researchers, pg. 13-14 (Source: UK Data Archive)

SAMPLE 1:

Most datasets will be collected 1-3 times per year for a period of 3 years. Temperature, light availability and soil moisture at multiple depths in the experiment will be logged every 15 minutes. These data will be stored on local data loggers and downloaded every two weeks.

Data originally recorded on paper will be transferred into spreadsheets using .csv formats. DGVM simulation runs will be performed on a high performance parallel computing platform, a 96-node Linux cluster, maintained jointly by USFS Pacific Northwest Research Station and Oregon State University. DGVM output will be analysed and displayed with the ESRI ArcGIS software suite. To ensure data quality, data will be checked for outliers in the R statistical program, and any outliers will be checked for transcription errors.

As the data will be generated, processed and analysed by different project team members, I will recommend the project team members to name the data file by using their name initials, date and version, e.g. LGH_20150801_v1.

(Adapted from: Cleland, E., Lipson, D., & Kim, J. The influence of plant functional types on ecosystem responses to altered rainfall. ‍Retrieved Nov 24, 2015, ‍‍from UC ‍San ‍Diego Sample ‍NSF Data ‍Management ‍Plans ‍website‍: ‍https://library.ucsd.edu/research-and-collections/research-data/_files/dmpsample/DMP-Example-Cleland.pdf)‍

SAMPLE 2:

Interviews conducted will be recorded using digital recorders. The interview recordings will be transcribed and then translated. Both transcripts and translations will be saved in Microsoft Word documents. There will be two Microsoft Word documents for each interview: one in the original Luganda language and the other translated to English. The English translated interview will be coded by using the ethnographic software.

The raw data will all be stored in a folder titled “Raw data_YYYYMMDD”; the processed or analysed data will be kept at different folders by data type, e.g. all audio recordings will be saved in the same folder and video recordings will be stored at another folder. We will be using the following file-naming convention for each data file and folder:

- - data file name: Subject_v1 (e.g. interview_v1)
  - folder name: datatype_v1_YYYYMMDD (e.g. audiorecordings_v1_20151120)

SAMPLE 3:

New data will be appended to existing time series in the MS SQL database. Aggregation of the data to state economic regions will be done to generate reports based on regions. Estimates/Projections will be calculated and reported. Website will be provided for users to view charts, maps, and tables that are dynamically created via an automated process that pulls data directly from the MS SQL database.

JISC has provided a guide on choosing a file name. We will name our data files based on the recommendations available in this website. All data files will be stored in different folders organised by researchers’ initials and date.

(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

SAMPLE 4:

The raw data of audio files will be normalized and cleaned up, then transcribed using a transcription software, ideally Elan. The audio and the transcription are synchronized. New audio recordings will be added each year throughout the project timeline (2015 – 2020).

The data will be organised and stored in different folders with the following file-naming convention: Subjectkeyword_V2_YYYYMMDD; Subjectkeyword_V2_YYYYMMDD.

(Adapted from: Tancheva, Kornelia (2012) “Linguistics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 7.
http://dx.doi.org/10.5703/1288284315007)

SAMPLE 5:

Experiment will capture videos of the 200ms-long process and physical samples of the mixture at different stages of the process. Samples will be separated by chromatography machines.

The data will be analysed which involves generating proprietary files for processing software and convenient printable formats for manually examining the data, for example Excel spreadsheets or PDF files. The pressure trace graphs and chromatographs will be the focus of analysis. Chromatograms will be interpreted for Clarity software. Some graphs on Arrhenius plots and concentration plots will be generated using Origin software. The video from the experiment will be used primarily for verification that the experiment ran correctly. Video stills will be generated from the video files and merged with some graphs using Photoshop.

Data cleansing (e.g. removing outliers, missing data interpolation) will be performed to improve the data quality. Data quality will also be ensured by repeated samples.

We will store all the data in a shared drive and will name each file by the following file-naming convention:

- - 20140603_MAEProject_DesignDocument_Tan_v2-01.docx
  - 20140809_MAEProject_MasterData_Daniel_v1-00.xlsx
  - 20140825_MAEProject_Ex1Test1_Data_Jason_v3-03.xlsx
  - 20141023_MAEProject_ProjectMeetingNotes_Kumar_v1-00.docx

(Adapted from:

- 1. Kashyap, Nabil (2011) “Aerospace Engineering / Chemical Kinetics – University of Michigan,” Data Curation Profiles Directory: Vol. 3, Article 1. http://dx.doi.org/10.5703/1288284314989
- 1. Brandt, S. (29 July 2015). Data Management for Undergraduate Researchers: File Naming Conventions, from http://guides.lib.purdue.edu/c.php?g=353013&p=2378293)

SAMPLE 6:

Data will be generated by subjecting plant samples to analysis using coupled Gas Chromatography- Mass Spectrophotometry (GC-MS).The data will then be analysed using the instrument specific proprietary software to measure the area underneath the peaks for specific known Volatile Organic Compounds (VOCs). The peak area data will be entered into an Excel spreadsheet along with the field survey data. Statistical analysis of the data will be performed using StatView to prepare the tables and graphs for the research.

All data columns that refer to Master Data will be validated for its consistency check to ensure quality. Analytical data quality will be tested using appropriate tests.

We have not decided on how the data files will be organised yet. However, we will follow the file naming conventions recommended by the Stanford University Libraries to name our data files.

(Adapted from: Wright, Sarah J. (2012) “Environmental Science / Herbivory – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 3. http://dx.doi.org/10.5703/1288284315002)

SAMPLE 7:

Motion capture markers of the system will be attached to various parts of the body, usually the joints. The data will be moved to Excel for automated and filtering to removing errors and noise that occur due to the system being sensitive to light (e.g. reflections) and motion marker occlusion. More automatic threshold- based filtering will be carried out along with visual review of the data and manual cleaning. This process will take place in Matlab and the data will eventually be converted to represent several variables (e.g. angle data, displacement velocity, or acceleration of joint segments). The data will then be aggregated across subjects and will be stored in an Excel spreadsheet.

The precise placement of markers is very important for the quality of the data and its reliability. About 40 markers on each subject will be used.

The data will be organised through a file folder system where each trial will be documented in a single spreadsheet, and all the files from particular study will be stored in the same folder structure

SAMPLE 8:

Traffic flow data will be collected using sensors and video cameras. The road sensors placed in each lane of traffic will record the status of the intersection ( that the light is red, yellow, or green). Data from the sensors will be FTP-ed out on an hourly basis as compressed files. Data will be processed, normalized and reformatted from the vendor’s proprietary format into .csv and then into Microsoft Excel. Video of the traffic sites will be taken for data verification purposes and to ensure quality. The video gathered will be parsed out into .gif or .jpg images at the rate of 20 frames per second.

The data files will be primarily organized by date.

(Adapted from:Carlson, Jake R. (2009) “Traffic Flow – Purdue University,” Data Curation Profiles Directory: Vol. 1, Article 4. http://dx.doi.org/10.5703/1288284315016)

File Formats & Software/Tools

a. Check the relevant file format(s) that you will be using (you may choose more than one):

b. What software(s) and/or tool(s) is/are needed to process/read the file(s)?

c. Where can this/these software(s) and/or tool(s) be obtained?

File formats affect one’s ability to use and re-use data in the future.
Strive to use a data format that is easy to read and easy to manipulate in a variety of commonly-used operating systems and programs.
Non-proprietary (‘open’) formats are also recommended to enhance accessibility.
For specialized data formats, provide information on the name, supplier information (if applicable) and version number to obtain the software(s) and/or tools to read your data.

Additional Information:

Learn more about file formats:
- File formats (Source: Australian National Data Service)
- Recommended formats (Source: UK Data Service)
- File formats table (Source: UK Data Service)

DMP Question 4

Confidentiality, Privacy & Security of Data

If your data is sensitive, how will you be managing and using it?

Sensitive data is generally considered to be:
- Identifiable data that can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention. (Source: Australian National Data Services)
- Proprietary data, which is not generally known or accessible and which gives competitive advantage to its owner. This include research with commercialization potential, and information that have terms of use attached to it. (Source: Other Intellectual Property: Trademarks, Patents and Proprietary Information, Stanford University)
- Restricted or confidential data with contractual (e.g. Research Collaboration Agreements, Project Agreements, Material Transfer Agreements, Non-Disclosure Agreements) or legal obligations (e.g. Official Secrets Act).
If your project involves the use of sensitive data, PIs are advised to use the form titled ‘Undertaking to safeguard confidential research information and data’ in ServiceNow to trigger an undertaking statement that can be sent to your relevant research team members for their acknowledgement.
Refer to the NTU Research Data Classification level(s) for guidance on handling sensitive data.
State appropriate security measures that you will be taking.
Consider how you will protect the identity of participants, e.g., via anonymisation or using managed access procedures.
Describe the process of providing security to the data and files from unauthorized access or security breaches.
Investigators carrying out research involving human participants should request consent to preserve and share the anonymised data if possible. Do not just ask for permission to use the data in your study or make unnecessary promises to delete it at the end if you are planning to preserve or share the data.
Data or information not originating from you or your project will have terms of use attached to it. These data should be considered sensitive in nature unless informed otherwise.
Consider potential commercialization of the research that is being carried out to ensure that there will be no issue with filing for intellectual property protection later on.

Additional Information:

See ‘Sensitive data: publishing and sharing’ for more guidance on how to manage and share sensitive data. (Source: Australian National Data Services)
See ‘Data Security’ in UK Data Service for more details on physical data security, network security, security of computer systems and files, etc.
See UK Data Service guidance on consent for data sharing.
See ICPSR approach to confidentiality and Health Insurance Portability and Accountability Act (HIPAA) regulations for health research.

SAMPLE 1:

I have sensitive data as it will contain personal data.

The research will include data from subjects being screened for STDs. The final dataset will include self-reported demographic and behavioural data from interviews and laboratory data from urine specimens. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers, there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and documentation available only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate technology; and (3) a commitment to destroying or returning the data after analyses are completed.

(Adapted from: NIH ‍Data ‍Sharing ‍Policy ‍and ‍Implementation ‍Guidance. ‍(9 ‍February ‍2012), ‍from ‍http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#ex)

SAMPLE 2:

I have sensitive data as it is national security related.

Access to research records will be limited to primary research team members. Recorded data will have any identifying information removed and will be relabelled with study code numbers. A database which relates study code numbers to consent forms and identifying information will be stored separately on password-protected computers in a secured, locked office. To maintain the privacy of the participants, any report of individual data will only consist of performance measures without any demographic or identifying information.

(Adapted from: Collaborative Research in Computational Neuroscience (CRCNS): Innovative Approaches to Science and Engineering Research on Brain Function. Retrieved Nov 24, 2015, from UC San Diego Sample NSF Data Management Plans website: http://libraries.ucsd.edu/services/data-curation/data-management/dmpsample/DMP-Example-Psych.doc)

DMP Question 5

Access & Usage Restrictions

Will there be restrictions on accessing and sharing your final research data?

Is the data you propose to collect (or existing data you propose to use) in the study suitable for sharing? Consider copyright ownership, consent agreement from subjects, data sharing agreements or any other agreements with external collaborators and parties, e.g. non-disclosure or proprietary use of the data. For multi-partner projects, IPR ownership should be covered in the consortium agreement.
If you are unable to make your final research data available to others, you must state the reasons, e.g. patentable data, etc.
According to the NTU Research Data Policy, the final research data from projects carried out at NTU shall be made available for sharing.
Refer to the NTU Research Data Classification level(s) for guidance on accessing and sharing research data.

Additional Information:

A licence is a document that clearly sets out how the data can be used and attributed to the original data owner.
Without a licence, it is unclear how your data can be reused and this may discourage the potential re-user.
The final research data is the final version of data that exists during the last stage in the data lifecycle in which all re-workings and manipulations of the data by the researcher have ceased.
The final research data also refers to the recorded factual materials commonly accepted by the scientific community as necessary to document, support, and validate research findings. Final research data does not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory.
Visit the ‘How to License Research Data’ in the Digital Curation Centre website to learn more about the why and how of research data licensing. AusGOAL (Australian Governments Open Access and Licensing Framework) also offers a good research data FAQ.
Types of Creative Commons licenses.
See EUDAT’s data and software licensing wizard.

SAMPLE 1:

I will share my final data under the CC-BY-NC Creative Commons (CC) license.

SAMPLE 2:

I will not be applying any Creative Commons license but will instead be imposing the following restriction to the sharing of my final data: not open sharing but on a private individual basis.

My reasons are: There are certain terms in the agreement that I sign with a third party that do not allow me to openly share some of my data. Anyone who is interested in my data could write to me at email: abd@yahoo.com and I would see what I can share based on his/her needs.

SAMPLE 3:

I will not be able to share my final data.

My reasons are: Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data.

DMP Question 6

Metadata & Standards

What metadata and/or data standards will you be using to describe your data?

The term metadata is commonly defined as “data about data,” information that describes or contextualises the data.
Metadata helps to place your dataset in a broader context, allowing those outside your institution, discipline, or software environment to understand how to interpret your data. (Source: MANTRA)
You are strongly encouraged to use community standards to describe and structure data, where these are in place. The Digital Curation Centre (DCC) offers a catalogue of disciplinary metadata standards.
If you are using a specific metadata scheme or standard, please state what it is and provide the references.
If you are not using a specific metadata scheme or standard, describe the type of metadata (e.g. descriptive, structural, administrative, etc.) you will be providing, if any.

Additional Information:

Three broad categories of metadata are:
- Descriptive – common fields such as title, author, abstract, keywords which help users to discover online sources through searching and browsing.
- Administrative – preservation, rights management, and technical metadata about formats.
- Structural – how different components of a set of associated data relate to one another, such as a schema describing relations between tables in a database, variable list, directory and file listing and taxonomy.
The difference between documentation (refer to DMP question 7) and metadata is that the first is meant to be read by humans and the second implies computer-processing (though metadata may also be human-readable).
Metadata may not be required if you are working alone on your own computer, but become crucial when data are shared online. Your data management plan should determine whether you need to apply metadata descriptors or tags at some point during your project.

SAMPLE 1:

I will not be using any metadata or international standard for the data collected and generated for this project. However, I will ensure each document that I have created using the Microsoft Word, Microsoft Excel and Microsoft PowerPoint has sufficient basic information such as Author’s name, Title, Subject, Keywords and etc. in the document properties. In addition, a separate readme file will be prepared to describe the details of each data. I will be applying the recommendations provided by Cornell University in the creation of readme file(s). Key elements could include: introductory information about the data, methodological, date-specific and sharing/access related information.

SAMPLE 2:

The clinical data collected from this project will be documented using CDASH v1.1 standards. The standard is available at CDISC website.

SAMPLE 3:

Using an electronic lab notebook, we would be generating metadata along with each notebook and postings. The metadata would include Sections, Categories and Keys which would be assigned by collaborators for reuse so as to maintain consistency in the use of terminology. We would also be using the Properties Ontology (ChemAxiomProp) when describing the chemical and materials properties.

SAMPLE 4:

Metadata about timing and exposure of individual images will be automatically generated by the camera. GPS locations will subsequently be added by post-processing GPS track data based on shared time stamps. Metadata for the image dataset as a whole will be generated by the image management software (iMatch) and will include time ranges, locations, and a taxon list. Those metadata will be translated into Ecological Metadata Language (EML), created using the Morpho software tool, and will include location and taxonomic summaries.

(Adapted from: Hampton, S. Examples of Data Management Plans. Retrieved Nov 24, 2015, from DataOne website: https://www.dataone.org/sites/all/documents/ESA11_SS3_hampton.pdf)

SAMPLE 5:

We will be using some core elements from the TEI metadata standards to describe our data. We will also be adding some customised elements in the metadata to provide more details on the rights management.

SAMPLE 6:

The data will be stored in several tables in an MS SQL database, which also includes some “metatables” that describe the original source of various tables and variables. These metatables will also include configuration information for the public website, such as short and long names for variables, numeric format, colours for mapping, etc. Several standard Census variables, ref: Office of National Statistics will also be used.

(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

DMP Question 7

Data Documentation

What documentation will you be providing to facilitate a better understanding of the project data?

List the documentation that you would be providing to explain how the data is to be interpreted and used. Examples of documentation:
- codebooks
- lay summary
- readme
- data dictionaries
- electronic lab notebook
Content of documentation could include the following:
- Methodology and procedures used to collect the data
- Details about codes
- Definitions of variables
- Variable field locations
- Frequencies

Additional Information:

Visit the ‘Document your data’ by UK Data Service for more guidance on the different types of documentation.
Visit the ‘How to write a lay summary’ by Digital Curation Centre for more guidance on why lay summaries are important, how to write one and examples.

SAMPLE 1:

I would be providing the following accompanying documentation to facilitate a better understanding of the project data.

lay summary
readme.txt

others: I will also be writing a journal article to share the research data management aspect of my project. The paper would be made available on DR-NTU later.

DMP Question 8

Data Storing

Where and how are you storing the data during the project?

List the platforms and devices that will be used to store the data, e.g. electronic lab notebook, Sharepoint, WordPress.
Consider institutional data security policies.
Identify the location where the data would be stored, e.g. school server. Provide the URLs of online locations.
Provide names of people performing data storage roles.

Additional Information:

See ‘Storing Your Data’ (Source: UK Data Service)
See ‘Keeping Research Data Safe’ in MANTRA for more guidance.
See UK Data Service Guidance on data storage.
See DataONE for Best Practices for storage.

SAMPLE 1:

I will be using a networked storage drive XXX, which is a storage for active data for all research staff and students. It is fully backed-up, secure, resilient, and has multi-site storage. It is accessible via VPN (Virtual Private Network) from outside the University. I will also be using an external storage device such as encrypted portable hard disks as additional back-up. Researcher ABC would be coordinating and overall-in-charge for data storage.

SAMPLE 2:

The data will be stored locally on a secure password-protected data server. One set of hard drives and one set of tapes will be stored in XXX building. A second set of hard drives and a second set of tapes will be stored at a XXX building. All data will be back up on a daily basis by XXX (researcher).

SAMPLE 3:

The data (on staff computers and the web server) will be managed according to the standard practices of the college’s IT department and will be password protected. Any restricted, non-public data will be stored on CRADC (Cornell Restricted Access Data Center). All files will be backed up every day by xxx (project team member).

DMP Question 9

Backup & Versioning Control

What backup and versioning control procedures will you be undertaking?

Describe the backup and archiving regime you will use to back up all your data to prevent its loss, e.g. through hard disk failure, virus infection or theft.
Describe the method you will use to ensure that different versions of your data are identifiable and properly controlled and used.
How will the data be backed up? i.e. how often, to where, how many copies, is this automated
Provide names of the people performing data backup roles.

Additional Information:

Storing data on laptops, computer hard drives or external storage devices alone is very risky. The use of robust, managed storage with automatic backup, for example that provided by university IT team, is preferable.
See ‘Backing-up Data’ (Source: UK Data Service)
See ‘Keeping Research Data Safe’ in MANTRA for more guidance.
3-2-1 principle of backup (Source: BACKBLAZE)
- Keep 3 copies of any important file
- Store files on at least 2 different media types
- Keep at least 1 copy offsite
- Learn more about versioning control:
  - Version control and authenticity (Source: UK Data Service)

SAMPLE 1:

A complete copy of materials will be generated and stored independently on primary and backup sources for both the PI and Co-PI (as data are generated) and with all members of the Expert Panel every 6 months. The project team will be adopting the Version Control guidelines provided by National Institute of Dental and Craniofacial Research to organise and ensure different versions of the data are identifiable and properly controlled and use.

SAMPLE 2:

We will adopt and use the version control standards recommended by University of Leicester for the transcripts of the interviews and coding in terms of changes the research team has made to the files.

SAMPLE 3:

We will be using Mercurial, a free, distributed source control management tool to manage the data, so that the data would easily be identifiable and properly controlled and used.

SAMPLE 4:

All data will be backed up manually on monthly basis by researcher xxx on a computer hard drive kept at the research team office. The computer will be password protected and only team members will be given the password and right to access the computer. Incremental back-ups will be performed nightly and full back-ups will be performed monthly. Staff xxxx will be keeping versions by appending the date of the update to the file name. Versions of the file that have been revised due to errors/updates will be retained in an archive system. A revision history document will describe the revisions made.

(Adapted from: NSF General: Mauna Loa example. Retrieved from Data Management Planning website: https://www.dataone.org/sites/all/documents/DMP_MaunaLoa_Formatted.pdf)

DMP Question 10

Question 10
Guide

Long-term Storage & Preservation

a. NTU Research Data Policy requires you to retain your research data for a minimum of 10 years. Where will you be depositing the data after the completion of your research project? (You may choose more than one)

b. Is there any data that will not be deposited in any data repository (ies) mentioned in question 10a?

Guide for 10a and b:

An open access data repository must be actively managed in order to:

1. enable access to the dataset
2. ensure dataset persistence
3. ensure dataset stability
4. enable searching and retrieval of datasets
5. collect information about repository statistics

(Source: Callaghan, S., Tedds, J., Kunze, J., et al. (2014). Guidelines on recommending data repositories as partners in publishing research data. International Journal of Digital Curation, 9(1), 152-163. doi:10.2218/ijdc.v9i1.309)

Use the NTU research data repository, DR-NTU (Data) to store and preserve the final version of your final research data for long-term access.
Preservation of digital outputs is necessary in order for the research data to endure changes in the technological environment and remain potentially re-usable in the future.
If you have proprietary format research data, try to also deposit the same data in a non-proprietary or 'open' format to make it easier for future reuse of your data.
NTU researchers should deposit their final research data in DR-NTU (Data) where possible. If you have some of your research data that are made available elsewhere, deposit the rest of your research data in DR-NTU (Data) and include a link in your DR-NTU (Data) dataset to the externally shared dataset.
When uploading your research data in DR-NTU (Data), please ensure that you do not infringe upon the copyrights or other intellectual property rights, including, but not limited to patent, trademark, trade secret, copyright, right of publicity or other right of any third party; you have been given all relevant, obligatory, and applicable approvals for posting such materials with the content included and in the format uploaded, including but not limited to approvals from the Institutional Review Board and third parties with whom you have relevant contractual obligations; and uploads must be void of all identifiable information, such that re-identification of any subjects from the amalgamation of the information available from all of the materials (across datasets and dataverses) uploaded under any one author and/or you should not be possible. Specifically, uploads cannot contain identity or social security numbers; credit card numbers; medical record numbers; health plan numbers; other account numbers of individuals; or biometric identifiers (fingerprints, retina, voice).

Additional Information for 10a and b:

An international list of data repositories is available via Re3data.
Some universities or publishers provide lists of recommendations, e.g. PLOS ONE recommended repositories.