The NTU DMP template v2 (15 Jan 2018 - 17 Jun 2020) is no longer in use.
DMP data for active projects has been mapped to DMP template v3 and migrated to RISE (Research Information System).
For earlier offline versions of DMP template, please see below:
The following is a compilation of the NTU DMP version 2 template questions, guides and samples:
Types and Size of data
a. What data will you be collecting or creating?
b. What is the estimated size of the project data?
Guide for a.:
Additional Information:
Guide for b.:
SAMPLE 1:
Class observation data, faculty interview data and student survey data will be collected. The data will be collected during the research period (Jan 2013 – Dec 2013). Most of the data will be in text format (notes, paper survey).
(Adapted from: Cmor, D., & Marshall, V. (2006). Librarian Class Attendance: Methods, Outcomes and Opportunities. 27th Annual IATUL Conference.)
SAMPLE 2:
Experimental and observational data in physical paper format will be collected. These are data related to production and decomposition, ecophysiological functional traits, soil extractable nutrients and mineralization rates.
As these original data in physical paper format will be used to identify outliers and possible transcription errors, the physical paper copies will be kept for at least 10 years.
(Adapted from: Cleland, E., Lipson, D., & Kim, J. The influence of plant functional types on ecosystem responses to altered rainfall. Retrieved Nov 24, 2015, from UC San Diego Sample NSF Data Management Plans website: https://library.ucsd.edu/research-and-collections/research-data/_files/dmpsample/DMP-Example-Cleland.pdf)
SAMPLE 3:
Experimental lab data will be collected using microscope. The data generated will be time- and location- stamped image files of natural resources in Delaware County, PA. The images will be served as a record of the occurrence of creatures, natural artefacts, and conditions at specific places and times during the period 2003 through 2011.
For many of the photos, taxonomic information and metadata will also be available. The occurrence data will be observational and qualitative. Metadata files shall be retained to facilitate reuse.
(Adapted from: Hampton, S. Examples of Data Management Plans. Retrieved Nov 24, 2015, from DataOne website: https://www.dataone.org/sites/all/documents/ESA11_SS3_hampton.pdf)
SAMPLE 4:
Recorded oral interviews from 30 residents will be collected at the Nnindye community located in the Mpigi district in Uganda over a period of 6 months in the form of photos and videos.
(Adapted from: Sapp Nelson, Megan and Beavis, Katherine (2013) “History / Sustainable Development – Purdue University,” Data Curation Profiles Directory: Vol. 5, Article 1. http://dx.doi.org/10.7771/2326-6651.1032 )
SAMPLE 5:
The primarily public data from 2000 to 2015 from the US Census Bureau will be acquired. Some preliminary (non-public) Census data, and some other sources, e.g. the US Bureau of Labour Statistics, and New York State Dept of Health will also be purchased and gathered.
(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)
SAMPLE 6:
Primary data of audio files including Cheyenne and English language will be collected. Text files are generated after the files are transcribed.
(Adapted from: Tancheva, Kornelia (2012) “Linguistics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 7.
http://dx.doi.org/10.5703/1288284315007)
SAMPLE 7:
Sensor data, images and possibly 3rd party data (weather and road conditions) will be collected. Data is saved as excel spreadsheets and in SQL database.
(Adapted from: Carlson, Jake R. (2009) “Traffic Flow – Purdue University,” Data Curation Profiles Directory: Vol. 1, Article 4. http://dx.doi.org/10.5703/1288284315016)
SAMPLE 8:
Experimental data will be generated from pressure sensors using Labview and generated from chromatographs. They includes variety of files including text, video specific to the equipment involved.
(Adapted from: Kashyap, Nabil (2011) “Aerospace Engineering / Chemical Kinetics – University of Michigan,” Data Curation Profiles Directory: Vol. 3, Article 1. http://dx.doi.org/10.5703/1288284314989)
SAMPLE 9:
Field data from survey & bioessays will be collected using excel spreadsheet. Raw data of samples from lab will be collected using proprietary instrument. Ancillary data includes GIS data.
(Adapted from: Wright, Sarah J. (2012) “Environmental Science / Herbivory – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 3. http://dx.doi.org/10.5703/1288284315002)
SAMPLE 10:
Quantitative data will be collected using motion capture system. The processed data types will include Matlab files, MS Excel files, codebook texts, and graphical files.
(Adapted from: Cragin, Melissa; Kogan, Marina; and Collie, Aaron (2011) “Bio-Mechanics Motion Studies – University of Illinois Urbana-Champaign,” Data Curation Profiles Directory: Vol. 3, Article 6. http://dx.doi.org/10.5703/1288284314998)
Collection Methods and Data Acquisition
How will the data be collected and processed?
Additional Information:
SAMPLE 1:
Most datasets will be collected 1-3 times per year for a period of 3 years. Temperature, light availability and soil moisture at multiple depths in the experiment will be logged every 15 minutes. These data will be stored on local data loggers and downloaded every two weeks.
Data originally recorded on paper will be transferred into spreadsheets using .csv formats. DGVM simulation runs will be performed on a high performance parallel computing platform, a 96-node Linux cluster, maintained jointly by USFS Pacific Northwest Research Station and Oregon State University. DGVM output will be analysed and displayed with the ESRI ArcGIS software suite. To ensure data quality, data will be checked for outliers in the R statistical program, and any outliers will be checked for transcription errors.
As the data will be generated, processed and analysed by different project team members, I will recommend the project team members to name the data file by using their name initials, date and version, e.g. LGH_20150801_v1.
(Adapted from: Cleland, E., Lipson, D., & Kim, J. The influence of plant functional types on ecosystem responses to altered rainfall. Retrieved Nov 24, 2015, from UC San Diego Sample NSF Data Management Plans website: https://library.ucsd.edu/research-and-collections/research-data/_files/dmpsample/DMP-Example-Cleland.pdf)
SAMPLE 2:
Interviews conducted will be recorded using digital recorders. The interview recordings will be transcribed and then translated. Both transcripts and translations will be saved in Microsoft Word documents. There will be two Microsoft Word documents for each interview: one in the original Luganda language and the other translated to English. The English translated interview will be coded by using the ethnographic software.
The raw data will all be stored in a folder titled “Raw data_YYYYMMDD”; the processed or analysed data will be kept at different folders by data type, e.g. all audio recordings will be saved in the same folder and video recordings will be stored at another folder. We will be using the following file-naming convention for each data file and folder:
(Adapted from: Sapp Nelson, Megan and Beavis, Katherine (2013) “History / Sustainable Development – Purdue University,” Data Curation Profiles Directory: Vol. 5, Article 1. http://dx.doi.org/10.7771/2326-6651.1032)
SAMPLE 3:
New data will be appended to existing time series in the MS SQL database. Aggregation of the data to state economic regions will be done to generate reports based on regions. Estimates/Projections will be calculated and reported. Website will be provided for users to view charts, maps, and tables that are dynamically created via an automated process that pulls data directly from the MS SQL database.
JISC has provided a guide on choosing a file name. We will name our data files based on the recommendations available in this website. All data files will be stored in different folders organised by researchers’ initials and date.
(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)
SAMPLE 4:
The raw data of audio files will be normalized and cleaned up, then transcribed using a transcription software, ideally Elan. The audio and the transcription are synchronized. New audio recordings will be added each year throughout the project timeline (2015 – 2020).
The data will be organised and stored in different folders with the following file-naming convention: Subjectkeyword_V2_YYYYMMDD; Subjectkeyword_V2_YYYYMMDD.
(Adapted from: Tancheva, Kornelia (2012) “Linguistics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 7.
http://dx.doi.org/10.5703/1288284315007)
SAMPLE 5:
Experiment will capture videos of the 200ms-long process and physical samples of the mixture at different stages of the process. Samples will be separated by chromatography machines.
The data will be analysed which involves generating proprietary files for processing software and convenient printable formats for manually examining the data, for example Excel spreadsheets or PDF files. The pressure trace graphs and chromatographs will be the focus of analysis. Chromatograms will be interpreted for Clarity software. Some graphs on Arrhenius plots and concentration plots will be generated using Origin software. The video from the experiment will be used primarily for verification that the experiment ran correctly. Video stills will be generated from the video files and merged with some graphs using Photoshop.
Data cleansing (e.g. removing outliers, missing data interpolation) will be performed to improve the data quality. Data quality will also be ensured by repeated samples.
We will store all the data in a shared drive and will name each file by the following file-naming convention:
(Adapted from:
SAMPLE 6:
Data will be generated by subjecting plant samples to analysis using coupled Gas Chromatography- Mass Spectrophotometry (GC-MS).The data will then be analysed using the instrument specific proprietary software to measure the area underneath the peaks for specific known Volatile Organic Compounds (VOCs). The peak area data will be entered into an Excel spreadsheet along with the field survey data. Statistical analysis of the data will be performed using StatView to prepare the tables and graphs for the research.
All data columns that refer to Master Data will be validated for its consistency check to ensure quality. Analytical data quality will be tested using appropriate tests.
We have not decided on how the data files will be organised yet. However, we will follow the file naming conventions recommended by the Stanford University Libraries to name our data files.
(Adapted from: Wright, Sarah J. (2012) “Environmental Science / Herbivory – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 3. http://dx.doi.org/10.5703/1288284315002)
SAMPLE 7:
Motion capture markers of the system will be attached to various parts of the body, usually the joints. The data will be moved to Excel for automated and filtering to removing errors and noise that occur due to the system being sensitive to light (e.g. reflections) and motion marker occlusion. More automatic threshold- based filtering will be carried out along with visual review of the data and manual cleaning. This process will take place in Matlab and the data will eventually be converted to represent several variables (e.g. angle data, displacement velocity, or acceleration of joint segments). The data will then be aggregated across subjects and will be stored in an Excel spreadsheet.
The precise placement of markers is very important for the quality of the data and its reliability. About 40 markers on each subject will be used.
The data will be organised through a file folder system where each trial will be documented in a single spreadsheet, and all the files from particular study will be stored in the same folder structure
(Adapted from: Cragin, Melissa; Kogan, Marina; and Collie, Aaron (2011) “Bio-Mechanics Motion Studies – University of Illinois Urbana-Champaign,” Data Curation Profiles Directory: Vol. 3, Article 6. http://dx.doi.org/10.5703/1288284314998)
SAMPLE 8:
Traffic flow data will be collected using sensors and video cameras. The road sensors placed in each lane of traffic will record the status of the intersection ( that the light is red, yellow, or green). Data from the sensors will be FTP-ed out on an hourly basis as compressed files. Data will be processed, normalized and reformatted from the vendor’s proprietary format into .csv and then into Microsoft Excel. Video of the traffic sites will be taken for data verification purposes and to ensure quality. The video gathered will be parsed out into .gif or .jpg images at the rate of 20 frames per second.
The data files will be primarily organized by date.
(Adapted from:Carlson, Jake R. (2009) “Traffic Flow – Purdue University,” Data Curation Profiles Directory: Vol. 1, Article 4. http://dx.doi.org/10.5703/1288284315016)
File Formats & Software/Tools
a. Check the relevant file format(s) that you will be using (you may choose more than one):
b. What software(s) and/or tool(s) is/are needed to process/read the file(s)?
c. Where can this/these software(s) and/or tool(s) be obtained?
Additional Information:
Confidentiality, Privacy & Security of Data
If your data is sensitive, how will you be managing and using it?
If your project involves the use of sensitive data, PIs are advised to use the form titled ‘Undertaking to safeguard confidential research information and data’ in ServiceNow to trigger an undertaking statement that can be sent to your relevant research team members for their acknowledgement.
Additional Information:
SAMPLE 1:
I have sensitive data as it will contain personal data.
The research will include data from subjects being screened for STDs. The final dataset will include self-reported demographic and behavioural data from interviews and laboratory data from urine specimens. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers, there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and documentation available only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate technology; and (3) a commitment to destroying or returning the data after analyses are completed.
(Adapted from: NIH Data Sharing Policy and Implementation Guidance. (9 February 2012), from http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#ex)
SAMPLE 2:
I have sensitive data as it is national security related.
Access to research records will be limited to primary research team members. Recorded data will have any identifying information removed and will be relabelled with study code numbers. A database which relates study code numbers to consent forms and identifying information will be stored separately on password-protected computers in a secured, locked office. To maintain the privacy of the participants, any report of individual data will only consist of performance measures without any demographic or identifying information.
(Adapted from: Collaborative Research in Computational Neuroscience (CRCNS): Innovative Approaches to Science and Engineering Research on Brain Function. Retrieved Nov 24, 2015, from UC San Diego Sample NSF Data Management Plans website: http://libraries.ucsd.edu/services/data-curation/data-management/dmpsample/DMP-Example-Psych.doc)
Access & Usage Restrictions
Will there be restrictions on accessing and sharing your final research data?
Additional Information:
SAMPLE 1:
I will share my final data under the CC-BY-NC Creative Commons (CC) license.
SAMPLE 2:
I will not be applying any Creative Commons license but will instead be imposing the following restriction to the sharing of my final data: not open sharing but on a private individual basis.
My reasons are: There are certain terms in the agreement that I sign with a third party that do not allow me to openly share some of my data. Anyone who is interested in my data could write to me at email: abd@yahoo.com and I would see what I can share based on his/her needs.
SAMPLE 3:
I will not be able to share my final data.
My reasons are: Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data.
Metadata & Standards
What metadata and/or data standards will you be using to describe your data?
Additional Information:
SAMPLE 1:
I will not be using any metadata or international standard for the data collected and generated for this project. However, I will ensure each document that I have created using the Microsoft Word, Microsoft Excel and Microsoft PowerPoint has sufficient basic information such as Author’s name, Title, Subject, Keywords and etc. in the document properties. In addition, a separate readme file will be prepared to describe the details of each data. I will be applying the recommendations provided by Cornell University in the creation of readme file(s). Key elements could include: introductory information about the data, methodological, date-specific and sharing/access related information.
SAMPLE 2:
The clinical data collected from this project will be documented using CDASH v1.1 standards. The standard is available at CDISC website.
SAMPLE 3:
Using an electronic lab notebook, we would be generating metadata along with each notebook and postings. The metadata would include Sections, Categories and Keys which would be assigned by collaborators for reuse so as to maintain consistency in the use of terminology. We would also be using the Properties Ontology (ChemAxiomProp) when describing the chemical and materials properties.
SAMPLE 4:
Metadata about timing and exposure of individual images will be automatically generated by the camera. GPS locations will subsequently be added by post-processing GPS track data based on shared time stamps. Metadata for the image dataset as a whole will be generated by the image management software (iMatch) and will include time ranges, locations, and a taxon list. Those metadata will be translated into Ecological Metadata Language (EML), created using the Morpho software tool, and will include location and taxonomic summaries.
(Adapted from: Hampton, S. Examples of Data Management Plans. Retrieved Nov 24, 2015, from DataOne website: https://www.dataone.org/sites/all/documents/ESA11_SS3_hampton.pdf)
SAMPLE 5:
We will be using some core elements from the TEI metadata standards to describe our data. We will also be adding some customised elements in the metadata to provide more details on the rights management.
SAMPLE 6:
The data will be stored in several tables in an MS SQL database, which also includes some “metatables” that describe the original source of various tables and variables. These metatables will also include configuration information for the public website, such as short and long names for variables, numeric format, colours for mapping, etc. Several standard Census variables, ref: Office of National Statistics will also be used.
(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)
Data Documentation
What documentation will you be providing to facilitate a better understanding of the project data?
Additional Information:
SAMPLE 1:
I would be providing the following accompanying documentation to facilitate a better understanding of the project data.
others: I will also be writing a journal article to share the research data management aspect of my project. The paper would be made available on DR-NTU later.
Data Storing
Where and how are you storing the data during the project?
Additional Information:
SAMPLE 1:
I will be using a networked storage drive XXX, which is a storage for active data for all research staff and students. It is fully backed-up, secure, resilient, and has multi-site storage. It is accessible via VPN (Virtual Private Network) from outside the University. I will also be using an external storage device such as encrypted portable hard disks as additional back-up. Researcher ABC would be coordinating and overall-in-charge for data storage.
SAMPLE 2:
The data will be stored locally on a secure password-protected data server. One set of hard drives and one set of tapes will be stored in XXX building. A second set of hard drives and a second set of tapes will be stored at a XXX building. All data will be back up on a daily basis by XXX (researcher).
SAMPLE 3:
The data (on staff computers and the web server) will be managed according to the standard practices of the college’s IT department and will be password protected. Any restricted, non-public data will be stored on CRADC (Cornell Restricted Access Data Center). All files will be backed up every day by xxx (project team member).
Backup & Versioning Control
What backup and versioning control procedures will you be undertaking?
Additional Information:
SAMPLE 1:
A complete copy of materials will be generated and stored independently on primary and backup sources for both the PI and Co-PI (as data are generated) and with all members of the Expert Panel every 6 months. The project team will be adopting the Version Control guidelines provided by National Institute of Dental and Craniofacial Research to organise and ensure different versions of the data are identifiable and properly controlled and use.
SAMPLE 2:
We will adopt and use the version control standards recommended by University of Leicester for the transcripts of the interviews and coding in terms of changes the research team has made to the files.
SAMPLE 3:
We will be using Mercurial, a free, distributed source control management tool to manage the data, so that the data would easily be identifiable and properly controlled and used.
SAMPLE 4:
All data will be backed up manually on monthly basis by researcher xxx on a computer hard drive kept at the research team office. The computer will be password protected and only team members will be given the password and right to access the computer. Incremental back-ups will be performed nightly and full back-ups will be performed monthly. Staff xxxx will be keeping versions by appending the date of the update to the file name. Versions of the file that have been revised due to errors/updates will be retained in an archive system. A revision history document will describe the revisions made.
(Adapted from: NSF General: Mauna Loa example. Retrieved from Data Management Planning website: https://www.dataone.org/sites/all/documents/DMP_MaunaLoa_Formatted.pdf)
Long-term Storage & Preservation
a. NTU Research Data Policy requires you to retain your research data for a minimum of 10 years. Where will you be depositing the data after the completion of your research project? (You may choose more than one)
b. Is there any data that will not be deposited in any data repository (ies) mentioned in question 10a?
Guide for 10a and b:
An open access data repository must be actively managed in order to:
(Source: Callaghan, S., Tedds, J., Kunze, J., et al. (2014). Guidelines on recommending data repositories as partners in publishing research data. International Journal of Digital Curation, 9(1), 152-163. doi:10.2218/ijdc.v9i1.309)
Additional Information for 10a and b:
You are expected to comply with University policies and guidelines namely, Appropriate Use of Information Resources Policy, IT Usage Policy and Social Media Policy. Users will be personally liable for any infringement of Copyright and Licensing laws. Unless otherwise stated, all guide content is licensed by CC BY-NC 4.0.