Skip to Main Content

Research Data Management

Data Management Plan Template v3

The online NTU DMP template resides in Research Information SystEm (RISE) and comes with 9 questions.

Update your DMP in RISE whenever there are significant changes (e.g. changes in data ownership, sensitivity, access, and storage locations) and upon project closure. See FAQ.

How to update DMP in RISE: See FAQ for instructions and recordings.

NTU Community may use the latest offline DMP template v3 for training and discussion purposes. Sign in (at upper right corner) to download MS Word file.

To attend workshops on DMP writing, please sign up here.

 


For earlier versions of the NTU DMP template, please see this link.

For NIE Community, please refer to this link.

Below are the guide and sample (if applicable) for each question.

DMP question 1

Types and Size of Data

a. What data will you be collecting or reusing?

b. What is the estimated size of the project data? (choose one)
[Consider implications of data volumes: do you have sufficient storage? Will the scale of the data pose challenges when sharing or transferring data between sites?]
   ○ ≤ 1GB
   ○ > 1GB ≤ 50GB
   ○ > 50GB ≤ 100GB
   ○ > 100GB ≤ 500GB
   ○ > 500GB ≤ 1TB
   ○ > 1TB ≤ 10TB
   ○ > 10TB ≤ 50TB
   ○ > 50TB

Guide for a.:

  • Describe type of data e.g. quantitative, qualitative, survey data, experimental measurements, models, images, audio-visual data, samples etc.
  • Describe format of data e.g. text, numeric, audio-visual, models, computer code, discipline-specific, instrument-specific.
  • Indicate which data are of long-term value and should be shared and/or preserved.
  • Are there any secondary data you are reusing? This could include data from earlier projects or third-party sources. Provide the title, author, date, URLs/name of these sources. Do you need to pay to reuse secondary data? If purchasing or reusing secondary data sources, explain how issues such as copyright and IPR have been addressed.
  • Consider how your data could complement and integrate with secondary data.

Additional Information:

  • You may refer to Re3data for a list of data repositories where you might find existing relevant third-party research data.

Guide for b.:

  • Do you have sufficient storage or should you include costs for more?
  • Will the scale of the data pose challenges when sharing or transferring data between sites?
  • Have you consulted with a data repository to determine preservation costs?
  • Consider the implications of data volumes in terms of storage, backup and access.
  • Consider how the data volume will grow to make sure any additional storage and technical support required can be met.

SAMPLE 1:

Class observation data, faculty interview data and student survey data will be collected. The data will be collected during the research period (Jan 2013 – Dec 2013). Most of the data will be in text format (notes, paper survey).

(Adapted from: Cmor, D., & Marshall, V. (2006). Librarian Class Attendance: Methods, Outcomes and Opportunities. 27th Annual IATUL Conference.)

SAMPLE 2:

Experimental and observational data in physical paper format will be collected. These are data related to production and decomposition, ecophysiological functional traits, soil extractable nutrients and mineralization rates.
As these original data in physical paper format will be used to identify outliers and possible transcription errors, the physical paper copies will be kept for at least 10 years.

(Adapted from: Cleland, E., Lipson, D., & Kim, J. The influence of plant functional types on ecosystem responses to altered rainfall. Retrieved Apr 1, 2020, from UC San Diego ‍Sample ‍NSF ‍Data ‍Management ‍Plans ‍website: https://library.ucsd.edu/research-and-collections/research-data/_files/dmpsample/DMP-Example-Cleland.pdf

SAMPLE 3:

Experimental lab data will be collected using microscope. The data generated will be time- and location- stamped image files of natural resources in Delaware County, PA. The images will be served as a record of the occurrence of creatures, natural artefacts, and conditions at specific places and times during the period 2003 through 2011.
For many of the photos, taxonomic information and metadata will also be available. The occurrence data will be observational and qualitative. Metadata files shall be retained to facilitate reuse.

(Adapted from:   Hampton, S. Examples of Data Management Plans. Retrieved  Apr 1, 2020, ‍from ‍DataOne ‍website: ‍https://www.dataone.org/sites/all/documents/ESA11_SS3_hampton.pdf‍)

SAMPLE 4:

Recorded oral interviews from 30 residents will be collected at the Nnindye community located in the Mpigi district in Uganda over a period of 6 months in the form of photos and videos.

(Adapted from:  Sapp Nelson, Megan and Beavis, Katherine (2013) “History / Sustainable Development – Purdue University,” Data Curation Profiles Directory: Vol. 5, Article 1. http://dx.doi.org/10.7771/2326-6651.1032 )

SAMPLE 5:

The primarily public data from 2000 to 2015 from the US Census Bureau will be acquired. Some preliminary (non-public) Census data, and some other sources, e.g. the US Bureau of Labour Statistics, and New York State Dept of Health will also be purchased and gathered.

(Adapted from:  Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

SAMPLE 6:

Primary data of audio files including Cheyenne and English language will be collected. Text files are generated after the files are transcribed.

(Adapted from:  Tancheva, Kornelia (2012) “Linguistics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 7. ‍
http://dx.doi.org/10.5703/1288284315007)

SAMPLE 7:

Sensor data, images and possibly 3rd party data (weather and road conditions) will be collected. Data is saved as excel spreadsheets and in SQL database.

(Adapted from:  Carlson, Jake R. (2009) “Traffic Flow – Purdue University,” Data Curation Profiles Directory: Vol. 1, Article 4. http://dx.doi.org/10.5703/1288284315016)

SAMPLE 8:

Experimental data will be generated from pressure sensors using Labview and generated from chromatographs. They includes variety of files including text, video specific to the equipment involved.

(Adapted from:  Kashyap, Nabil (2011) “Aerospace Engineering / Chemical Kinetics – University of Michigan,” Data Curation Profiles Directory: Vol. 3, Article 1. http://dx.doi.org/10.5703/1288284314989)

SAMPLE 9:

Field data from survey & bioessays will be collected using excel spreadsheet. Raw data of samples from lab will be collected using proprietary instrument. Ancillary data includes GIS data.

(Adapted from:  Wright, Sarah J. (2012) “Environmental Science / Herbivory – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 3. http://dx.doi.org/10.5703/1288284315002)

SAMPLE 10:

Quantitative data will be collected using motion capture system. The processed data types will include Matlab files, MS Excel files, codebook texts, and graphical files.

(Adapted from:  Cragin, Melissa; Kogan, Marina; and Collie, Aaron (2011) “Bio-Mechanics Motion Studies – University of Illinois Urbana-Champaign,” Data Curation Profiles Directory: Vol. 3, Article 6.  http://dx.doi.org/10.5703/1288284314998)

DMP question 2

Collection Methods and Organization of Data

a. How will the data be collected or acquired?

b. How will the data be organized?
[Consider data organization best practices.]

Guide for a.:

  • Describe data collection method, e.g. experimental (generated by lab equipment), computational/simulation (generated from computation models), observational (recordings of specific phenomena at a specific time or location), derived (produced via processing or combining other data), reference (extracted from published and/or curated datasets).
  • State how data is acquired, e.g. purchase data. 
  • For acquired data, keep in mind:
    • Version of data
    • Check copyrights, licenses, restrictions (access, reuse)
  • Describe the methods and standards that you will adopt to ensure quality data. This may include processes such as calibration, repeat samples or measurements, standardised data capture, data entry validation, peer review of data or representation with controlled vocabularies.

Additional Information:

  • Consistent, well-ordered research data will be easier for the research team to find, understand and reuse.
  • See DataOne Best Practices for data quality.

Guide for b.:

  • Describe how the data will be organised in your research project e.g. naming conventions, version control, folder structures, any community data standards (if any) will be used.

Additional Information:

SAMPLE 1:

Most datasets will be collected 1-3 times per year for a period of 3 years. Temperature, light availability and soil moisture at multiple depths in the experiment will be logged every 15 minutes. These data will be stored on local data loggers and downloaded every two weeks.

Data originally recorded on paper will be transferred into spreadsheets using .csv formats. DGVM simulation runs will be performed on a high performance parallel computing platform, a 96-node Linux cluster, maintained jointly by USFS Pacific Northwest Research Station and Oregon State University. DGVM output will be analysed and displayed with the ESRI ArcGIS software suite. To ensure data quality, data will be checked for outliers in the R statistical program, and any outliers will be checked for transcription errors.

As the data will be generated, processed and analysed by different project team members, I will recommend the project team members to name the data file by using their name initials, date and version, e.g. LGH_20150801_v1.

(Adapted from: Cleland, E., Lipson, D., & Kim, J. The influence of plant functional types on ecosystem responses to altered rainfall. ‍Retrieved Apr 1, 2020, ‍‍from UC ‍San ‍Diego Sample ‍NSF Data ‍Management ‍Plans ‍website‍: ‍https://library.ucsd.edu/research-and-collections/research-data/_files/dmpsample/DMP-Example-Cleland.pdf)‍

SAMPLE 2:

Interviews conducted will be recorded using digital recorders. The interview recordings will be transcribed and then translated. Both transcripts and translations will be saved in Microsoft Word documents. There will be two Microsoft Word documents for each interview: one in the original Luganda language and the other translated to English. The English translated interview will be coded by using the ethnographic software.

The raw data will all be stored in a folder titled “Raw data_YYYYMMDD”; the processed or analysed data will be kept at different folders by data type, e.g. all audio recordings will be saved in the same folder and video recordings will be stored at another folder. We will be using the following file-naming convention for each data file and folder:

      • data file name: Subject_v1 (e.g. interview_v1)
      • folder name: datatype_v1_YYYYMMDD (e.g. audiorecordings_v1_20151120)

(Adapted from: Sapp Nelson, Megan and Beavis, Katherine (2013) “History / Sustainable Development – Purdue University,” Data Curation Profiles Directory: Vol. 5, Article 1. http://dx.doi.org/10.7771/2326-6651.1032)

SAMPLE 3:

New data will be appended to existing time series in the MS SQL database. Aggregation of the data to state economic regions will be done to generate reports based on regions. Estimates/Projections will be calculated and reported. Website will be provided for users to view charts, maps, and tables that are dynamically created via an automated process that pulls data directly from the MS SQL database.

JISC has provided a guide on choosing a file name. We will name our data files based on the recommendations available in this website. All data files will be stored in different folders organised by researchers’ initials and date.

(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

SAMPLE 4:

The raw data of audio files will be normalized and cleaned up, then transcribed using a transcription software, ideally Elan. The audio and the transcription are synchronized. New audio recordings will be added each year throughout the project timeline (2015 – 2020).

The data will be organised and stored in different folders with the following file-naming convention: Subjectkeyword_V2_YYYYMMDD; Subjectkeyword_V2_YYYYMMDD.

(Adapted from: Tancheva, Kornelia (2012) “Linguistics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 7.
http://dx.doi.org/10.5703/1288284315007)

SAMPLE 5:

Experiment will capture videos of the 200ms-long process and physical samples of the mixture at different stages of the process. Samples will be separated by chromatography machines.

The data will be analysed which involves generating proprietary files for processing software and convenient printable formats for manually examining the data, for example Excel spreadsheets or PDF files. The pressure trace graphs and chromatographs will be the focus of analysis. Chromatograms will be interpreted for Clarity software. Some graphs on Arrhenius plots and concentration plots will be generated using Origin software. The video from the experiment will be used primarily for verification that the experiment ran correctly. Video stills will be generated from the video files and merged with some graphs using Photoshop.

Data cleansing (e.g. removing outliers, missing data interpolation) will be performed to improve the data quality. Data quality will also be ensured by repeated samples.

We will store all the data in a shared drive and will name each file by the following file-naming convention:

      • 20140603_MAEProject_DesignDocument_Tan_v2-01.docx
      • 20140809_MAEProject_MasterData_Daniel_v1-00.xlsx
      • 20140825_MAEProject_Ex1Test1_Data_Jason_v3-03.xlsx
      • 20141023_MAEProject_ProjectMeetingNotes_Kumar_v1-00.docx

(Adapted from:

SAMPLE 6:

Data will be generated by subjecting plant samples to analysis using coupled Gas Chromatography- Mass Spectrophotometry (GC-MS).The data will then be analysed using the instrument specific proprietary software to measure the area underneath the peaks for specific known Volatile Organic Compounds (VOCs). The peak area data will be entered into an Excel spreadsheet along with the field survey data. Statistical analysis of the data will be performed using StatView to prepare the tables and graphs for the research.

All data columns that refer to Master Data will be validated for its consistency check to ensure quality. Analytical data quality will be tested using appropriate tests.

We have not decided on how the data files will be organised yet. However, we will follow the file naming conventions recommended by the Stanford University Libraries to name our data files.

(Adapted from: Wright, Sarah J. (2012) “Environmental Science / Herbivory – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 3. http://dx.doi.org/10.5703/1288284315002)

SAMPLE 7:

Motion capture markers of the system will be attached to various parts of the body, usually the joints. The data will be moved to Excel for automated and filtering to removing errors and noise that occur due to the system being sensitive to light (e.g. reflections) and motion marker occlusion. More automatic threshold- based filtering will be carried out along with visual review of the data and manual cleaning. This process will take place in Matlab and the data will eventually be converted to represent several variables (e.g. angle data, displacement velocity, or acceleration of joint segments). The data will then be aggregated across subjects and will be stored in an Excel spreadsheet.

The precise placement of markers is very important for the quality of the data and its reliability. About 40 markers on each subject will be used.

The data will be organised through a file folder system where each trial will be documented in a single spreadsheet, and all the files from particular study will be stored in the same folder structure

(Adapted from: Cragin, Melissa; Kogan, Marina; and Collie, Aaron (2011) “Bio-Mechanics Motion Studies – University of Illinois Urbana-Champaign,” Data Curation Profiles Directory: Vol. 3, Article 6. http://dx.doi.org/10.5703/1288284314998)

SAMPLE 8:

Traffic flow data will be collected using sensors and video cameras. The road sensors placed in each lane of traffic will record the status of the intersection ( that the light is red, yellow, or green). Data from the sensors will be FTP-ed out on an hourly basis as compressed files. Data will be processed, normalized and reformatted from the vendor’s proprietary format into .csv and then into Microsoft Excel. Video of the traffic sites will be taken for data verification purposes and to ensure quality. The video gathered will be parsed out into .gif or .jpg images at the rate of 20 frames per second.

The data files will be primarily organized by date.

(Adapted from:Carlson, Jake R. (2009) “Traffic Flow – Purdue University,” Data Curation Profiles Directory: Vol. 1, Article 4. http://dx.doi.org/10.5703/1288284315016)

DMP question 3

File Formats and Software/Tools

a. Check the relevant file format(s) that you will be using (you may choose more than one).
[Non-proprietary and open file formats are recommended for long-term access and reuse purposes.] For detailed checklist, please refer to offline DMP template or online DMP form in RISE. 

b. What software(s) and/or tool(s) is/are needed to process/read the file(s)?
[Consider availability of software during and after project duration, including backup and additional data formats for at least ten-years or longer-term access and reuse.]

c. Where can this/these software(s) and/or tool(s) be obtained?

 

  • File formats affect one’s ability to use and re-use data in the future.
  • Strive to use a data format that is easy to read and easy to manipulate in a variety of commonly-used operating systems and programs.
  • Non-proprietary (‘open’) formats are also recommended to enhance accessibility.
  • For specialized data formats, provide information on the name, supplier information (if applicable) and version number to obtain the software(s) and/or tools to read your data.

Additional Information:

DMP question 4

Management of Proprietary Secondary Data

[Proprietary secondary data refers to data from external sources eg. databases or collaborators. These typically come with terms of use which will affect how you store the data, who can see/use it and how long it is to be kept.]

a. Do you use proprietary secondary data? (choose one)

    ○ My project does not involve the use of proprietary secondary data.

    ○ My project involves the use of proprietary secondary data.

If ‘My project involves the use of proprietary secondary data.’ is selected, answer b - d:

b. Indicate:

         i. Sources of data __________

        ii. Terms of use ____________

c. Who will have access to this data? Indicate the team members (students, staff, collaborators), who will have the access to the proprietary data. Include names when available. 

d. I have informed the above mentioned people that they will be handling proprietary data and the relevant terms and conditions, using the NTU undertaking form ‘Undertaking to safeguard confidential research information and data’ in ServiceNow:

     ○ Yes

     ○  Not yet
         Remarks: _____________________

  • Secondary data is the data that has been collected by others, for another purpose, but has some relevance to your research needs.
  • Some secondary data are free for use (e.g. government statistics, census data), and there are authors/owners who place their data in public domain for others to freely use (e.g. CC-0, no-rights-reserved).
  • Proprietary secondary data that you purchased/subscribed to, or obtained from collaborators, may come with restrictions. Do check their terms-of-use.
  • Proprietary secondary data's terms-of-use may limit how you can use, who can view/use, how you store/transfer, how long you can use, and how you can share the data.
  • The best practice is to identify any proprietary data, and their terms of use, in the DMP prior to data acquisition or collection, to justify the need to withhold them from public access if necessary.
  • NTU has an ‘Undertaking to safeguard confidential research information and data’ form, accessible through ServiceNow. PIs can use this form to ensure that research team members are aware of the nature of a particular research project, and for team members to safeguard such proprietary secondary data in accordance with with the terms-of-use and/or contractual obligations.

DMP question 5

Management of Sensitive Data

(See video recording of past workshop on DMP Q5)

a. Please check the relevant response:

    ○   My project does not involve the use of sensitive data.

    ○   My project involves the use of sensitive data.

    If 'My project involves the use of sensitive data.’ is selected, answer b - f:

b. Pease select those that apply:

    ☐  My project involves the use of sensitive data as it contains human subject identifiable data.
    ☐  My project involves the use of sensitive data as it is of commercial-in-confidence nature.
    ☐  My project involves the use of sensitive data as it is national security related.
    ☐  My project involves the handling of data that has patent/commercialization potential.
    ☐  My project involves the use of other types of sensitive data; please specify_______________________

c. State the relevant NTU Research Data Classification level(s) for your research data. You may choose more than one, if applicable.

    ☐ Level 1: Low or no sensitivity
    ☐ Level 2: Moderate
    ☐ Level 3: Moderate high
    ☐ Level 4: High

d. Describe contractual/legal obligations including those towards consent agreements and implications on how the data are to be managed/used/shared.

e. Who will have access to this data? Indicate the team members (students, staff, collaborators) who will have access to the sensitive data. Include names when available.

f. I have informed the above mentioned people that they will be handling sensitive data and the relevant terms and conditions, using the NTU undertaking form ‘Undertaking to safeguard confidential research information and data’ in ServiceNow:

    ○ Yes

    ○ Not yet
        Remarks: ______________________

Guide for a, b:

  • Sensitive data refers to data that needs to be protected from unauthorised access or unwarranted disclosure. It is generally considered to be:
    • Identifiable data: Data that can be used to identify an individual, endangered species, object or location. In Singapore, individual’s identifiable data is protected under the Personal Data Protection Act 2012.
    • Proprietary data: Data that is internally generated and gives competitive advantage to its owner. Proprietary data may be protected under copyright, patent, or trade secret laws.
    • Restricted or confidential data with contractual (e.g. Research Collaboration Agreements, Project Agreements, Material Transfer Agreements, Non-Disclosure Agreements) or legal obligations (e.g. Official Secrets Act). Click here to access NTU research agreement templates.

Guide for c:

Guide for d:

  • If your data is sensitive,
    • Ensure handling of sensitive data conforms to legal, regulatory, policy requirements
    • Determine risks and consequences of handling data (e.g. collecting, using, disclosing)
    • Describe appropriate security measures that you will be taking to prevent unauthorized access or security breaches
    • Describe how you will protect the identity of research subjects, e.g., via anonymisation or using managed access procedures
    • For research involving human subjects, state any data sharing agreement, where applicable

Guide for e:

  • For sensitive data, list the personnel (by roles and names, where applicable) who will have access to the data.

Guide for f:

  • For sensitive data, indicate if you have informed the above mentioned people that they will be handling sensitive data, and the contractual or legal obligations (including those towards consent agreements) and implications on how the data are to be managed. The PI is to use the form ‘Undertaking to safeguard confidential research information and data’ in ServiceNow to trigger an undertaking statement that will be sent to relevant research team members for their acknowledgement.

Additional Information:

SAMPLE 1:

I have sensitive data as it will contain human subject identifiable data.

The research will include data from subjects being screened for STDs. The final dataset will include self-reported demographic and behavioural data from interviews and laboratory data from urine specimens. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers, there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and documentation available only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate technology; and (3) a commitment to destroying or returning the data after analyses are completed.

(Adapted from: NIH ‍Data ‍Sharing ‍Policy ‍and ‍Implementation ‍Guidance. ‍(9 ‍February ‍2012), ‍from ‍http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#ex)

SAMPLE 2:

I have sensitive data as it will contain human subject identifiable data.

Access to research records will be limited to primary research team members. Recorded data will have any identifying information removed and will be relabelled with study code numbers. A database which relates study code numbers to consent forms and identifying information will be stored separately on password-protected computers in a secured, locked office. To maintain the privacy of the participants, any report of individual data will only consist of performance measures without any demographic or identifying information.

(Adapted from: Collaborative Research in Computational Neuroscience (CRCNS): Innovative Approaches to Science and Engineering Research on Brain Function. Retrieved Nov 24, 2015, from UC San Diego Sample NSF Data Management Plans website: https://library.ucsd.edu/research-and-collections/research-data/_files/dmpsample/DMP-Example-Psych.doc)

DMP question 6

Access and Usage Restrictions

(See video recording of past workshop on DMP Q6)

a. Who owns the data? Will there be restrictions on accessing and sharing your final research data? (choose one)

○  NTU owns all research data produced by this project. There will be no restriction to share final research data.

  1. Please select one:

○  The sharing of NTU owned research data (where possible) shall be based on Creative Commons license CC:BY:NC, where others may reuse the data for non-commercial applications only and must correctly attribute the data source in NTU.

○  The sharing of NTU owned research data shall be based on other terms due to obligations or other considerations. Please specify other terms/data sharing license and reasons for not using CC-BY-NC: _________________________

○  I will not be sharing data. Reasons: ___________________________

○  This project involves both NTU owned as well as external party owned data. NTU owned data will be shared where possible.

  1. Please select one:

○  The sharing of NTU owned research data (where possible) shall be based on Creative Commons license CC:BY:NC, where others may reuse the data for non-commercial applications only and must correctly attribute the data source in NTU.

○  The sharing of NTU owned research data shall be based on other terms due to obligations or other considerations. Please specify other terms/data sharing license and reasons for not using CC-BY-NC: _________________________

○  I will not be sharing data. Reasons: ___________________________

○  All research data produced for this project are owned by external party(ies) due to agreement with external parties on copyright, intellectual property, non-disclosure or proprietary use. 

  1. Please select one:

○  No data sharing due to obligations to agreement.

○  Conditional data sharing within terms of agreement.

○  I will share my data after obtaining my patent. 

According to the NTU Research Data Policy:

5.1.1 OWNERSHIP

5.1.1.1 The University owns all research data produced by research projects conducted at or under the auspices of NTU regardless of funding source, unless specific terms of sponsorship, other agreements or university policy supersede these rights.

5.1.1.2 In joint projects with other external parties, there shall be clear agreement for NTU to jointly hold all rights and ownership of research data arising from the project.

5.1.1.3 The University assigns automatic rights to the PI and his/her designated researchers to use and publish all research data arising from their project for non-commercial purposes only.

 

5.1.6 DATA SHARING

5.1.6.1 The final research data from projects carried out at NTU shall be made available for sharing (via the NTU Data Repository) unless there are prior formal agreements with external collaborators and parties on non-disclosure or proprietary use of the data.

5.1.6.2 The sharing and use of research data shall be based on Creative Commons license CC:BY:NC, where others may use data for non-commercial applications only and must correctly attribute the data source in NTU unless specified otherwise in the DMP.

Note: If you need to share data with external parties (e.g. collaborators or service vendors), you are advised to have a Research Contract Agreement (RCA) or Non-Disclosure Agreement (NDA) in place. Click here to access NTU research agreement templates.

 

Additional Information:

  • A license is a document that clearly sets out how the data can be used and attributed to the original data owner.
  • Without a license, it is unclear how your data can be reused and this may discourage the potential re-user.
  • The final research data is the final version of data that exists during the last stage in the data lifecycle in which all re-workings and manipulations of the data by the researcher have ceased.
  • The final research data also refers to the recorded factual materials commonly accepted by the scientific community as necessary to document, support, and validate research findings. Final research data does not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory.
  • Visit 'Data Rightsholder's (Creator's) Flowchart' by Australian Research Data Commons.
  • Visit the ‘How to License Research Data’ in the Digital Curation Centre website to learn more about the why and how of research data licensing. AusGOAL (Australian Governments Open Access and Licensing Framework) also offers a good research data FAQ.
  • Types of Creative Commons licenses.
  • See EUDAT’s data and software licensing wizard.

DMP question 7

Data Documentation and Metadata

(See video recording of past workshop on DMP Q7)

a. What data documentation will you be providing? (you may choose more than one)

[Data documentation helps secondary users understand and reuse your research data.]

☐ Codebook

☐ Data dictionary

☐ Manual/protocol

☐ Field notes

☐ Lay summary

☐ Readme.txt

☐ Webpage

☐ Others;

Details: __________________  

b. What metadata will you be providing? (you may choose more than one)

[Metadata refers to information that describes or contextualises the data, allowing those outside your institution, discipline, or software environment to interpret your data.]

☐ NTU/NIE data repository metadata standards (i.e. using the repository's metadata forms)

☐ Other metadata standards;

Please specify: _______________________

      ☐ No metadata standards will be used;

Following types of information for describing research data will be provided: ___________________________

The difference between data documentation and metadata is that the first is meant to be read by humans and the second implies computer-processing (though metadata may also be human-readable).

Data Documentation​

codebooks: describes the contents, structure, and layout of a data collection.

  • A well-documented codebook "contains information intended to be complete and self-explanatory for each variable in a data file." (source: ICPSR
  • Can contain information such as variable name, variable label, question text, value label, summary statistics, missing value, etc;
  • Useful for processing heterogeneous data into consistent, computable dataset.

manuals/protocols: a detailed plan describing the conduct and operation of a study.

  • Can include information on project title, summary, description (rationale, objectives, methodology, data management and analysis), ethical considerations, references as well as roles and responsibilities of research team members. (Source: University of Michigan Library)
  • Click here for a more detailed breakdown of elements in a research protocol from Office of Human Research Administration at Harvard University.

data dictionariesan inventory of data elements in a database or data model.

  • Can contain information on variable name, variable definition, data unit, data format or type, min and max values, method of measurement, precision of measurement;
  • Suitable for spreadsheets and datasets containing many variables;
  • May be part of README.txt file.

lay summary: are short accounts of research that are targeted at a general audience and are particularly important for research in medicine and health.

  • This idea can be transferable to other disciplines.
  • Can include information on the general purpose and method of the study explained in non-technical language, and a brief description of research conduct procedures.

README.txt: describe data content and general file structure.

  • Highly versatile, open format;
  • Can include information on dataset title, principle investigator, data collection date and geographic location, licenses or restrictions place on data, methods for data collection and data processing. And for each filename, can include a short description of what data it contains. (Source: Graduate Institutie of Geneva Library)

 

Metadata

You are strongly encouraged to use community standards to describe and structure data, where these are in place.

  1. The Digital Curation Centre (DCC) offers a catalogue of disciplinary metadata standards.
  2. FAIRsharing.org provides a repository of disciplinary and data management metadata standards across the globe.

If you are using a specific metadata scheme or standard, please state what it is and provide the references.

If you are not using a specific metadata scheme or standard, describe the type of metadata (e.g. descriptive, structural, administrative, etc.) you will be providing, if any.

In addition, when you deposit your dataset in NTU data repository--i.e. DR-NTU (Data) or NIE data repository, you are actually following the set of metadata that is pre-designed in these repositories.

 

Examples of datasets with good metadata from 

NTU data repository: https://doi:10.21979/N9/DUMOTD

NIE data repository: https://doi:10.25340/R4/5ROR7L

 

Additional Information:

  • Visit the ‘Document your data’ by UK Data Service for more guidance on the different types of documentation.
  • Visit the ‘How to write a lay summary’ by Digital Curation Centre for more guidance on why lay summaries are important, how to write one and examples.
  • Three broad categories of metadata are:
    • Descriptive – common fields such as title, author, abstract, keywords which help users to discover online sources through searching and browsing.
    • Administrative – preservation, rights management, and technical metadata about formats.
    • Structural – how different components of a set of associated data relate to one another, such as a schema describing relations between tables in a database, variable list, directory and file listing and taxonomy.

Samples for "Metadata" section:

 

When select the "Other metadata standards" option:

SAMPLE 1:

The clinical data collected from this project will be documented using CDASH v1.1 standards. The standard is available at CDISC website.

SAMPLE 2:

Using an electronic lab notebook, we would be generating metadata along with each notebook and postings. The metadata would include Sections, Categories and Keys which would be assigned by collaborators for reuse so as to maintain consistency in the use of terminology. We would also be using the Properties Ontology (ChemAxiomProp) when describing the chemical and materials properties.

SAMPLE 3:

We will be using some core elements from the TEI metadata standards to describe our data. We will also be adding some customised elements in the metadata to provide more details on the rights management.

 

When select the "No metadata standards will be used." option:

SAMPLE 1:

I will not be using any metadata or international standard for the data collected and generated for this project. However, I will ensure each document that I have created using the Microsoft Word, Microsoft Excel and Microsoft PowerPoint has sufficient basic information such as Author’s name, Title, Subject, Keywords and etc. in the document properties. In addition, a separate readme file will be prepared to describe the details of each data. I will be applying the recommendations provided by Cornell University in the creation of readme file(s). Key elements could include: introductory information about the data, methodological, date-specific and sharing/access related information.

SAMPLE 2:

Metadata about timing and exposure of individual images will be automatically generated by the camera. GPS locations will subsequently be added by post-processing GPS track data based on shared time stamps. Metadata for the image dataset as a whole will be generated by the image management software (iMatch) and will include time ranges, locations, and a taxon list. Those metadata will be translated into Ecological Metadata Language (EML), created using the Morpho software tool, and will include location and taxonomic summaries.

(Adapted from: Hampton, S. Examples of Data Management Plans. Retrieved  Nov 24, 2015, from DataOne website: https://www.dataone.org/sites/all/documents/ESA11_SS3_hampton.pdf)

DMP question 8

Data Storage During Project

a. Where and how are you storing the data during the project?

b. What backup and versioning control procedures will you be undertaking?

c. Who will be responsible for parts a and b? Provide names when available.

Guide for a.:

  • List the platforms and devices that will be used to store the data, e.g. electronic lab notebook, Sharepoint, WordPress.
  • Consider institutional data security policies.
  • Identify the location where the data would be stored, e.g. school server. Provide the URLs of online locations.

Additional Information:

Guide for b.:

  • Describe the backup and archiving regime you will use to back up all your data to prevent its loss, e.g. through hard disk failure, virus infection or theft.
  • Describe the method you will use to ensure that different versions of your data are identifiable and properly controlled and used.
  • How will the data be backed up? i.e. how often, to where, how many copies, is this automated

Additional Information:

  • Storing data on laptops, computer hard drives or external storage devices alone is very risky. The use of robust, managed storage with automatic backup, for example that provided by university IT team, is preferable.
  • See ‘Backup’ in UK Data Archive for more tips.
  • See MANTRA for more guidance.
  • Learn more about versioning control:

Data Storing

SAMPLE 1:

I will be using a networked storage drive XXX, which is a storage for active data for all research staff and students. It is fully backed-up, secure, resilient, and has multi-site storage. It is accessible via VPN (Virtual Private Network) from outside the University.  

SAMPLE 2:

The data will be stored locally on a secure password-protected data server. One set of hard drives and one set of tapes will be stored in XXX building. A second set of hard drives and a second set of tapes will be stored at a XXX building. 

SAMPLE 3:

The data (on staff computers and the web server) will be managed according to the standard practices of the college’s IT department and will be password protected. Any restricted, non-public data will be stored on CRADC (Cornell Restricted Access Data Center). 

Backup & Versioning Control

SAMPLE 1:

A complete copy of materials will be generated and stored independently on primary and backup sources for both the PI and Co-PI (as data are generated) and with all members of the Expert Panel every 6 months. The project team will be adopting the Version Control guidelines provided by National Institute of Dental and Craniofacial Research to organise and ensure different versions of the data are identifiable and properly controlled and use.

SAMPLE 2:

We will adopt and use the version control standards recommended by University of Leicester for the transcripts of the interviews and coding in terms of changes the research team has made to the files.

SAMPLE 3:

We will be using Mercurial, a free, distributed source control management tool to manage the data, so that the data would easily be identifiable and properly controlled and used.

SAMPLE 4:

All data will be backed up manually on monthly basis by researcher xxx on a computer hard drive kept at the research team office. The computer will be password protected and only team members will be given the password and right to access the computer. Incremental back-ups will be performed nightly and full back-ups will be performed monthly. Versions of the file that have been revised due to errors/updates will be retained in an archive system. A revision history document will describe the revisions made.

(Adapted from: NSF General: Mauna Loa example. Retrieved from Data Management Planning website: https://www.dataone.org/sites/all/documents/DMP_MaunaLoa_Formatted.pdf)

DMP question 9

Data Storage after Project

NTU Research Data Policy requires you to retain your research data for a minimum of 10 years. The data will be retained and/or shared for reuse by others after the completion of your research project in the following location(s): (select all that apply)
[Also refer to any additional requirements your college/school/institute might have. Content uploaded on NTU research data repository or external open access repositories must not infringe upon the copyrights or other intellectual property rights, not violate any laws, not contain software viruses, and must be void of all identifiable information.]

☐  NTU research data repository or NIE research data repository;
     Provide DOI(s) of dataset(s): __________________

☐  External open access data repository;
     Provide DOI(s) or URL(s) of dataset(s): __________________

☐  School/institute/laboratory/project server(s) (e.g. NIE RDA(R), LKCMedicine SDS);
     Provide folder locations or pathnames: __________________

☐  Other locations for digital research data (e.g. RDSS);
     Provide details of other digital locations: __________________

☐  Location(s) for any non-digital research data;
     Provide details of physical locations: __________________

☐  To be determined

An open access data repository must be actively managed in order to:

    1. enable access to the dataset
    2. ensure dataset persistence
    3. ensure dataset stability
    4. enable searching and retrieval of datasets
    5. collect information about repository statistics

(Source: Callaghan, S., Tedds, J., Kunze, J., et al. (2014). Guidelines on recommending data repositories as partners in publishing research data. International Journal of Digital Curation, 9(1), 152-163. doi:10.2218/ijdc.v9i1.309)

  • Use the NTU research data repository, DR-NTU (Data) to store and preserve the final version of your final research data for long-term access.
  • Preservation of digital outputs is necessary in order for the research data to endure changes in the technological environment and remain potentially re-usable in the future.
  • If you have proprietary format research data, try to also deposit the same data in a non-proprietary or 'open' format to make it easier for future reuse of your data. 
  • NTU researchers should deposit their final research data in DR-NTU (Data) where possible. If you have some of your research data that are made available elsewhere, deposit the rest of your research data in DR-NTU (Data) and include a link in your DR-NTU (Data) dataset to the externally shared dataset. 
  • When uploading your research data in DR-NTU (Data), please ensure that you do not infringe upon the copyrights or other intellectual property rights, including, but not limited to patent, trademark, trade secret, copyright, right of publicity or other right of any third party; you have been given all relevant, obligatory, and applicable approvals for posting such materials with the content included and in the format uploaded, including but not limited to approvals from the Institutional Review Board and third parties with whom you have relevant contractual obligations; and uploads must be void of all identifiable information, such that re-identification of any subjects from the amalgamation of the information available from all of the materials (across datasets and dataverses) uploaded under any one author and/or you should not be possible. Specifically, uploads cannot contain identity or social security numbers; credit card numbers; medical record numbers; health plan numbers; other account numbers of individuals; or biometric identifiers (fingerprints, retina, voice).

 

Additional Information: