Logo

Search in DATA AFFAIRS

TaskExercise 2

Exercise 2

This sample Data Management Plan was created for a DFG grant. Compare the DMP with the recommended components under „Methods.“ Were all elements considered? What is missing, and what could be the reasons for this?

Data Management Plan for a DFG grant (full text)

Concept for Handling Research Data (DFG)

(Original in German by FDM@HU-Berlin, licensed under CC0 see: https://cms.hu-berlin.de/de/ueberblick/projekte/dataman/muster-dmp-dfg, translated by Saskia Köbschall)

Data Description:

The research data to be collected in project XYZ will be gathered through an online questionnaire. The software LimeSurvey, provided by the Computer and Media Service (CMS) of Humboldt-Universität zu Berlin, will be used for this purpose. The analysis of the survey data will be conducted using the open-source statistical software R and will be stored in the form of the dataset (CSV), the R analysis script (R), and a series of graphics (TIFF). Additionally, a README file (TXT), the questionnaire (PDF/A), and a codebook (PDF/A) will be created to describe the data.

In addition to the research data collected within the project, publicly accessible data will be reused and referenced. These include public statistics (CSV), reports (DOCX, PDF), and legal regulations (HTML, PDF). The total expected file size is a maximum of 50 GB.

Commented [KH1]: How are new data generated in your project? Are existing data being reused? What types of data, in terms of data formats (e.g., image data, text data, or measurement data), are generated in your project, and how are they further processed? To what extent do these data accumulate, and what is the expected data volume?

Documentation and Data Quality:

Metadata will be created using the web form of the GESIS – Leibniz Institute for the Social Sciences following the discipline-specific DDI standard.Additional documentation of the research data will be provided in the form of a README file, the questionnaire, the codebook, and the R syntax. Keywords will be assigned according to the discipline-specific thesaurus TheSoz. The data will be classified under the Social Sciences Classification via the web form.

The quality of the data will be verified using statistical methods, focusing primarily on representativeness and reliability. For instance, participation rates will be compared to the respective proportions in official statistics, and weighting will be applied where necessary. To use the collected data, spreadsheet software, a word processing program, statistical software, and a PDF viewer will be required.

Commented [KH2]: What approaches are being used to ensure that the data are described in a transparent and comprehensible way (e.g., the use of existing metadata or documentation standards, or ontologies)? What measures are being taken to ensure high data quality? Are quality controls planned, and if so, how will they be conducted? What digital methods and tools (e.g., software) are required to use the data?

Storage and Technical Security During the Project Duration:

The secure storage and backup of the data will be ensured by the project management in collaboration with the responsible IT officer of Institute XYZ throughout the project duration. For storing and collaboratively processing data during the project, the university-owned cloud storage „HU-Box“ will be used. This enables clear access management and simple usage administration. For sensitive data, encrypted and password-protected folders will be used, which can only be accessed and processed by authorized staff members. A nightly automated backup will be performed.

Commented [KH3]: How are the data stored and secured during the project duration? How is the security of sensitive data ensured during the project period (access and usage management)?

Legal Obligations and Framework Conditions:

Participants will be informed within the online questionnaire about the future publication of the data while maintaining anonymity. The online survey will be conducted in compliance with GDPR and developed in consultation with institutional data protection officers. This includes obtaining informed consent from respondents and a separate consent for the future publication of collected data.

An ethics review will be obtained in advance from the responsible ethics committee of Humboldt-Universität. To clarify copyright ownership of the data, a cooperation agreement will be established with project partner Z, and a data management plan will be developed as part of the project.

Commented [KH4]: What legal particularities exist in relation to handling research data in your project? Are there any expected impacts or restrictions regarding future publication or accessibility? How are aspects of usage rights, copyright, and ownership issues considered? Are there important scientific codes of conduct or professional standards that should be taken into account?

Data Sharing and Long-Term Accessibility:

In addition to direct analysis by the project team, the dataset will also be relevant for other research projects. Since no comparable data is currently available for secondary analysis, the collected research data, R analysis scripts, and the questionnaire will be made available under a CC-BY license via GESIS – Leibniz Institute for the Social Sciences. The GESIS data archive will assign the study a Digital Object Identifier (DOI). As outlined in the DFG guidelines on good scientific practice, project results and all relevant research data will be stored for at least ten years at Humboldt-Universität zu Berlin. For this purpose, a procedure will be established with the institutional IT officer to transfer the data to the CMS backup service. The curation of the data after project completion will be handled by GESIS – Leibniz Institute for the Social Sciences.

Commented [KH5]: Which data are particularly suitable for reuse in other contexts? According to what criteria are research data selected for availability to others? Are you planning to archive your data in an appropriate infrastructure? If so, how and where? Are there embargo periods? When will the research data be accessible to third parties?

Responsibilities and Resources:

In accordance with the Principles for Handling Research Data at Humboldt-Universität zu Berlin (https://hu.berlin/forschungsdaten-policy), the project management is responsible for all aspects of research data management. However, specific sub-tasks will be delegated to project staff. For example, three person-months (PM) are allocated for preparing research data for publication in the repository.

The provision and archiving of data via GESIS – Leibniz Institute for the Social Sciences will be free of charge after consultation with the repository. The long-term storage of the data will also be free of charge via the CMS of HU Berlin.

Commented [KH6]: Who is responsible for the proper handling of research data (description of roles and responsibilities within the project)? What resources (costs, time, or other) are required to ensure appropriate research data management within the project? Who will be responsible for curating the data after the project’s completion?

This template follows the Checklist for Handling Research Data issued by the German Research Foundation (DFG) in its version from December 21, 2021.

In this sample DMP, all details are included, albeit some are formulated very briefly. The DMP could have been created in the early phase of a project, as it lacks (at this stage) details on specific storage locations for specific material, meaning it is not yet a working document but rather a planning document. Furthermore, the details on planned costs and resources for research data management refer only to the archiving phase.

Additionally, there are almost no details on ethical and legal aspects. This could be related to the method, as online surveys can often be conducted anonymously. A note on maintaining anonymity could therefore be sufficient here.