Data Documentation and Metadata
Definition
The documentation of research data serves to provide a detailed description of the data with the aim of making it discoverable and thus reusable.
Metadata are structured information about resources - such as books, documents, collection objects, or (archived) research data - and serve to provide a standardized description. They facilitate the search, discovery, and reuse of these resources. The type and format of metadata used depend on the academic discipline and, where applicable, the requirements of funding agencies and archives/repositories.
Metadata standards ensure the structured description of resources by providing standardized fields for descriptions, such as for photographs: location, date, photographer, condition, etc. These standards enable the linkage and exchange of metadata between various applications, such as catalogs or search portals, thereby supporting interoperabilityInteroperability is the ability of a system to work seamlessly with other systems. In interoperable systems, data can be automatically combined and exchanged with other datasets, making data machine-readable, interpretable, and comparable in a simplified and accelerated manner. Interoperability is one of the main criteria of the FAIR Principles (Forschungsdaten.info, 2023). Read More.
Introduction
If research data are to be made available for archivingArchiving refers to the storage and accessibility of research data and materials. The aim of archiving is to enable long-term access to research data. On one hand, archived research data can be reused by third parties as secondary data for their own research questions. On the other hand, archiving ensures that research processes remain verifiable and transparent. There is also long-term archiving (LTA), which aims to ensure the usability of data over an indefinite period of time. LTA focuses on preserving the authenticity, integrity, accessibility, and comprehensibility of data. Read More and reuseData reuse, often referred to as secondary use, involves re-examining previously collected and published research datasets with the aim of gaining new insights, potentially from a different or fresh perspective. Preparing research data for reuse requires significantly more effort in terms of anonymization, preparation, and documentation than simple archiving for storage purposes. Read More, meticulous and detailed data documentation is indispensable. This entails explaining and describing the context of data collection and analysis so that users can trace the origins of the research materials, interpret them appropriately, and analyze them meaningfully. The central question is: How can research processes be made comprehensible and traceable for third parties, and how can the research data generated during these processes be made interpretable and reusable by others?
In social and cultural anthropology, careful and transparent documentation of the research context and the specific circumstances of data collection is an integral part of everyday research practice. This so-called data transparency is essential for making data intersubjectively verifiable, allowing readers of an ethnography or users of research data to trace the pathways of knowledge production in the field.
When preparing research data for storage, archiving, and reuse in repositories and data centers, this form of data documentation – systematic descriptions of the research context and the methods employed – is of particular importance.
When referring to data documentation in the following discussion, we are not addressing the securing and recording of information (e.g., notes, research protocols, diaries, photographs, audio recordings, etc.) during fieldwork (these are discussed in the article on Recording Strategies). Instead, we focus on a standardized, digital form of data description (i.e., metadataMetadata are descriptions of research data (data about data) and provide content-related and structured information about the research context, methodological and analytical procedures, as well as the research team that generated the data. They can be categorized into bibliographic, administrative, procedural, and descriptive metadata and are typically created using templates, ReadMe files, or data curation profiles. Metadata are published alongside the research data themselves and are essential in online repositories and research data centers, where they enable third parties to understand and contextualize datasets. Metadata also enhances the findability and machine-readability of data, making them a key component of the FAIR Principles and good scientific practice. Read More) and other types of data contextualization related to archiving and reuse.
Metadata provide a structured description of research data and may include content-specific, disciplinary, and technical-formal information about data collection. They offer a preliminary overview of archived material. Generally, archives, repositories, or research data centers provide guidelines regarding the content and format of metadata, which should be followed when describing data. These guidelines include schemas and core elements for data description, aiming to enhance the discoverability and readability of metadata for both humans and machines.
It is recommended to use controlledA controlled vocabulary consists of defined terms and rules organized in word lists or structured thesauri. It serves as a type of lexicon or encyclopedia for discipline-specific definitions, aiming to promote consistent scientific practice and make research interoperable and intersubjectively understandable. In the social sciences, the 'European Language Social Science Thesaurus' (ELSST) is particularly relevant siehe: https://elsst.cessda.eu. Read More and standardized vocabulariesAuthority data provide standardized records and unique identifiers in the form of specific numbers that distinctly describe and categorize individuals, works, institutions, research funders, entities, or keywords. This standardization eliminates incorrect or duplicate assignments. Authority data is particularly useful in catalogs and databases, where it facilitates easy retrieval of information about specific entities and supports digital networking and discoverability across projects. In Germany, the 'Integrated Authority File” (GND) by the German National Library is the central authority filei see: https://www.dnb.de/DE/Professionell/Standardisierung/GND/gnd_node.html. Read More, including authority data, which aid in cataloging and tagging in online archives and repositories by providing unique identifiers for individuals, locations, works, etc.1For more on standard data, see the video: https://www.youtube.com/watch?v=VsP7b7B-W_Q.
For a more comprehensive description and to enhance the traceability of research data, contextual materials or documents can be provided. These offer insights into the research background and consolidate a wide range of contextual information, thereby improving the understanding and reuse of the data.
Motivation
If research data are to be archived and made available for reuse, thorough data documentation is indispensable. As part of good scientific practiceGood scientific practice (GSP) represents a standardized code of conduct established in the guidelines of the German Research Foundation (DFG). These guidelines emphasize the ethical obligation of every researcher to act responsibly, honestly, and respectfully, also in order to strengthen public trust in research and science. They serve as a framework for guiding scientific work processes. Read More, it also serves to ensure data quality and should comply with the FAIR principlesThe FAIR Principles were first developed in 2016 by the FORCE11 community (The Future of Research Communication and e-Scholarship). FORCE11 is a community of researchers, librarians, archivists, publishers, and research funders aiming to bring about change in modern scientific communication through the effective use of information technology, thereby supporting enhanced knowledge creation and dissemination. The primary goal is the transparent and open presentation of scientific processes. Accordingly, data should be made findable, accessible, interoperable, and reusable (FAIR) online. The objective is to preserve data long-term and make it available for reuse by third parties in line with Open Science and Data Sharing principles. Precise definitions by FORCE11 can be found on their website see: https://force11.org/info/the-fair-data-principles/. Read More. Proper documentation is crucial for reuseData reuse, often referred to as secondary use, involves re-examining previously collected and published research datasets with the aim of gaining new insights, potentially from a different or fresh perspective. Preparing research data for reuse requires significantly more effort in terms of anonymization, preparation, and documentation than simple archiving for storage purposes. Read More, as it helps prevent misinterpretations and misuses and facilitates the search for suitable data (Huber, 2019, p. 14).
A data report can also serve as the basis for a dedicated publication on methodology. This makes the work conducted within the research project visible and citable beyond the publication of an article.
Moreover, careful data documentation positively impacts the organization and workflow of the researcher, even when archiving or reuse is not planned. In any case, data documentation should accompany the research process to avoid labor-intensive reconstructions later (RatSWD, 2023, p. 25).
Methods

Source: Types of Data Documentation, Anne Voigt with CoCoMaterial, 2023, licensed under CC BY-SA 4.0
Metadata and Metadata Standards
Metadata describe (research) data. They provide structured information about the research context, methods and analysis procedures used, research teams, available datasets, and more. Typically, metadata can be categorized into:
- Bibliographic Metadata (e.g., title, author, thematic focus of the subject)
- Administrative Metadata (e.g., file format, access rights, licenses)
- Process Metadata (e.g., methods used in data collection)
- Descriptive Metadata (e.g., additional information about the content and origin of the data) (Forschungsdaten.info, 2023d)
Metadata can be summarized and published in pre-structured templates like ReadMe filesReadMe files in the context of systems or projects contain information about the respective system, project, etc., to help users orient themselves. Read More. Archives, repositories, or research data centers also often provide forms with metadata structures for data archiving2 See for example, Qualiservice: https://www.qualiservice.org/en/the-helpdesk.html#downloads.
For social sciences, discipline-specific metadata standards have been established:
- Data Documentation Initiative (DDI)3 https://ddialliance.org/
- Dara Metadata Schema4 https://www.da-ra.de/downloads#version-3-0
These standards are integrated into digital databases and available for free download. However, these standards are primarily geared toward quantitative research data and are less suitable for qualitative research data, which are predominantly generated in social and cultural anthropological research. For empirical ethnographic research, supplementary documentation through data reports, study reports, and context materials is therefore indispensable.
ReadMe Files
ReadMe files are simple text or TEI-XML files stored in formats such as .txt, .md, or .xml. They include key metadata in a compact and structured format, such as project name, team members, funding, naming conventions, folder structures, and abbreviations. They can also record changes and data versioning. ReadMe files are typically machine-readable and can be published independently. They serve as a practical overview, are usually machine-readable and may look like this:
Examples
Documentation of research project XYZ
Creator(s):
Research context and hypotheses (reason(s) for data analysis):
Creation date of file(s):
Data collection/creation method(s):
Used Software (incl. version and add-ons), tools or devices:
Data (file names (incl. version), content, methods for data cleansing, language of data):
Softwarecode (file names (incl. version), content, programming language):
Additional documentation files (e.g. codebook, lab notebook, questionnaire):
Information on access and terms of use (license)
Notes:
Data Reports and Study Reports
Metadata alone are often insufficient for documenting qualitative research data. A data or methods report – referred to as a study report by Qualiservice – offers an alternative documentation method. Researchers can describe contexts, connections, and additional information in free text or bullet points, as well as record changes.
The report should include (similar to the metadata templates) the institution and persons, the research question, the preliminary work and conceptualization of the topic. Likewise, methods should be mentioned and further steps of data processing and analysis (such as transcription, evaluation procedure, interpretation and perspective of the researcher) should be presented. Furthermore, references to further contextual information and reuse potentials can be established. It is recommended to keep the report short and concise with essential information, notes and descriptions, and to use it as a practical and detailed summary of the research. As an overview and summary of the research, the report is an advantageous orientation aid for both the researchers themselves and for the potential research team and can be used as a stand-alone publication (RatSWD, 2023, p. 27).
Context Documents and Materials
For qualitative research data or data collected through ethnographic research, providing context materials aids documentation and reuse scenarios. These materials include artifacts such as written documents, images, videos, and objects of mundane, sacred, or artistic origins that are collected (not generated) by ethnographers. They can be analyzed and contextualized according to the research question (see article Data in ethnographic research).
For data documentation, this understanding can be extended to include contextual documents that “arise” during the research: questionnaires, interview guidelines, systematic observation protocols and other related data collection instruments, field and method reports, the respective transcription rules, anonymization measures and evaluation programs, etc. Context documents of this kind serve as data documentation and lead to a better understanding of the research and the research results.
For archiving, it is important to subject the documents and materials to a considered curation and sorting by type (such as interview data, surveysIn the social sciences, a survey refers to standardized, quantitative overview studies that provides information on specific groups or observational units, such as households, family structures, age groups (youth, retirees, workers, etc.), or individual companies and organizations. Survey data are usually collected through questionnaires or structured interviews. These data represent statistical microdata, allowing for the investigation of relationships and characteristics at the individual level. Surveys are standard methods in quantitative social research and are also employed in social and cultural anthropology to gather general information on social parameters, such as household compositions, economic conditions, or age structures within a population. Read More, observation data or media data, etc.) in advance, i.e. to consider which of one's own research data are suitable for archiving and re-use and which are not. This decision is closely linked to ethical and data protectionData protection includes measures against the unlawful collection, storage, sharing, and reuse of personal data. It is based on the right of individuals to self-determination regarding the handling of their data and is anchored in the General Data Protection Regulation (GDPR), the Federal Data Protection Act (Bundesdatenschutzgesetz), and the corresponding laws of the federal states. A violation of data protection regulations can lead to criminal consequences. Read More aspects. Once the selection criteria have been clarified, a precise derivation and listing of the materials and documents used, as well as the “tools” and “instruments” used in the research, can be carried out. This makes the research context, the perspective of the researcher, as well as the method, topic, research question, etc. comprehensible and interpretable.
Notes
- 1For more on standard data, see the video: https://www.youtube.com/watch?v=VsP7b7B-W_Q.
- 2See for example, Qualiservice: https://www.qualiservice.org/en/the-helpdesk.html#downloads
- 3
- 4
Literature and References
Forschungsdaten.info. (2023a). Das Data-Curation-Profile. forschungsdaten.info. https://forschungsdaten.info/themen/organisieren-und-aufbereiten/data-curation-profile/
Forschungsdaten.info. (2023d). Metadaten und Metadatenstandards. forschungsdaten.info. https://forschungsdaten.info/themen/beschreiben-und-dokumentieren/metadaten-und-metadatenstandards/
Huber, E. (2019). Affektive Dimensionen von Forschungsdaten, ihrer Nachnutzung und Verwaltung. Berlin. SFB 1171 Working Paper 01/19. https://refubium.fu-berlin.de/bitstream/handle/fub188/24721/SFB1171_WP_01-19_Huber.pdf?sequence=1&isAllowed=y
Rat für Sozial- und Wirtschaftsdaten. (RatSWD, 2023). Forschungsdatenmanagement in kleinen Forschungsprojekten – Eine Handreichung für die Praxis. RatSWD Output Series, 7. Berufungsperiode Nr. 3. https://doi.org/10.17620/02671.72
Citation
Heldt, C., Röttger-Rössler, B. & Voigt, A. (2023). Data Documentation and Metadata. In Data Affairs. Data Management in Ethnographic Research. SFB 1171 and Center for Digital Systems, Freie Universität Berlin. https://en.data-affairs.affective-societies.de/article/datadocumentation-and-metadata/