Logo

Search in DATA AFFAIRS

Learning unitData Documentation and Metadata

Introduction

If research data are to be made available for archivingArchiving refers to the storage and accessibility of research data and materials. The aim of archiving is to enable long-term access to research data. On one hand, archived research data can be reused by third parties as secondary data for their own research questions. On the other hand, archiving ensures that research processes remain verifiable and transparent. There is also long-term archiving (LTA), which aims to ensure the usability of data over an indefinite period of time. LTA focuses on preserving the authenticity, integrity, accessibility, and comprehensibility of data. Read More and reuseData reuse, often referred to as secondary use, involves re-examining previously collected and published research datasets with the aim of gaining new insights, potentially from a different or fresh perspective. Preparing research data for reuse requires significantly more effort in terms of anonymization, preparation, and documentation than simple archiving for storage purposes. Read More, meticulous and detailed data documentation is indispensable. This entails explaining and describing the context of data collection and analysis so that users can trace the origins of the research materials, interpret them appropriately, and analyze them meaningfully. The central question is: How can research processes be made comprehensible and traceable for third parties, and how can the research data generated during these processes be made interpretable and reusable by others?

In social and cultural anthropology, careful and transparent documentation of the research context and the specific circumstances of data collection is an integral part of everyday research practice. This so-called data transparency is essential for making data intersubjectively verifiable, allowing readers of an ethnography or users of research data to trace the pathways of knowledge production in the field.

When preparing research data for storage, archiving, and reuse in repositories and data centers, this form of data documentation – systematic descriptions of the research context and the methods employed – is of particular importance.

When referring to data documentation in the following discussion, we are not addressing the securing and recording of information (e.g., notes, research protocols, diaries, photographs, audio recordings, etc.) during fieldwork (these are discussed in the article on Recording Strategies). Instead, we focus on a standardized, digital form of data description (i.e., metadataMetadata are descriptions of research data (data about data) and provide content-related and structured information about the research context, methodological and analytical procedures, as well as the research team that generated the data. They can be categorized into bibliographic, administrative, procedural, and descriptive metadata and are typically created using templates, ReadMe files, or data curation profiles. Metadata are published alongside the research data themselves and are essential in online repositories and research data centers, where they enable third parties to understand and contextualize datasets. Metadata also enhances the findability and machine-readability of data, making them a key component of the FAIR Principles and good scientific practice. Read More) and other types of data contextualization related to archiving and reuse.

Metadata provide a structured description of research data and may include content-specific, disciplinary, and technical-formal information about data collection. They offer a preliminary overview of archived material. Generally, archives, repositories, or research data centers provide guidelines regarding the content and format of metadata, which should be followed when describing data. These guidelines include schemas and core elements for data description, aiming to enhance the discoverability and readability of metadata for both humans and machines.

It is recommended to use controlledA controlled vocabulary consists of defined terms and rules organized in word lists or structured thesauri. It serves as a type of lexicon or encyclopedia for discipline-specific definitions, aiming to promote consistent scientific practice and make research interoperable and intersubjectively understandable. In the social sciences, the 'European Language Social Science Thesaurus' (ELSST) is particularly relevant siehe: https://elsst.cessda.eu. Read More and standardized vocabulariesAuthority data provide standardized records and unique identifiers in the form of specific numbers that distinctly describe and categorize individuals, works, institutions, research funders, entities, or keywords. This standardization eliminates incorrect or duplicate assignments. Authority data is particularly useful in catalogs and databases, where it facilitates easy retrieval of information about specific entities and supports digital networking and discoverability across projects. In Germany, the 'Integrated Authority File” (GND) by the German National Library is the central authority filei see: https://www.dnb.de/DE/Professionell/Standardisierung/GND/gnd_node.html. Read More, including authority data, which aid in cataloging and tagging in online archives and repositories by providing unique identifiers for individuals, locations, works, etc.1For more on standard data, see the video: https://www.youtube.com/watch?v=VsP7b7B-W_Q.

For a more comprehensive description and to enhance the traceability of research data, contextual materials or documents can be provided. These offer insights into the research background and consolidate a wide range of contextual information, thereby improving the understanding and reuse of the data.