Discussion
The Concept of Data
Research data collected during ethnographic fieldwork are always deliberately generated and do not exist independently of the personal interaction between researchers and research participants. Therefore, in ethnographic fieldwork, it is not appropriate to speak of the collection of pre-existing raw or primary dataPrimary data are data collected directly in relation to a research subject 'in the field.' In field research, this may include field observations and corresponding notes in field journals, interview transcripts, conversation protocols, as well as quantitative data collected through questionnaires, and data created with and by research participants, such as photographs, films, etc. Read More. Instead, data are created in a processual manner during research and must be regarded as constructed: Social and cultural anthropological research does not follow „the acquisition of data with a subsequent analysis but establishes the value of certain informants‘ statements as data through theory-oriented analysis“ (Hirschauer, 2014, p. 303). Through their filtering and reflection on observations and inquiries, ethnographers ultimately produce data influenced by subjectivity and affect.
In addition, they gather other materials relevant to their topic (e.g., newspaper articles, brochures, photos, films, social media content, etc.) and – depending on the research question – engage in archival research by reviewing historical sources. It seems useful to distinguish between data as information deliberately generated by researchers, and material as information existing independently of the researchers. The term source, on the other hand, is used primarily to refer to published texts related to the research subject, which can be either academic or non-academic in nature, though the distinction from the concept of materials can sometimes blur in the latter case. This understanding also differs from the usual definition of sources in historical sciences (see Kirn, 1968, p. 29). This differentiation is significant because generated data, collected materials, and consulted sources each require different forms of processing, contextualization, ethical reflection, and legal assurance for archiving and potential reuse (see TP INF/SFB 1171, 2022).
The focus of this portal is on data as information elicited by researchers using various methods.
In this context, the variability of data genres also becomes relevant. Ethnographic fieldwork collects both qualitative and quantitative data, though the distinction between the two is not always clear-cut (Beer, 2003, p. 11). Quantitative data include household surveysA household survey is an overview study conducted through standardized surveys of a representative sample or random sample of households within a study region (see: Survey/Survey Data). In social and cultural anthropology, the terms survey, household survey, and census are often used interchangeably. Read More, measurements, standardized questionnaires, or time allocation studiesA time allocation study systematically measures the amount of time individuals spend on specific tasks and activities. These studies examine how people budget their time in various social and cultural contexts. For example, they explore how the division of labor in productive and reproductive activities is organized across genders and generations: How much time per day do mothers, fathers, older siblings, grandparents, and others spend caring for young children? How much time is allocated by whom to economic activities, caregiving tasks, or neighborhood interactions? Various methods are used to measure time budgeting, generating quantitative and replicable datasets. Read More. Qualitative data encompass open, narrative interviews; non-standardized research notes such as excerpts, protocols, notes, and field diaries; as well as photographs, images, and films shot in the field. While quantitative data are easier to depersonalize, qualitative data are characterized by semantic richness and ambiguity, making them difficult to understand without precise contextualization (Hirschauer, 2014, p. 303; Kretzer, 2013, p. 153). Sharing quantitative data collected through participant observation is potentially easier, whereas sharing qualitative data presents considerable challenges. Open and transparent access to qualitative ethnographic data, in the spirit of Open Science'Open Science encompasses strategies and practices aimed at making all components of the scientific process openly accessible and reusable on the internet. This approach is intended to open up new possibilities for science, society, and industry in handling scientific knowledge” (AG Open Science, 2014, translation by Saskia Köbschall). Read More and Data SharingData sharing refers to the act of sharing or distributing data. According to research requirements, data should be made as open as possible and as confidential as necessary (European Commission, 2021). Particularly with regard to the reuse and handling of sensitive, personal data, it is crucial to carefully assess whether and in what form archiving and sharing data with other researchers and the public is possible and appropriate. The imperative of data sharing enjoys broad consensus within the Open Science movement but should be critically considered and weighed from a social and cultural anthropological perspective. Read More, entails personal, legal, and ethical challengesResearch ethics addresses the relationship between researchers, the research field, and the subjects/participants of the research. This relationship is critically examined against the backdrop of vulnerabilities and power asymmetries created by the research process (Unger, Narimani & M’Bayo, 2014, p.1-2). Due to the processual and open-ended nature of ethnographic research, ethical questions arise throughout the research process in various ways, depending on the research context and methods. However, research ethics does not end with leaving the field; it also encompasses issues related to data archiving, data protection, and sharing research data with participants (see, for example, ethics guidelines by the DGSKA or the position paper on archiving, provision, and reuse of research data by the dgv). Read More for both researchers and research participants. Careful (pre-)selection and curation of data intended for publication are therefore required.
Critical Perspectives on Open Science
Many ethnographic researchers view the demand for Open Science and data sharing with skepticism for various reasons. They see no necessity for sharing data beyond publications, arguing that data only emerge through the reflexive and selective engagement of researchers with their notes, sources, and materials and ultimately take their final form in written texts (Behrends et al., 2022, p. 15). Furthermore, they assert that the boundary between data and publication is not clearly definable. They emphasize that it is part of the discipline’s good scientific practiceGood scientific practice (GSP) represents a standardized code of conduct established in the guidelines of the German Research Foundation (DFG). These guidelines emphasize the ethical obligation of every researcher to act responsibly, honestly, and respectfully, also in order to strengthen public trust in research and science. They serve as a framework for guiding scientific work processes. Read More to make the contexts of data collection, associated methodologies, and resulting data, as well as their own positioning in the field, transparent in publications, thus enabling intersubjective comprehensibility of the research process. The open presentation of the research and knowledge process was already emphasized by the „founding father“ of participant observation, Bronislaw Malinowski (1922):
„I consider that only such ethnographic sources are of unquestionable scientific value, in which we can clearly draw the line between, on the one hand, the results of direct observation and of native statements and interpretations, and on the other, the inferences of the author, based on his common sense and psychological insight”.
(Malinowski, 1922, p. 3)
Some researchers also fear that the demand for data sharing could lead to the disclosure of private field journals or notebooks that were never intended for publication (Behrends et al., 2022, p. 10). A central criticism concerns the considerable resource expenditure associated with data sharing, including high costs for post-processing (decoding, rewriting, or translation), curation, pseudonymizationPseudonymization is 'the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data cannot be attributed to an identified or identifiable natural person' (BlnDSG §31, 2020; EU GDPR Article 4 No. 5, 2016). Read More, or anonymizationAccording to the German Federal Data Protection Act (BDSG § 3, para. 6 in the version valid until May 24, 2018), anonymization is understood to mean all measures for modifying personal data in such a way 'that the individual details about personal or factual circumstances can no longer be assigned to an identified or identifiable natural person, or can only be assigned to an identified or identifiable natural person with a disproportionate investment of time, cost and labor.” Anonymized data is therefore data that does not (or no longer) provide any information about the person concerned. As such, it is not subject to data protection or the General Data Protection Regulation (GDPR). Read More. These requirements further raise the question of the purposefulness of data sharing:
„I believe that the added value of disclosing data, independent of all the ethical issues associated with it, is extremely limited […] For this reason, I honestly do not see what can be gained from it – apart from the enormous financial and time investment required to anonymize the data.“
(Behrends et al., 2022, p. 8)
Hirschauer (2014) shares this view and even goes so far as to describe data sharing (in the form of archiving and reusing data from ethnographic research) as „archival nonsense“:
„The mere accumulation of data is just as meaningless as the mere accumulation of ethnological artifacts in dusty shelves and dark storage rooms.“
(Hirschauer, 2014, p. 301)
Social and cultural anthropologists also express concern about the entire research process, fearing it may be jeopardized by the demand for OS:
„What does it actually do to us and our research when we already anticipate accessibility or making data available?“
(Behrends et al., 2022, p. 10)
On the one hand, there is concern that the awareness of future data publication could already influence the way field notes are recorded. On the other hand, this raises significant questions and gaps in research ethics: How should researchers handle the absence of provisions for personal data protectionData protection includes measures against the unlawful collection, storage, sharing, and reuse of personal data. It is based on the right of individuals to self-determination regarding the handling of their data and is anchored in the General Data Protection Regulation (GDPR), the Federal Data Protection Act (Bundesdatenschutzgesetz), and the corresponding laws of the federal states. A violation of data protection regulations can lead to criminal consequences. Read More? To what extent might findings be distorted by possible anonymizations? Could published data be misused, leading to unintended (political) consequences? Researchers fear that these gaps could sustainably undermine the trust-based relationships, rooted in loyalty and responsibility, between them and their informants. Hirschauer concludes:
„The central problem is not that archiving could harm our informants – and this is already a significant issue – but that it undermines their trust in such a way that it harms our research.“
(Hirschauer, 2014, p. 310)
In conclusion, the demand for Open Science (data sharing, research data managementResearch data management is aimed at handling research data in a responsible and well-considered manner. The idea is to carefully organize, maintain and process research data using specific measures and strategies. The goal is to store data long-term and make it accessible and reusable by others, in line with good scientific practice. This enables easier verification of scientific findings, secures evidence, and allows for further evaluations and analyses of the data. Read More, and the FAIR principlesThe FAIR Principles were first developed in 2016 by the FORCE11 community (The Future of Research Communication and e-Scholarship). FORCE11 is a community of researchers, librarians, archivists, publishers, and research funders aiming to bring about change in modern scientific communication through the effective use of information technology, thereby supporting enhanced knowledge creation and dissemination. The primary goal is the transparent and open presentation of scientific processes. Accordingly, data should be made findable, accessible, interoperable, and reusable (FAIR) online. The objective is to preserve data long-term and make it available for reuse by third parties in line with Open Science and Data Sharing principles. Precise definitions by FORCE11 can be found on their website see: https://force11.org/info/the-fair-data-principles/. Read More) does not seem to align well with the methodological approaches and data concepts of the discipline. OS does not adequately consider the sensitivity and responsibility required for research involving human participants and instead appears to be primarily subjected to a neoliberal logic of exploitation (Pels, 2018). The Global Indigenous Data Alliance (GIDA)The Global Indigenous Data Alliance (GIDA) is a network of researchers, data practitioners, and political activists dedicated to ensuring that Indigenous groups: Read More also criticized this in 2019 and developed the CARE principlesThe CARE principles were established by the Global Indigenous Data Alliance (GIDA) in 2019. They complement the FAIR principles and are used as a tool to focus more strongly on research contexts and their historical embeddedness, as well as on power asymmetries in the field. The acronym stands for Collective Benefit (common good), Authority to Control (control of research participants over their own representation), Responsibility (responsibility on the part of researchers) and Ethics (consideration of ethical aspects). The CARE principles are intended to emphasize and take into account the fair, respectful and ethical treatment of research participants and the data generated from research with regard to data sharing. The CARE principles are therefore relevant in all phases of the research data life cycle and research data management. Read More, which have been established as a complement to the FAIR principles and aim to address the ethical gap created by the demand for data sharing in ethnographic research (see the article on research ethics and data ethics; Imeri & Rizzolli, 2022, pp. 1).
Summary
Summarizing the above sections, the following can be stated:
- Open Science and the demand for data sharing are challenging in ethnographic disciplines due to the unique and non-repeatable nature of fieldwork and the personal involvement of the researcher.
- However, to meet the requirements of OS, specific, detached data types (e.g., in the form of quantitative data, materials, or sources) can be made openly accessible and shared. Qualitative documents are also fundamentally shareable, but this is always associated with significant resource expenditure.
- To make data sharing not only FAIR but also ethically justifiable, the CARE principles should be considered, and one’s positionality, power, and authority structures should already be reflected in field notes, documentation, and especially in publications.
Additionally, the following questions should always be considered in relation to data sharing:
- Who is the target audience for the published data? Who will read/use it and for what reason? What supplementary information is needed for understanding, and how will it be provided? (See articles on Archiving, Data Reuse, and Data Documentation and Metadata.)
- Are research participants informed about the publication and have they consented to it? (See articles on Informed Consent and Rights and Licenses.)
- Are personal and data protection aspects considered? (See articles on Data Protection and Rights and Licenses.)
- Have participants been adequately pseudonymized, and is re-identification excluded? (See article on Anonymization and Pseudonymization.)
This offering on research data management in social and cultural anthropology aims to provide answers to these questions: It presents the measures and methods of RDM while also conveying a sense of the challenges and limitations of OS. On the one hand, it highlights the current state of prevailing disciplinary debates on the topics of RDM, acknowledging that many open questions remain unanswered. On the other hand, it supports interested early-career researchers, students, and educators with information on OS as well as practical advice, exercises, and assistance.
Careful data maintenance, preparation, and organization are relevant for every research project. However, the decision regarding the form and extent to which researchers share their data and make it available for reuse must be made independently, depending on the research context. The more reflection and critical exchange on research data management are encouraged, the more successful future research and data collection efforts can become.