Reuse of Research Data
Definition
The reuse of research data involves the use of previously collected data (or materials or sources) that have been archived in an archive, repository, or research data center. This means that research data, along with their metadataMetadata are descriptions of research data (data about data) and provide content-related and structured information about the research context, methodological and analytical procedures, as well as the research team that generated the data. They can be categorized into bibliographic, administrative, procedural, and descriptive metadata and are typically created using templates, ReadMe files, or data curation profiles. Metadata are published alongside the research data themselves and are essential in online repositories and research data centers, where they enable third parties to understand and contextualize datasets. Metadata also enhances the findability and machine-readability of data, making them a key component of the FAIR Principles and good scientific practice. Read More and contextual materials (see article on Data Documentation and Metadata), can be found online and, depending on the data type and access rightsIn archives or repositories, access rights regulate who has access to data and to what extent, particularly for reuse. Typically, access is categorized as follows: Read More, can be read, downloaded, printed, linked, stored, analyzed, and used for new research projects with different questions. This aligns with the FAIR principlesThe FAIR Principles were first developed in 2016 by the FORCE11 community (The Future of Research Communication and e-Scholarship). FORCE11 is a community of researchers, librarians, archivists, publishers, and research funders aiming to bring about change in modern scientific communication through the effective use of information technology, thereby supporting enhanced knowledge creation and dissemination. The primary goal is the transparent and open presentation of scientific processes. Accordingly, data should be made findable, accessible, interoperable, and reusable (FAIR) online. The objective is to preserve data long-term and make it available for reuse by third parties in line with Open Science and Data Sharing principles. Precise definitions by FORCE11 can be found on their website see: https://force11.org/info/the-fair-data-principles/. Read More and the Open Science'Open Science encompasses strategies and practices aimed at making all components of the scientific process openly accessible and reusable on the internet. This approach is intended to open up new possibilities for science, society, and industry in handling scientific knowledge” (AG Open Science, 2014, translation by Saskia Köbschall). Read More movement (RatSWD, 2023, p. 33; DGfE, 2020, p. 4).
Introduction
The reuse of research data is not a fundamentally new concept; it has always been part of scientific work. However, digitalization has opened new pathways. For example, the goal of the Human Relations Area Files (HRAF), established in the USA in 1949 as a non-profit consortium of universities, colleges, libraries, and research institutions, was to collect (already published) ethnographic texts, images, and later films, categorize them thematically and regionally, and make them available for cross-cultural comparative studies. Originally, these texts and images were stored on microfiche, which required specialized magnifying devices for reading. This technology significantly facilitated material collection for comparative studies and secondary analyses. Today, HRAF documents are digitally accessible and continually expanded1see Website: https://hraf.yale.edu/.
With the advancement of digitalizationDigital data are created through digitalization, which involves converting analog materials into formats suitable for electronic storage on digital media. Digital data offer the advantage of being easily and accurately duplicated, shared, and machine-processed. Read More and the push for Open Science'Open Science encompasses strategies and practices aimed at making all components of the scientific process openly accessible and reusable on the internet. This approach is intended to open up new possibilities for science, society, and industry in handling scientific knowledge” (AG Open Science, 2014, translation by Saskia Köbschall). Read More, the reuse of research data has become increasingly central. Research data from publicly funded projects are considered a public good and, in digital form, should ideally be openly accessible and usable. In the best case, data are available free of charge (Open AccessOpen Access refers to the free, costless, unrestricted, and barrier-free access to scientific knowledge and materials. For third parties to reuse these materials legally, the creators must grant usage rights through a licensing agreement. Free CC licenses, for example, specify exactly how data and materials may be reused. Read More) and openly licensedIn a license agreement or through an open license, copyright holders specify how and under what conditions their copyrighted work may be used and/or exploited by third parties. Read More, allowing reuse without the need to contact the original data providers for permission. This requires a reciprocal relationship of trust between data providers and data users. Data providers should prepare their data meticulously, adhering to data protection laws and research ethics, and carefully assess the potential for third-party reuse (see articles on Archiving, Data Protection, Informed Consent). When dealing with sensitive qualitative data, open access may not be possible; in such cases, agreements should clearly outline how the data can be used. Data users, in turn, should handle the data respectfully, acknowledge the original data providers through proper citation, and strictly avoid any form of data misuse (RatSWD, 2023, p. 33; DGfE, 2020, p. 4).
A responsible and reflective approach to existing data - considering both the data providers and third parties involved - is essential for re-analysis. To avoid arbitrariness and misinterpretation, data users should thoroughly engage with the contextual information (see article on Data Documentation) and understand the background of the original research, as well as the nature and characteristics of the data. Additionally, users should reflect on their own (ethnographic) positions and perspectives and incorporate these into their new analyses and arguments (Huber, 2019, p. 8). Reusing personalPersonal data includes: 'any information relating to an identified or identifiable natural person (data subject); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier, or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person(…)” (EU GDPR Article 4 No. 1, 2016; BDSG §46 para. 1, 2018; BlnDSG §31, 2020). Read More and sensitive dataWithin the category of personal data, there is a subset known as special categories of personal data. Their definition originates from Article 9(1) of the EU GDPR (2016), which states that these include information about the data subject’s: Read More, however, presents significant data protection challenges.
Motivation
Data reuse can be extremely valuable. Secondary analyses and comparative studies can be conducted using previously collected data. This allows researchers to examine historical developments and changes related to the studied topic, apply new research questions to existing data and combine datasets for broader analyses. Data users can expand and deepen their research focus while verifying the quality and “accuracy” of the data (RatSWD, 2023; DGfE, 2020; Forschungsdaten.Info, 2023h). Transparency in data collection and analysis processes also enhances the authenticity and comprehensibility of the data, contributing to good scientific practice (GWP)Good scientific practice (GSP) represents a standardized code of conduct established in the guidelines of the German Research Foundation (DFG). These guidelines emphasize the ethical obligation of every researcher to act responsibly, honestly, and respectfully, also in order to strengthen public trust in research and science. They serve as a framework for guiding scientific work processes. Read More as promoted by the German Research Foundation (DFG) (DGfE, 2020, pp. 1).
Methods
Different access options are available for data reuse, depending on the nature of the data (sensitive, personal, etc.):
- Open Access
Data can either be downloaded and used directly and free of charge, or accessed and used after registration and agreement to the terms of use. - Access by Request
If the desired data are not openly and freely available for digital reuse (as is often the case with sensitiveWithin the category of personal data, there is a subset known as special categories of personal data. Their definition originates from Article 9(1) of the EU GDPR (2016), which states that these include information about the data subject’s: Read More and personal dataPersonal data includes: 'any information relating to an identified or identifiable natural person (data subject); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier, or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person(…)” (EU GDPR Article 4 No. 1, 2016; BDSG §46 para. 1, 2018; BlnDSG §31, 2020). Read More), a data use agreement must be requested and signed with the respective archive, repositoryA repository is a storage location for academic documents. In online repositories, publications are digitally stored, managed, and assigned persistent identifiers. Cataloging facilitates the search and use of publications and author information. In most cases, documents in online repositories are openly and freely accessible (Open Access). Read More, or research data center. This agreement becomes legally binding upon signing, meaning the specified licensesIn a license agreement or through an open license, copyright holders specify how and under what conditions their copyrighted work may be used and/or exploited by third parties. Read More and usage conditions must be strictly observed, particularly regarding the attribution and citation of data providers. Usually, data access is limited to academic use in research and/or teaching, and proof of institutional affiliation is required. - Secure Access
Sensitive and personal data (i.e., data that cannot be sufficiently anonymized) can only be reused under strict security measures to protect the individuals involved. The secure organization and provision of this data are managed by the respective research data center: On-site usage allows data reuse in (digital) safe rooms or designated guest workspaces, secured with multiple passwords and various access and output controls. In physical secure rooms, devices such as mobile phones, laptops, or USB drives are allowed. The data are accessed via an internal computer with an intranet connection, eliminating the need for physical or digital data transfer. Fees may apply for such usage (DGfE, 2020, p. 15).
In the "Tools" section, repositories and research data centers (online databases) are listed where data can be downloaded or their usage can be requested online.
For example, Qualiservice offers the following usage conditions: a temporary embargo (i.e., a restriction period) or the exclusion of certain usage purposes, such as the use of materials in teaching. Particularly sensitive data may only be viewed on-site in Bremen.
Practical Examples
This interview with a sociology doctoral candidate illustrates how the reuse of (quantitative and personal) data from a statistical institute can be structured. The PhD candidate conducts research based on data already collected by the institute, focusing on employment conditions and the professional activities of young people in the Netherlands.
Example: Interview with a Sociologist
As audio file, only in German
Source: Interview by Camilla Heldt with a Doctoral Candidate, 2023, licensed under CC BY-NC-ND 4.0
As transcript
Question: I’m speaking today with a sociologist. You’re a doctoral candidate and, as you mentioned, currently in the midst of your research. What is the topic of your research?
Answer: I am studying the career trajectories of young people in the labor market. For this, I quantitatively analyze anonymized data regarding individuals’ income and employment situations.
Question: Ah, and you mentioned that you are not collecting your own data but are reusing data from the Dutch Statistical Institute. What exactly does this data reuse look like?
Answer: That’s right. I have a contract with the statistical institute and use the data via a VPN. To access the data, I log into the institute’s server using two-factor authentication. I need three different passwords in total. First, I receive a code through a password-protected app to log into the server. Then, I enter my personalized password. On the server, I have to log into another system where I can access the data. So, I have a primary and a secondary account, and I can only use my data in the secondary account.
Question: Ah, that sounds quite complex. What has your experience with data reuse been like? How do you feel about it?
Answer: Generally, the data usage works well, and I’m very happy to be able to use such detailed data for my doctoral research. Nevertheless, there are frequent technical issues when logging into the secondary account. To resolve these, I have to call the institute’s IT department each time, and it can take one to two days for them to identify and fix the problem, which can be quite frustrating.
Question: Yes, I can fully understand that. Would you also like to reuse data in future projects?
Answer: Yes, absolutely. Despite the minor hurdles, I would definitely like to work with the same data again in future projects.
Question: Thank you for these insights.
Answer: Thank you for the conversation.
Tools
The following repositories and portals offer qualitative data that can be reused, with access varying based on specific reuse criteria:
- Accredited Research Data Centers (RatSWD) specializing in qualitative research data, such as:
- Qualiservice Bremen: https://www.qualiservice.org/en/
(possible reuse of ethnographic data, including sensitive content)2At Qualiservice, qualitative social science research data from various disciplines can be found. The data portfolio includes interview data, ethnographic materials such as field notes and observation logs, mixed-methods data, image and audio material, as well as audiovisual data.
- Qualiservice Bremen: https://www.qualiservice.org/en/
- Discipline-Specific Search Portals where data can be found, such as:
- EVIFA – in addition to ethnological literature, ethnological research data archived at Qualiservice https://www.evifa.de/
- Verbund Forschungsdaten Bildung (Network): https://www.forschungsdaten-bildung.de/en/
- Repositories of One’s Own Institution (e.g., specific university repositories)
- Generic Repositories e. g. Zenodo: https://zenodo.org/, SocArxive: https://socopen.org/
- Meta-Portals for finding repositories: https://www.re3data.org/
- Meta-Portals for finding repositories, such as meta-portals where data can be found:
- Google Dataset Search: https://datasetsearch.research.google.com/
- Open Knowledge Maps: https://openknowledgemaps.org/
- Science Open: https://www.scienceopen.com/
- OpenAire: https://www.openaire.eu/
Discussion
Despite the advantages mentioned, data reuse and the question of whether and to what extent it is meaningful and feasible remain controversial. In socio-cultural anthropology, the close personal entanglement of researchers with their data is particularly problematic. Ethnographic data often contain emotional and biographical information about the researchers themselves, which would need to be extracted to protect their privacy. Making research situations and researchers' involvement comprehensible would require transferring this information into the contextual documentation, a significant effort that can “empty” the data of important content.
Behrends et al. (2022) note that increasing demands for open science'Open Science encompasses strategies and practices aimed at making all components of the scientific process openly accessible and reusable on the internet. This approach is intended to open up new possibilities for science, society, and industry in handling scientific knowledge” (AG Open Science, 2014, translation by Saskia Köbschall). Read More and data sharing either force researchers to publicly disclose personal aspects of their research or depersonalize their data, potentially reintroducing the illusion of objective knowledge. Furthermore, data reuse can threaten the trust built over long periods between participants and researchers, as shared information, conversations, and recordings become accessible to unknown third parties (Huber, 2019, p. 5).
Solutions for these issues, some of which have been developed by Qualiservice, may include:
- Careful and detailed data documentation (see article on Data Documentation) that highlights key aspects for reuse while safeguarding the privacy of researchers
- Development of informed consent formsInformed consent refers to the agreement of research participants to take part in a study based on the basis of comprehensive and understandable information. The design of an informed consent must address both ethical principles and data protection requirements. Read More tailored to research contexts, clearly outlining potential data reuse scenarios
- Schemas for documenting oral consents
- Well-negotiated reuse and licensing agreementsIn a license agreement or through an open license, copyright holders specify how and under what conditions their copyrighted work may be used and/or exploited by third parties. Read More with secure data accessIn archives or repositories, access rights regulate who has access to data and to what extent, particularly for reuse. Typically, access is categorized as follows: Read More, as shown in the application example
- Classification into data genres (or types) and curation/selection of data suitable for reuse (see article on Archiving), with ethical considerations playing a key role
While the value of data reuse is increasingly recognized in the open science movementSince the early 2000s, the Open Science movement has advocated for an open and transparent approach to science in which all stages of the scientific knowledge process are made openly accessible online. This means that not only the final results of research, such as monographs or articles, are shared publicly, but also materials that accompanied the research process, such as lab notebooks, research data, software used, and research reports. This approach aims to promote public participation in science and knowledge, engaging interested audiences. It also seeks to encourage creativity, innovation, and new collaborations, while enabling the verification of findings in terms of quality, accuracy, and authenticity – a process intended to democratize research. Components of Open Science include Open Access and Open Data, which provide the infrastructure for sharing interim research results. Read More, ethnographic disciplines require a fundamental shift in thinking for secondary analysis (even as a complement to primary data collection) to be seen as worthwhile and attractive. This particularly affects how such work is valued in qualification projects, where fieldwork is often central. Conversely, the popularity of collecting one’s own data should not diminish, to avoid an uncritical data positivism (DGfE, 2020, p. 19). Although the path to successful data reuse in ethnographic disciplines remains challenging, this shift in thinking is gradually beginning. Research data management, dialogue, exchange, the expansion of online databases, and careful planning of research projects should serve as initial tools and support measures (Huber, 2019, p. 16).
Notes
- 1see Website: https://hraf.yale.edu/
- 2At Qualiservice, qualitative social science research data from various disciplines can be found. The data portfolio includes interview data, ethnographic materials such as field notes and observation logs, mixed-methods data, image and audio material, as well as audiovisual data.
Literature and References
Behrends, A.; Knecht, M.; Liebelt, C.; Pauli, J.; Rao, U.; Rizzolli, M.; Röttger-Rössler, B.; Stodulka, T. and Zenker, O. (Eds.) (2022). Zur Teilbarkeit ethnographischer Forschungsdaten. Oder: Wie viel Privatheit braucht ethnographische Forschung? Ein Gedankenaustausch. SFB 1171 ‚Affective Societies‘ Working Paper Nr. 01/22. http://dx.doi.org/10.17169/refubium-35157.2
Deutsche Gesellschaft für Erziehungswissenschaften (DGfE). (2020). Empfehlungen zur Archivierung, Bereitstellung und Nachnutzung von Forschungsdaten im Kontext erziehungs- und bildungswissenschaftlicher sowie fachdidaktischer Forschung. DFG. https://www.dfg.de/resource/blob/174560/67a06609aa9aaa98e73b9b7d798afbb9/stellungnahme-forschungsdatenmanagement-data.pdf
Forschungsdaten.info. (2023h). Open Data, Open Access und Nachnutzung. forschungsdaten.info. https://forschungsdaten.info/themen/finden-und-nachnutzen/open-data-open-access-und-nachnutzung/
Huber, E. (2019). Affektive Dimensionen von Forschungsdaten, ihrer Nachnutzung und Verwaltung. Berlin. SFB 1171 Working Paper 01/19. https://refubium.fu-berlin.de/bitstream/handle/fub188/24721/SFB1171_WP_01-19_Huber.pdf?sequence=1&isAllowed=y
Rat für Sozial- und Wirtschaftsdaten. (RatSWD, 2023). Forschungsdatenmanagement in kleinen Forschungsprojekten – Eine Handreichung für die Praxis. RatSWD Output Series, 7. Berufungsperiode Nr. 3. https://doi.org/10.17620/02671.72
Citation
Heldt, C. & Röttger-Rössler, B. (2023). Reuse of Research Data. In Data Affairs. Data Management in Ethnographic Research. SFB 1171 and Center for Digital Systems, Freie Universität Berlin. https://en.data-affairs.affective-societies.de/article/data-reuse/