Methods: Archiving

Source: Archiving, Anne Voigt with CoCoMaterial, 2023, licensed under CC BY-SA 4.0.
Data Selection
Archiving research data rarely involves sharing complete material corpora. Instead, it requires a thoughtful approach to determine what can and should be shared (DGSKA, 2015).
A significant challenge in archiving ethnographic data is the sensitive nature of the content. Social and cultural anthropologists often collect data on sensitive topics, the archiving of which in public repositoriesA repository is a storage location for academic documents. In online repositories, publications are digitally stored, managed, and assigned persistent identifiers. Cataloging facilitates the search and use of publications and author information. In most cases, documents in online repositories are openly and freely accessible (Open Access). Read More could pose unpredictable (political) risks or consequences for the research participants. These data must be treated with the utmost care and in adherence to ethical research principles (see the article on Data Protection). Additionally, obtaining informed consent from participants for archiving identifiable personal dataPersonal data includes: 'any information relating to an identified or identifiable natural person (data subject); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier, or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person(…)” (EU GDPR Article 4 No. 1, 2016; BDSG §46 para. 1, 2018; BlnDSG §31, 2020). Read More is essential but can often prove extremely difficult in practice (see articles on Informed Consent and Anonymization).
Another challenge arises from the researcher’s own involvement with their material, which often includes elements and references to the researcher themselves. Archiving such data for potential reuseData reuse, often referred to as secondary use, involves re-examining previously collected and published research datasets with the aim of gaining new insights, potentially from a different or fresh perspective. Preparing research data for reuse requires significantly more effort in terms of anonymization, preparation, and documentation than simple archiving for storage purposes. Read More means researchers may inadvertently disclose aspects of their personality, experiences, and positionality in the field to unknown third parties. This could conflict with the protection of their own personal rights unless they undertake the labor-intensive task of removing all personal elements from the data. However, such efforts often result in significant decontextualization.
These considerations underscore the need for careful deliberation on whether and under what conditions data archiving and reuse by third parties can be permitted, ensuring the protection of both the participants and the researchers. The central question in this regard is: for which potential users and purposes should researchers prepare which parts of their data, and in what form (Behrends et al., 2022, p. 17)?
For example, does the audience consist of:
- Academic audiences and professional peers in Germany, the country of research, or internationally, who are interested in the data from a scholarly perspective?
- State research agencies or local organizations, which may seek access to the data but could hold economic or political interests divergent from those of the researchers?
- Non-academic audiences interested in the research topic or whose interest researchers hope to cultivate, aiming to make the research accessible and visible beyond academic discourse, in line with the principles of Open Science'Open Science encompasses strategies and practices aimed at making all components of the scientific process openly accessible and reusable on the internet. This approach is intended to open up new possibilities for science, society, and industry in handling scientific knowledge” (AG Open Science, 2014, translation by Saskia Köbschall). Read More?
The decision regarding which data and materials can be made accessible, and under what conditions, should always be left to the researchers themselves, especially since they are personally involved in their respective fields and embedded in social relationships. The question of sharing data is therefore often linked to affective ties and loyalties, making it far from easy to resolve.
„Imagine, for example, an anthropologist, who investigates land right conflicts and therefore talks to the plantation owners – some of whom use illegal methods to expand their lands – and to local nongovernmental organizations (NGOs) that try to fight them, as well as to the ancestral owners of the land who have been cultivating it for generations, but who, in turn, also distrust the NGOs. Probably the anthropologist will be little inclined to share his or her information with the plantation owners, but possibly with the NGOs whose political commitment and mission he or she supports. The decision to share data with the NGO activists might on the other hand violate the trust of the ancestral land users, with whom the anthropologist feels particularly connected. But besides this, can anthropologists ever be sure to see through the motivations of their various interaction partners?“
(Rizzolli & Röttger-Rössler, 2024, p. 286)
This quote illustrates that the demand to share data with local research participants may sound straightforward but is often extremely difficult to implement and ethically challenging. The group of research participants is neither easy to define nor homogeneous.
Archives, Repositories, and Research Data Centers (RDCs)
Archives and repositoriesA repository is a storage location for academic documents. In online repositories, publications are digitally stored, managed, and assigned persistent identifiers. Cataloging facilitates the search and use of publications and author information. In most cases, documents in online repositories are openly and freely accessible (Open Access). Read More for qualitative data that account for the outlined aspects in their infrastructural organization – where trained personnel with experience in qualitative or ethnographic social research can provide specialized and discipline-specific consulting – remain scarce in Germany. An appropriate infrastructure for archiving and providing qualitative data is needed that ensures both the protection of data through secure, reliable and sustainable IT infrastructure and – as far as possible – access to the data material under controlled conditions and ideally in exchange with primary researchers (e.g. on-site use) (Imeri, 2017; Imeri et al., 2018; Eberhard, 2018; Eberhard, 2020).
A data service operating on this model and dedicated exclusively to qualitative research data is the Research Data Centre (RDC) QualiserviceThe Research Data Centre (RDC) Qualiservice provides qualitative social science data for scientific reuse. Accredited in 2019 by the German Council for Social and Economic Data (RatSWD), it adheres to their quality assurance criteria. In addition to data reuse, researchers have the option to share and organize their own research data, with the Qualiservice team offering advisory support. Qualiservice is committed to the DFG guidelines for ensuring good scientific practice and also adheres to the FAIR Guiding Principles for Scientific Data Management and Stewardship as well as the OECD Principles and Guidelines for Access to Research Data from Public FundingFor further informationen see also: https://www.qualiservice.org/en/. Read More at the University of Bremen. Qualiservice offers support and guidance to researchers during all phases of the research process. The center specializes in archiving qualitative and, particularly, ethnographic research materials with sensitive content, such as observation protocols, field notes, interviews, photographs, audio recordings, as well as audiovisual or internet-based data. It provides opportunities for archiving and enabling the scientific reuse of qualitative data. Researchers determine when and under what conditions data reuse is permissible1Qualiservice offers various options for this: examples include temporary embargoes (access restrictions for a set period) or the exclusion of specific uses, such as prohibiting the material’s use in teaching. For particularly sensitive data, it can be stipulated that access is allowed only on-site in Bremen..
Assigning Licenses and Persistent Identifiers
In research data centers such as Qualiservice, data providers can determine, via a license or corresponding data usage agreement, how access to their research data is granted and what usage rights are stipulated. The usage terms depend, among other factors, on whether personal and sensitive data have been processed and appropriate permissions have been obtained. Reuse, such as for research or teaching purposes via Qualiservice, must be explicitly authorized through informed consentInformed consent refers to the agreement of research participants to take part in a study based on the basis of comprehensive and understandable information. The design of an informed consent must address both ethical principles and data protection requirements. Read More from the affected individuals.
One of the most widely used licensing systems, especially for scientific articles, is the Creative Commons LicensesCreative Commons licenses are pre-formulated license agreements created by the non-profit organization Creative Commons, allowing copyright holders to grant public usage rights to their creative works. Once a work under a CC license is used by third parties according to the terms of the license, a contract is established (TUM, 2023, p. 5). Read More (Creative Commons, 2023b). The Creative Commons organization currently offers six pre-designed, standardized license agreementsIn a license agreement or through an open license, copyright holders specify how and under what conditions their copyrighted work may be used and/or exploited by third parties. Read More that copyright holdersThe German Copyright Act (UrhG) protects certain intellectual creations (works) and services. Works include literary works, photographic, film and musical works, as well as scientific or technical representations such as drawings, plans, maps, sketches, tables and plastic representations (§ 2 UrhG). The artistic, scientific achievements of persons or the investment made, on the other hand, are considered to be services worthy of protection (ancillary copyright).The author is entitled to publish and utilize the work. Read More can use to grant usage rights for their works. For scientific text publications, the CC-BY license has become the standard and is recommended by the German Research Foundation (DFG, 2014). Thanks to its modular, standardized structure, these licenses are easy to understand, even without legal expertise.
Research data and documentation intended for archiving should also be equipped with persistent identifiers (PIDs)A Persistent Identifier (PID) is a permanent digital code directly associated with a digital resource, such as a dataset, scholarly article, or other publication, making it permanently identifiable and findable. Unlike other serial identifiers (e.g., URLs), a Persistent Identifier refers to the object itself rather than its location on the internet. If the location of a digital object associated with a Persistent Identifier changes, the identifier remains the same; only the URL location in the identifier database needs to be updated or supplemented. This ensures that a dataset remains permanently findable, accessible, and citable (Forschungsdaten.info, 2023). Read More, which ensure the permanent findability and citability of the data. A PID is a durable reference to a digital resource, such as a dataset, that points to the object itself rather than its URL location. Even if the location of a digital object changes, the identifier remains the same (Forschungsdaten.info, 2023).
Currently, the most widely used persistent identifiers include the DOIDOI stands for Digital Object Identifier, a unique and permanent (persistent) identifier for digital objects, such as articles and contributions in scientific publications, as well as lecture publications and educational materials. A DOI must first be registered in the central database of the International DOI Foundation see: https://www.doi.org/. Read More and ORCIDAn example of a standardized identifier for uniquely identifying individuals is the ORCID (Open Researcher and Contributor ID). ORCID is an internationally recognized persistent identifier that allows researchers to be uniquely identified. The ID can be used by researchers for their scientific publications permanently and independently of any institution. It consists of 16 digits, grouped in four sets of four (e.g., 0000-0002-2792-2625). ORCID IDs are widely established in workflows at numerous publishers, universities, and research-related institutions and are often integrated into the peer-review process for journal articles An ORCID can be created free of charge at https://orcid.org/.. Read More:
- DOI (Digital Object Identifier): Refers to digital objects such as articles and contributions in scientific publications, as well as publications of lectures and educational materials.
- ORCID (Open Researcher and Contributor ID): Enables researchers and authors to be digitally referenced and uniquely identified.
Suitable File Formats for Archiving
The guidelines for good scientific practice stipulate a retention period of at least 10 years (GWP Guideline 17, 2022). In addition to ensuring the interpretability of the data through accompanying documentation and metadata (see article on Data Documentation and Metadata), it is important under the FAIR principlesThe FAIR Principles were first developed in 2016 by the FORCE11 community (The Future of Research Communication and e-Scholarship). FORCE11 is a community of researchers, librarians, archivists, publishers, and research funders aiming to bring about change in modern scientific communication through the effective use of information technology, thereby supporting enhanced knowledge creation and dissemination. The primary goal is the transparent and open presentation of scientific processes. Accordingly, data should be made findable, accessible, interoperable, and reusable (FAIR) online. The objective is to preserve data long-term and make it available for reuse by third parties in line with Open Science and Data Sharing principles. Precise definitions by FORCE11 can be found on their website see: https://force11.org/info/the-fair-data-principles/. Read More to also guarantee their usability through appropriate file formatsThe terms 'file type' and 'file format' are often used interchangeably. A distinction is made between proprietary and open file formats. Proprietary formats usually require fee-based software to access, as they may not be compatible with other programs (e.g., PowerPoint for .ppt files or Photoshop for .psd files). In contrast, open formats such as .rtf or .png are based on standards and can be opened by many programs. Read More. Formats that require proprietaryProprietary file formats are file formats that cannot be opened or read by third parties, or only with difficulty, because they are protected by license or patent. In most cases, special (paid) software is required for this (Wikipedia, 2023). Examples include the Word format.docx or the Adobe Photoshop format.psd. Read More, often paid software such as Microsoft Office, MaxQDA, or Photoshop for processing are less suitable for archiving.
If such software is used in a research project, the data should be converted into more suitable file formats for archiving. Ideally, files to be archived should be „unencrypted, uncompressed, patent-free, and created in an open, documented standard“ (Biernacka et al., 2021). Conversion can usually be carried out directly in the respective software during saving or using the export function.
Table: Recommended File Formats for Archiving (Biernacka et al., 2021)
File Types | Recommendation | Avoid |
Images | TIFF, JPEG2000, PNG | GIF, JPG |
Text | TXT, HTML, RTF, PDF/A, DOCX | DOC, PDF |
Tables | CSV, TSV, SPSS portable, XLSX | XLS, SPSS |
Multimedia | Container: MPEG4, MKV Codec: Theora, Dirac, FLAC | QuickTime, Flash |
Literature
Behrends, A.; Knecht, M.; Liebelt, C.; Pauli, J.; Rao, U.; Rizzolli, M.; Röttger-Rössler, B.; Stodulka, T. and Zenker, O. (Eds.) (2022). Zur Teilbarkeit ethnographischer Forschungsdaten. Oder: Wie viel Privatheit braucht ethnographische Forschung? Ein Gedankenaustausch. SFB 1171 ‚Affective Societies‘ Working Paper Nr. 01/22. http://dx.doi.org/10.17169/refubium-35157.2
Biernacka, K., Buchholz, P., Danker, S. A., Dolzycka, D., Engelhardt, C., Helbig, K., Jacob, J., Neumann, J., Odebrecht, C., Petersen, B., Slowig, B., Trautwein-Bruns, U., Wiljes, C. & Wuttke, U. (2021). Train-the-Trainer-Konzept zum Thema Forschungsdatenmanagement. (Version 4). Zenodo. https://doi.org/10.5281/zenodo.5773203
Creative Commons. (2023b). Licenses List. Creative Commons. https://creativecommons.org/licenses/?lang=de
Deutsche Forschungsgemeinschaft (DFG). (2014). Appell zur Nutzung offener Lizenzen in der Wissenschaft. Information für die Wissenschaft Nr. 6. https://www.dfg.de/de/aktuelles/neuigkeiten-themen/info-wissenschaft/2014/info-wissenschaft-14-68
Deutsche Forschungsgemeinschaft. (DFG, 2022). Leitlinien zur Sicherung guter wissenschaftlicher Praxis. Kodex. https://doi.org/10.5281/zenodo.6472827
Deutsche Gesellschaft für Sozial- und Kulturanthropologie. (DGSKA, 2015). Positionspapier zum Umgang mit ethnologischen Forschungsdaten. Forschungsdaten Info. https://forschungsdaten.info/nachrichten/nachricht-anzeige/positionspapier-zum-umgang-mit-ethnologischen-forschungsdaten/
Eberhard, I. & Kraus, W. (2018). Der Elefant im Raum. Ethnographisches Forschungsdatenmanagement als Herausforderung für Repositorien. Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare, 71(1), 41–52. DOI: 10.31263/voebm.v71i1.2018
Eberhard, I. (2020). Der Kontext bestimmt alles: Kontextdaten und Containerobjekte als Lösungsmöglichkeit für den Umgang mit sozialwissenschaftlichen qualitativen Daten. Erfahrungen aus dem Pilotprojekt „Ethnographische Datenarchivierung“ an der Universitätsbibliothek Wien. ABI Technik, 40(2), 169-176. https://doi.org/10.1515/abitech-2020-2007
Forschungsdaten.info. (2023). Glossar. forschungsdaten.info. https://forschungsdaten.info/praxis-kompakt/glossar/
Imeri, S. (2017): Open Data? Zum Umgang mit Forschungsdaten in den ethnologischen Fächern. In J. Kratzke & V. Heuveline (Ed.): E-Science-Tage 2017: Forschungsdaten managen (167-178). heiBOOKS. https://doi.org/10.11588/heibooks.285.377
Imeri, S., Sterzer, W., Harbeck, M. (2018). Forschungsdatenmanagement in den ethnologischen Fächern. Bericht aus dem Fachinformationsdienst Sozial- und Kulturanthropologie. Zeitschrift für Volkskunde, 114.1 (2018), 71–75.
Rizzolli, M. & Röttger-Rössler, B. (2024). Opening up ethnographic data. When the private becomes public. In Lünenborg, M. & Röttger-Rössler, B. (Eds). Affective Formation of Publics. Places, Networks, and Media (271-291). https://doi.org/10.4324/9781003365426-18