Logo

Search in DATA AFFAIRS

TaskExercise 2

Exercise 2

This exercise relates to the interview with Rainer Mühlhoff, where he discusses artificial intelligence. You can read it again here if needed.

Transcript of the Interview

Birgitt Röttger-Rössler: I am speaking with Rainer Mühlhoff, Professor of AI Ethics at the Institute for Cognitive Science at the University of Osnabrück. Rainer, I am very pleased that you are here today and have taken the time for our conversation.

Rainer Mühlhoff: Yes, thanks for the invitation. I am happy to be here as well.

Birgitt Röttger-Rössler: We want to talk about AI. AI has been a prominent topic in public discussions for several months due to the media hype surrounding ChatGPT. Yet AI has long been omnipresent in our society, influencing our lives and being fueled by our interactions with digital media. I would like to talk with you today about this omnipresence of AI. In your writings, you describe AI as a socio-technical system. Could you elaborate on that as a starting point?

Rainer Mühlhoff: Yes, gladly. Understanding AI as a socio-technical system primarily means looking at the societal embedding of AI technology that already exists today and has effects on all of us. This also involves a particular definition of AI, which I want to clarify. First, AI is often framed in public discourse, especially in newspaper debates, as a vision of the future – either in utopian or dystopian terms. The dystopian version warns that AI will eventually take over and dominate us, while the utopian vision suggests that AI will free us from work, taking care of everything. These are future-oriented discourses. The first clarification is: Nope! AI technology is already here, affecting the majority of people on this planet, and if we want to discuss AI ethically and politically, we should focus on its present implications rather than just its future ones.

The second clarification is that AI is often portrayed in mainstream discourse as embodied entities. The discussion tends to focus on robots or self-driving cars – material objects that interact with us, talk to us, or shake our hands. In this view, artificial intelligence is located inside these machines or entities. But in reality, most of today’s AI does not confront us physically. It consists of information processing in data networks and computing centers. Most AI today functions by analyzing the data we constantly generate through our use of digital media and services. It learns from this data and uses it to treat us in an individualized way – making suggestions, categorizing us into risk groups, recommending routes, and so on. These AI functions do not come from a tangible entity that interacts with us in the physical world but rather exist within invisible, intangible data networks.

The third point, which brings us directly to the socio-technical nature of AI, is that the most dominant and relevant AI technologies today – both in terms of their impact and financial investment – are those that exploit our data. These technologies utilize the data we have been accumulating since the 2000s, when smartphones and internet connections became widespread, and large corporations began collecting it. The current hype around machine-learning-based AI is no coincidence. Machine learning is all about learning from data. The ideas behind this technology existed in the 20th century, but back then, there was not enough data to train effective machine-learning models.

The reason we are now experiencing a surge in machine-learning-based AI technology is that we finally have the necessary data. And why do we have all this data? Because, around the turn of the millennium, networked media technology became widely adopted. The internet was opened up in the 1990s, and smartphones became available in the 2000s. These two major technologies have enabled the continuous aggregation of data from all areas of life and from nearly all people on the planet. Only through these massive aggregated datasets have powerful and effective machine-learning systems become possible.

All of us play a role in enabling this technology. Every day, as we continue using our devices and generating data, we contribute to these systems. That is why AI must be analyzed as a socially and media-culturally embedded phenomenon.

Birgitt Röttger-Rössler: In your writings, especially in your text on the „Power of Data,“ you emphasize that a power-analytical perspective on AI is necessary and that this is precisely where the particular challenges for AI ethics lie. Could you elaborate on that a bit?

Rainer Mühlhoff: Yes, I think the question of what AI ethics should look like is really exciting. It’s quite a contested field at the moment, still very young. There are now professorships in this area, but there is no established canon, no dominant paradigm. I would say that much of AI ethics today operates within the paradigm of applied ethics. This means that ethical considerations are made in relation to specific application domains. For example, people examine questions like how a robotic caregiver should be designed to ensure it treats people with dignity. These are important ethical questions, approached from a domain-specific perspective.

What I do, however, is an ethics that is actually, to be honest, a critical theory. That means I believe that AI ethics cannot be understood without a concept of power. Or rather, AI ethics must be approached from a power perspective because the most significant structural effects of AI – those that restructure society on a broad scale – are precisely power effects.

I am particularly interested in questions like: What does AI do structurally to our society? What forms of exploitation, discrimination, or preferential treatment of certain groups does it facilitate? How does it disadvantage others? What are the global dynamics at play – what kinds of mechanisms of global exploitation are embedded in this technology? What kind of relationship does AI have with the Global South? These are the questions I explore in relation to AI ethics.

This approach to ethics is very closely tied to political questions and has little to do with what I call „checklist ethics.“ That’s a term I use negatively. A lot of AI ethics today is approached with a mindset of: „We just want a checklist so we can make our AI ethical. We want to be able to tick some boxes, and if all the boxes are checked, then we can sell the product!“ That’s not what I do, and I don’t find it meaningful. I believe that good ethics means people have to think for themselves and take personal responsibility. Ethics should not be reduced to fulfilling externally imposed imperatives.

For me, ethics is also a philosophical discipline that involves personal responsibility, character development, and, in a way, cultivating virtues. It’s not just about a superficial form of compliance, which in the extreme case can turn into „whitewashing.“ AI ethics is often accused of „ethics washing,“ where companies push for AI ethics precisely because they know it’s more favorable for them than regulation.

Ethics remains non-binding, which is why there are hundreds of industry-driven „ethics white papers“ for various AI products. The industry uses this to create the image that it is trying to make its AI technology and products ethical. But this is part of a broader discourse strategy to prevent hard regulations – real regulatory measures that would set actual limits on AI.

I believe that when doing AI ethics, especially in a university setting, and when teaching students who are not just philosophy students but also come from computer science or cognitive science backgrounds, my role is to make it clear that this is not real ethics. Real ethics means critically examining what AI does to our society.

Birgitt Röttger-Rössler: Yes, thank you! That’s very exciting and interesting (…). Yes, especially pointing out that „ethics can serve as a kind of shield, a protective cover, behind which operations can continue relatively unrestrained.“ So, where are the power holders? Where do they sit? Maybe you can say something about that…

Rainer Mühlhoff: Yes, exactly. Thank you for coming back to that, because that was the actual question earlier on. I would say (…) I think it was necessary to clarify this first because now we can say that AI must be examined as a phenomenon of power. But the power of AI is complicated. It is not simply about certain institutions or people holding power. Instead, I would argue that we must examine at least two levels.

One level is participatory – AI as a socio-technical system, as a network of collaborative intelligence to which we all contribute by generating data every day through our digital devices. In this sense, we are all implicitly part of these power structures. That would be the first level.

The second level is accumulation. Of course, AI is associated with some form of power accumulation among major economic players. But I think it is crucial to analyze both of these levels in order to understand each one fully.

So let’s start briefly with the participatory level. I think it is essential to understand that AI would not exist without millions of users providing data every day. By generating data, we also contribute our cognitive capacities, in a way.

My favorite example is Facebook – when it was still called Facebook, meaning when the company itself was still named Facebook. In the 2010s, Facebook developed a facial recognition AI called „DeepFace,“ one of the first in the world.

What do you need to build facial recognition using machine learning? You have to train these systems – they learn from data, as I mentioned earlier. That means you need millions of labeled facial images – images with a face on them, along with information about who that person is.

Obtaining such data, especially in very large quantities, is difficult or, in the worst case, simply expensive for such companies. So, which companies have the best chances of achieving this? Those that can convince users to provide these datasets for free.

And that is exactly what Facebook did when it introduced the feature that allows users to label photos they upload. You can mark people’s faces when you upload a photo – now, all social networks have this feature.

This function was introduced specifically to „nudge“ people into providing labeled facial images. Of course, from the company’s internal perspective, this makes perfect sense. But to make it appealing as a social feature, it was framed differently.

The social product that emerged allowed users to label their photo albums, be notified when someone else uploaded a picture of them, and easily share the information that they were at a particular party in their timeline.

So, the entire system was designed as a form of social interaction. Uploading a photo and tagging yourself or others became a standard way to interact socially. Social media helped normalize this as a common way of engaging with others – which is evident when observing young people.

This entire setup was a kind of „nice hack,“ so to speak. A structure that extracts data from all of us was embedded into social reality. But at the same time, social reality itself was transformed in the process – which brings us to structural effects.

The key point here is that we all make this possible. We cannot simply claim to be passive participants or passive actors in this process.

That being said, I am not suggesting that individual users should be blamed or held responsible for tagging faces, which allowed Facebook to develop its facial recognition AI. That would be too much of a stretch.

A structural problem requires a structural solution. However, this structural configuration – the interaction of millions of actors simultaneously – enables, in this case, an AI system.

By tagging a face, we each contribute a small portion of our cognitive capacity. Recognizing faces was always a very difficult problem for algorithms – algorithmically challenging, but very easy for humans.

The solution to this difficult problem through AI was to create a global information network via social media and smartphones. This network allows AI to tap into the distributed cognitive resources of users worldwide.

All of these small contributions are aggregated, orchestrated by AI systems, and used to recognize faces – ultimately enabling the system to function effectively.

Birgitt Röttger-Rössler: Thank you! I think this example very clearly illustrates what you mean by socio-technical systems and also wonderfully demonstrates the entanglement – the feedback effect on society, on traditional communication modes, which are now being altered by these new possibilities. What I would still like to address in our conversation today is that I would like to integrate this discussion into the e-learning portal Data Affairs. While developing this portal – Data Management in Ethnographic Research – our team kept thinking and discussing: „Somehow, we need AI. We need to address this topic!“ Even though AI does not seem to be directly related to concrete data management in social science research, it still lurks in the background.

What we aim to do with the portal is to sensitize students, early-career researchers, and anyone interested in reflecting on what they do with their information, their so-called data. Not just thoughtlessly storing everything in the cloud – rather, taking data protection and data security seriously and reflecting on these issues. These are ethical questions, too. Does everything necessarily have to be stored in digital repositories? Is that really necessary?

That’s one issue. Another one that constantly occupies my thoughts is that the whole data management discussion operates under the so-called FAIR principles. This catchy slogan is actually quite ambiguous, right? It stands for Findable, Accessible, Interoperable, and Reusable. In other words: Make knowledge, which you have acquired in highly complex contexts, into small, processable, findable, and reusable units (…). Reduce it somehow. Wouldn’t it be important to take a critical perspective on this? Some people even refer to this as datafication, data extraction, or data mining. Many scholars in academia have raised critical concerns about this issue.

What would you say about this? What’s your opinion on this issue? This unease – because that’s really what it is, a kind of unease that my team and I feel.

Rainer Mühlhoff: Yes, I completely understand that, and from my research perspective, I can say quite clearly that I see significant risks here. What is labeled as FAIR can actually lead to enormous unfairness –to societal consequences and effects that have a lot to do with injustice and discrimination. I think we need to be aware that we are living in an era where large-scale data – anonymized mass data – is particularly valuable. This means that people, or the individuals represented in these research datasets, are promised anonymity. But this does not prevent the powerful exploitation of these datasets.

The greatest risk today is no longer about re-identifying individual people in these datasets – though that still happens and remains a significant issue. But even if we assume that perfect anonymization is achieved and individuals cannot be re-identified, these datasets still serve as valuable resources. Specifically, machine-learning systems can learn from them to differentiate between different types of people.

That is precisely what the interest lies in – automating the classification of individuals into different categories or social boxes in order to treat them differently. If you make publicly available research data that has been collected in the humanities and social sciences, you must assume that the insurance industry, for example, will want access to these data. And they absolutely do want them. Or take a company developing AI systems to assist in hiring decisions. These systems are particularly interested in subcultures, minorities, and their social behaviors because they want to be able to automatically recognize whether someone belongs to such a minority or to a way of life that is considered risky or associated with a supposed risk. Opening up such data – even if anonymized – already exposes vulnerable groups to an even higher risk of discrimination.

This applies particularly to research focusing on such groups. But even if you are conducting research on, let’s say, the so-called majority society – those who believe they have nothing to hide or who think they do not belong to a minority – even this research carries risks for society as a whole. These data pose a risk to society as a whole because you also need the data of the many, the supposedly „normal“ people, in order to differentiate and identify the „abnormal“ ones, in quotation marks. So, even people who believe they have no individual risk in sharing their data ultimately contribute to enabling the discrimination of others.

Birgitt Röttger-Rössler: Yes, as you said, they are crucial for norm-setting!

Rainer Mühlhoff: Exactly! They provide the benchmark.

Birgitt Röttger-Rössler: Yes. So ultimately, benchmarks are always quantitatively calculated. What the majority does becomes the norm. Or what the majority can do, or what the majority experiences in terms of health (…). These are all majority perspectives that set the norms.

Rainer Mühlhoff: And this trend is strongly reinforced by this process. And I want to take it a step further!We completely lack awareness of the significant societal risk involved in the secondary use of research data. When data is collected – whether through human experiments or fieldwork – ethics committees are always involved in assessing the primary purpose of data collection. They evaluate: What do you want to research? Is this ethically justifiable? However, they never assess the potential secondary uses of this data.

Now, if the trend is that research data must be made available in publicly accessible repositories, that means we must account for the entire spectrum of possible and even yet-to-be-imagined future uses of that data – particularly by actors we have not foreseen. These may be actors with discriminatory, exploitative, or otherwise harmful intentions. All of this should actually be considered when deciding whether to collect such data in the first place. The limited focus on the primary purpose of data collection would then essentially be obsolete.

And that would be a fundamental shift – one that would completely overhaul our entire ethical evaluation system for such research. In my opinion, researchers in the field would then bear a new kind of responsibility.

In anthropological or ethnological research, gaining access to a field often means entering spaces where it is not necessarily a given that outsiders are allowed in. Being granted access comes with a responsibility: researchers are essentially entrusted by the field or the people they study, and they are expected to handle that trust responsibly.

This responsibility is inextricably linked to the researcher in the field. The moment they transfer this data to a repository and expose it to uncontrolled secondary use, they – at least in my view – violate that very responsibility and the trust placed in them.

Birgitt Röttger-Rössler: Exactly! That is an argument often made by social and cultural anthropologists. There is a general sense of unease about this. We also conducted focus group discussions with colleagues, and this unease was explicitly addressed using the same criteria and aspects that you just articulated. Now, the corresponding repositories that store social science and qualitative data often say: “Yes! We secure access! It is only available upon request!” They emphasize that the context is always provided along with the data to ensure that the material is not viewed in complete isolation or used in entirely different contexts.

But I think doubts are warranted. Who can guarantee that? And can these repositories truly ensure security with their restricted access systems and control mechanisms? Can they guarantee it? Once something is on the internet, it remains accessible in some way, and the data can be completely detached from its original context. These are the concerns I have, and what you just outlined supports them.

Rainer Mühlhoff: Exactly! And we don’t even have to think about hackers or people gaining unauthorized access to data online. There are no legal regulations limiting access to such data, for example, by ensuring that only certain types of research can access it. Data protection laws do nothing in cases where the data has been anonymized. This means that researchers have to rely solely on the promise of the repository’s operator that they will continue applying the same access criteria not just today, but also in ten or twenty years when deciding who gets access.

There is no legal framework for this – just this unilateral promise between the repository’s operator and you, or the people whose data is stored there.

A good example – this is only about four weeks old – comes from the UK. The Biobank Project was launched there in the 2000s. In this project, 500,000 people voluntarily go to the doctor at regular intervals to be examined for specific health markers. This creates long-term time-series data for health-related research. The data covers cancer, behavioral patterns, substance abuse, lifestyle factors, and more.

When the Biobank Project was founded in the 2000s, participants were told that their data would only be used for medical research and specifically not shared with the insurance industry.

However, it has now been revealed that these data were indeed passed on to the insurance industry – anonymized. This means that the people in these datasets have no legal claim. Since the data is anonymized, no data protection laws were violated.

The point is: The insurance industry does not care about individual personal data – it is interested in large datasets of anonymized information. What they want, and what has now been proven, is to analyze the correlation between certain lifestyle patterns and, for example, the risk of developing cancer.

This means that when you apply for a new insurance policy, they analyze your lifestyle and use these patterns to make predictive risk assessments – whether you are likely to develop specific diseases – and then offer you a more expensive insurance plan accordingly.

These datasets were used precisely for that purpose because the leadership of the Biobank – which is a private company – has changed over the past 15 years, and with it, the criteria for data access have shifted.

And the same thing could happen to any research repository that is not subject to state regulation. And even if they are state-regulated, those regulations can change.

We also need to consider scenarios in which power is taken over by authoritarian or racist political regimes. We might think that collecting data on minorities and storing it indefinitely in repositories is not risky today. But we can all imagine political developments in which these same datasets could be used for purposes that we absolutely oppose now – and hopefully will still oppose in the future.

So, these are things we really need to think about.

Birgitt Röttger-Rössler: Yes, I think these are all very, very important aspects that we have discussed here, and hopefully, they will give some people food for thought. To conclude our conversation, I would like to return to the aspect of sustainability. These enormous data centers that exist, that keep growing larger and larger to handle these massive amounts of data, have an insatiable appetite for energy. In my opinion, this issue is discussed far too little. Hmm, I’m not sure if you have anything to say about this!? You certainly have an opinion…

Rainer Mühlhoff: Yes, this is currently an important debate in AI, and particularly in AI ethics research. AI technology is incredibly resource-intensive. Running these models requires immense computing power, and computing power is always linked to energy consumption. Power plants have to be built for this. If you consider what it costs to train large language models, it amounts to several million euros. The majority of that cost is energy consumption, right? Essentially, it’s the electricity bill. We are dealing with an incredible scale here. People need to be aware of this.

And it’s not just about energy – it’s also about rare earth elements, which are extracted through mining operations that are often tied to economic exploitation, usually involving countries in the Global South. The extraction of these raw materials is inherently linked to AI technology – or digital technology in general, including the technology we are using right now for this recording.

Whenever this issue comes up, I always like to point out that sustainability also has a social dimension. And that is something that is often overlooked, even within sustainability debates. Sustainability is often forgotten, but even within sustainability discussions, people usually forget that sustainability has a social component. Seven of the 17 UN Sustainable Development Goals relate to social issues – things like equality, non-discrimination, access to employment, and so on.

And AI technology, in particular, is a technology that is not necessarily socially sustainable. We just had a perfect conversation about this. We talked about whether research data – especially data related to the study of social environments, minorities, or specific societal groups – should be stored indefinitely in publicly accessible or semi-restricted repositories. We don’t know what might happen to these data in the future, and it is very possible that they could be used for discriminatory purposes against these groups.

And that is a direct violation of social sustainability goals. This is a dimension that must absolutely be considered – especially when it comes to research data. In research data discussions, the main argument is often: “We need to preserve these data for future, currently unforeseen reuse scenarios.” This means we are addressing a future-oriented aspect, which could have severe negative sustainability consequences. And not just because AI computations consume energy, but also because the uses of these data – the purposes for which they are deployed – could be socially harmful. This is particularly concerning if these data fall into the wrong hands or if ethical frameworks or political regimes change, influencing whether these data should be used or not.

The BMBF (German Federal Ministry of Education and Research) is currently working on a national research data law. It’s still in the early conceptual phase. There is a preliminary policy paper, and one of the ideas being discussed is a “right to use” clause. This would give private companies access rights to research data from public research institutions. That is what’s being considered in this policy paper.

From my perspective, this is highly problematic. Not only because of the asymmetry – after all, private companies’ research data would not be made similarly accessible; there would be no reciprocal access right to their data. That is not part of the current discussion.

But public research institutions are precisely the places where research on and for societal minorities must take place. That is exactly why I believe these data are particularly vulnerable and deserve special protection from being exploited for economic or discriminatory purposes.

That is why I think it is very reasonable to bring in the social dimension of sustainability and the UN Sustainable Development Goals as arguments against indiscriminately storing all research data in repositories forever.

Birgitt Röttger-Rössler: Exactly! And that not storing certain data can actually be an aspect of sustainability.

Rainer Mühlhoff: Exactly!

Birgitt Röttger-Rössler: Rainer, thank you very much for this conversation!

Rainer Mühlhoff: Birgitt, it was a pleasure! Thank you as well!

Answer the following question:
What potential challenges and risks does Rainer Mühlhoff see in the current and future use of AI and machine learning in the research context?

  • That we do not yet know how the data collected by companies (even anonymized) from the internet and social media will be used in the future.
  • That a secure access to research data, guaranteed today by an archive, could become obsolete in the future due to changes in laws, terms and conditions, etc..
  • That social and cultural anthropological research is typically based on strong trust relationships between researchers and participants, and this trust may be compromised when research data is stored in a repository. This is particularly critical when the research involves minorities or politically sensitive topics.
  • That research data collected for a specific primary purpose could be misused in other contexts in the future, e.g., by businesses or authoritarian regimes for their own interests (see example: UK Biobank).
  • That what is considered „FAIR“ today could lead to discrimination in the future if anonymized data is used to classify people into a „normal“ majority and a „risky“ minority, potentially leading to disadvantages and discrimination for the latter.
  • That there is (still) no ethics for AI, or that such an ethics framework should be developed independently of the current demands from businesses for „simple checklists“.