Evgeniya Sukhodolskaya
Five questions about bias in AI you’ve probably wanted to ask
Large language models (LLMs) have generated huge hype in the AI field. GPT-3, ChatGPT, Bing, Google’s Bard – you name it. Twitter is buzzing about how racist and sexist they can be, while AI engineers say they’re supplemental tools that need human curation.
Biases in AI systems have been there since the beginning of AI.
Bias has always been prevalent in AI. The human brain tries to simplify information consumption since it struggles to compare and fairly evaluate it. The resulting errors are called cognitive biases, and they affect how technology is used. For example, information retrieval systems like search engines have always suffered from position bias because people perceive items at the top of a list to be the most relevant.
Not all biases in AI systems are harmful. One digital nudging mechanism is categorization, an example of which is the “healthy snacks” category grocery delivery apps usually have. Leveraging the diversification bias, they pull users out of the filtering bubble to explore new and healthier options.
But some biases are obviously dangerous. That’s why fairness and biases in AI is a hot topic supercharged by the recent boom of LLMs.
Many biases hide in the data used to train ML models. Quite a few language models boil down to a tool that generates words with the highest probability in the current context. Large corpora of “words in contexts” scraped from the web, which is inherently biased, simultaneously empowers and causes problems for LLMs.
Image from @spiantado on https://twitter.com
Biases are identified, classified, and mitigated. The problem is too big to ignore. There are even guidelines. For example, Google’s People + AI Guidebook has a well-formulated, concise chapter called Data Collection + Evaluation with recommendations for minimizing bias in data. But there are still many unanswered questions and unsolved problems.
As a data advocate, I’m lucky enough to have access to experts at industrial conferences and meetups. At the Data-Driven AI Berlin meetup on biases in AI, I asked software engineering and data science leads on a discussion panel the questions that bug me the most.
These people in tech are creating AI startups and designing ML pipelines. That puts them in direct contact with biases in AI systems, so their answers are a rough approximation of the actual situation in the industry.
Question 1
A majority of biases come from historical data, old beliefs, and outdated behavioral patterns. Obviously, historical data should not be ignored or erased, ML models learn from our history the same way we do. Should we explicitly fix historical biases in AI or wait for history to become history?
First, we should decide what to optimize for in each use case by clearly defining equality, opportunity, odds, and outcomes. The more we aim to change the status quo, the more stringent we need to be with the definitions we use. At a minimum, we should measure the problems caused by historical biases and be fully transparent in sharing this information.
Next, we should look at generalization versus specificity. Generalizing specific cases usually introduces new biases. For example, DALL-E fixed ethnicity and gender bias by forcefully inserting additional information into user prompts, which some users claimed led to incorrect results. Teaching users to be specific might eliminate many historical biases in AI.
Question 2
Do all biases come from humans?
Even dinosaurs would probably have exhibited position bias if someone had ranked meat for them. When humans produce biased data, it results in biased outputs for models.
As a side note, I don’t fully agree. All generalization creates or intensifies biases, so neural networks (simplifying, learning weights, generalizing input information to process unseen data) might be a separate bias driver. What do you think?
Question 3
Humanity seems to be stuck in an infinite loop of introducing and fixing biases. Should we try to automate everything? AutoML for the win, AI models training AI models, and an unbiased AI eventually.
Even if there were a cycle of automation like that, we would still have to kick it off and set the objective as customers. That raises the likelihood of biases built into the loop. It’s not a perfect solution, especially with full AI automation decades away.
Mistakes are especially scary where human life is deeply impacted, like in medicine and jurisprudence. Given the current state of ethical AI, they can’t be fully automated, so we’re better off looking for manual solutions.
Question 4
Biases in humans and AI are a psychological and sociological phenomenon. Should tech specialists study those fields to better handle biased AI systems? Or can we focus on common sense and outsource the rest to experts?
A degree in ethics isn’t mandatory; having well-defined personal ethics is. Work ethics can’t be completely distinct from your personal ethics (like in the TV series “Severance"), and that applies to technical experts too.
However, AI system development is rarely just up to the tech crowd. Cross-functional teams handle product development, and one of their main focuses is conducting comprehensive user studies to detect biases and improve products. The ease with which users find biases in accessible models, like CLIP’s infamous bias toward text on an image, speaks for itself.
Question 5
If you as a tech specialist or team can’t find any more biases in your AI product, can you relax?
There will always be situations where you don’t have the imagination to understand someone else’s experience. But that doesn’t mean it doesn’t exist. Where human lives are affected, we have to set the bar as high as possible, remembering that fairness is key to marketplace sustainability as well.
The panel discussion underlined once again that biases should be of first concern for the AI crowd, especially with AI-powered tools playing an ever-growing role in our lives.
AI engineers need to broadly apply automated tools for bias mitigation. For example, crowdsourcing tools make it easy to conduct Implicit Association Tests (IAT) on a global crowd.
As Toloka dev advocates, we’ve seen how crowdsourcing can be used to detect biases. The founder of metarank.ai, Roman Grebennikov, used our platform to hunt for position bias. He created a project emulating a movie recommendation system with a standard ranked list interface, only the list was shuffled relative to the initial order of most to least relevant. Labeling revealed strong position bias that outweighs relevance. For more details about the experiment and Roman’s recommendations for position bias mitigation, read this article.
The next goal our machine learning, data advocate, and research teams are working toward is crowdsourced tools that detect and mitigate biases in LLMs. We’ll gather some intel and be back soon with insights.
Article written by:
Evgeniya Sukhodolskaya
Updated:
Apr 10, 2023