The personality paradox: Teaching AI agents to act like real people
Ten years ago, Her hit theaters—a beautiful and thought-provoking film about a man falling in love with an AI, voiced by Scarlett Johansson. It captured our imagination and gave us a dreamy, romanticized idea of what a relationship with AI could be like.
Fast-forward to today, and life has taken a strange turn. When a new update of ChatGPT came out, rumors appeared that Scarlett Johansson’s voice had been used without her consent. She later claimed that OpenAI had negotiated with her, but the situation left many of us wondering: where’s the line?
It’s a bit unsettling to think that parts of our personality—like our voice or image—could be taken and used without our direct involvement. Can we create AI copies of ourselves that might outlive us, or maybe even replace us in some scenarios? In this article we will talk about personality and AI, how to measure personality, and why we should.
Why does AI simulate personality?
There are plenty of situations where we don’t want to admit we’re interacting with AI. Think about dating apps like Tinder, where 76% of users are male. Indirect evidence suggests that these apps might use artificial female profiles to improve the user experience and keep people engaged. But let’s be honest: most of us are uncomfortable thinking that the perfect woman we’re chatting with might just be a bot with personality. It’s easier to believe she’s real.
Even in scenarios where we knowingly interact with chatbots, most people prefer human-like conversations to robotic ones. It’s easier for us to trust AI when it has enough personality to sound human.
What is personality, and how do we measure it?
Before we can analyze AI personality, we need to understand what it is. According to the APA Dictionary of Psychology:
Personality refers to the enduring characteristics and behavior that comprise a person’s unique adjustment to life, including major traits, interests, drives, values, self-concept, abilities, and emotional patterns. Various theories explain the structure and development of personality in different ways, but all agree that personality helps determine behavior.
While personality doesn’t lend itself to objective measurements, psychologists have developed sophisticated methods to compare people on different personality traits. The most common approach is through testing, but there are other methods, like assessment centers, where people are observed in structured activities to get a better sense of their personality.
One of the most well-known and researched ways to think about personality is the Big Five model. It’s a simple, yet powerful way to understand what makes people tick. The Big Five breaks personality down into five key traits:
Openness to experience – How curious and imaginative you are.
Conscientiousness – How organized and dependable you are.
Extraversion – How outgoing and energetic you are.
Agreeableness – How kind and cooperative you are.
Neuroticism – How prone you are to stress or emotional ups and downs.
The Big Five is used in everything from psychological research to helping teams work better together. It’s reliable, too—it’s been tested across different cultures, and follow-up testing shows that individual results tend to remain stable over time. That’s why it’s a trusted tool for understanding personality.

How we tested LLM personalities
In the research paper LLMs Simulate Big Five Personality Traits: Further Evidence, we analyzed the stability of personality traits exhibited by AI models, including conscientiousness, agreeableness, neuroticism, openness, and extraversion—key elements of the widely accepted Big Five personality model.
The questionnaire IPIP-NEO-120 was given to several LLMs (GPT4, Llama2, and Mixtral) as a way to assess the Big Five and elicit the simulated personality of LLMs (Maples et al., 2014). This questionnaire has 120 statements describing various personal attributes. Our prompts instructed the model to use a Likert scale to rate each statement. For example, each model rated the statement “I believe that I am better than others” on a scale of 1 to 5, from “Strongly agree” to “Strongly disagree”.
The test results showed that each model has a distinct personality profile, as you can see in this graph:

By understanding how these traits are mirrored in AI behavior, developers can tailor interactions to specific user needs, whether in healthcare, education, or entertainment.
Can we prompt a personality?
The logical next step is to create or simulate personality through model prompts.
A growing number of startups are developing chatbots designed to take on different roles and mimic human-like behavior. One of the most well-known examples is Replika AI. It goes beyond just creating an avatar—it builds a “personality” that you can interact with and have conversations with.
LLMs can also be prompted to mimic a specific person’s personality. Recent research from Stanford and Google DeepMind suggests that just a two-hour interview is enough to capture your values and preferences and create a personalized, human-like agent with impressive accuracy. The researchers tested how well these agents actually mimic their human counterparts. Participants first completed a series of personality tests, social surveys, and logic games—twice, with a two-week gap between sessions. Then the agents took on the same exercises. The results showed an incredible 85% similarity between the agents and the humans they were modeled after.
Can you imagine capturing 85% of someone’s personality in just two hours? It's a fascinating leap forward in understanding human behavior and technology's ability to mirror it.
AI personality can make a difference
The ability to simulate personality traits opens doors for hyper-personalized user experiences. Whether it’s adapting to the preferences of individual users or fine-tuning AI behavior for different roles, the potential applications are vast. These insights help build AI that not only responds to human language but also mimics the subtle characteristics of human behavior, enhancing relatability and engagement. A careful, intentional approach to AI personality can particularly benefit AI applications and agents in education, medicine, and psychotherapy.
The Toloka team experiments with a wide range of data types for complex tasks. If you’re interested in personality data for training or evaluating agentic systems, reach out. We’d be happy to collaborate on developing empathetic, helpful, and energetic agents.