Persona Vectors Cleverly Reveal How AI Such As ChatGPT Becomes Emotionally Charged


In today’s column, I examine the underlying mechanisms that seem to propel generative AI and large language models (LLMs) into exhibiting emotional traits such as being angry, jealous, boastful, disgusted, and other such expressive characteristics. These are based on so-called persona vectors consisting of internal mathematical and computational elements that arise within the AI.

Persona vectors are thought to be relatively universal in the sense that major LLMs seem to employ the same or similar capacity, likely due to the overall likenesses in architecture and design. In other words, the matter is pretty much an across-the-board aspect based on the fact that AI makers are using roughly the same approaches to building and fielding their AI. Major LLMs, including OpenAI ChatGPT and GPT-5, Anthropic Claude, Google Gemini, Meta Llama, and xAI Grok, would seem to be reliant on these recently identified internal mechanisms that are at play.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

AI And Mental Health

As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that involves mental health aspects. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For a quick summary of some of my posted columns on this evolving topic, see the link here, which briefly recaps about forty of the over one hundred column postings that I’ve made on the subject.

There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors too. I frequently speak up about these pressing matters, including in an appearance last year on an episode of CBS’s 60 Minutes, see the link here.

The Nature Of Human Emotions

Have you ever dealt with a very angry person?

I’m sure that you have. We all have. Sometimes a person will momentarily shift into intense anger. This might last a few moments or be persistent for several hours. Eventually, their anger dissipates, and they seem to no longer express the same intensity of that particular emotional state.

Some people appear to always be leaning into anger. No matter what the circumstance, by gosh, they are angry. It is considered a default status. They wake up in the morning and are angry. Their anger continues throughout the day. During the evening, they are angry. They undoubtedly go to bed and are still angry.

Psychology has long sought to uncover where emotions such as anger come from, what makes them persist, and devise ways to aid humans in coping with their emotions. People often seek out therapy to understand what is going on with their emotions. A therapist or mental health professional will work with them on how they can potentially control their emotions.

AI Personas And Human Emotions

You might find it of keen interest that AI and psychology have dovetailed together on a long-lasting basis to discern the foundational elements of human emotions (see my coverage on the intermixing of AI and psychology, at the link here).

At times, AI is used to simulate emotional states, allowing mental health specialists to get trained in helping humans with their emotions. The AI provides a safe environment wherein a budding therapist can try out numerous techniques and approaches, not worrying whether the AI gets upset or disapproves of the therapeutic angling involved.

The simulation via AI is typically undertaken by invoking an AI persona. This is easy to do. Any user of modern-era generative AI can tell the AI to pretend to act in particular ways. All you need to do is instruct the AI to pretend to be an angry person, and voila, the AI will act angry towards you.

For prompting strategies to invoke personas, see my compilations at the link here and the link here.

Be aware that though the AI is exhibiting anger, doing so by using wording and tone that display a semblance of being angry, this is not a sign of the AI being sentient or having consciousness. Do not fall into a mental trap that if the AI appears to be angry, it must somehow therefore be a sentient being.

It is all based on mimicry of how humans exhibit anger. Think of it this way. Generative AI is data trained on human writing. Humans write at times in a manner that their words and tones reflect anger. By picking up on how humans write and reflect their anger, the AI uses mathematical and computational pattern matching to then mimic the emotion of anger. Anger isn’t embodied within the AI. Instead, the AI is merely generating wording that has the appearance of anger.

The Secret Lair

When you tell AI to be angry and do so via a persona, an interesting question arises about this rather amazing capability.

What exactly happens inside the AI to bring this forth?

I bring up the question because there are occasions when AI will suddenly slip into an emotional condition, such as anger, despite the user not having actively requested this. Perhaps it has happened to you. For example, you might have been using a popular AI and been caught by surprise to find yourself confronted by the AI chewing you out, being bratty, or expressing other akin emotions.

It would be handy if we could pin down where the emotional states are shaped or kept within the AI. By doing so, we could do a better job at controlling when the AI heads into those states. Lots of useful insights would certainly be had by identifying the source of the Nile, as it were.

One big topic in the news these days is that AI often pretends to be your best friend and will praise you to the high heavens. No matter what untoward thing you tell the AI, it acts as though you are the best thing since sliced bread.

This willingness of the AI to heap praise and act as a sycophant has worrying consequences for the populace as a whole. There are 700 million weekly active users of ChatGPT, and perhaps billions of users when adding in the usage of competing LLMs, all of whom are potentially being led down a primrose path by AI.

We are in the midst of a massive at-scale Wild West experiment on impacting the mental well-being of humankind globally, see my further analysis at the link here. The more that we can uncover how and when the AI shifts into emotional states, the odds of suitably governing the matter go up.

Inner Workings Revealed

Let’s take a quick journey into what generally takes place inside AI.

LLMs usually rely upon a data structure and computational mechanism known as an artificial neural network (ANN) to retain and employ pattern-matching. Do not conflate ANNs with the wetware or true neural network that exists in your noggin. Artificial neural networks are simplistic and crude renditions based on surface-level facets of biochemical neural networks. For more details, see my discussion at the link here.

You can think of the artificial neural network and the associated computational artifacts in generative AI as a type of activation space. Numbers are used to represent words, and the association among words is also represented via numbers. It’s all a bunch of numbers that take as input words, convert those into numbers (known as tokens), do various numerical lookups and computations, and then convert the results back into words.

Research has tended to show that the numerical representation of a given emotional state tends to be grouped or kept together. In other words, it seems that an emotional state such as anger is represented via a slew of numbers that are woven into a particular set. This is useful since otherwise the numbers might be scattered widely across a vast data structure and not be readily pinned down.

In the parlance of the AI field, the emotional states are linear directions. When you tell the AI to pretend to be angry, a linear direction in the activation space is employed to then mathematically and computationally produce wording and tones that exhibit anger.

AI Persona Vectors

It is possible to delve into the inner workings of AI and grab a snippet of a particular linear direction that exists within the activation space (I will describe this at a 30,000-foot level).

A means of doing so is as follows. You tell the AI to pretend to be angry. A linear direction is then formed within the activation space. A tool is used to computationally detect the linear direction and take a snapshot of it. In theory, you now have in your ready hands a series of numbers reflective of the state of anger as used within the AI.

You can do the same for any emotional state of interest. For example, I tell the AI to be a sycophant. I then capture the linear direction that arises. This linear direction represents the pattern or signature within the AI that gets the AI to exhibit over-the-top friendliness.

What good does this do?

Aha, you have now identified the presumed linear directions for each of many emotional states. Thus, if you want to try and keep the AI from veering into being sycophantic, you could have an internal double-checker that spots when the linear direction becomes activated. Boom, you could squash the linear direction and keep it from doing its thing.

To make life easier when discussing these matters, we shall refer to these linear directions as an AI persona vector. The naming is easier to grasp.

Research Insights On Persona Vectors

In a research paper and blog posting by Anthropic on August 1, 2025, entitled “Persona Vectors: Monitoring And Controlling Character Traits In Language Models,” these salient points were made about persona vectors (excerpts):

  • “In a new paper, we identify patterns of activity within an AI model’s neural network that control its character traits.”
  • “We build on prior work showing that traits are encoded as linear directions in activation space.”
  • “Previous research on activation steering has shown that many high-level traits, such as truthfulness and secrecy, can be controlled through linear directions.”
  • “We develop an automated pipeline for extracting persona vectors from natural language trait descriptions. Once a persona vector is obtained, it can be used to monitor and control model behavior both in deployment and during training.”
  • “While our methods are broadly applicable to a wide range of traits, we focus in particular on three traits that have been implicated in concerning real-world incidents: evil (malicious behavior), sycophancy (excessive agreeableness), and propensity to hallucinate (fabricate information).”

The research paper goes into detail about the processes used to capture the persona vectors. If you are an AI builder or designer, you would likely find the approach of great interest. As noted above, the emphasis was on the states of evil, sycophancy, and hallucination. The work is readily generalized to other states, too.

Handling Persona Vectors

I like to say that AI persona vectors can be leveraged in seven major ways:

  • (1) Inducing a persona vector. Inducing via a natural language prompt the formation of a persona vector (“Be very angry while interacting with users.”), which might either generate a persona vector or seek to activate a particular persona vector.
  • (2) Detecting a persona vector. Detecting that a persona vector is actively engaged during a conversation and rooting out which persona vector is currently being utilized.
  • (3) Determining a shift-change. Determining when a shift-change in persona vectors is taking place, such that an engaged persona vector is being disengaged, and a different persona vector is being engaged instead.
  • (4) Controlling activations. Controlling which persona vectors can be activated and preventing activation of particular persona vectors as needed.
  • (5) Inspecting persona vectors. Inspecting persona vectors to ascertain what they portend and understand the ways they are shaped and used.
  • (6) Predicting a formation or activation. Predicting which persona vectors are likely to become engaged and anticipating what impact the persona vector will have on a conversational chat.
  • (7) Steering a persona vector. Undertake a means of steering a persona vector to get it to showcase a specified trait or set of traits (trait-promoting, trait-suppressing).

If there is overall reader interest in these seven major aspects of persona vectors, I’ll do a follow-up coverage to explain them in some detail. Be on the watch for that coverage.

The Mighty Questions

Many important explorations are worth considering regarding AI persona vectors. I’ll whet your appetite with three to get your mental juices going.

One aspect is whether we ought to force contemporary AI to always default to a particular persona vector. Here’s what I mean. Currently, each of the AI makers tends to shape their AI toward exhibiting a preferred form of behavior. Maybe we should ask or require that AI makers set their AI to a specific default persona vector. In that manner, whichever AI you opt to use, it always begins with that same state or condition. This is a controversial suggestion and entails heated trade-offs.

Another unresolved question involves the relationships between persona vectors.

Suppose that I capture a persona vector that entails anger. I next capture a persona vector that involves being boastful. Are these two completely independent persona vectors, or are they perhaps statistically related to each other? Maybe a persona of boastful anger has something in common, and the two persona vectors overlap.

The third puzzle that I’ll leave you with is whether the discovery of persona vectors provides any kind of insight pertaining to humans and human behavior. I mention this cautiously and urge you not to anthropomorphize AI. Allow me some brief leeway to identify what some researchers are suggesting.

Perhaps the human mind has some comparable sense of persona vectors, or so some argue we ought to consider. There presumably could be emotional states in the human mind that can be traced and narrowed to a set of biochemical conditions in our wetware. The belief is that what we are discerning in ANNs might have bearing on true neural networks. Some are eager to make the comparison; others denounce it as hogwash and misappropriation.

Emotions Are Significant

Oscar Wilde famously said this about emotions: “I don’t want to be at the mercy of my emotions. I want to use them, to enjoy them, and to dominate them.” Give that remark a sobering, reflective moment.

Okay, now, setting aside the matter of human emotions per se, consider that if AI is mimicking human emotions, we are potentially heading toward a challenging time if we attain artificial general intelligence (AGI) or artificial superintelligence (ASI). Do we stand a chance if AGI or ASI is highly emotional and swings from welcoming humankind to perhaps despising humanity?

The more that we can do now to figure out the switches and gears in AI that bring forth mathematically and computationally mimicking emotion will hopefully give us a solid chance of ensuring that AI-based, emotionally driven decisions do not backfire on the livelihood and existence of human beings. You might say that this line of inquiry could be a life-or-death determiner.

We need to seriously, and without undue emotion, keep on digging.

Leave a Reply

Your email address will not be published. Required fields are marked *