Left to their own devices, an army of AI characters didn’t just survive — they thrived. They developed in-game jobs, shared memes, voted on tax reforms and even spread a religion.
The experiment played out on the open-world gaming platform Minecraft, where up to 1000 software agents at a time used large language models (LLMs) to interact with one another. Given just a nudge through text prompting, they developed a remarkable range of personality traits, preferences and specialist roles, with no further inputs from their human creators.
The work, from AI startup Altera, is part of a broader field that wants to use simulated agents to model how human groups would react to new economic policies or other interventions.
But for Altera’s founder, Robert Yang, who quit his position as an assistant professor in computational neuroscience at MIT to start the company, this demo is just the beginning. He sees it as an early step towards large-scale “AI civilizations” that can coexist and work alongside us in digital spaces. “The true power of AI will be unlocked when we have actually truly autonomous agents that can collaborate at scale,” says Yang.
Yang was inspired by Stanford University researcher Joon Sung Park who, in 2023, found that surprisingly humanlike behaviors arose when a group of 25 autonomous AI agents was let loose to interact in a basic digital world.
“Once his paper was out, we started to work on it the next week,” says Yang. “I quit MIT six months after that.”
Yang wanted to take the idea to its extreme. “We wanted to push the limit of what agents can do in groups autonomously.”
Altera quickly raised more than $11m in funding from investors including A16Z and the former Google CEO Eric Schmidt’s emerging tech VC firm. Earlier this year Altera released its first demo: an AI-controlled character in Minecraft that plays alongside you.
Altera’s new experiment, Project Sid, uses simulated AI agents equipped with “brains” made up of multiple modules. Some modules are powered by LLMs and designed to specialize in certain tasks, such as reacting to other agents, speaking, or planning the agent’s next move.
The team started small, testing groups of around 50 agents in Minecraft to observe their interactions. Over 12 in-game days (4 real-world hours) the agents began to exhibit some interesting emergent behavior. For example, some became very sociable and made many connections with other characters, while others appeared more introverted. The “likability” rating of each agent (measured by the agents themselves) changed over time as the interactions continued. The agents were able to track these social cues and react to them: in one case an AI chef tasked with distributing food to the hungry gave more to those who he felt valued him most.
More humanlike behaviors emerged in a series of 30-agent simulations. Despite all the agents starting with the same personality and same overall goal—to create an efficient village and protect the community against attacks from other in-game creatures—they spontaneously developed specialized roles within the community, without any prompting. They diversified into roles such as builder, defender, trader, and explorer. Once an agent had started to specialize, its in-game actions began to reflect its new role. For example, an artist spent more time picking flowers, farmers gathered seeds and guards built more fences.
“We were surprised to see that if you put [in] the right kind of brain, they can have really emergent behavior,” says Yang. “That’s what we expect humans to have, but don’t expect machines to have.”
Yang’s team also tested whether agents could follow community-wide rules. They introduced a world with basic tax laws and allowed agents to vote for changes to the in-game taxation system. Agents prompted to be pro or anti tax were able to influence the behavior of other agents around them, enough that they would then vote to reduce or raise tax depending on who they had interacted with.
The team scaled up, pushing the number of agents in each simulation to the maximum the Minecraft server could handle without glitching, up to 1000 at once in some cases. In one of Altera’s 500-agent simulations, they watched how the agents spontaneously came up with and then spread cultural memes (such as a fondness for pranking, or an interest in eco-related issues) among their fellow agents. The team also seeded a small group of agents to try to spread the (parody) religion, Pastafarianism, around different towns and rural areas that made up the in-game world, and watched as these Pastafarian priests converted many of the agents they interacted with. The converts went on to spread Pastafarianism (the word of the Church of the Flying Spaghetti Monster) to nearby towns in the game world.
The way the agents acted might seem eerily lifelike, but their behavior combines patterns learned by the LLMs from human-created data with Altera’s system, which translates those patterns into context-aware actions, like picking up a tool, or interacting with another agent. “The takeaway is that LLMs have a sophisticated enough model of human social dynamics [to] mirror these human behaviors,” says Altera co-founder Andrew Ahn.
In other words, the data makes them excellent mimics of human behavior, but they are in no way “alive”.
But Yang has grander plans. Altera plans to expand into Roblox next, but Yang hopes to eventually move beyond game worlds altogether. Ultimately, his goal is a world in which humans don’t just play alongside AI characters, but also interact with them in their day-to-day lives. His dream is to create a vast number of “digital humans” who actually care for us and will work with us to help us solve problems, as well as keep us entertained. “We want to build agents that can really love humans (like dogs love humans, for example),” he says.
This viewpoint—that AI could love us—is pretty controversial in the field, with many experts arguing it’s not possible to recreate emotions in machines using current techniques. AI veteran Julian Togelius, for example, who runs games testing company Modl.ai, says he likes Altera’s work, particularly because it lets us study human behavior in simulation.
But could these simulated agents ever learn to care for us, love us, or become self-aware? Togelius doesn’t think so. “There is no reason to believe a neural network running on a GPU somewhere experiences anything at all,” he says.
But maybe AI doesn’t have to love us for real to be useful.
“If the question is whether one of these simulated beings could appear to care, and do it so expertly that it would have the same value to someone as being cared for by a human, that is perhaps not impossible,” Togelius adds. “You could create a good-enough simulation of care to be useful. The question is whether the person being cared for would care that the carer has no experiences.”
In other words, so long as our AI characters appear to care for us in a convincing way, that might be all we really care about.
Update: We gave more detail on how Altera’s system combines LLMs with other modules.
Read MoreArtificial intelligence – MIT Technology Review