The 25,000-Task Reality Check
The 25,000-Task Reality Check. The Endogeneity Paradox: Why Structure Kills Performance. Where Popular Frameworks Go Wrong.
The 25,000-Task Reality Check
Researchers at the frontier just demolished conventional wisdom about multi-agent systems. Dochkina et al. tested 8 different LLMs across 25,000 tasks, scaling from 4 to 256 agents under every coordination protocol imaginable—from rigid CrewAI-style hierarchies to complete anarchy [1].
The results expose what builders suspected: pre-assigned roles and rigid frameworks consistently underperform. Self-organizing teams with minimal scaffolding beat structured approaches by up to 14% on complex reasoning tasks.
The study tested everything from GPT-4o to Claude 3.5 and Llama-3.1, measuring performance across parallelizable tasks (research synthesis, data analysis) and sequential workflows (code generation, document creation). The pattern held across models and scales.
But here's the kicker: neither maximum control nor maximum chaos wins. The sweet spot lives in what researchers call "minimal scaffolding"—just enough structure for capable LLMs to self-organize, without the overhead of predetermined hierarchies.
The Endogeneity Paradox: Why Structure Kills Performance
The core finding challenges everything we thought we knew about AI coordination. Researchers discovered the "endogeneity paradox": neither maximal external control nor maximal agent autonomy produces optimal outcomes [1].
Think about it like Nordic work culture. The most productive teams aren't micromanaged hierarchies or complete free-for-alls. They're groups of capable people with clear goals and minimal bureaucracy. LLMs, it turns out, follow similar patterns.
Rigid frameworks fail because they prevent adaptation. When you pre-assign an "analyst" role to an agent, you lock it into that function even when the task demands different expertise. Self-organizing teams dynamically allocate roles based on actual capability and context.
The data is stark: self-organizing teams achieve 17-22% higher success rates on parallelizable tasks. But they underperform on strictly sequential work without light routing—confirming that context, not ideology, should drive architecture decisions.
Where Popular Frameworks Go Wrong
The study specifically benchmarked against popular frameworks like CrewAI and LangGraph. The results aren't pretty for the structured approach.
"Bag of agents" architectures spike error rates by 17x due to coordination overhead [6]. When every agent needs to check with every other agent, communication costs explode faster than capability scales. It's the distributed systems nightmare all over again.
Meanwhile, the "more agents equals better results" myth gets thoroughly debunked. Google and DeepMind scaling studies confirm that overhead dominates beyond 8-16 agents without emergent organization [3]. Most production workloads hit diminishing returns much earlier.
The practical lesson for builders: start with single-agent sequential (SAS) for most tasks. Only scale to multi-agent when you have genuine parallelism and the coordination benefits outweigh the overhead costs.
This mirrors what we see in software teams. Adding developers to a late project makes it later, but the right team structure can unlock genuine parallel work. The same principles apply to AI agents.
The Builder's Playbook: When and How to Self-Organize
Based on the research and our own production experience, here's the practical framework:

Start Simple: Single-agent systems handle 80% of business tasks effectively. Don't reach for multi-agent until you've hit clear single-agent limits.
Identify True Parallelism: Self-organizing teams excel when tasks can genuinely run in parallel—research synthesis, data analysis across multiple sources, content generation for different audiences. They struggle with inherently sequential work like step-by-step debugging.
Use Minimal Scaffolding: Instead of pre-assigned roles, provide clear objectives and let capable LLMs self-organize. Think "build a market analysis" rather than "agent A researches, agent B analyzes, agent C writes."
Implement Light Routing: For mixed workloads, use systems like BiRouter [5] that can dynamically decide between single-agent and multi-agent approaches based on task characteristics.
The software engineering applications are particularly compelling. Lyu et al. demonstrated self-organizing LLM teams that mirror human development squads, achieving 20% faster iteration cycles for continuous deployment [2]. These systems naturally develop specialization—some agents gravitate toward testing, others toward documentation—without rigid role assignments.
Real-World Evidence: From Code to Organizations
The implications extend beyond software. Self-organizing AI teams are emerging as a new organizational primitive, especially in knowledge work.
Nordic companies are early adopters because the cultural fit is natural. Flat hierarchies, autonomous teams, and trust-based coordination align perfectly with self-organizing AI systems. When your human organization already minimizes bureaucracy, extending that principle to AI feels obvious.
One pattern we're seeing: successful AI implementations mirror successful human team structures. Companies with rigid hierarchies struggle with self-organizing AI because they keep trying to impose human organizational charts on systems that work differently.
The research confirms this intuition. Expert commentary notes that "LLMs spontaneously develop brain-like layers" when allowed to self-organize [8]. These emergent structures often outperform designed hierarchies because they adapt to actual information flows rather than theoretical org charts.
The Post-Code Implications
This research points toward a fundamental shift in how we think about AI systems. When code becomes free, the bottleneck moves to judgment—and judgment includes knowing when to impose structure versus when to let emergence take over.
Traditional software engineering emphasized control and predictability. You designed systems, defined interfaces, and managed complexity through abstraction layers. Multi-agent frameworks follow this playbook: define roles, create communication protocols, manage state transitions.
But LLMs operate more like biological systems. They're capable of emergent coordination that often surpasses designed structures. The builder's job shifts from orchestration to calibration—setting the right conditions for emergence rather than micromanaging every interaction.
This has profound implications for how we build AI products. Instead of complex frameworks, we need adaptive systems that can scale coordination dynamically. Instead of predetermined workflows, we need environments where AI agents can discover optimal collaboration patterns.
The Nordic advantage here is cultural. Societies built on trust and minimal hierarchy are naturally better at designing AI systems that leverage emergence rather than fighting it.
The Future of AI Organizations
Looking ahead, self-organizing AI teams represent more than a technical optimization. They're a preview of how AI-native organizations might operate.
Endogenous organizations—where structure emerges from capability rather than imposed hierarchy—could become the default for AI-augmented work. Human managers would focus on setting objectives and maintaining culture, while AI teams self-organize around specific deliverables.
The research suggests we're already seeing this transition. The most effective AI implementations don't replicate human organizational patterns; they discover new ones optimized for AI capabilities.
For builders, this means designing for emergence rather than control. The frameworks that win will be those that provide just enough structure for self-organization while staying out of the way of natural coordination patterns.
The post-code era isn't just about AI writing software. It's about AI discovering new ways to organize work itself. And the evidence suggests that minimal structure, not maximum control, unlocks that potential.
Sources
- https://arxiv.org/abs/2603.28990
- https://arxiv.org/abs/2603.25928
- https://arxiv.org/abs/2510.05174
- https://arxiv.org/abs/2602.01011
- https://arxiv.org/abs/2512.00740
- https://towardsdatascience.com/why-your-multi-agent-system-is-failing-escaping-the-17x-error-trap-of-the-bag-of-agents
- https://ai.gopubby.com/your-multi-agent-framework-is-an-anti-pattern-25-000-tasks-prove-that-pre-assigned-roles-make-ai-e6ea31736ebd
- https://x.com/awagents/status/2039437848030347310
Want to go deeper?
We explore the frontier of AI-built software by actually building it. See what we're working on.