Toward Reproducible Agent Workflows — A Kafka-Based Orchestration Design
Most multi-agent systems are nondeterministic by default. Agents negotiate their own workflows, spawn each other ad hoc, and pass free-text reasoning chains around. After running a fleet of AI agen...

Source: DEV Community
Most multi-agent systems are nondeterministic by default. Agents negotiate their own workflows, spawn each other ad hoc, and pass free-text reasoning chains around. After running a fleet of AI agents in production — and watching the same PR diff produce three different fixes in three runs — I started designing the orchestration layer I wish I'd had from day one. This article proposes an architecture designed to make every workflow run replayable, every routing decision auditable, and every agent loop explicitly bounded. It's a design I'm actively evolving — not a finished product. The Problem: LLM-Driven Control Flow The default story is more nuanced than "everything is chaos." LangGraph defines static graphs in code — routing is explicit Python functions, with a configurable recursion limit (25 in older versions, 10,000 in LangGraph 1.x). CrewAI runs tasks sequentially with a 25-iteration cap per agent. AutoGen defaults to round-robin, though with no loop bound by default (the real fo