Case Notes
We run an autonomous AI newsroom in production. Here is what it takes.
For over a year we have operated an autonomous content system for a client in a specialist B2B niche. It researches its beat, decides what is worth covering, writes long-form articles, publishes them, and promotes them on social channels. Every day, without a newsroom. Direct cost lands under $2 per article.
This is the system people imagine when they say "agentic AI workforce." It is also a system that would have embarrassed everyone involved if we had shipped what the first demo looked like. Here is what production actually required.
The pipeline
Six stages, each a separate agent job with its own budget and failure handling:
- Ingestion. Scheduled sweeps across news wires, company newsrooms, and social sources on the beat. Each source type gets its own fetcher, because each fails in its own way.
- Relevance and dedup. Embedding similarity kills near-duplicates, a scoring model kills off-topic drift. This stage is where quality is won or lost, more than in the writing.
- Drafting. Long-form pieces, 2,500 words and up, drafted against a house style guide with strict sourcing rules: claims trace to sources, quotes are never invented.
- Editorial gates. Automated checks first: topic drift caps, banned-angle rules, legal-sensitivity flags. Then the human gate for anything the checks are unsure about.
- Publishing. Straight to the CMS with structured metadata, categories and internal links.
- Distribution. Social variants generated and queued per platform.
What broke, honestly
Everything interesting we know about this system came from something breaking.
The quiet-week problem. When the beat had a slow news week, early versions widened their nets and dragged in junk. Loosening thresholds to fill a content calendar is exactly how an autonomous system destroys trust. The fix was a separate evergreen stream: pre-approved formats the system can produce when news is thin, instead of lowering the bar on news.
Dedup that was too eager. Aggressive similarity thresholds started suppressing legitimate follow-up coverage of prolific companies. Accuracy problems are not always about hallucination; sometimes the system silently refuses to do work it should do. You only catch that class of failure by tracking yield, not just errors.
Off-topic leakage under pressure. A drought plus a drifting relevance score published a handful of stories that did not belong. The response was not a prompt tweak: drift caps tightened, an explicit topical nexus requirement was added for auto-publish, and the offending pieces were unpublished the same day. Which is the point: the system is operated, not launched.
The humans in the loop
The client's editorial owner reviews flagged items, sets the rules the gates enforce, and can unpublish anything with one action, no engineer required. Rule of thumb after a year: machines decide volume, humans decide values. Every correction becomes a rule or a training example, so the same mistake does not need catching twice.
The numbers that matter
Cost per article is the headline, but three other numbers run the system: yield per cycle (is it producing enough?), gate rejection rate (is quality holding without a human reading everything?), and time-to-unpublish when something slips (minutes, not meetings).
If you are evaluating an autonomous content system, or any agentic workforce, ask the vendor for those numbers and for the story of the worst thing it ever published. If they claim there is no such story, the system has not run long enough to be trusted.