Google

Rich Holmes, elvis, and Andy Masley posted new notes

Friday, 12 December 2025

 
Substack

Rich Holmes, elvis, and Andy Masley posted new notes

Peter Yang liked
A farewell message from Apple’s Head of Design and…
Read More
2942
elvis liked
First large-scale study of AI agents actually running in production. The hype says agents are transforming everything. The data tells a different story. Researchers surveyed 306 practitioners and conducted 20 in-depth case studies across 26 domains. What they found challenges common assumptions about how production agents are built. The reality: production agents are deliberately simple and tightly constrained. 1) Patterns & Reliability - 68% execute at most 10 steps before requiring human intervention. - 47% complete fewer than 5 steps. - 70% rely on prompting off-the-shelf models without any fine-tuning. - 74% depend primarily on human evaluation. Teams intentionally trade autonomy for reliability. Why the constraints? Reliability remains the top unsolved challenge. Practitioners can't verify agent correctness at scale. Public benchmarks rarely apply to domain-specific production tasks. 75% of interviewed teams evaluate without formal benchmarks, relying on A/B testing and direct user feedback instead. 2) Model Selection The model selection pattern surprised researchers. 17 of 20 case studies use closed-source frontier models like Claude Sonnet 4, Claude Opus 4.1, and GPT o3. Open-source adoption is rare and driven by specific constraints: high-volume workloads where inference costs become prohibitive, or regulatory requirements preventing data sharing with external providers. For most teams, runtime costs are negligible compared to the human experts the agent augments. 3) Agent Frameworks Framework adoption shows a striking divergence. 61% of survey respondents use…
Read More
5595
Nathan Lambert liked
South Dakota has been achieved internally
Read More
791
762
 
Share on :

No comments:

Post a Comment

 

Themes Design by Capricon Vision | Published by Templates | Powered by Blogger.com
Copyright © 2011 Cobaz Post - Some Rights Reserved