͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Jensen Huang says, “Every company needs an OpenClaw strategy.” What did he mean by that, and what does it mean for you? If you want me to brief your board on this or other topics, please get in touch here. I only have the capacity to help a select number of firms a year, and slots fill up fast. 🦞

🔮 Autoresearch and the experimental society

On bringing the scientific method into everyday knowledge work

Azeem Azhar

Apr 2

∙

Preview

READ IN APP

Science is the most reliable method humanity has found for producing knowledge. It has also, for most of history, been expensive to run.

Andrej Karpathy released 600 lines of Python code a few weeks ago that started to change that. His autoresearch (see EV#565) runs an autonomous experimental loop in which a human sets a strategic direction, defines what good looks like and the agent iterates towards success within the guardrails. In Andrej’s initial experiment, it trained a GPT-2-level model over two days, 11% faster and found 20 genuine improvements.

Shortly after the release, Shopify’s CEO Toby Lütke used autoresearch on his company’s internal model, qmd; it ran 37 experiments overnight, so Toby woke up to a 0.8-billion-parameter model outscoring his previous 1.6-billion-parameter version by 19%. Toby is not a machine learning engineer.

Autoresearch is powerful because it solves two problems at once. One, it automates part of the knowledge-production process. And two, it solves the agent control problem, meaning it keeps agents on task. AI often drifts if you give it an open-ended brief or optimize for the wrong thing. To my great joy, autoresearch prevents this by design. The human decides where the car is going; autoresearch keeps its hands on the wheel.

I spent the last month adapting autoresearch for knowledge work beyond machine learning with the goal of spinning up a system that can run structured, low-cost experiments on the kinds of decisions most teams make every week. I’m calling this version AutoBeta, and I’m making the full playbook/skill available to paying members below.

Let’s go!

The measurement problem

When I first saw autoresearch, my immediate reaction was that it didn’t have to be just about machine learning. The loop is generic – hypothesize, test, score, iterate. So I cloned it and started running it on other bits of my work.

It didn’t go exactly as I expected. The outputs looked fine but I couldn’t tell whether they were getting better. Unlike ML where the agent has a built-in feedback signal from each training run, knowledge work was missing it. A pricing decision doesn’t validate in five minutes; and the paragraphs I write don’t tell me if the argument is getting better or just changing, most of the time.

This is what makes applying autoresearch to knowledge work genuinely hard. The loop needs something to optimize against, and in knowledge work, that something doesn’t exist naturally.

So I constructed a version of autoresearch, called AutoBeta, that works across a wide range of business problems. It is not as technically robust as Karpathy’s, but it has the same design principles: I set the objective and the constraints, the experimentation takes place within the loop.

The one thing I changed was the score. I created “an oracle”, a panel of synthetic judges scoring each output against pre-defined criteria, collapsed into a single number the loop can optimize for. ...

Subscribe to Exponential View to unlock the rest.

Become a paying subscriber of Exponential View to get access to this post and other subscriber-only content.

A subscription gets you:

	Expert analysis every week on AI, energy, and the exponential economy.
	500+ editions and members-only resources spanning 10 years.
	Subscriber-only chats and our Slack with 5,000+ leaders and thinkers.