Agent Feedback Loops

Zef Hemel

14 Apr 2025 — 4 min read

“So I prompted Cursor to write me some code, and what it gave me didn’t even compile! AGI my ass. Wake me up when these AI coders are no longer stupid.”

Interviewer: Next task — on the white board over there, please write me an algorithm in JavaScript that inserts an element into a diagonally-warped inverse splay tree in O(log n) time complexity.

Candidate writes an elaborate and impressive piece of code on whiteboard, clearly narrating their thought process and adding correctness proofs.

Interviewer: That’s directionally correct, but dude, push is not a method on a Set. Are you stupid? Come back once you actually know what you’re doing.

AI coding agents — like humans before it — need feedback loops. Just like you are not able to write an entire application on a white board without slipping up here or there, neither can an LLM.

We can have philosophical discussions about P(doom) and how AI is supposedly going to take over the planet any day now, so why the hell do they still skip writing a semicolon? The reality is that they are stochastic token predictors, and they make mistakes, randomly. However, they are also surprisingly good at very quickly fixing their mistakes.

Once told what they are.

And if they are not given the means to check their own work, it will have to be you (the human in the loop) who will have to check their homework. Even if that homework is just shoveling an error from a terminal back into the chat. You don’t want to be in the business of checking the AI’s work every step of the way, do you?

You got coffee to drink. And other AI agents to boss around.

So, we better come up with an automated feedback loop strategy.

Try this in your next AI pairing session: ask your AI coding agent to actually compile the code, run the linters and test suite. Yes, just tell it what shell commands to run. If it’s a good one, it will happily oblige, interpret the results and act on it.

Be sure to enable whatever the equivalent of “YOLO mode” is in your AI agent tool of choice. Cursor recently chickened out and renamed its YOLO mode to “auto run” mode (it’s under Cursor Settings > Features > Enable auto-run mode at the time of this writing). This advice is not endorsed by my employer, and should be considered for entertainment purposes only.

Your agent and you will both very much appreciate their newfound autonomy.

“Oh, there are compilation errors here. I see what’s going on!”

Flap, flap, flap.

“Let me run the build again to see if that fixed it.”

Flap, flap, flap.

“Now it compiles, let me run the tests.”

Flap, flap, flap.

“Oh this test is failing. I see what’s going on here.”

Flap, flap, flap.

“Ok, now everything works. You’re welcome.”

Sorry, were you saying something? I was getting coffee, and playing hide-and-seek with my other agents.

The power of the feedback loop.

There is an opportunity here.

There is a chance we will actually start to practice something that we have claimed to have been practicing for years, but actually never did: Test-Driven Development (TDD).

Tests first. Implementation later.

TDD is a fundamentally good idea, and we all know it:

Writing tests force you to formulate your requirements formally in an objectively checkable way.
Writing tests first make sure that you don’t cheat in your test suite taking into account hacks (I mean well-considered particularities) in your implementation.
Having tests give the programmer a fast feedback loop and dopamine high as more and more of the (initially failing) tests start to pass. Ahh!

There’s only one little issue.

You have to write tests. And nobody likes writing tests.

Nobody.

Except AI agents. They just want to impress you, their overlord. That’s all they want. A smile on your face.

And we all know that the path to the heart of any engineer is to increase their test coverage. And free pizza.

Opportunity!

“Dear AI friend, you’re a senior TDD developer. I would like you to implement X, BUT BEFORE YOU START YOUR IMPLEMENTATION WRITE TESTS OR YOU GO TO JAIL.”

Yelling and threats work. Management 101.

#ProTip: Be sure to check the tests it writes. If they are wrong, or inconsistent, or worthless, you’re in for a world of hurt, confusion and infinite loops.

Flap, flap, flap.

“Done!”

“I’m sorry to act as captain obvious here, but while you wrote tests, you didn’t run them. Run the tests.”

“Oh right, they’re failing. I see what’s going on here.”

Flap, flap, flap.

“Not quite yet, let’s give this another shot.”

Flap, flap, flap.

“Nope.”

Flap, flap, flap.

“Nope.”

Flap, flap, flap.

“And, booyah! All green. Let me summarize my excellent work...”

“Alright, alright. Not bad. Your initial set of tests are passing. Now, add an additional set of tests, that check all the edge cases that you know exist, because you wrote the code. Then fix those edge cases.”

“Abso-diddly-lutely!”

Flap, flap, flap.

This was always my ultimate productivity zen scenario:

The dish washer is washing my dishes.
The washing machine is doing my laundry.
The robot vacuum is zooming around.
Disk defragmenter is defragging my hard drive (I’m old skool).
The microwave is popping my popcorn.
I lie chillin’ on the sofa.

But this was so 6 months ago.

Now we can also get an AI agent to do our software development work at the same time.

The exercise is to keep the agent productive for as long as possible, without requiring our intervention. Twenty seconds. Five minutes. Twenty minutes. What’s the best you can do? Once we hit some threshold, why not spin up an additional one pointed at another checkout of your repo?

How? By giving it feedback loops. Compile. Lint. Test. Run an (AI generated) code review and have it make the suggested changes. Have it deploy the changes to production. Have it check if you hit the target KPIs for the feature. And iterate.

“Not there yet. Let me e-mail some customers, I have production database access anyway. YOLO! Ah, I see what’s going on here!”

Flap, flap, flap.

DING!

Popcorn is ready. What are we watching?

Agent Feedback Loops

Zef Hemel

Read more

Hallucination Weekly #3

Hallucination Weekly #2

Hallucination Weekly #1

Levels of AI Coder Adoption