Hallucination Weekly #3

Zef Hemel

27 Apr 2025 — 2 min read

Welcome to Hallucination Weekly #3 — my (by now slightly less) new experimental way to share what is on my radar in terms of AI, in a short(er) and more skimmable form. With the AI industry in its state of extreme hype, and zero sign that this will change any time soon, the most common question I get is: what is worth paying attention to, what’s worth trying? Gimme me the TL;DR.

Unlike the last times, I will not split up topics into an aftificial Kanban board style format. I will just go with the flow.

I’m going to vibe it.

Comparing tools

A recurring question is: how do we compare AI tools? The amount of randomness we see in outputs is huge — you’d have attempt the same task 5+ times to get a feel for the outputs, then rinse and repeat with different tools. For many tasks (like coding) this is practically impossible. With random “knowledge prompts” this is easier (“tell me the market size of X in country Y”). See if the results diverge, if so: go deeper. Last week I mentioned msty, which actually makes sending the same prompt to multiple models simultaneously very easy.

This week I started to also do this with research projects, which I now always kick off with a Deep Research product (ChatGPT 4o’s Deep Research and Gemini’s Deep Research with 2.5 Pro in my case). Note: kick off, not replace. I always get something out of these lengthy reports, worst case just a list of sources worth digging into. Also here, I find it worth running the same task in multiple tools. They can yield very different results based on the sources they dig up.

Something I threw at Deep Research this week:

You hear people saying that "SaaS is dead" with the idea that with emerging AI coding tools, it becomes cheaper, perhaps almost free to develop many of these tools internally. How are companies actually approaching this? Even though initial creation of these tools internally may yield quick result. What about the phase that comes after that: the ownership and maintenance phase? How is that going to be solved, or how is it already solved by software companies that we consider “AI Native”?

The results from ChatGPT and Gemini were quite different. In this particular case, the Gemini result felt more thorough and balanced. The ChatGPT version threaded together things in an logically invalid way. The point is not which wins, the point is: if you have the opportunity, use multiple tools and compare the results.

Random prompt of the week

Another recurring theme is: “right! I had never thought to ask an AI to do this!” In other words, it’s often not that we need bigger, better, faster models, we need to get better at even identifying problems that are suitable for an AI to solve.

I think we can only get better at this by seeing diverse examples.

Since it’s Friday, let me give a super random one from my personal life from this morning:

Please prepare a printable document that lists all the special moves and super combos for Super Street Fighter II for the SNES, as well at what they do. Descriptions in Dutch.

Context: we gave the kids retro consoles for their birthdays last year and they rediscover them from time to time. This week playing this super nintendo classic.

Why this is a good case for an LLM:

Common knowledge: It’s not asking for niche content. I bet there are dozens, hundreds of fan sites that list this information. It will have seen the answer many times.
Transformation: I wanted a version in Dutch, in a particular format.
Low risk: I didn’t fact check the result (just spot checked a few cases where the moves turned out to still be imprinted in my brain). The worst case scenario is that if the result contains errors, this may lead to implicitly gaslighting my kids into thinking it’s them not executing the combos right. Meh. Whatever.

Hallucination Weekly #3

Zef Hemel

Comparing tools

Random prompt of the week

Read more

Hallucination Weekly #2

Agent Feedback Loops

Hallucination Weekly #1

Levels of AI Coder Adoption