Provoking thought

Hallucination Weekly #1

Welcome to Hallucination Weekly — my new experimental way to share what is on my radar in terms of AI, in a short(er) and more skimmable form. With the AI industry in its state of extreme hype, and zero sign that this will change any time soon, the most common question I get is: what is worth paying attention to, what’s worth trying? Gimme me the TL;DR.

And because it is against my nature to be brief: challenge accepted.

Although the format will likely evolve, I intend to split this into a few sections:

  1. Queued: these are AI developments and products that have peaked my interest, but I have not really dug in deeply yet.
  2. Processing: these are developments and products that I am actually playing with and have some early (honeymoon period) thoughts on.
  3. Done: topics that I now have sufficiently explored to have a real (final?) opinion on.
  4. Brain noise: some academic ideas I’m toying with in my head.

Look at it like a type of Kanban board. I will try to limit the number of items in each “column.”

Organizational note: I’m launching this as a separate newsletter that all my Zef+ subscribers are subscribed to automatically, but can easily opt out of each individually. You’d suffer severe AI FOMO if you do, but that’d be your choice.

Queued

Google launched the A2A (Agent2Agent) protocol, to enable agents to collaborate across platforms. No, it’s not MCP, it complements it. Hot take: this was an obvious next step, and Google rushed in to be first (they had their Cloud Next conference this week). The “rushed” part shows. Success will depend on adoption. By the way, if you needed another Python agent framework to complement the other hundred, Google rushed out one too.

One challenge in Chat-Oriented Programming (AI coder adoption level 3) is deciding on a good size of task (not too small, not too big). Effectively, you have to do some sort of backlog grooming and estimation rather than blasting high-level tasks at an AI coding agent. RooCode now has a feature called Boomerang Tasks that claims to be able to break complex projects into smaller tasks and then execute on them sequentially (and do the context management for you). Interesting idea, haven’t tried to see how well it works.

Processing: Podcast Edition

NotebookLM (free): throw in a piece of TL;DR material (e-books, academic papers, long blog posts not written by me) and have it generate 10-20 minute engaging, AI generated podcast episodes about it. You have to actually try it to appreciate it. Just do it (Adidas), trust me. I’m still keeping this in Processing, because some cracks are starting to show after listening to dozens of hours of this content. I’m now getting mildly triggered by some of the AI-generated “mannerisms” of the podcast hosts, which may escalate in rage quitting the use of this (otherwise awesome) tool. I won’t mention what they are not to ruin the experience for you.

Snipd (freemium): apply AI to podcast listening and you get Snipd. It transcribes entire podcast episodes. It identifies guests and allows you to follow to them across podcasts. It automatically highlights notable “snippets” (hence snip’d) in the episode. Triple clicking your Airpods (or pushing a button on screen) when something interesting is said generates an explicit “snip” contextually. The app supports exporting these “snips” to e.g. readwise. I’ve been using this for a few weeks, and still am deciding if I need to pay for its premium features. Premium supports, among other things, upload of your own audio (like NotebookLM generated stuff). The larger, more existential question is: while it’s great to highlight stuff in books and podcasts, will I ever look at those highlights? So far, not really; although it feels good to have them. That said, this app perfectly fits my podcasting workflow: one feed with new episodes, swiping to queue or archive them, and intuitive queue reordering. I’d use this without AI. Snipd is only in Processing because I’m not sure about the value of Premium, and the actual value of snip’ing (its namesake) is unclear.

Done

In coding, telepathic tab complete (level 2 in my linked post) is here to stay. Writing code without it is just a stubborn decision at this point. The one condition I would put on this is that you only ever accept code that you can judge to be correct. This still makes it a potentially dangerous tool for junior developers. The best implementation of telepathic tab complete is in Cursor right now, but Github Copilot (even though it invented this feature) is catching up again. Use an IDE that supports it.

Brain noise

I’m toying with the idea of promptware (yes, I just casually coined that term). I first came across this concept with the Cline Memory Bank, which adds memory support to the Cline coding agent. What’s important here is not what it is, but how it does it: this is not a feature baked into Cline, it is a feature implemented purely through prompting. It’s promptware. Get it? As a result, it (should) work with any AI coding agent. A culture around collecting “rules” for coding agents has now emerged, aggregated in, for example, awesome-cursorrules. Of course this community sharing has already backfired on the security vulnerability side. A pretty extreme version of promptware is paelladoc, which claims to capture all good practices in one huge repository of rules for your AI coding agent to digest and follow. Does it actually work, though? It all assumes that AI reliably follows instructions, which is anything but a given in my experience.

That’s all for this week. Feedback welcome!