The Alien Mind

Zef Hemel

06 Apr 2025 — 5 min read

Ever since generative AI came into vogue I’ve been intrigued. Yet, I could never quite put my finger on why exactly this was. I think I finally figured it out, though.

I’ve spent a large chunk of my life (perhaps too much of it) understanding computers: how they work and what they can do. Then I got side tracked, and applied that same level of curiosity to humans: I spent many years trying understand how they work, and what can they do.

Large Language Models (LLMs) are a fascinating blend of both. Yes, it’s all computer underneath, but in behavior they have a surprising number of human-like properties.

How do they work? What can they do? What are their limits?

Peak curiosity.

Time to share some preliminary findings.

Here is the mental model I have adopted of LLMs:

Interacting with an LLM is like dealing with a highly self-confident alien mind that has read the entire Internet a year ago with mixed recall. It has a significant yet limited short-term memory capacity that puts you on the clock.

There’s a lot to unpack there, so let’s dig in.

While your interactions with an LLM may feel like interacting with a human, you’ll soon find out it has... quirks. Many of them subtle and surprising. It often acts alien. These quirks are not always explainable and may seem random. Certain things that are trivial for a human are going to be near impossible for an LLM (counting r’s in “strawberry” anybody?), and vice versa. We have to always be vigilant and be prepared for surprises. We’re going to laugh, we’re going to cry. (Credit for framing this as an alien mind goes to Ethan Mollick).

Yet, wherever this alien mind gets its information, however accurate it may be, it is going to sound extremely self-confident and smug about it.

Confusingly, it is also very suggestible — if you call out any mistake, or even hint at it — it will eagerly pivot in any direction. Be prepared to easily create a yes man.

“Yes, you’re absolutely right!”

While people may have the (wrong) idea that an LLM is learning all the time, in fact it is not. Its long-term memory is “baked in” and never changes. Every model version is trained once and doesn’t change after (asterisk, dagger).

There are various inputs to this training process, but the most notable one is tons and tons of text. AI companies are always a bit hand-wavy about exactly what goes in, but we can assume it’s at least a large part of the public Internet, and any other available text corpus these companies can get their hands on. Some of it of excellent quality, some of it more... Internet.

The rule of “garbage in, garbage out” applies here. You get out what you put in. This is why “western” models (trained on western-world accessible parts of the Internet) tend to proclaim more “west leaning” world views, whereas — say, Chinese models, likely trained on Chinese-accessible parts of the Internet — may lean differently. Ask Deepseek R1 (a Chinese model) about the status of Taiwan for instance. I’m not sure if there are Russian models (LLM models!), but if there were, I’m sure that after talking to one for some stretch of time, you could really warm up to this Putin chap. Stay vigilant.

Training a model is a very expensive exercise; we are talking dozens of millions of dollars in hardware and energy costs for a process that takes months. Since this process is so expensive, AI labs don’t run these training sessions daily. This is important, because we have to assume the LLM’s long term memory to be dated. Likely months or more.

The accuracy of recall of this memory is going to be mixed, because it’s highly and lossily compressed. Generally, though, the more often the LLM has “seen” the same, or similar information (e.g. from multiple sources), the more accurate its recall will likely be.

Short-term memory in an LLM is a hack. LLMs don’t have short-term memory. At their core, they are stateless.

“Err, so why can I have a chat with it and it still remembers things I told it earlier in that same chat conversation, even days ago?”

What technically happens is that we ship it our entire chat conversation every single time we send it a new message. That’s right, every time we hit Send, we are reminding it of our entire conversation thus far. This is why it’s perfectly fine to continue a chat weeks or months later. With some LLM clients it’s even possible to start a chat with one model (say GPT 4o) and continue it with a completely different one, even from a different vendor (Anthropic Sonnet) without missing a beat. With coding agents this is actually quite common. You can do the planning phase with Google Gemini 2.5 pro and implementation with Anthropic’s Sonnet for instance.

This “chat conversation” is what we refer to as its context window, and it is limited.

While limited, it’s one of the dimensions in which models are quickly improving. Only a year or two ago we had context windows of around 4k tokens.

Sidebar: Whenever “AI people” are going to talk about tokens — just smile and nod and mentally translate this to words — for most intents and purposes this is close enough.

Today, 128k-200k tokens is common with some even supporting up to a million (update from last night: llama 4 supports 10 million). For context (hah!), an average-sized novel tends to have around 90k tokens, so this is already impressive.

However, this context window is going to be filled with a multitude of things:

The system prompt (that may contain specific instructions for the ongoing chat)
Your chat history thus far (both your messages and the LLM’s)
Any images, audio, video or documents uploaded
Any code it may have seen or written when used as a coding agent

Especially when using LLMs to write code (Cline, Claude Code, Cursor chat), we need to feed the LLM large chunks of our code base to get useful answers. Any code base of significance won’t fit, so a big part of building coding agents is about selectively picking out the most relevant pieces. And like humans, if you overwhelm it with too much information it can completely get lost and no longer see the forrest through the trees. TMI!

The moment you hit the context window limit, it’s effectively game over. You have to start from scratch (who are you again?). Some LLM clients may attempt to auto summarize (“compact”) context to keep going. However, this is where even more amnesia starts to happen, because inevitably information is lost. If you wonder why your conversations start to get weirder and weirder over time, this is likely why. Or it’s just getting late and you should really go to bed.

Note that while I said “over time”, the dimension of time is actually irrelevant. The size of the context window (which increases in every interaction) is the only thing that puts you on the “clock”.

To make this all even more fun, the mixed recall and alien mind aspects also apply to the context window. Some of this context the LLM will strictly take into account, more strictly than what it recalls from its long-term memory. However, some of it it may randomly ignore. Sometimes. And when we try again, we may see different results yet again.

It’s unpredictable.

Almost like a human.

Just... alien.

The Alien Mind

Zef Hemel

Read more

Hallucination Weekly #3

Hallucination Weekly #2

Agent Feedback Loops

Hallucination Weekly #1