What is the Human Context Window?

I noticed it having dinner with friends on a random Thursday evening.

My partner was telling us about something going on at work, and I caught myself doing the thing we all inevitably do when hearing something familiar. Your eyes are pointed at someone’s face, but your brain is running a background process on something else entirely. In my case, I was mentally editing .md file structure for one of my agents to refactor a workflow I’d been debugging all afternoon.

She obviously noticed.

It wasn’t that I was distracted, which is totally normal, but that the type of distraction had changed. I wasn’t zoning out into some pleasant daydream or replaying a song in my head while half paying attention. I was involuntarily trying to process information at a speed my biology was never designed for, because I’d spent the previous ten hours operating at machine cadence.

At first, this may seem like a superpower: “If I train my brain for higher bandwidth and faster processing of data, I can be exponentially more productive!” To be honest, it does feel that way for some time… but we have our biological limits.

On this Thursday I had hit that wall.

I’m writing about this in my 2026 predictions piece right now, floating the idea that the average human processes roughly 100,000 to 200,000 “token equivalents” per day. In that draft, it’s a supporting stat for my prediction that the cognitive costs of AI usage will become a real issue next year. But the more I sit with that number, the more I realize it deserves its own deconstruction.

Because that number isn’t just interesting trivia, it’s the single most important constraint that nobody building AI products is seriously designing around.

Welcome to the human context window.

The Question Nobody’s Asking

If you’ve spent any time around large language models, you know what a context window is. It’s the amount of information a model can hold in working memory at once. Claude 4.5 Opus sits at 200K tokens. GPT-5.2 handles 400K. Gemini 2.0 Pro pushes 1 million.

These numbers move at a step change with every release cycle. It’s one of the easier features to “feel” from one model handling context at a lower limit to a larger limit. This is what fuels the “intelligence” perception.

Think about the movement towards a Chinese Room - Does the machine have a mind in exactly the same sense that people do, or is it just acting as if it had a mind?

This is what this little thought is trying to solve, but to start, we need to ask the obvious inverse question: what’s the context window of the thing on the other side of the screen?

What’s our limit, and how did we evolve into it?

Not your theoretical storage capacity, the total number of neurons, or the raw bandwidth of your optic nerve. I mean your actual, functional, working context window. The amount of information you can consciously hold, process, and act on in a given moment.

That number is shockingly small, and unfortunately, it hasn’t been upgraded since the Pleistocene.

The Numbers: Your Brain at 10 Bits/s

In 2024, a research team at Caltech published a paper in Neuron called “The Unbearable Slowness of Being.” The title alone should tell you something about what they found.

Jieyu Zheng and Markus Meister demonstrated that the human brain, for all its staggering biological complexity, processes conscious information at approximately 10 bits per second.

Ten…

Not ten megabits, not ten kilobits… Ten measly bits.

I’ll date myself, but that’s roughly the bandwidth of a 1990s dial-up modem if you divide its speed by 5,600.

Meanwhile, your sensory systems are gathering roughly 1 billion bits per second. Your eyes alone contribute about 10 million bits per second. Your ears, your skin, your proprioceptive systems, basically all of it is flooding your brain with an absolute firehose of data every waking moment.

And your conscious mind grabs ten bits of it.

Macro photograph of a human iris — the biological aperture through which 10 million bits per second enter, and only 10 reach consciousness

This isn’t an entirely new observation. Tor Norretranders laid the groundwork in his book The User Illusion back in 1998, estimating that around 11-12 million bits per second of sensory input gets filtered down to somewhere between 16 and 50 bits per second of conscious awareness.

The Caltech study tightened that estimate and, more importantly, identified that the bottleneck isn’t neuronal. Your individual neurons can fire much faster. The constraint is organizational…

The brain has chosen this processing speed through evolution, and the reasons why are more interesting than the limitation itself.

But we’ll get to that.

First, let’s build out what this actually looks like across a full day.

Mapping the Human Context Window

A research team at Queen’s University used brain imaging to estimate that humans experience roughly 6,200 distinct “thought worms” per day. These are moments when your brain transitions from one coherent thought pattern to another, as measured by shifts in neural activity.

Six thousand two hundred transitions. Across roughly 16 waking hours, that’s about 6.5 thought shifts per minute, or one every nine seconds.

Now layer in the research on mind wandering. Multiple studies converge on a consistent range: we spend somewhere between 40% and 47% of our waking hours thinking about something other than what we’re currently doing.

Nearly half of our conscious life is spent on cognitive load that has nothing to do with our current task.

When I put all of this together with the token-equivalent framework, here’s how the daily budget roughly breaks down:

Verbal speech output: ~13,500 tokens (9%) The average person speaks about 16,000 words per day. In token terms, that’s roughly 13,500. This is your highest-fidelity output channel, including every word you speak, at minimum, semi-consciously selected.

Active inner speech: ~40,000 tokens (27%) For most, it’s that voice in your head. The one narrating your commute, rehearsing conversations, working through problems. Research suggests inner speech can run at speeds well above normal speaking pace, and it occupies a significant chunk of your waking cognition.

Daydreaming and passive cognition: ~70,000 tokens (46%) Now here’s the big one and what we could call the “mind-wandering bucket.” Loosely structured, often semi-conscious, and ranging from “what should I eat for dinner” to elaborate fictional scenarios you’ll never tell anyone about. Low fidelity per thought, but massive in aggregate volume.

Sensory and subconscious processing: ~26,500 tokens (18%) Everything your brain handles that never surfaces to conscious awareness in verbal form. Spatial navigation, pattern recognition, emotional regulation, and motor coordination. It has an enormous computational load, but minimal conscious access.

Total: ~150,000 token equivalents per day The range of 100,000 to 200,000 accounts for individual variation. As you can imagine, introverts versus extroverts, knowledge workers versus manual laborers, and high-stimulation days versus quiet ones will have obvious differences.

Hundreds of water streams cascading down dark canyon walls, converging into a single luminous channel — representing the daily cognitive budget funneled through conscious processing

This is why those 6 meetings in a row have a distinct draining effect, especially if you’re not conditioned to it.

That’s our context window. Not the theoretical maximum of what your brain could store if you really tried, but the actual operational throughput of a human consciousness across a standard day.

Human vs. Machine: The Bandwidth Mismatch

Now that we have our measure, let’s see how it stacks up next to the machines.

Claude 4.5 Opus’s 200K token context window holds roughly 150,000 words. If you tried to read that much text out loud at average speaking speed, it would take about 17 hours. If you tried to read it silently at the average adult reading speed of 238 words per minute, you’d need approximately 10.5 hours. That’s more than a full workday of sitting down and reading nothing else.

An LLM processes that context near-instantaneously.

And that’s the smallest window on the current frontier. GPT-5.2 doubles it at 400K tokens. Gemini 3 Pro hits a million, the equivalent of 750k words. You’d need roughly 52.5 hours of continuous reading to get through it. I love reading, but no thanks.

The model processes that context in seconds.

The difference is categorical, not just speed. The LLM doesn’t compress, doesn’t forget, doesn’t daydream through the boring parts, and doesn’t lose the thread because it was distracted by a notification. Every token in that window is equally accessible at all times.

Meanwhile, our brains are running 150,000 tokens per day through a 10-bit-per-second conscious bottleneck, losing almost half of that to involuntary mind wandering, and can only actively hold about four to seven items in working memory at any given moment.

This is the bandwidth mismatch. When you work alongside AI agents operating at machine tempo for long sessions, you’re essentially trying to interface a 10-bit-per-second system with a functionally infinite one.

BUT… our cognitive fatigue is a physics problem and not just a character flaw.

The Compression Engine: Why the Bottleneck Exists

Most AI discourse gets this part wrong.

The instinct is to look at that 10-bit-per-second number and see a limitation or a clear design flaw. An obvious oversight of evolution that needs augmentation. From a pure throughput perspective, it looks pathetic next to even the most modest LLM.

But Norretranders’ central insight in The User Illusion reframes the entire picture. Consciousness, he argues, is a brutally effective compression algorithm.

Think of your desktop computer. You don’t interact with raw binary, memory addresses, and kernel processes. You interact with a simplified graphical interface: icons, windows, and folders, but simply put, your “desktop” is an illusion. A useful fiction that lets you operate a system of incomprehensible complexity without understanding any of it.

Consciousness does the same thing for reality.

A glass prism in darkness, chaotic multicolored light entering from all directions and compressing into a single precise beam — representing consciousness as a compression algorithm

Out of the billion bits per second flooding your sensory systems, your brain compresses, filters, prioritizes, and discards until you’re left with a tiny, manageable stream of coherent experience. The 10 bits per second is what survives one of the most sophisticated information processing systems in the known universe. It’s not leftover scraps from a bad filter, but the most refined abstraction tooling known in existence.

The Caltech team’s finding that this bottleneck is organizational rather than neuronal supports this. Your brain could process faster at the conscious level; individual neurons have the capacity, but evolution didn’t select for speed.

It selected for the quality of compression.

Why? Because in the environments where human cognition evolved, the ability to extract signal from noise was worth more than the ability to process noise faster.

A human who could look at a complex landscape and instantly identify the one thing that mattered survived. The predator in the grass, the ripening fruit, and the social signal were the meaningful signals for those who survived. Not the one who processed every visual detail with equal attention.

Yes, we’re slow (absurdly slow), but slowness wasn’t the feature that was being selected for. Compression quality was.

The ability to take a universe of sensory chaos and extract the one insight that matters. To look at a spreadsheet and feel that something is off before you can articulate why. To read a room and know the meeting is going sideways from a micro-expression that lasted a quarter of a second.

LLMs don’t do this.

They process everything with equal weight unless specifically instructed otherwise. They have enormous context windows and zero intuition about what in that context actually matters.

That’s our job, or at least, it was.

What We’re Trading Away

In 2000, Eleanor Maguire and her team at University College London published a study on London taxi drivers that became one of the most cited pieces in neuroscience history. To earn their license, London cabbies had to pass “The Knowledge,” a grueling multi-year process of memorizing 25,000 streets and thousands of landmarks across the city.

The finding: taxi drivers had measurably larger posterior hippocampi than control subjects.

Their brains had physically restructured to accommodate the spatial demands of their profession. More years of experience correlated with more structural change.

A London black cab on city streets — the drivers of these vehicles once had measurably larger hippocampi from memorizing 25,000 streets

Then GPS happened.

A 2020 study in Scientific Reports tracked what came after its proliferation. GPS use correlated with reduced hippocampal activity and worse spatial memory performance. And crucially, follow-up research established directionality. It wasn’t that people with bad spatial memory gravitated toward GPS.

GPS use caused the decline.

The brain regions responsible for spatial navigation began to atrophy when the task was offloaded to a device.

The pattern is clean: A specific cognitive function → A technology that replaces it → Measurable biological atrophy in the brain region responsible

The question is whether the same thing is happening with AI, but at a much larger scale.

A 2025 study from MIT Media Lab titled “Your Brain on ChatGPT” suggests it is. Researchers used EEG to measure brain connectivity in people using different tools for cognitive tasks.

LLM users showed the weakest brain connectivity patterns. The study also found that 78% of participants couldn’t accurately quote passages from their own AI-assisted essays when tested afterward.

The researchers coined a term for this: “cognitive debt.”

Not cognitive load, the well-studied phenomenon of working memory overload. Cognitive debt. The accumulated deficit that builds when you repeatedly outsource cognitive processes that our brains would otherwise perform and strengthen.

Microsoft Research backed this up at CHI 2025, surveying 319 knowledge workers and finding that higher confidence in generative AI correlated directly with reduced critical thinking.

The more you trust the tool, the less you engage the faculties the tool is replacing.

This is the part that should genuinely concern anyone building these systems: adults who offload thinking to AI lose existing capacity. That’s concerning but recoverable.

Children who grow up offloading thinking to AI may never build the capacity in the first place.

The London taxi drivers could grow their hippocampi because they did the hard cognitive work first. If they’d had GPS from day one, the growth would never have occurred. We’re potentially looking at a generation that never develops the cognitive infrastructure that previous generations built through effort and then chose to offload for convenience.

A 2025 paper in MDPI’s Information journal frames this as the “Cognitive Atrophy Paradox.” The argument is simple: AI is qualitatively different from previous cognitive offloading tools like calculators or GPS.

Those tools replaced specific, narrow tasks. A calculator replaces arithmetic, and GPS replaces spatial navigation. AI replaces the general reasoning process itself. The meta-cognitive skill of figuring out how to approach a problem, what questions to ask, and how to structure an analysis.

Offloading arithmetic means you lose arithmetic, but we know what we gave up. Offloading reasoning is unfathomably different. You lose the ability to even know what you’re giving up.

The Attention Tax

Long-term atrophy aside, something more immediate is already eating your context window.

Sophie Leroy (Shout out, UW!) published her research on “attention residue” in 2009, demonstrating that when you switch from Task A to Task B, a portion of your cognitive resources stays attached to Task A. Your brain doesn’t context-switch cleanly.

It drags fragments of the previous task into the new one, reducing your performance on both.

Gloria Mark’s research extended this, finding that it takes an average of 23 minutes and 15 seconds to fully re-engage with a task after an interruption. That’s not to start working on it again, but to reach the same depth of cognitive engagement you had before.

Twenty-three minutes.

Microsoft’s 2025 workplace research found that the average knowledge worker faces approximately 275 interruptions per day. Notifications, pings, messages, and alerts roughly every two minutes during working hours.

Do the math and it breaks. If every interruption costs 23 minutes of recovery and you’re getting interrupted 275 times a day, you’d need over 105 hours to fully recover from a single day’s interruptions.

You have eight.

A single figure silhouetted against a vast starfield — a tiny conscious processor facing an incomprehensible volume of information

Now add AI to this picture.

Every AI interaction is a context switch. You formulate a prompt, shift your attention to the output, evaluate it, decide whether to iterate or accept, then switch back to your primary task. Each cycle pulls you out of your own cognitive flow and drops you into a different mode of thinking: from generating to evaluating.

That’s even if you’re only running a single agent session at a time.

I notice this viscerally when I’m working with Alexandria. I’ll be deep in a writing flow, hit a point where I need a data point or a structural suggestion, fire off a request to one of my research agents, and by the time the response comes back, I’ve lost the thread of the argument I was building. The agent gave me exactly what I asked for. But my context window had already moved on, and reloading the previous state costs me.

And that’s just within a single writing session. Zoom out to my actual workweek, jumping between building out Alexandria’s platform, spinning up custom agent workflows for clients, and handling everything else that connects the two, and it starts to feel like two steps back before you get the leap forward.

This is the attention tax. And AI amplifies it, because AI responses are fast enough that you never get the natural recovery period that comes with slower information retrieval. When you had to walk to a bookshelf, flip through an index, and find the right page, that physical process gave your brain time to maintain its state.

When the answer arrives in 1.5 seconds, you’re switching cognitive modes faster than your 10-bit-per-second processor can keep up with.

Why This Matters If You Build Things

I spend most of my professional time thinking about how to build systems that make people, teams, and companies more effective. The agentic AI maturity model, the spectrum from prompts to teams, Alexandria itself is infrastructure designed to amplify human capability.

After a year and a half of building this stuff, I think we’re designing AI systems around the wrong constraint.

Every major AI product today is optimized for the model’s capabilities. Bigger context windows, faster inference, better reasoning, and more tool use. The entire competitive landscape is defined by what the AI can do.

Almost nobody is designing around what the human can absorb.

Think about what a well-designed AI workflow should account for:

The compression budget. If a human can consciously process ~150,000 token equivalents per day, and you’re asking them to review AI output that could be thousands of tokens per interaction, how many meaningful AI interactions can they actually have before quality degrades? My rough estimate is somewhere between 30 and 50 deep interactions per day before evaluation quality starts falling off a cliff. Most power users blow past this before lunch.

The attention residue cost. Every AI interaction is a context switch. If you’re designing a workflow where a human bounces between five different agent outputs in rapid succession, you’re not creating a 5x productivity gain. You’re creating five overlapping attention residue penalties that compound into mush, especially over time.

The compression asymmetry. An LLM can hand you 2,000 tokens of densely structured analysis. Reading and genuinely understanding those 2,000 tokens takes you three to five minutes of focused attention. The model generated it in two seconds. The human is always the bottleneck, and workflow design needs to respect that rather than pretend it away.

This is why I’ve been moving Alexandria increasingly toward autonomous operation with human checkpoints rather than continuous human-in-the-loop collaboration. Not because I trust the agents to be right about everything (I don’t), but because I’ve learned that the most valuable use of my human context window is judgment at decision points, not monitoring of process steps.

The agents research, draft, analyze, and structure. I review, decide, redirect, and approve. The compression that my 10-bit-per-second consciousness provides is the ability to look at an agent’s output and feel whether it’s right or wrong before I can fully articulate why. Right now, this is the highest-value use of my bandwidth.

Designing for anything else is fighting biology.

A person standing at a minimal control console, with a vast autonomous system operating behind them — human judgment at decision points governing larger automated processes

The Counterargument (Because I’m Not a Doomer)

I want to be honest about where this argument has limits, because I’ve seen too many “AI is making us stupid” takes that grab the scariest stat and ignore everything that pushes back on it.

First, cognitive offloading isn’t inherently destructive. Writing was a cognitive offloading technology. So were books, libraries, and the printing press.

Socrates literally argued that written language would destroy memory, and in a narrow sense, he was right. We don’t memorize epic poems anymore. But the net effect of writing on human cognitive capability has been so obviously positive that the trade-off barely registers.

Second, the brain is plastic. The same neuroplasticity that allowed taxi drivers to grow their hippocampi means that cognitive atrophy from AI offloading is likely reversible for adults who deliberately re-engage those faculties. The concern about children not building capacity in the first place is more serious, but even that isn’t a foregone conclusion. Kids growing up with calculators still learn arithmetic. The question is whether AI’s broader cognitive replacement changes the equation, and we genuinely don’t know yet.

Third, there’s a strong argument that AI expands the effective human context window rather than shrinking it. I can hold more complex mental models because Alexandria handles the supporting detail work. My judgment at decision points is arguably better because I’m spending less cognitive budget on research mechanics and more on synthesis and evaluation. The total conscious throughput might be the same, but the quality of what fills it might be higher.

I believe all three of these things are true. I also believe they don’t fully cancel out the concerns.

The GPS analogy holds because the mechanism is the same, even if the magnitude differs. We will trade capability for convenience. Some of that trade will be worth it, and some won’t. Unfortunately, we’ll only figure out which is which in retrospect.

The Human Context Window Is the Whole Point

The human context window. Our 150,000-token-per-day, 10-bit-per-second, perpetually-distracted, and half-daydreaming system looks like a catastrophic limitation when you compare it to what machines can do. And if all you care about is throughput, it is.

But throughput was never the point.

We’re beings of quality over quantity. The point of human consciousness, the reason evolution built this particular compression algorithm, is that it converts an incomprehensible flood of information into meaning. Not data, not tokens, but the kind of meaning where you walk into a room and know something’s wrong before anyone says a word.

Every AI system I’ve built has reinforced this for me. The agents are better than I am at research, drafting, structural analysis, and catching inconsistencies. They’re not better at knowing what matters. And it’s not even close.

This gap isn’t closing next quarter. Human cognition works this way on purpose. The brutal compression, the 10 bits per second, and the forced prioritization are the mechanisms that produce judgment. Remove them, and you don’t get a faster human. You get a slower, worse AI.

So protect the human context window instead of trying to engineer around it. Build workflows that deliver the right 2,000 tokens at the right moment instead of dumping 50,000 tokens and hoping the human sorts it out. Protect the deep-focus time where your 10-bit-per-second processor does its best work, because that’s where the insights come from that no context window, no matter how large, can generate on its own.

The human context window is small. Painfully, laughably small compared to the machines.

That’s the feature.

This piece started as a question I couldn’t stop asking after writing my 2026 predictions. If you’re building agentic systems and grappling with the human side of the equation, I’d like to hear how you’re designing around these constraints. Find me on Twitter/X or LinkedIn.

The Bottleneck Is the Point

What is the Human Context Window?

The Question Nobody’s Asking

The Numbers: Your Brain at 10 Bits/s

Mapping the Human Context Window

Human vs. Machine: The Bandwidth Mismatch

The Compression Engine: Why the Bottleneck Exists

What We’re Trading Away

The Attention Tax

Why This Matters If You Build Things

The Counterargument (Because I’m Not a Doomer)

The Human Context Window Is the Whole Point

PATRICK MCGRATH

TOPICS

What is the Human Context Window?

The Question Nobody’s Asking

The Numbers: Your Brain at 10 Bits/s

Mapping the Human Context Window

Human vs. Machine: The Bandwidth Mismatch

The Compression Engine: Why the Bottleneck Exists

What We’re Trading Away

The Attention Tax

Why This Matters If You Build Things

The Counterargument (Because I’m Not a Doomer)

The Human Context Window Is the Whole Point

Share this post

PATRICK MCGRATH

Decode user behavior

TOPICS