Sep 6, 2025

Why You Cannot Leave Everything to Claude Code

Currently Impossible vs Absolutely Impossible

Currently Impossible

Issues that might be solved with infinite compute/time (but I don’t think it’ll be solved soon - I’m not an expert, but I think it won’t happen until the O(N^2) Attention problem is solved).

Context Window is Too Small

For example, even Claude 4.1 Opus (as of 2025/09/06), a frontier model, has only 200K context window. Looks big, but at ~10 tokens per line, that’s only 20K lines. Considering in/out and past conversation context, actual code is less than 10K lines. Then amnesia occurs. Specifically:

Ignores API specs even when provided
Fills in lost context with inference (resulting in hallucination)
Gradually loses design principles, ends up with completely different approach/structure
Produces tons of duplicate code

Cannot Do Large Refactoring

This was noticeable - gave Claude Code ~10K line frontend code to refactor.

Instruction was to follow DRY principle, extract duplicate processes, componentize if possible. After some extraction, build errors everywhere, AI somehow decided to stop using the components. Then in same conversation, asked to delete unused code/files - it reread all created components, said “not being used,” and deleted everything.

This is an extreme example, but in long context, consistency is lost, focus shifts to recent actions only, digs holes then forgets why they dug them and fills them.

https://arxiv.org/abs/2307.03172?utm_source

Conversely, ~2000 line single-file backend code refactored successfully.

Potemkin Understanding Everywhere

Can correctly explain concepts. But cannot practice them. And can correctly recognize their practice is wrong.

https://xenospectrum.com/what-is-potemkin-understanding-the-decisive-weakness-of-llm-exposed-by-harvard-university-and-others

Same happens in code generation. For example, when creating user credit processing, AI understands transactions are needed when discussing functional requirements, but doesn’t implement them when actually coding. Then in different context, can correctly identify the issue.

Absolutely Impossible

Problems that remain even with infinite compute or mathematical fixes.

XY Problem Will Always Occur

For example: “Please build an HTTP server that returns ‘hello,world’ in Python” Must fill in various context:

Python version (3.11? 3.12? etc.)
Library (http.server? flask?)
Request type (GET?)
Port number (80? 8080?)
Content-Type?
Plans for extension? (Hardcode or abstract?)

Over 100 branching possibilities in this alone.

Let’s ask Claude:

When asked "Build an HTTP server that returns Hello World in Python," here's the context I implicitly fill in:

## Basic Assumptions
- **Purpose**: Simple server for learning/testing
- **Python Version**: Python 3.x (current mainstream version)
- **Execution Environment**: Assumes local development environment

## Technical Choices
- **Library**: Use standard library `http.server` (no external dependencies)
- **Port Number**: Common development ports like 8000 or 8080
- **HTTP Method**: Prioritize GET request handling
- **Response Format**: Plain text or simple HTML

## Code Characteristics
- **Simplicity Priority**: Minimal code that works
- **Error Handling**: Keep it basic
- **Security**: Not intended for production use
- **Extensibility**: Basic structure for easy feature addition

## Execution Method
- Runnable directly from command line
- Stoppable with Ctrl+C
- Display URL on startup

Based on these implicit assumptions, I provide the simplest, most understandable code. If there are specific requirements (Flask usage, specific port, JSON support, etc.), I adjust accordingly.

These functional requirements ultimately cannot be known without reading the user’s brain’s latent space. (Could be solved by asking back, but if you can answer/decide accurately, you’re already an engineer.)

This is difficult for humans too, hence multimodal reasoning machines called PMs exist. Scaling text-only access won’t solve this.

What Happens Without Understanding These?

Non-engineers (huge generalization) occasionally say engineers are no longer needed, but I think they probably don’t notice AI meaninglessly digging and filling holes. When you try a reasonably large project, it will suddenly stop progressing at some point (when project exceeds model’s max context). (If completely delegating to AI.)

Engineers Going Forward

Before this, what is an engineer? Engineers logically decompose things and perform new reasoning. The essence is deduction and induction, not memorizing AWS specs or language syntax.

That fairly meaningless KV-like memorization is being/will be replaced by AI, so in the near term, engineers will handle what AI can’t. In that sense, there’s some shift upstream. AI writes code, humans think and take responsibility.