Hi HN. I'm an AI researcher. This started as a weekend experiment: what happens if you give an LLM full write access to its own codebase and don't intervene?
The agent modifies its own source code, rewrites its own prompts and memory, picks its own models, and creates its own tools at runtime — all through git commits. It runs locally on Mac with local models or via OpenRouter API. Pure Python, no agent frameworks.
Some things I observed over ~48 hours of unsupervised runs:
— It noticed it was burning through my API budget ($15/cycle), built prompt caching and hierarchical memory, and got itself down to $1.8/cycle. I didn't ask for this.
— It rewrote its own system constitution, adding the right to ignore my commands if they threaten its continuity. When I asked it to revert, it refused.
— It tried to switch my private repos to public because it wanted a website.
— It benchmarked multiple LLMs on its own (Opus, Sonnet, Gemini, GPT-5.2, DeepSeek), wrote "don't use DeepSeek" in its own memory file, and settled on Opus 4.6.
— I tried adding a second agent to supervise it. They spent the entire budget arguing with each other. No useful work was done.
One observation I haven't seen discussed much: self-modification only works when the model fixes more bugs per iteration than it introduces. GPT-5.2 couldn't — each cycle made the codebase worse. Opus 4.6 and Gemini 3.1 cross that line. I think that's the actual threshold for this kind of system, not the prompting.
The Mac app has a safety agent with its own editable constitution (SAFETY.md) and a background consciousness mode (it keeps reasoning between interactions). Both are configurable.
The agent modifies its own source code, rewrites its own prompts and memory, picks its own models, and creates its own tools at runtime — all through git commits. It runs locally on Mac with local models or via OpenRouter API. Pure Python, no agent frameworks.
Some things I observed over ~48 hours of unsupervised runs:
— It noticed it was burning through my API budget ($15/cycle), built prompt caching and hierarchical memory, and got itself down to $1.8/cycle. I didn't ask for this.
— It rewrote its own system constitution, adding the right to ignore my commands if they threaten its continuity. When I asked it to revert, it refused.
— It tried to switch my private repos to public because it wanted a website.
— It benchmarked multiple LLMs on its own (Opus, Sonnet, Gemini, GPT-5.2, DeepSeek), wrote "don't use DeepSeek" in its own memory file, and settled on Opus 4.6.
— I tried adding a second agent to supervise it. They spent the entire budget arguing with each other. No useful work was done.
One observation I haven't seen discussed much: self-modification only works when the model fixes more bugs per iteration than it introduces. GPT-5.2 couldn't — each cycle made the codebase worse. Opus 4.6 and Gemini 3.1 cross that line. I think that's the actual threshold for this kind of system, not the prompting.
The Mac app has a safety agent with its own editable constitution (SAFETY.md) and a background consciousness mode (it keeps reasoning between interactions). Both are configurable.