Here's a related issue that took me a whole day to figure out why Claude Code telemetry pings were causing a total network failure when using CC with a local LLM via llama-server.
I wanted to use local LLMs (~30B) on my M1 Macbook Pro Max, with Claude Code for a privacy-sensitive project. I spun up Qwen3-30B-A3B via llama-server and hooked it up to Claude Code, and after using it for an hour or so, found that my network connectivity got totally borked: browser not loading any web-pages at all.
Some investigation showed that Claude Code assumes it's talking to the Anthropic API and sends event logging requests (/api/event_logging/batch) to the llama-server endpoint. The local server doesn't implement that route and returns 404s, but Claude Code retries aggressively. These failed requests pile up as TCP connections in TIME_WAIT state, and on macOS this can exhaust the ephemeral port range. So my browser stopped loading pages, my CLI tools couldn't reach the internet, and the only option was to reboot my macbook.
After some more digging (with Claude Code's help of course) I found that the fix was to add this setting in my ~/.claude/settings.json
{
// ... other settings ...
"env": {
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
}
// ... other settings ...
}
I added this to my local-LLM + Claude Code/ Codex-CLI guide here:
I'm...rather confused why the results here are surprising. The title and first paragraph are suggestive of unusual data like analytics or sending all your codebase, but it's just sending the prompt + context.
This is how every LLM API has worked for years; the API is a stateless token machine, and the prompts + turns are managed by the client application. If anything it's interesting how standard it is; no inside baseball, they just use the normal public API.
Thanks — glad it resonated! Part 2 should uncover a lot of the magic behind the scenes. And thanks for sharing the link. Running claude code against a local llm is a really interesting direction, but I need more RAM...
I built a MITM proxy to inspect Claude Code’s network traffic and was surprised by how much context is sent on every request. This is Part 1 of a 4-part series focusing only on the wire format and transport layer. Happy to answer technical questions.
Thanks! I tried doing a similar comparison with Codex CLI and Cursor, but hit a wall. Codex doesn’t seem to respect standard proxy env vars, and Cursor uses gRPC. Claude Code was the only one that was straightforward to inspect. Opencode looks like a great next candidate.
Seems a bit obvious all the information claude code would send to the llm would be sent to Anthropic no? Isn’t that the point of using it via Azure or AWS Bedrock, for the guarantees of secrecy they provide you?
yes, it's obvious the data goes to Anthropic. What wasn't obvious to me was what exactly is included and how it's structured: system prompt size, full conversation replay, file contents, git history, tool calls. The goal was to understand how the wire level works.
On Azure/Bedrock - good point! My understanding is that they route requests through their infrastructure rather than Anthropic directly, which does change the trust boundary, but my focus here was strictly on what the client sends, that payload structure is the same regardless of backend.
It's the last 5 commits, not the full history. Here's what actually gets sent in the system prompt:
gitStatus: This is the git status at the start of the conversation...
Current branch: main
Main branch: main
Status: (clean)
Recent commits:
6578431 chore: Update security contact email (#417)
0dc71cd chore: Open source readiness fixes (#416)
...
Enough for Claude to understand what you've been working on without sending your entire repo history.
I am surprised that... if they have a session ID of some sort for your chat that they make you re-send the entire message history each time? Not a single kind of cache or stateful proxy/buffering mechanism? Guessing that extra cost in bandwidth is cheaper than having to develop and maintain that? Seems kind of like an obvious optimization/design tradeoff they could eventually decide to change one day?
Statelessness simplifies scaling and operational complexity. They cache the system prompt, but otherwise each request is fully self-contained. It’s an obvious tradeoff, and I wouldn’t be surprised if they move toward some form of server-side state or delta encoding once the product stabilizes.
I feel it has to be something bigger, if they’re just taking client side input for the system prompt seems like a security issue. Doesn’t this mean I could reprogram Claude at its core?
You can change the system prompt claude code sends, which changes how the agent frames behavior, but claude still has internal and server side safety layers. So removing or rewriting the client system prompt won't allow to magically bypass those. I think of the client system prompt more as agent configuration than as the primary safety net — it shapes behavior, but it’s not the final authority. I’m covering this in Part 2 — breaking down what’s actually in the system prompt and how the client-side safety framing is constructed.
I wanted to use local LLMs (~30B) on my M1 Macbook Pro Max, with Claude Code for a privacy-sensitive project. I spun up Qwen3-30B-A3B via llama-server and hooked it up to Claude Code, and after using it for an hour or so, found that my network connectivity got totally borked: browser not loading any web-pages at all.
Some investigation showed that Claude Code assumes it's talking to the Anthropic API and sends event logging requests (/api/event_logging/batch) to the llama-server endpoint. The local server doesn't implement that route and returns 404s, but Claude Code retries aggressively. These failed requests pile up as TCP connections in TIME_WAIT state, and on macOS this can exhaust the ephemeral port range. So my browser stopped loading pages, my CLI tools couldn't reach the internet, and the only option was to reboot my macbook.
After some more digging (with Claude Code's help of course) I found that the fix was to add this setting in my ~/.claude/settings.json
I added this to my local-LLM + Claude Code/ Codex-CLI guide here:https://github.com/pchalasani/claude-code-tools/blob/main/do...
I don't know if others faced this issue; hopefully this is helpful, or maybe there are other fixes I'm not aware of.
This is how every LLM API has worked for years; the API is a stateless token machine, and the prompts + turns are managed by the client application. If anything it's interesting how standard it is; no inside baseball, they just use the normal public API.
I use both Claude Code and Xcode with a local LLM (running with LM Studio) and I noticed they both have system prompts that make it work like magic.
If anyone reading this interested in setting up Claude Code to run offline, I followed these instructions:
https://medium.com/@luongnv89/setting-up-claude-code-locally...
My personal LLM preference is for Qwen3-Next-80B with 4bit quantization, about ~45GB in ram.