Ask HN: How are you controlling costs and enforcing limits for LLM calls?

I’ve been running into an issue with LLM/agent systems where unexpected loops or repeated calls can quickly drive up costs.

Most tools I’ve seen focus on observability (logs, traces, dashboards), but not actual enforcement at runtime.

Curious how people here are handling this in production:

- Are you enforcing hard limits (budget, rate, etc.) or just monitoring?

- Do you handle this at the app level or via some middleware/proxy?

- Have you built something in-house for this?

Feels like an unsolved problem, especially with agents.

Would love to hear how others are dealing with it.

2 points | by 8dazo 3 hours ago

4 comments