Status pre-launch · buildingDay-one providers all major LLM APIsTwo modes BYOK or managed tokensHosted on AWS EKSBeta access free
FILED · PRE-LAUNCH EDITIONBUILT IN PUBLIC · NO TRACKERS · NO BS

YOUR AI BILL HAS A PROBLEM. IT IS NOT A SECRET.

LLMBurner meters every dollar your AI burns — in the products you ship and across your team. One proxy for your code's LLM calls; a desktop app + Chrome extension for everyone else. Every cost, one panel. Currently being built — get in early.

Solo founder · I personally reply · krupesh.dev
ENTERPRISE LLM SPEND · 6 MONTHS$3.5B → $8.4Bdoubled in half a year · most teams have no idea where it went
· Menlo Ventures · 2025
TEAMS WHOSE LLM BILL EXCEEDED THEIR FORECAST78%within the first year of deployment
· a16z survey · 2024
PRODUCT SPEND · TO INSTALL1 urlproxy your code's LLM calls · no SDK swap · no lock-in
TEAM SPEND · WHO BURNS WHATper seatdesktop app + Chrome extension · per person, per machine · coming soon
CLAIM 01.
Spend is exploding.
$8.4B

Enterprise LLM spend more than doubled in six months. The teams writing those cheques mostly can't tell you which feature spent what.

CLAIM 02.
Budgets miss every time.
78%

Of enterprise AI teams report LLM spend exceeded their first-year projections. Surprise bills are the norm, not the exception.

CLAIM 03.
Not re-routing is overpaying.
10×/yr

The cost of equivalent-quality inference drops 10× every year. Teams that don't re-route stay locked to last year's prices — paying multiples for the same answer.

CLAIM 04.
The fix is one line.
1 url

To meter your product's own LLM calls — one base URL, no SDK swap, no middleware. For your team's AI spend, the desktop app + Chrome extension are coming.

SRC: How the proxy works.
FROM THE FOUNDERPRE-LAUNCH · OPEN BUILD

I'm building this because I needed it.

I was shipping a small AI product — a Claude wrapper, the kind every indie builder ships these days. Every user request hit the API. Every reply ate tokens. And every month the bill arrived without telling me which feature, which user, or which prompt ate the most.

I looked for a tool that would just meter the calls and tell me where the money was going. What I found was either an enterprise observability suite priced for Series B, or a logging library I'd have to wire into every SDK. Nothing in between. Nothing for someone who just wanted to see the bill, route the cheap calls to a cheap model, and cap the budget before something runs away overnight.

So I'm building it. Solo, on AWS, in public. If you've ever opened an LLM invoice and squinted — I'd love to talk to you before I write the next line of code.

WHAT IT WILL LOOK LIKE.
ALL THE DATA. ONE SCREEN.

8 panels · design preview
numbers below are illustrative
CONSOLE / previewDESIGN PREVIEWILLUSTRATIVE NUMBERS
panels 8not live data
Telemetry · spend / 24hUSD · LIVE
$84.17−12.4%
vs. yesterday · forecast $340/mo · trending down
OpenAIgpt-533%$27.84
Anthropicopus 4.731%$26.22
Anthropicsonnet 4.616%$13.40
Googlegemini 2.511%$9.63
Groqllama 3.39%$7.08
AI advisory · this weekRANKED BY $
Route support-bot through Haiku 4.5 instead of Opus 4.7.
opus 4.7haiku 4.5
40 prompts tested · quality Δ ≈ 0%
Saved / mo
−$340
Quality drop
~0%
Alerts · last 4h5 EVENTS
  • 14:07CRITsupport-bot hit $30/day cap · calls blocked
  • 13:54WARNOpus 4.7 at 80% of monthly budget
  • 13:31INFOSuggested: gpt-5 → nano on /summary · −$58
  • 12:48WARNDeepSeek V3 latency p95 above SLO · 1.84s
  • 12:11OKGemini Flash repriced · −13.6% / 1M tokens
Playground · same prompt, every modelRAN 14:06
"Reset my password. The link in the email isn't working."
opus 4.7 · current
"I understand how frustrating that is. Let me walk you through resetting your password step by step…"
$0.0186ms 412tok 186
haiku 4.5★ PICK
"Sorry — try this fresh reset link: [link]. Expires in 30 min. If it still fails, reply with the error."
$0.0014ms 189tok 42
gpt-5
"Apologies for the trouble! I can help you reset your password. Could you click this new link: [link]…"
$0.0098ms 340tok 58
Prompt history6 EDITS
  • system_v3.2@you · 2dlive
  • system_v3.1@maya · 5dpinned
  • system_v3.0@you · 7drolled back
  • system_v2.9@maya · 12darchived
Live pricing · per 1M tokensUPDATED 14:00
ModelInOut
haiku 4.5$0.80$4.00 −12%
gpt-5 nano$0.05$0.40 0%
gemini flash$0.38$1.50 −14%
llama 3.3$0.09$0.59 0%
sonnet 4.6$3.00$15.00 0%
opus 4.7$15.00$75.00
Call heatmap · 7dBY HOUR
036912151821
MON
TUE
WED
THU
FRI
SAT
SUN
LOW
HIGHcalls / hr
PEAKTue 11:00 · 48 calls/hr
COSTLIESTMon 14:00 · $4.20/hr
QUIETESTSun 03:00 · 0 calls
Devices & seats · burn by source3 PCS · 2 SEATS
One company, every machine — desktop app + Chrome, per PC, per person.
MacBook Prodesktop · @krupesh41%$34.50
Mac Studiodesktop · @maya22%$18.40
WIN-DEV-01chrome · @krupesh19%$16.00
LINUX-CIchrome · service12%$10.10
misc2 idle seats6%$5.17

SEEN ENOUGH?
HELP ME BUILD THE RIGHT THING.

I'm before v1. Talk to me for 20 minutes — show me your stack, your worst LLM bill, the feature you wish existed — and your priorities go straight into v1. The first 10 callers get free beta access for life.

No sales people · no funnel · your priorities go straight into v1
BOOK 20 MIN —→

WHERE YOUR
BILL COMES FROM.

Every route, every feature flag, every model — attributed in real time. No more looking at a monthly invoice and guessing which feature caused the spike.

  • Attributed per route, per flag, per team
  • Daily, weekly, 30-day rollups
  • Export to CSV or query direct SQL
SPEND BY ROUTE · LAST 24H$84.17 TOTAL
/support-botopus 4.7$32.00 38%
/summarisersonnet 4.6$20.20 24%
/embeddings cronnightly$14.31 17%
/intent-routinggpt-5 nano$9.26 11%
/classifyhaiku$5.89 7%
misc · under 4 flags$2.51 3%

TWO WAYS TO PLUG IT IN.

CHOOSE WHAT FITS · NO LOCK-IN EITHER WAY
MODE 01 · BYOK

Bring your own API keys.

FOR · teams already paying providers directly
  • Keys are stored encrypted on AWS · we never see plaintext
  • You keep your existing billing with OpenAI, Anthropic, etc.
  • LLMBurner only meters · routes · alerts · advises
  • Switch back to the provider direct anytime · just change the URL back
PRICING · flat per tracked request · well below what the meter saves
MODE 02 · MANAGED TOKENS

Buy tokens through us. One bill.

FOR · solo builders + small teams tired of 5 provider invoices
  • One topup · access to every supported provider at our rates
  • Starting discount when you load tokens · no minimum
  • Single monthly invoice instead of five
  • Same observability, same routing, same caps
PRICING · provider rate + small margin · margin disclosed up front

ONE LINE. EVERY PROVIDER. NO LOCK-IN.

BEFORE.PYBLIND
# no visibility
# no cost tracking
# no alerts

from openai import OpenAI

client = OpenAI(
  api_key="sk-...",
  base_url="https://api.openai.com/v1",
)
AFTER.PYMETERED
# every call tracked
# costs · tokens · latency
# budgets · alerts · routes

from openai import OpenAI

client = OpenAI(
  api_key="sk-...",
  base_url="https://api.llmburner.app/v1",
)

TWO SURFACES.
ONE SPEND PANEL.

PRODUCT SPEND + TEAM SPEND · ALL COMING SOON
SURFACE 01 · API / TOKEN

Your product's AI spend.

FOR · apps, products & internal tools you ship
  • One base URL · every model · no SDK swap
  • Attributed per route, per feature, per flag
  • Works in any language that can hit an API
  • Hard spend caps + alerts before something runs away
● COMING SOON
SURFACE 02 · DESKTOP + CHROME

Your team's AI spend.

FOR · seeing which employee is burning how much
  • Lightweight desktop app · drops in via npm
  • Chrome extension · tracks browser AI tools your team uses
  • Per-person, per-machine breakdown in dollars
  • Rolls into the same one panel as your product spend
● COMING SOON

WHAT YOU GET.

07 MODULES · ONE PROXY
01.

PER-FEATURE COST

Every call, attributed to the route or feature flag it came from. SQL on top if you want it. CSV export if you don't.

live · $/call
02.

SIDE-BY-SIDE MODELS

Same prompt across GPT, Claude, Gemini, Mistral, Groq. Stacked outputs, tokens, latency, cost. Pick the cheapest one that's correct.

8+ providers
03.

HARD LIMITS

Cap any provider by dollars or requests. When a runaway loop hits the ceiling, the burner stops the burn.

trips < 30s
04.

LIVE PRICING

Every model, every provider, current price. Refreshed when providers push. No more stale spreadsheets.

real-time
05.

VERSIONED PROMPTS

Folders per project. Pin the winning system prompt. Diff and roll back when the new one ships a regression.

∞ versions
06.

PLAIN-ENGLISH INSIGHTS

"Route support-bot through Haiku 4.5, save ~42%, no quality drop." A sentence. Not a chart you have to interpret.

weekly digest
07.

DEVICE & SEAT BURN

Spend by machine and by person, not just by route. Desktop app and Chrome usage land in one panel — track a single account across three PCs, or a whole company with many seats, and see exactly which device is burning.

per-device · per-seat

WHERE THIS ACTUALLY IS.

I'm not going to fake a green LIVE badge to make this look further along than it is. Here's the real status — built, building, and the two things I'd genuinely like your opinion on.

BUILTv0
  • The design & spec for the proxy interface (this page)
  • AWS EKS deployment plan, key-encryption approach
  • Two-mode pricing decided: BYOK + managed tokens
  • Confidence in the problem · validated by my own LLM bill
BUILDING NOWv1 targets
  • Proxy router · OpenAI + Anthropic first
  • Per-call cost logging · batched calc on first ship
  • One dashboard view · spend breakdown by route
  • Hard spend caps · per-provider, per-day
  • Email alerts when caps near or break
HELP ME DECIDEyour input matters
  • Dashboard: when you log in, what's the first view you want? Spend by route? By model? A "what changed today" summary?
  • Streaming responses: the proxy has to handle SSE/streaming cleanly. What's broken about how today's tools do it for you?
Strong opinion?Book 20 min → · or reply to the waitlist email when it lands.

OBJECTIONS.

Is the product actually working right now?

No. LLMBurner is pre-launch — I'm a solo founder, building v1 right now. The console screenshots above are design previews of where I'm headed. The numbers in them are illustrative, not real customers. If you sign up, you'll get a personal email from me, and the first version when it ships. If that's not what you want, please don't sign up — I'd rather have 30 honest signups than 1,000 strangers.

How is this different from Helicone, LangSmith, OpenRouter?

Helicone and LangSmith are observability-first; OpenRouter is routing-first. LLMBurner combines both in one proxy and gives you a second mode where you can buy tokens through us at a small discount instead of juggling five provider accounts. That managed-tokens mode is the differentiator — nobody else does it that I've seen. If they ship it before me, I'll tell you here honestly.

Is this another middleware I have to maintain?

No. LLMBurner will be a hosted proxy on AWS. You change one base URL in your existing OpenAI/Anthropic SDK. No library to install, no service mesh, no agents. Point the URL back to the provider any time you want — there is no lock-in either way.

What about my prompts and data?

The plan: we don't train on your data, ever. By default we'd log sizes, tokens, latency, timestamps — not prompt or response bodies. Body logging will be opt-in per project, encrypted at rest. I'd rather lock this down right than ship it loose, so if you have strong opinions on the privacy model, that's another thing worth a call.

What will it cost?

Free during beta. After that, two modes: BYOK = flat per tracked request, well below what the meter saves you. Managed tokens = provider rate plus a disclosed margin, with a starting discount on first topup. If it doesn't save you more than it costs, don't pay me.

Does the proxy add latency?

Plan is to deploy in the same regions as the providers. Target added latency <15ms p50. I'll publish the real number on a public status page once v1 is live — not before, because measuring an unbuilt thing is exactly the kind of fake number I'm trying to avoid putting on this page.

Who are you and why should I trust you?

I'm Krupesh — solo developer, building in public. krupesh.dev. I'm building this because I ran into the problem myself on a small Claude-wrapper project. I personally read and reply to every waitlist signup. If you book a call, you'll be talking to me, not a salesperson — because there are no salespeople.

STOP GUESSING. START METERING.

Drop your email. You'll get a personal note from me, an honest update when v1 ships, and the chance to shape what gets built first. No drip campaign. No newsletter. Just real signal.

or skip the queue · book a 20-min call →