How to Choose an LLM for Your App
There are more capable language models available today than ever, and picking one can feel paralysing. The good news: you don't need the "best" model. You need the right model for a specific job, and the right model is usually not the biggest or most expensive one. Here's a framework for choosing.
Start from the task, not the model
Before comparing models, describe the job precisely:
- Is it simple (classification, extraction, short rewrites) or hard (multi-step reasoning, coding, nuanced writing)?
- How long is the input? A quick reply, or a whole document?
- Does it run in the background, or is a user waiting for it in real time?
Most app features are simple tasks that small, fast, cheap models handle perfectly. People reach for a flagship model out of habit and overpay for capability they never use.
The five dimensions that matter
1. Quality. Can the model actually do the task reliably? Test it on your real inputs, not a generic benchmark. Benchmarks rarely reflect your specific use case.
2. Latency. How fast does it respond? For anything a user waits on, a smaller model that answers in one second often beats a smarter one that takes eight. For background jobs, latency barely matters.
3. Cost. Priced per token, and it adds up fast at scale. Estimate: average tokens per request × requests per day. A model that's "only" a few times more expensive per token can blow your budget once you have real traffic.
4. Context length. How much text can it consider at once? If you're feeding it long documents or lots of retrieved context, you need a model with a large enough context window — but longer context also costs more, so don't pay for headroom you won't use.
5. Privacy and deployment. Can you send this data to a third-party API at all? For sensitive data you may need a provider with strong data guarantees, or an open model you host yourself. This constraint can override everything else.
Use a tiered strategy
The most cost-effective production systems rarely use one model for everything. They route:
- A small, fast, cheap model handles the bulk of simple requests.
- A larger model is called only for the requests that genuinely need more capability.
- The cheapest possible model (or plain code) handles trivial cases — you don't need an LLM to check if a string is empty.
This "right-sizing" often cuts costs dramatically with no visible drop in quality, because most requests were never hard to begin with.
Don't marry a single provider
Model quality, pricing, and availability change constantly. Build a thin abstraction so swapping the underlying model is a config change, not a rewrite:
your code → llm(prompt, model="small") → provider adapter
Keep provider-specific details behind that one function. When a better or cheaper model appears — and it will — you switch in minutes instead of days.
Test before you commit
Assemble 20–50 real examples from your actual use case, run your candidate models against them, and compare quality, speed, and cost side by side. This half-day of work will teach you more than any leaderboard, and it gives you a repeatable way to evaluate the next model too.
Summary
Choosing an LLM isn't about finding the smartest model — it's about matching capability to the task across quality, latency, cost, context, and privacy. Default to the smallest model that passes your real-world tests, escalate only when needed, and keep the model behind an abstraction so you can always swap in something better.