Skip to content
← All writing
June 24, 2026·3 min read

How to Add AI to a Flutter App with LLMs

Adding AI to a mobile app used to mean training a model and shipping it inside the binary. Today, most "AI features" are a thin, well-designed layer between your app and a large language model (LLM). The hard part is no longer the model — it's the engineering around it. This guide walks through how to add an LLM-powered feature to a Flutter app in a way that is fast, cheap, and safe to ship.

Never call the model directly from the app

The single most important rule: your app should never talk to the LLM provider directly. If you embed an API key in a Flutter build, it will be extracted — anyone can decompile the app, pull the key, and run up your bill.

Instead, put a small backend between the app and the model:

Flutter app  →  your backend (holds the key)  →  LLM provider

Your backend can be a few lines of Python (FastAPI) or Go. It does three jobs: hold the secret key, enforce rate limits per user, and shape the request/response. This is also where you'll later add caching and logging.

The request/response shape

Keep the contract between app and backend boring and explicit. The app sends the user's input plus any context; the backend returns structured data, not raw model text your UI has to guess at.

final res = await http.post(
  Uri.parse('https://api.yourbackend.com/assist'),
  headers: {'Authorization': 'Bearer $sessionToken'},
  body: jsonEncode({'prompt': userText, 'context': recentItems}),
);
final data = jsonDecode(res.body); // { "summary": "...", "actions": [...] }

By returning JSON with named fields, your Flutter widgets stay simple and you can change the prompt on the server without shipping a new app version.

Stream the response for perceived speed

LLMs are slow to produce a full answer but fast to produce the first token. A response that streams word-by-word feels dramatically faster than a spinner that sits for six seconds. Expose a streaming endpoint (Server-Sent Events works well) and render tokens as they arrive:

final request = http.Request('POST', uri)..body = payload;
final response = await request.send();
response.stream.transform(utf8.decoder).listen((chunk) {
  setState(() => _answer += chunk);
});

Even if the total time is identical, streaming changes how fast the feature feels — and perceived performance is what users judge.

Control cost before it controls you

LLM calls cost money per token, and mobile users tap buttons a lot. Three cheap safeguards:

  • Rate-limit per user on the backend (e.g. N requests per minute). This stops both abuse and runaway loops.
  • Cache identical requests. Many prompts repeat; a simple hash-keyed cache can cut cost significantly.
  • Pick the smallest model that works. Reach for a fast, cheap model by default and only escalate to a larger one for genuinely hard requests.

Design for failure

The network will drop, the provider will occasionally return errors, and the model will sometimes produce nonsense. Treat AI features as fallible by default:

  • Always show a graceful "couldn't generate that, try again" state.
  • Never let an AI response silently overwrite user data — confirm destructive actions.
  • Log failures on the backend so you can see real-world failure rates.

Keep the UX honest

Label AI output as AI-generated, give users an easy way to edit or dismiss it, and never present a model guess as a certain fact. Trust is easy to lose and expensive to rebuild.

Summary

A good AI feature in Flutter is 10% model and 90% plumbing: a backend that holds your keys, a boring JSON contract, streaming for perceived speed, hard cost limits, and a UI that assumes the model can be wrong. Get that scaffolding right and you can swap models, tune prompts, and add features without ever touching the app binary again.