Testing Flutter Apps: Unit, Widget, and Integration
Flutter's testing story is genuinely good — fast, first-party, and layered — but many apps ship with almost no tests because it's not obvious where to start. The framework gives you three distinct tools. Knowing which one to reach for is most of the battle.
Unit tests: your logic, at full speed
Unit tests exercise plain Dart code — no widgets, no device. They run in milliseconds, which means you can have hundreds and run them on every save.
The prerequisite is that your logic is plain Dart. If parsing, validation, calculations, and state transitions live inside widgets, you can't unit test them. Pull them into plain classes — whatever your state management flavor calls them — and testing becomes trivial:
test('splits bill with service charge proportionally', () {
final result = BillSplitter.split(items: items, serviceRate: 0.12);
expect(result.perPerson['Ali'], closeTo(154.6, 0.01));
});
Where code depends on external services, inject the dependency and substitute a fake in tests. This is the layer where your business rules — the code that would embarrass you if it broke — should be covered exhaustively, including the ugly edge cases: empty lists, zero amounts, malformed input.
Widget tests: the UI, without a device
Widget tests pump a widget tree into a simulated environment and let you interact with it — no emulator, still fast enough to run hundreds in CI.
testWidgets('shows error when email is invalid', (tester) async {
await tester.pumpWidget(const MaterialApp(home: LoginScreen()));
await tester.enterText(find.byKey(const Key('email')), 'not-an-email');
await tester.tap(find.text('Sign in'));
await tester.pump();
expect(find.text('Enter a valid email'), findsOneWidget);
});
Two things to know. First, pump() advances one frame while pumpAndSettle() runs frames until animations finish — using the wrong one is the classic source of flaky widget tests. Second, test behavior, not structure: assert "the error message appears," not "there is a Column with three children." Structural assertions shatter on every refactor and protect nothing.
Widget tests earn their keep on screens with real logic: forms with validation, lists with empty/loading/error states, anything conditional. A static settings page doesn't need one.
Integration tests: the whole app, for the critical paths
Integration tests (the integration_test package) run your actual app on a real device or emulator — real rendering, real plugins, real navigation. They're the only tests that catch problems that live between the pieces: broken navigation flows, platform channel issues, startup crashes.
They're also slow and the most maintenance-heavy, so be ruthless about scope. Cover the handful of journeys where a failure means the app is effectively down — sign in, the core flow that makes your app worth using, checkout or sync if you have them. Five solid integration tests protecting the money paths beat fifty brittle ones nobody trusts.
Point them at a staging backend or a fake server, never production.
A shape that works
The classic pyramid holds up well in Flutter: many unit tests, a good layer of widget tests, a few integration tests. In practice, the priority order for an app with no tests today:
- Unit-test the logic that would silently corrupt data or money if wrong.
- Widget-test the screens with the most conditional behavior.
- Add integration tests for the two or three journeys you manually re-check before every release — that's your gut telling you what matters.
- Run all of it in CI on every pull request. Tests you only run locally decay in weeks.
One honest note on golden (screenshot) tests: they catch visual regressions well, but they fail on every intentional design tweak and can differ across machines. Use them for stable, high-value components — a design system — rather than whole screens.
The takeaway
You don't need heroic coverage. Plain-Dart logic under exhaustive unit tests, behavior-focused widget tests on your conditional screens, and a few integration tests guarding the critical journeys will catch the majority of real regressions — and turn releases from a manual checklist into a button.