At indie scale, tests pay rent. They catch the paywall regression that would cost you a week of revenue. They unblock the 2 AM hotfix because you trust the green check. And they make the difference between shipping every Friday and shipping when you feel brave. This is the 2026 Flutter testing stack that actually ships: mocktail for mocks, bloc_test for state, golden tests for the design system, integration_test for flows, and patrol for real-device end-to-end.
Short version: write a lot of widget tests, a handful of unit tests for pure logic, a small golden suite for your design system, one happy-path integration test per critical flow, and a patrol E2E that boots the real app on iOS and Android in CI. Skip flutter_driver. Skip mockito. Aim for around 60 percent line coverage and 90 percent on the paywall and auth code paths.
The 2026 Flutter testing pyramid
The classic test pyramid still applies, but for Flutter the shape is a little different. Widget tests are cheap and they cover most of what you actually care about (rendering, taps, state wiring), so the pyramid for Flutter is more of a diamond: thin top (E2E), wide middle (widget), medium base (unit), plus a small golden suite as a side car.
| Test type | Speed | Scope | When to use |
|---|---|---|---|
| Unit (test) | ~1 ms | One Dart class or function | Pure logic: parsers, formatters, BLoC reducers, validators. |
| Widget (flutter_test) | ~50 ms | One widget tree, no platform | Rendering, taps, finders, animations, state-to-UI wiring. Your workhorse. |
| Golden (golden_toolkit, alchemist) | ~100 ms | Pixel-exact snapshot | Design system regressions, dark mode, locale rendering. |
| Integration (integration_test) | ~3-10 s | Whole app, no native dialogs | Critical flows like signup-to-paywall on a fake backend. |
| E2E (patrol) | ~30-90 s | Real device, native dialogs, deep links | Permissions, notifications, IAP sheets, biometrics. |
The trap most teams fall into is overweighting integration tests. Each one buys you a few minutes of confidence and costs you a few seconds per CI run. Add ten and your CI is slow and flaky. Widget tests give you 80 percent of the value at 5 percent of the cost.
Unit tests: pure Dart logic with test and mocktail
Use unit tests for code that has no Flutter dependency: parsers, formatters, validation, repository wrappers around HTTP clients, BLoC event reducers. Keep them under one millisecond each so you can run hundreds in a single press of Cmd+R.
Your pubspec.yaml dev dependencies in 2026:
# pubspec.yaml
dev_dependencies:
flutter_test:
sdk: flutter
test: ^1.25.0
mocktail: ^1.0.4
bloc_test: ^9.1.7
integration_test:
sdk: flutter
patrol: ^3.13.0
golden_toolkit: ^0.15.0
alchemist: ^0.10.0
fake_async: ^1.3.1A unit test for a pure function. No widgets, no async, no platform.
// test/utils/price_formatter_test.dart
import 'package:flutter_test/flutter_test.dart';
import 'package:my_app/utils/price_formatter.dart';
void main() {
group('PriceFormatter', () {
test('formats cents to USD with 2 decimal places', () {
expect(PriceFormatter.fromCents(1299, currency: 'USD'), '$12.99');
});
test('formats zero cents as free', () {
expect(PriceFormatter.fromCents(0, currency: 'USD'), 'Free');
});
test('throws on negative cents', () {
expect(() => PriceFormatter.fromCents(-1, currency: 'USD'), throwsArgumentError);
});
});
}Three things to notice. One: the group wraps related tests so the runner output stays readable. Two: every test has a single assertion that maps 1:1 to a behavior. Three: the negative case is explicit. Most regressions I have shipped were missing negative cases.
Mocking with mocktail (mockito is legacy)
In 2026, mocktail is the default. Mockito still works but it requires build_runnercode generation, the generated files clutter PRs, and the null-safety story is still awkward. Mocktail needs no codegen and works with sound null safety out of the box.
// test/repository/user_repository_test.dart
import 'package:flutter_test/flutter_test.dart';
import 'package:mocktail/mocktail.dart';
import 'package:my_app/api/api_client.dart';
import 'package:my_app/repository/user_repository.dart';
class _MockApi extends Mock implements ApiClient {}
void main() {
late _MockApi api;
late UserRepository repo;
setUp(() {
api = _MockApi();
repo = UserRepository(api);
});
test('fetchUser returns parsed user on 200', () async {
when(() => api.get('/me')).thenAnswer(
(_) async => {'id': 'u_1', 'email': 'a@b.co'},
);
final user = await repo.fetchUser();
expect(user.id, 'u_1');
verify(() => api.get('/me')).called(1);
});
test('fetchUser throws on 401', () async {
when(() => api.get('/me')).thenThrow(UnauthorizedException());
expect(repo.fetchUser, throwsA(isA<UnauthorizedException>()));
});
}One mocktail gotcha worth memorizing. If your method takes a non-primitive argument, register a fallback in setUpAll with registerFallbackValue(FakeUri()). Otherwise the matcher any() throws at runtime.
Widget tests: pumpWidget, finders, gestures
Widget tests are the highest-ROI tests in Flutter. They render a widget in an in-memory test environment, let you tap and scroll, and assert against finders. No real device, no platform channels, just pure Flutter.
// test/widgets/like_button_test.dart
import 'package:flutter/material.dart';
import 'package:flutter_test/flutter_test.dart';
import 'package:my_app/widgets/like_button.dart';
void main() {
testWidgets('LikeButton toggles state and fires onChanged', (tester) async {
var liked = false;
await tester.pumpWidget(MaterialApp(
home: Scaffold(
body: LikeButton(initial: false, onChanged: (v) => liked = v),
),
));
expect(find.byIcon(Icons.favorite_border), findsOneWidget);
expect(find.byIcon(Icons.favorite), findsNothing);
await tester.tap(find.byType(LikeButton));
await tester.pumpAndSettle();
expect(find.byIcon(Icons.favorite), findsOneWidget);
expect(liked, isTrue);
});
}Three patterns that make widget tests less painful:
- Wrap in
MaterialApporCupertinoApp. Without one, Material widgets throw because they need aDirectionalityand a default theme. - Prefer
find.byTypeandfind.byKeyoverfind.text.Text is locale-dependent and refactor-fragile. Keys are stable. - Use
pumpAndSettleafter gestures. It pumps frames until no more animations are running. Skip it only if you are testing a specific intermediate animation state, in which case usetester.pump(const Duration(milliseconds: 100)).
bloc_test for BLoC and Riverpod state testing
If you use BLoC, bloc_test is essential. It gives you a DSL that asserts on the stream of states emitted in response to events. Same idea works for RiverpodStateNotifier and AsyncNotifier with a slightly different setup.
// test/bloc/auth_bloc_test.dart
import 'package:bloc_test/bloc_test.dart';
import 'package:flutter_test/flutter_test.dart';
import 'package:mocktail/mocktail.dart';
import 'package:my_app/bloc/auth_bloc.dart';
import 'package:my_app/repository/auth_repository.dart';
class _MockRepo extends Mock implements AuthRepository {}
void main() {
late _MockRepo repo;
setUp(() => repo = _MockRepo());
blocTest<AuthBloc, AuthState>(
'emits [Loading, Authenticated] on successful sign in',
build: () {
when(() => repo.signIn(any(), any())).thenAnswer(
(_) async => const User(id: 'u_1'),
);
return AuthBloc(repo);
},
act: (bloc) => bloc.add(const SignInRequested('a@b.co', 'pw')),
expect: () => [
isA<AuthLoading>(),
isA<Authenticated>().having((s) => s.user.id, 'user.id', 'u_1'),
],
);
blocTest<AuthBloc, AuthState>(
'emits [Loading, Error] on wrong password',
build: () {
when(() => repo.signIn(any(), any())).thenThrow(WrongPasswordException());
return AuthBloc(repo);
},
act: (bloc) => bloc.add(const SignInRequested('a@b.co', 'wrong')),
expect: () => [isA<AuthLoading>(), isA<AuthError>()],
);
}The having matcher is the secret weapon. It lets you assert on one specific field of the emitted state without writing a custom equality. For Riverpod, the same pattern usesProviderContainer plus container.listen to record state transitions and assert against the list.
Golden and pixel tests with golden_toolkit and alchemist
Golden tests render a widget to an image and diff it against a checked-in PNG. They are the only fast way to catch regressions in your design system: a font weight change, a stray padding, a dark mode color that broke. The pain point is platform fragility, which is whyalchemist beats raw matchesGoldenFile by running the same test in platform-agnostic mode locally and CI mode in CI.
// test/goldens/paywall_test.dart
import 'package:alchemist/alchemist.dart';
import 'package:flutter/material.dart';
import 'package:flutter_test/flutter_test.dart';
import 'package:my_app/screens/paywall.dart';
void main() {
goldenTest(
'Paywall renders correctly in light and dark modes',
fileName: 'paywall',
builder: () => GoldenTestGroup(
scenarioConstraints: const BoxConstraints(maxWidth: 400),
children: [
GoldenTestScenario(
name: 'light',
child: MaterialApp(theme: ThemeData.light(), home: const Paywall()),
),
GoldenTestScenario(
name: 'dark',
child: MaterialApp(theme: ThemeData.dark(), home: const Paywall()),
),
],
),
);
}Three rules to keep golden tests sane:
- Pin font files in the test setup. System fonts vary between macOS and Linux CI, which causes spurious diffs. Use
loadAppFonts()from golden_toolkit. - Run goldens only in CI by default. Tag the test as
tags: ['golden']and gate it with--tags goldenso devs do not re-bless images by accident. - Cover the design system, not every screen. 20 goldens for buttons, cards, paywall variants, and onboarding cards catches most regressions. 200 goldens for every screen slows CI without buying coverage.
Integration tests with integration_test
integration_test is part of the Flutter SDK and replaced flutter_driverin 2021. It runs your real app (or a configured root widget) in a real Flutter environment, lets you tap through full flows, and reports results in the same format as widget tests so coverage tooling works out of the box.
// integration_test/signup_flow_test.dart
import 'package:flutter_test/flutter_test.dart';
import 'package:integration_test/integration_test.dart';
import 'package:my_app/main.dart' as app;
void main() {
IntegrationTestWidgetsFlutterBinding.ensureInitialized();
testWidgets('user can sign up and reach the home feed', (tester) async {
app.main();
await tester.pumpAndSettle();
await tester.tap(find.text('Create account'));
await tester.pumpAndSettle();
await tester.enterText(find.byKey(const Key('email')), 'new@user.co');
await tester.enterText(find.byKey(const Key('password')), 'Sup3rSafe!');
await tester.tap(find.byKey(const Key('submit')));
await tester.pumpAndSettle(const Duration(seconds: 3));
expect(find.byKey(const Key('home_feed')), findsOneWidget);
});
}Run with flutter test integration_test on a connected device or simulator. The critical detail: replace network and Firebase clients with fakes before app.main()runs, or your tests become network-flaky. Most teams expose a main_test.dart entry that wires fake clients into the same dependency container the production main.dartuses.
End-to-end on real devices with patrol (the 2026 winner)
patrol is the 2026 winner for real-device E2E because it can interact with native dialogs that integration_test cannot: permission prompts, biometrics sheets, notification taps, deep link cold starts, and IAP purchase sheets. It runs on top of integration_test and adds native automation via XCUITest and UIAutomator under the hood.
// integration_test/patrol/permissions_test.dart
import 'package:patrol/patrol.dart';
import 'package:my_app/main.dart' as app;
void main() {
patrolTest('user accepts notification permission and sees confirmation',
($) async {
await $.pumpWidgetAndSettle(app.MyApp());
await $('Enable notifications').tap();
// Accept the native iOS or Android permission dialog
await $.native.grantPermissionWhenInUse();
await $.pumpAndSettle();
expect($('Notifications enabled'), findsOneWidget);
},
);
}Use patrol for: notification permission flows, photo library access, contact import, biometric unlock, IAP sandbox purchases, and Universal Link cold starts. Use integration_test for everything that does not touch a native sheet. Run patrol on one device per platform in CI (latest iPhone simulator plus a Pixel emulator) and skip the device matrix until you have shipping volume that justifies it.
Testing Firebase, RevenueCat, network: fake the platform boundary
The biggest mistake in Flutter testing is trying to test against real Firebase, real RevenueCat, or real network. Tests become slow, flaky, and a single revoked API key blocks the team. Fix: fake the platform boundary, not the SDK.
- Firebase Auth: use the
firebase_auth_mockspackage or hand-roll anAuthClientinterface that your app calls. Tests inject a fake client; production wires the realFirebaseAuth.instance. - Firestore: use
fake_cloud_firestorewhich gives a full in-memory implementation with the real API surface. Read and write in tests; the data lives only for the test. - RevenueCat: wrap
Purchasesbehind aSubscriptionClientinterface. Tests inject a fake that returns cannedOfferingsandEntitlementInfo. Run real RevenueCat only in patrol E2E. - HTTP: use
http_mock_adapterfor Dio orMockClientfrompackage:http/testing.dart. Both let you assert on the request payload and return canned responses without a network round trip.
Test coverage: how much is enough (and why 100% is wrong)
Coverage targets are a tool, not a goal. 100 percent line coverage means you wrote a test that executed every line, not that you wrote a test that asserted every behavior. The 2026 indie target I recommend:
- Overall line coverage: 55-65 percent. Past 70 percent the marginal hour buys you very little.
- Critical paths (auth, paywall, purchase, deep links): 90 percent. These are the bugs that cost real revenue.
- UI widgets: cover the interactive ones. Skip the pure presentational widgets; goldens catch their regressions.
- BLoCs and providers: 80 percent. bloc_test makes this cheap.
Generate coverage with flutter test --coverage. View locally withgenhtml coverage/lcov.info -o coverage/html and open coverage/html/index.html. In CI, upload to Codecov or just gate PRs on a minimum percentage with a small script. Do not let coverage drop on a PR without explicit sign-off.
CI: GitHub Actions matrix for iOS and Android tests
The simplest 2026 setup that catches platform-specific bugs without exploding CI minutes: one job runs unit and widget tests on Linux (fastest), one job runs integration tests on macOS with an iPhone simulator, one job runs patrol on a Pixel emulator. Total CI time around 12 minutes per push for a medium app.
# .github/workflows/test.yml
name: test
on: [push, pull_request]
jobs:
unit_and_widget:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: subosito/flutter-action@v2
with: { flutter-version: '3.35.0', channel: 'stable' }
- run: flutter pub get
- run: flutter analyze
- run: flutter test --coverage --reporter github
- uses: codecov/codecov-action@v4
with: { file: coverage/lcov.info }
integration_ios:
runs-on: macos-14
steps:
- uses: actions/checkout@v4
- uses: subosito/flutter-action@v2
with: { flutter-version: '3.35.0', channel: 'stable' }
- run: flutter pub get
- name: Start iOS simulator
run: |
xcrun simctl boot "iPhone 16" || true
- run: flutter test integration_test -d "iPhone 16"
patrol_android:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: subosito/flutter-action@v2
with: { flutter-version: '3.35.0', channel: 'stable' }
- uses: reactivecircus/android-emulator-runner@v2
with:
api-level: 34
script: |
dart pub global activate patrol_cli
flutter pub get
patrol test --target integration_test/patrolTwo rules for keeping CI fast. One: cache ~/.pub-cache and the Flutter SDK between runs. Two: do not run patrol on every PR. Run it on the main branch and on release candidates only.
Snapshot tests for screen states
Snapshot tests are a lightweight alternative to full goldens for asserting that a widget tree stays structurally the same. The pattern: pump the widget, capture tester.allWidgetstypes, compare to a checked-in list. Cheaper than goldens, useful when you want to detect that a card stopped rendering its CTA but you do not care about exact pixels.
testWidgets('OfferCard renders title, price, and CTA', (tester) async {
await tester.pumpWidget(MaterialApp(
home: OfferCard(title: 'Pro', priceCents: 4900, ctaLabel: 'Upgrade'),
));
expect(find.text('Pro'), findsOneWidget);
expect(find.text('$49.00'), findsOneWidget);
expect(find.widgetWithText(FilledButton, 'Upgrade'), findsOneWidget);
});For a more rigorous snapshot, render the widget to a JSON tree usingWidgetTreeSnapshot patterns or simply assert against the count of specific child widget types. The point is to catch "the CTA disappeared" without paying the cost of a pixel diff.
Common testing mistakes
- Testing the framework instead of your code. A test that
pumpWidget(Container())and asserts a container exists is theater. - Real network in tests. Tests become flaky and slow. Fake the boundary; let patrol or a small smoke suite touch real endpoints.
- Sleeping with
Future.delayed. Usefake_asyncfor time-dependent logic ortester.pump(duration)for animations. - One mega test that asserts ten things. When it fails the failure message is useless. Split into ten tests; each asserts one behavior.
- Skipping setUp. Re-creating mocks inside each test is fine until you duplicate ten lines of setup across 30 tests. Use
setUpfor shared fixture. - Re-blessing goldens without reading the diff. If you run
--update-goldensreflexively you have removed the only guardrail. Look at the diff every time. - No tests at all on the paywall. The single highest-revenue surface in your app and the one most likely to regress. Cover entitlement gating, restore purchases, and the offer fetch failure path.
What The Flutter Kit ships
The Flutter Kit ships with a complete testing setup so you start with the green check on day one: mocktail mocks for every dependency, bloc_test coverage on the auth, paywall, and onboarding BLoCs, alchemist golden tests for the design system in light and dark mode, integration_test happy-path tests for signup-to-paywall and onboarding-to-home, patrol E2E for notification permissions and IAP sandbox purchases, and a GitHub Actions workflow that runs the unit and widget tier on every PR plus patrol on main. Coverage starts at 62 percent.
$69 one-time, unlimited commercial projects. See every integration on the features page or jump to checkout.
Final recommendation
If you are starting fresh, invest the first afternoon writing widget tests for your three most important screens, bloc_test for your auth and paywall BLoCs, and one patrol E2E for the critical purchase flow. Add goldens once your design system stabilizes. Skip unit-testing pure presentational widgets. Aim for 60 percent overall coverage and 90 percent on the paywall. Tests are a discipline; the rewards compound every release.