The 2026 generation of mobile AI is no longer chat. It is agents that take actions: book the appointment, edit the spreadsheet, query the user's own data, file the support ticket. The two technologies that made this normal are tool calling (the LLM can request your functions) and MCP, the Model Context Protocol (a standard way to expose tools and data as servers). Here is how to build agentic Flutter apps with both, including the streaming, permissions, and safety patterns that production apps need.
Short version: keep your API keys on a backend proxy, define a small set of typed tools as JSON Schemas, stream responses so the UI feels alive, require user confirmation for any state-changing tool, and add MCP servers when you want third-party tools (Linear, GitHub, Notion) without writing custom integrations. For most indie Flutter apps in 2026, Claude Haiku 4.5 with three to five tools is the sweet spot for cost, latency, and quality.
What is MCP (Model Context Protocol) and why it matters
MCP is an open protocol that lets LLM clients talk to tool servers in a uniform way. Anthropic introduced it in late 2024, OpenAI adopted client support during 2025, and by 2026 a dozen first-party servers exist for things like Linear, GitHub, Google Drive, Notion, Postgres, and local filesystems. The win is composability: any MCP-compatible client (your Flutter app's backend, a desktop assistant, an IDE) can connect to any MCP server and immediately use its tools without bespoke glue code.
For a Flutter app, MCP matters in two ways. First, you can connect to existing third-party MCP servers from your backend and expose those tools to your users. Second, you can build your own MCP server that wraps your product's API, and reuse it across your Flutter app, your web app, your Slack bot, and an internal admin assistant.
Tool calling vs MCP vs function calling (clearing up the confusion)
These three terms get tangled. Here is the clean mental model.
- Function calling is OpenAI's original name (2023) for letting the model request your local function with structured arguments. The model never executes anything; it just emits a typed call you choose to run.
- Tool calling is the generalized 2024 term. Same idea, broader. The model emits a tool-call message, your code runs the tool, you feed the result back as a tool-result message, and the model continues.
- MCP is a wire protocol that standardizes how tools are advertised and invoked across a process boundary. An MCP server lists tools, resources, and prompts. An MCP client connects, lists them, and forwards tool calls. The LLM is unaware of MCP; the client adapts MCP tools into the model's tool-call format.
In short: tool calling is the model behavior. MCP is the transport that lets external systems expose tools to your LLM session. They compose; an MCP-aware backend speaks tool calling to the model and MCP to the servers.
The 2026 mobile AI architecture
A clean agentic Flutter app has four parts:
- Flutter UI renders the chat, tool-call indicators, and confirmation prompts. It never talks to LLM providers directly.
- Backend proxy (Cloud Function, Cloud Run, Fly.io, or Workers) holds your API keys, applies rate limits, calls the LLM, runs tool calls, and streams results back via SSE.
- LLM provider (OpenAI, Anthropic, Gemini) does the reasoning and emits tool-call messages.
- MCP servers (optional) expose tools from third-party systems or your own services. The backend is the MCP client.
Putting an LLM key in your Flutter binary is an immediate security incident. Strings ship as plain text in app bundles, and a single de-obfuscated APK on Reddit can drain your account overnight. Always proxy. Even on indie scale.
Provider comparison: OpenAI, Anthropic Claude, Gemini tool use
All three major providers support tool calling and stream responses. The differences are in quality of tool selection, latency, cost, and MCP integration. This is the 2026 picture for indie Flutter developers.
| Capability | OpenAI (GPT-5 mini) | Anthropic (Claude Haiku 4.5) | Gemini (2.5 Flash) |
|---|---|---|---|
| Tool calling | Mature, parallel tools, JSON Schema | Mature, parallel tools, JSON Schema | Mature, single + parallel, OpenAPI-style schema |
| Streaming tool calls | Yes, deltas across calls | Yes, clean tool-use events | Yes, function-call deltas |
| MCP client built in | Yes (Responses API native MCP) | Yes (native MCP via SDK) | Partial, via Gemini SDK MCP shim |
| Prompt caching | Automatic on long prompts | Explicit, very effective for tool defs | Implicit caching available |
| Indicative input $/1M | ~$0.25 | ~$1.00 | ~$0.10 |
| Indicative output $/1M | ~$2.00 | ~$5.00 | ~$0.40 |
| Tool-selection quality | Strong | Best in class for multi-tool flows | Strong, leans literal |
Pricing moves; treat the numbers as relative. My default in 2026 is Claude Haiku 4.5 because tool-call selection on multi-step flows is the most reliable, and explicit prompt caching makes repeat tool definitions essentially free. Gemini 2.5 Flash wins on raw cost for high-volume consumer apps. GPT-5 mini is the safest pick when your team already runs an OpenAI stack.
Building a simple tool-calling chat in Flutter
The Flutter side is normal chat with one twist: render a small badge whenever the assistant message contains a tool call. That signal alone makes the experience feel agentic.
// lib/features/agent/agent_message.dart
sealed class AgentMessage {
const AgentMessage();
}
class UserText extends AgentMessage {
const UserText(this.text);
final String text;
}
class AssistantText extends AgentMessage {
AssistantText(this.text);
String text;
}
class ToolCall extends AgentMessage {
const ToolCall({required this.name, required this.args, required this.id});
final String name;
final Map<String, Object?> args;
final String id;
}
class ToolResult extends AgentMessage {
const ToolResult({required this.id, required this.result, this.isError = false});
final String id;
final String result;
final bool isError;
}// lib/features/agent/agent_view.dart
class AgentView extends StatelessWidget {
const AgentView({required this.messages});
final List<AgentMessage> messages;
@override
Widget build(BuildContext context) {
return ListView.builder(
itemCount: messages.length,
itemBuilder: (context, i) => switch (messages[i]) {
UserText(:final text) => _Bubble(text: text, isUser: true),
AssistantText(:final text) => _Bubble(text: text, isUser: false),
ToolCall(:final name) => _ToolBadge(label: 'Calling $name'),
ToolResult(:final isError) => _ToolBadge(
label: isError ? 'Tool failed' : 'Tool done',
tone: isError ? _Tone.error : _Tone.ok,
),
},
);
}
}The pattern matters: surface the tool call as a small, distinct UI element instead of hiding it inside the assistant bubble. Users learn to trust the agent faster when they can see what it is doing.
Defining tools as JSON Schemas
Every provider accepts tool definitions as JSON Schema. Keep schemas small (under 20 fields total per tool), use enums where possible, and add a one-line description per field. The model uses descriptions to choose between tools and to fill arguments correctly.
// Tool: search_workouts (server-side definition)
{
"name": "search_workouts",
"description": "Search the user's saved workouts by name, muscle group, or date range.",
"input_schema": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "Free text query like 'leg day'." },
"muscle_group": {
"type": "string",
"enum": ["legs", "back", "chest", "arms", "core", "full_body"]
},
"since": { "type": "string", "format": "date", "description": "ISO date, inclusive." }
},
"required": ["query"]
}
}Three rules I follow for every tool schema:
- Name tools in
verb_nounform.create_workout, notworkouts. - Mark side-effecting tools clearly in the description (
"Creates a new ..."). The model picks safer behavior when it knows. - Constrain enums and formats. Free-text fields invite hallucinated values.
Streaming tool calls and assistant messages
Streaming is essential for perceived latency. The Flutter side reads SSE from the backend proxy and folds events into the message list as they arrive. The Dart side stays small.
// lib/features/agent/agent_stream.dart
import 'dart:async';
import 'dart:convert';
import 'package:http/http.dart' as http;
Stream<Map<String, Object?>> streamAgent({
required Uri endpoint,
required Map<String, String> headers,
required Map<String, Object?> body,
}) async* {
final req = http.Request('POST', endpoint)
..headers.addAll({...headers, 'accept': 'text/event-stream'})
..body = jsonEncode(body);
final res = await req.send();
if (res.statusCode != 200) {
throw StateError('Agent stream failed: ${res.statusCode}');
}
final lines = res.stream
.transform(utf8.decoder)
.transform(const LineSplitter());
await for (final line in lines) {
if (!line.startsWith('data:')) continue;
final payload = line.substring(5).trim();
if (payload.isEmpty || payload == '[DONE]') continue;
yield jsonDecode(payload) as Map<String, Object?>;
}
}On the BLoC or Riverpod side, fold each event into the message list. Text deltas append to the current assistant bubble. Tool-use events create a new ToolCall entry. Tool-result events flip the badge to done or error. Use a single source of truth for the list so the ListView never flickers.
Connecting to an MCP server from your backend proxy
The backend is the MCP client. Below is a minimal Node/TypeScript Cloud Function that runs a Claude session with both local tools and tools sourced from an MCP server.
// functions/src/agent.ts
import Anthropic from '@anthropic-ai/sdk';
import { Client as McpClient } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
// Start an MCP client that talks to the Linear MCP server.
const mcp = new McpClient({ name: 'flutter-kit-agent', version: '1.0.0' });
await mcp.connect(new StdioClientTransport({
command: 'npx',
args: ['-y', '@linear/mcp-server'],
env: { LINEAR_API_KEY: process.env.LINEAR_API_KEY! },
}));
// List MCP tools and convert them to Anthropic tool format.
const { tools: mcpTools } = await mcp.listTools();
const remoteTools = mcpTools.map((t) => ({
name: t.name,
description: t.description ?? '',
input_schema: t.inputSchema,
}));
// Local Flutter-side tools, defined on the backend.
const localTools = [
{
name: 'search_workouts',
description: "Search the user's workouts by name, muscle group, or date.",
input_schema: { /* as above */ },
},
];
export async function runAgent(userId: string, messages: any[]) {
const tools = [...localTools, ...remoteTools];
let turn = await anthropic.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
tools,
messages,
});
// Plan, act, observe loop with a hard cap on iterations.
for (let i = 0; i < 6 && turn.stop_reason === 'tool_use'; i++) {
const toolUses = turn.content.filter((b: any) => b.type === 'tool_use');
const results = await Promise.all(toolUses.map(async (tu: any) => {
const out = remoteTools.find(r => r.name === tu.name)
? await mcp.callTool({ name: tu.name, arguments: tu.input })
: await runLocalTool(userId, tu.name, tu.input);
return { type: 'tool_result', tool_use_id: tu.id, content: JSON.stringify(out) };
}));
messages = [...messages, { role: 'assistant', content: turn.content }, { role: 'user', content: results }];
turn = await anthropic.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
tools,
messages,
});
}
return turn;
}For the streaming version, swap messages.create for messages.streamand forward each delta to the Flutter client as SSE. The plan-act-observe loop stays the same.
Multi-step agents: plan, act, observe loop
An agent is a loop, not a single call. The model proposes tool calls, you run them, feed the results back, and let it continue until stop_reason is no longertool_use. Three guardrails make this safe in production.
- Hard iteration cap (six is a good default). Without a cap, a confused agent can loop forever and you wake up to a thousand-dollar bill.
- Per-tool timeouts. Each tool call wraps in
Promise.racewith a timeout (8 seconds for HTTP, 20 for heavy work). Slow tools poison the whole session. - Per-session token budget. Track input + output tokens across the loop and cut off when you hit a ceiling. Render an "agent stopped, want to continue?" prompt to the user.
Local tools (Flutter side) vs server tools (backend side)
Not all tools belong on the backend. Some need device context (camera, contacts, current screen state) and must execute in Flutter. The split:
- Server tools. Database queries, API calls, MCP server calls, anything with secrets. The backend runs them and includes the result in the next LLM turn.
- Client tools. Open the camera, navigate to a screen, fill a form, read the current map viewport. The backend forwards the tool call to the Flutter app, the app runs it, the app posts the result back, and the backend continues the loop.
For client tools, the cleanest implementation is a second SSE channel where the backend pushes tool requests to the Flutter client and waits for a POST with the result. Keep client tools deterministic and idempotent so retries are safe.
Permissions and user-in-the-loop confirmations
Any tool that changes state (creates, updates, deletes, sends, charges, books) must require explicit user confirmation. This is non-negotiable. The pattern:
// In the Flutter agent BLoC, intercept dangerous tool calls.
final dangerous = {'create_workout', 'send_email', 'charge_card', 'book_appointment'};
if (event is ToolCallProposed && dangerous.contains(event.name)) {
emit(state.copyWith(awaitingConfirmation: event));
return;
}
// On user confirm, POST to the backend to resume the agent.
Future<void> onConfirm(ToolCall call, bool approved) async {
await api.resumeAgent(
sessionId: state.sessionId,
toolUseId: call.id,
approved: approved,
arguments: call.args,
);
}Always show the tool name in plain language, the arguments the model wants to use, and two buttons: Approve and Cancel. For high-stakes tools (payments, deletion), add an editable form so the user can correct the arguments before approving. The model is right most of the time; this UI catches the times it is not.
Cost and latency: caching, model choice, request batching
Three levers control the cost and feel of your agent.
- Prompt caching. Tool definitions and system prompts can be cached on Anthropic and OpenAI. The cache hit cuts input cost by roughly 90 percent and shaves latency measurably. Put your tool schemas at the start of the prompt and mark them as cacheable.
- Right-sized model. Default to Haiku 4.5, Gemini 2.5 Flash, or GPT-5 mini. Only escalate to Sonnet 4.6, Gemini 2.5 Pro, or GPT-5 when the small model demonstrably fails on your eval set. Most indie apps never need the big models.
- Parallel tool calls. When the model emits multiple tool calls in one turn, run them concurrently with
Promise.allon the backend. This cuts wall-clock latency dramatically.
At indie scale, a well-tuned agent costs around $0.002 to $0.01 per session on Haiku and $0.0005 to $0.003 on Gemini Flash. That is the price of a few rich chat turns plus a few tool calls with cached prompts.
Safety: tool sandboxing, schema validation, refusal handling
Three production safeties:
- Validate every tool argument with Zod or Joi before executing. The model mostly emits valid JSON, but hallucinated enum values and out-of-range numbers do happen. Reject early and feed the error back as a tool result so the model can retry.
- Sandbox MCP server processes. Run each MCP server with the minimum env vars and a per-process resource cap. Treat them as untrusted code that runs in your account.
- Handle refusals. When the model refuses (safety filter, policy), do not loop forever. Show the user a clean "I can't help with that" bubble and offer a related action.
Anti-patterns to avoid
- Do not put LLM API keys in the Flutter app. Bundle strings are not secret. Proxy through a backend, always.
- Do not auto-run state-changing tools. Even "just an email" should require confirmation the first ten times. Add a per-tool "always allow" toggle once trust is established.
- Do not expose 30 tools to the model. Tool selection quality degrades past roughly eight tools. Use routing or sub-agents instead.
- Do not write your own MCP server when one exists. The official Linear, GitHub, Notion, and Postgres servers cover most use cases. Wrap them, do not reinvent them.
- Do not let the loop run unbounded. Always cap iterations and tokens. One stuck session can burn through your monthly budget overnight.
- Do not skip streaming. A non-streaming agent feels broken. Even a two-second pause without visible progress causes drop-off. Stream the first token within 500 ms or render a thinking indicator.
- Do not log raw tool inputs. They contain user data and sometimes credentials. Redact before logging. Always.
What The Flutter Kit ships
The Flutter Kit ships a working agentic chat template: a streaming chat UI with tool-call badges and confirmation prompts, a Cloud Function backend proxy that calls Claude Haiku 4.5 with tool definitions and one MCP server connection, three example tools (search, create, schedule) wired to Firestore, a per-session token budget enforcement layer, and PostHog events for tool selection, approval rate, and cost per session. The chat-only mode also works with OpenAI and Gemini via an adapter layer, so you can swap providers without touching the Flutter side.
$69 one-time, unlimited commercial projects. See every integration on the features page or jump to checkout.
Final recommendation
For a new agentic Flutter app in 2026, start with Claude Haiku 4.5 behind a backend proxy, define three to five tools as JSON Schemas, stream every response, and require user confirmation for any state change. Add MCP servers when you need third-party integrations, not before. Cap iterations and tokens per session. If you do these six things, your agent will feel fast, behave safely, and cost almost nothing at indie scale. Everything else is tuning.