Building Production-Ready AI Apps with Flutter and Gemini

Architecture overview

Flutter UI Layer

↓

Repository / Use Case Layer

↓

Gemini Service (gemini-1.5-pro)

↓

Response Handler + Streaming

Flutter is the best framework for building AI apps that run everywhere. One codebase, Android, iOS, Web, Desktop — and Gemini's API is clean enough that integrating it well is genuinely fun. Here's the practical guide I wish I'd had.

Setting up the Gemini dependency

Google's official package is google_generative_ai. Add it to your pubspec.yaml:

dependencies:
  google_generative_ai: ^0.4.3
  flutter_riverpod: ^2.5.1

Never hardcode your API key. Use flutter_dotenv or build flavours to inject it via environment variables.

The architecture that actually works

The temptation is to call the Gemini API directly from your widget. Resist it. The architecture that scales is:

GeminiService — a plain Dart class that wraps the API
ConversationRepository — manages message history and state
ChatNotifier (Riverpod) — exposes state to the UI
ChatScreen — renders messages and handles input

Streaming responses

This is where most tutorials fall down. Streaming is what makes AI apps feel fast. Here's the pattern:

final response = model.generateContentStream(
  [Content.text(userMessage)],
);

await for (final chunk in response) {
  final text = chunk.text;
  if (text != null) {
    // Update state incrementally
    ref.read(messageProvider.notifier).appendText(text);
  }
}

Pair this with a StreamBuilder or a Riverpod StreamNotifier, and your UI updates character by character — exactly like ChatGPT.

Multimodal input: images + text

Gemini 1.5 Pro is natively multimodal. This is a massive opportunity for mobile apps, since your users have a camera. Here's how to send an image:

final image = await ImagePicker().pickImage(source: ImageSource.camera);
final bytes = await image!.readAsBytes();

final content = [
  Content.multi([
    TextPart('What's in this image?'),
    DataPart('image/jpeg', bytes),
  ])
];

final response = await model.generateContent(content);

Managing context and conversation history

Gemini's API is stateless — you send the full conversation history with each request. For long conversations, you'll hit token limits. The practical approach is a sliding window: keep the last N exchanges, always include a system prompt, and summarise older history if needed.

Error handling and rate limits

Production apps need graceful degradation. Handle GenerativeAIException for API errors, implement exponential backoff for rate limit responses, and always give users feedback when something goes wrong.

Testing AI features

Don't mock the model — mock your GeminiService layer. This lets you write unit tests for your business logic without hitting the API. For integration tests, use actual API calls but mark them as integration tests excluded from normal CI runs.

The cleanest Flutter AI apps I've seen treat the LLM as an infrastructure dependency — like a database — not as the main character of the architecture.

At Roboto Systems, we use this exact pattern across all our Roboto AI apps. The code you write once for Roboto Cart AI works in Roboto Notes AI with minimal changes.

Building Production-Ready AI Apps with Flutter and Gemini

Setting up the Gemini dependency

The architecture that actually works

Streaming responses

Multimodal input: images + text

Managing context and conversation history

Error handling and rate limits

Testing AI features

Building AI apps for every platform

More articles

The Rise of AI Agents: What Every Developer Should Know in 2025

On-Device LLMs: Running AI Locally on Android and iOS

Prompt Engineering for App Developers: A Practical Field Guide