Models - CREAO Documentation

Overview

CREAO gives you access to multiple AI models from different providers. Choose the model that best fits your task — whether you need maximum intelligence, a large context window, or cost efficiency. Select a model from the model dropdown at the top of the chat interface. Your choice persists per thread, so you can use different models for different conversations.

Model comparison

All models have full access to the same tools — code execution, web search, image generation, file handling, and all connected skills and integrations.

Model	Provider	Context	Cache	Cost tier	Best for
Claude Fable 5	Anthropic	1M	Yes	Premium	Anthropic’s latest flagship — deepest reasoning and long-horizon agent work (eligible paid users)
Claude Opus 4.8	Anthropic	1M	Yes	Premium	Frontier reasoning and coding with a 1M-token context window
Claude Opus 4.7	Anthropic	1M	Yes	Premium	Hardest software engineering and long-horizon agent work
Claude Opus 4.6	Anthropic	1M	Yes	Premium	Complex reasoning at the same price as Opus 4.7
Fugu Ultra	Sakana AI	1M	Yes	Premium	Multi-agent reasoning for hard, long-horizon work
GPT-5.6 Sol	OpenAI	1M	Yes	Premium	Largest GPT-5.6 model - frontier reasoning and agentic coding
GPT-5.6 Terra	OpenAI	1M	Yes	Standard	Balanced GPT-5.6 reasoning at lower cost
Claude Sonnet 5	Anthropic	1M	Yes	Standard	Best balance of speed and intelligence — the default model
Claude Sonnet 4.6	Anthropic	1M	Yes	Standard	Previous-generation Sonnet
Claude Haiku 4.5	Anthropic	200K	Yes	Economy	Fast, cost-efficient tasks
GPT-5.6 Luna	OpenAI	1M	Yes	Economy	Fastest, most cost-efficient GPT-5.6 model
Gemini 3.1 Pro	Google	1M	Yes	Standard	Advanced reasoning with multimodal input
Gemini 3.5 Flash	Google	1M	Yes	Economy	Fast, agentic Gemini for high-throughput workflows
GPT-5.5	OpenAI	1M	Yes	Standard	Smartest GPT model — frontier agentic coding
GLM 5.2	Z.ai	1M	No	Economy	Strong coding at very low cost — great for iterative sessions
MiniMax M3	MiniMax	1M	Yes	Economy	Sparse-attention reasoning, fast at very long context
Muse Spark 1.1	Meta	1M	Yes	Economy	Meta’s multimodal agentic model for tools and vision

Fugu Ultra delivers output in larger batches rather than token-by-token — it thinks first, then sends a complete response. The chat header shows a notice when Fugu Ultra is selected.

Choosing a model

I want the best quality regardless of cost

Choose Claude Fable 5 — Anthropic’s latest flagship, available to eligible paid users — for the deepest reasoning and long-horizon agent work, or Claude Opus 4.8 for frontier reasoning and coding. GPT-5.6 Sol is available to eligible accounts for frontier GPT reasoning and agentic coding. Fugu Ultra (Sakana) is also available on paid plans for hard, long-horizon multi-agent work — it uses deep reasoning with a 1M-token context window and delivers output in batches. Opus 4.7 and Opus 4.6 are also available if you have existing prompts tuned for a specific version.

I want a good balance of speed, quality, and cost

Claude Sonnet 5 is the default and recommended for most users. It handles coding, analysis, and creative tasks well at moderate cost. GPT-5.6 Terra is a balanced GPT option for eligible accounts. Claude Sonnet 4.6 is still available if you have existing prompts tuned for it.

I want to minimize credit usage

GLM 5.2, GPT-5.6 Luna, Muse Spark 1.1, and MiniMax M3 use very few credits per message with strong 1M-token context options. GLM 5.2 is especially good at reasoning tasks; MiniMax M3 excels at very long-context work. These models are great for iterative sessions where you send many messages.

I need fast responses for simple tasks

Claude Haiku 4.5 is the fastest model. Use it for quick questions, formatting, or lightweight code generation where speed matters more than depth.

Using models via the Developer API

Backend integrations can send an optional model field on Developer API run-creating endpoints. Use GET /v1/models to discover the exact model ids your account can run, or see the Developer API Models guide for request examples and lifecycle policy.

Prompt caching

Most models support prompt caching, which reduces cost and latency on follow-up messages in the same thread. When caching is active, repeated parts of the conversation (system prompt, earlier messages) are served from cache at a reduced rate. Caching happens automatically. You don’t need to configure anything.

Credit costs

Credits are deducted based on actual token usage. The cost tier determines how many credits each message uses:

Cost tier	Credit usage per message	Models
Economy	~0.1–1 credits	GLM 5.2, MiniMax M3, Muse Spark 1.1, GPT-5.6 Luna, Claude Haiku 4.5, Gemini 3.5 Flash
Standard	~1–6 credits	Claude Sonnet 5, Claude Sonnet 4.6, GPT-5.6 Terra, GPT-5.5, Gemini 3.1 Pro
Premium	~5–40 credits	Claude Fable 5, Claude Opus 4.8, Claude Opus 4.7, Claude Opus 4.6, GPT-5.6 Sol, Fugu Ultra

Exact costs depend on the length of your message, the conversation history, and how much the model outputs. You can see your remaining credits in the sidebar.

​Overview

​Model comparison

​Choosing a model

​Using models via the Developer API

​Prompt caching

​Credit costs

Overview

Model comparison

Choosing a model

Using models via the Developer API

Prompt caching

Credit costs