Skip to main content

F2: LLM Landscape

Choosing the right model is the highest-leverage decision in any AI project. This module maps the model universe and gives you a decision framework. For foundational concepts, see F1: GenAI Foundations.

Three Categories of Modelsโ€‹

CategoryDefinitionExamplesSelf-Host?Fine-Tune?
ProprietaryClosed weights, API-only accessGPT-4o, Claude Opus, Gemini ProโŒLimited (OpenAI fine-tuning API)
Open-WeightWeights released, restricted licenseLlama 3.1, Mistral Large, Gemma 2โœ…โœ…
Open-SourceWeights + training code + data, permissive licenseOLMo, Pythia, BLOOMโœ…โœ…
info

"Open-weight" โ‰  "open-source." Llama's license restricts commercial use above 700M MAU. Always check the license before deploying. See the glossary for formal definitions.

OpenAI / Azure OpenAI Familyโ€‹

The default choice for most enterprise workloads via Azure OpenAI Service.

ModelContextStrengthsBest For
GPT-4o128KMultimodal (text+image+audio), strong reasoningGeneral-purpose, complex RAG, agents
GPT-4o-mini128K60ร— cheaper than GPT-4o, fastHigh-volume classification, extraction, routing
GPT-4.11MMassive context, superior instruction followingLong-document analysis, codebase Q&A
GPT-4.1-mini1MCost-efficient 1M contextLarge context at lower cost
GPT-4.1-nano1MFastest, cheapest 4.1 variantEdge, real-time, high-throughput
o1200KDeep chain-of-thought reasoningMath, science, complex logic
o3200KEnhanced reasoning with tool useMulti-step problem solving, coding
o3-mini200KBudget reasoning modelReasoning tasks at lower cost
o4-mini200KLatest compact reasoning modelAgentic tasks with tool use

:::tip Start Small 80% of production workloads run fine on GPT-4o-mini or GPT-4.1-nano. Start there and only upgrade when evaluation metrics show the smaller model fails. FrootAI Play 14 (Cost-Optimized Gateway) implements automatic model routing based on query complexity. :::

Anthropic Claudeโ€‹

ModelContextKey Differentiator
Claude Opus 4200KStrongest reasoning, extended thinking, agentic coding
Claude Sonnet 4200KBest balance of speed and intelligence
Claude Haiku 3.5200KFastest, cheapest โ€” strong for extraction

Claude vs GPT key differences:

  • Claude excels at long-form analysis and nuanced instruction following
  • Claude's extended thinking is visible (scratchpad tokens shown in API)
  • Azure does not host Claude โ€” requires direct API or AWS Bedrock
  • Claude supports system prompts as a first-class feature (not just messages[0])

Meta Llamaโ€‹

The leading open-weight family. Self-hostable on AKS (see FrootAI Play 12).

ModelParametersContextNotes
Llama 3.18B / 70B / 405B128KWorkhorse for self-hosting
Llama 3.21B / 3B128KEdge/mobile deployment
Llama 3.2 Vision11B / 90B128KMultimodal (text + image)
Llama 4 Scout17B active (109B total)10MMixture-of-Experts, massive context
Llama 4 Maverick17B active (400B total)1MMoE, strong multilingual
info

Llama 4 uses Mixture-of-Experts (MoE) โ€” only a fraction of parameters activate per token, giving large-model quality at small-model cost. The "17B active" means inference VRAM is similar to a 17B dense model.

Google Geminiโ€‹

ModelContextKey Feature
Gemini 2.0 Flash1MFast, multimodal, tool use
Gemini 2.5 Pro1MStrongest reasoning, thinking mode
Gemini 2.5 Flash1MCost-efficient with thinking budget control

Gemini models are available on Azure AI via the model catalog (Models as a Service).

Microsoft Phi โ€” Small Language Modelsโ€‹

ModelParametersContextStrength
Phi-414B16KSTEM reasoning rivaling larger models
Phi-3.5-mini3.8B128KLong-context SLM
Phi-3.5-MoE42B (6.6B active)128KMoE efficiency
Phi-Silicaโ€”โ€”On-device (Copilot+ PCs, NPU)
tip

Phi models are ideal for edge deployment (Play 19) and fine-tuning (Play 13) โ€” small enough to train on a single GPU, strong enough for focused tasks.

Model Selection Decision Frameworkโ€‹

Start here: Can GPT-4o-mini / GPT-4.1-nano handle it?
โ”‚
โ”œโ”€ YES โ†’ Use it. Done. ($0.15โ€“$0.40 / 1M tokens)
โ”‚
โ””โ”€ NO โ†’ What's failing?
โ”‚
โ”œโ”€ Reasoning quality โ†’ Try o3-mini or GPT-4o
โ”œโ”€ Long context needed โ†’ GPT-4.1 (1M) or Gemini 2.5 Pro
โ”œโ”€ Multimodal (images) โ†’ GPT-4o or Llama 3.2 Vision
โ”œโ”€ Data sovereignty โ†’ Self-host Llama 3.1 on AKS
โ”œโ”€ Latency critical โ†’ Phi-4 on edge or GPT-4.1-nano
โ””โ”€ Cost critical at scale โ†’ Fine-tune a smaller model (Play 13)
warning

Never choose a model based on benchmarks alone. Always evaluate on YOUR data using YOUR metrics. FrootAI's evaluation framework (Play 17) automates this with groundedness, relevance, and coherence scoring.

Azure OpenAI Pricing Tiersโ€‹

TierHow It WorksBest For
Pay-As-You-Go (PAYG)Per-token billing, shared capacityDev/test, variable workloads
Provisioned (PTU)Reserved throughput units, predictable costProduction with steady traffic
GlobalMicrosoft-managed routing across regionsHighest availability
Data ZoneRegion-pinned for data residencyCompliance-sensitive workloads

Practical Comparison Tableโ€‹

CapabilityGPT-4o-miniGPT-4oGPT-4.1Claude Sonnet 4Llama 3.1 70B
Cost (1M in/out)$0.15/$0.60$2.50/$10$2/$8$3/$15Self-host
Context128K128K1M200K128K
Multimodalโœ…โœ…โœ…โœ…Text only
Fine-tunableโœ…โœ…โœ…โŒโœ…
Self-hostableโŒโŒโŒโŒโœ…
Structured Outputโœ…โœ…โœ…โœ…Via frameworks

Key Takeawaysโ€‹

  1. Default to the smallest viable model โ€” GPT-4o-mini or GPT-4.1-nano for 80% of tasks
  2. Use model routing to send complex queries to expensive models and simple ones to cheap models
  3. Open-weight models unlock data sovereignty, customization, and cost control at scale
  4. Evaluate on your data โ€” benchmark rankings don't predict performance on your specific task
  5. Reasoning models (o1/o3) are for math, logic, and multi-step planning โ€” not general chat

โ† F1: GenAI Foundations | F3: AI Glossary โ†’