Skip to main content

F3: AI Glossary

Quick-reference glossary of the 50 most important AI/ML terms. Each term is tagged with the FROOT layer it belongs to: Foundations, Reasoning, Orchestration, Operations, or Transformation.

:::tip How to Use This Glossary Use Ctrl+F to search for a specific term. Layers map to FrootAI modules:

  • F = Foundations (F1, F2) β€” model mechanics & architecture
  • R = Reasoning β€” prompts, RAG, grounding, deterministic AI
  • O = Orchestration β€” agents, tools, frameworks, memory
  • Op = Operations β€” infrastructure, deployment, monitoring, observability
  • T = Transformation β€” fine-tuning, evaluation, responsible AI, alignment :::

:::info Layer Legend Quick Reference

CodeFROOT LayerFocus Area
FFoundationsModel mechanics, architecture, math, tokenization
RReasoningPrompts, RAG, grounding, guardrails, search
OOrchestrationAgents, tools, frameworks, memory, delegation
OpOperationsInfrastructure, deployment, monitoring, cost
TTransformationFine-tuning, evaluation, responsible AI, alignment
:::

Terms A–Z​

TermLayerDefinition
AgentOAn LLM-powered system that can reason, plan, and take actions using tools. Unlike simple chat, agents operate in loops β€” observe β†’ think β†’ act β†’ observe.
AlignmentTThe process of making AI models behave according to human values and intentions. Techniques include RLHF, DPO, and constitutional AI.
AttentionFThe mechanism that lets transformers weigh the relevance of every token against every other token. Self-attention is the core of all modern LLMs.
AutoregressiveFA generation strategy where each new token depends on all previously generated tokens. GPT-family models are autoregressive β€” they generate left-to-right, one token at a time.
BPEFByte-Pair Encoding. The tokenization algorithm used by most LLMs. Iteratively merges the most frequent adjacent byte pairs into single tokens. See F1.
Chain-of-Thought (CoT)RA prompting technique that instructs the model to show its reasoning step-by-step before giving a final answer. Dramatically improves accuracy on math, logic, and multi-step problems.
ChunkingRSplitting documents into smaller segments for RAG retrieval. Strategies include fixed-size (512 tokens), semantic (by topic), and recursive (by structure). Chunk size profoundly affects retrieval quality.
Context WindowFThe maximum number of tokens a model can process in a single request (input + output). GPT-4o: 128K, GPT-4.1: 1M. Exceeding it causes silent truncation.
CopilotOMicrosoft's AI assistant brand. GitHub Copilot assists with code; Microsoft 365 Copilot assists with productivity. Built on GPT-4o with tool integration.
Cosine SimilarityFA metric measuring the angle between two vectors (range: -1 to 1). Used to compare embeddings β€” 1.0 = identical meaning, 0 = unrelated. Core to semantic search and RAG.
Deterministic AIRMaking AI outputs reproducible and predictable. Achieved via temperature=0, seed pinning, structured output schemas, and guardrails. See FrootAI Play 03.
DPOTDirect Preference Optimization. A simpler alternative to RLHF that fine-tunes models directly on human preference pairs without training a separate reward model.
EmbeddingsFDense vector representations of text (e.g., 1536 or 3072 dimensions). Semantically similar texts have nearby vectors. Used for search, RAG, clustering, and classification.
Encoder/DecoderFTwo transformer architectures. Encoders (BERT) create representations for understanding. Decoders (GPT) generate text autoregressively. Encoder-decoder (T5) does both.
EvaluationTSystematic measurement of AI quality. Key metrics: groundedness, relevance, coherence, fluency, safety. FrootAI uses automated eval pipelines with thresholds (β‰₯4.0/5.0).
Few-ShotRProviding 2–5 input/output examples in the prompt so the model learns the pattern in-context. More reliable than zero-shot for formatting and classification tasks.
Fine-TuningTTraining a pre-trained model on domain-specific data to specialize its behavior. Cheaper than training from scratch. Methods: full fine-tuning, LoRA, QLoRA. See Play 13.
Foundation ModelFA large pre-trained model (GPT-4o, Llama 3.1, Claude) designed to be adapted for many downstream tasks. Trained on broad data; specialized via prompting or fine-tuning.
Function CallingOAn LLM capability where the model outputs structured JSON to invoke external functions/APIs. The model doesn't execute the function β€” your code does. Enables tool use.
GPUOpGraphics Processing Unit. The parallel-computing hardware that powers AI training and inference. Key metric: VRAM (memory). A100: 80GB, H100: 80GB, H200: 141GB.
GroundingRConnecting AI responses to verified source data (documents, databases, APIs) to reduce hallucination. RAG is the primary grounding technique. See Play 01.
GuardrailsRConstraints applied to AI inputs and outputs β€” content filters, token limits, schema validation, blocklists. Implemented via Azure AI Content Safety or custom rules in guardrails.json.
HallucinationRWhen an AI generates plausible-sounding but factually incorrect information. Mitigated by grounding, RAG, low temperature, and groundedness evaluation.
Hybrid SearchRCombining keyword search (BM25) with vector search (embeddings) for retrieval. Typically outperforms either alone. Azure AI Search supports this natively via search_type: "hybrid".
InferenceFRunning a trained model to generate predictions/outputs. What you do when you call an API. Contrast with training (learning weights from data). See F1.
In-Context LearningRThe ability of LLMs to learn new tasks from examples provided in the prompt, without any weight updates. Encompasses zero-shot, few-shot, and many-shot prompting.
JSON ModeRA model setting that guarantees the output is valid JSON. OpenAI's response_format: { type: "json_object" }. More reliable: Structured Outputs with a JSON schema.
KV CacheFKey-Value cache. An optimization that stores previously computed attention keys and values to avoid recomputation during autoregressive generation. Reduces latency but consumes VRAM.
Knowledge CutoffFThe date after which a model has no training data. GPT-4o: Oct 2023. Information after this date requires RAG or tool use to access.
LangChainOA popular open-source framework for building LLM applications. Provides abstractions for chains, agents, tools, and memory. Python and JavaScript versions available.
LoRATLow-Rank Adaptation. A parameter-efficient fine-tuning method that freezes the base model and trains small rank-decomposition matrices. Reduces VRAM by 10–100Γ— vs full fine-tuning.
MCPOModel Context Protocol. An open standard (by Anthropic) for connecting AI models to external tools and data sources. FrootAI's MCP server exposes 25 tools. See F4.
Memory (Agent)OHow agents persist information across turns. Short-term: conversation history in context. Long-term: vector store or database. Semantic Kernel uses ChatHistory + plugins.
Multi-AgentOSystems where multiple specialized AI agents collaborate on complex tasks. Patterns: supervisor, swarm, pipeline, debate. See Play 07 and Play 22.
Multi-ModalFModels that process multiple input types β€” text, images, audio, video. GPT-4o and Gemini are natively multimodal. Llama 3.2 Vision adds image understanding.
Next-Token PredictionFThe core training objective of autoregressive LLMs. Given all preceding tokens, predict the probability distribution of the next token. This simple objective produces emergent capabilities.
ONNXOpOpen Neural Network Exchange. A cross-platform model format for optimized inference. Used with ONNX Runtime for CPU/GPU deployment without framework dependencies.
ParametersFThe learnable weights in a neural network. "7B" = 7 billion parameters. More parameters β‰ˆ more capability but higher compute cost. See VRAM formula in F1.
Prompt EngineeringRThe practice of designing effective instructions for LLMs. Techniques: system prompts, few-shot examples, chain-of-thought, structured output, role-playing.
QLoRATQuantized LoRA. Combines 4-bit quantization of the base model with LoRA adapters. Enables fine-tuning of 70B models on a single 48GB GPU.
QuantizationFReducing the numerical precision of model weights (FP32β†’FP16β†’INT8β†’INT4) to shrink VRAM usage and increase inference speed. Trade-off: some quality loss.
RAGRRetrieval-Augmented Generation. A pattern that retrieves relevant documents from a knowledge base and includes them in the LLM prompt for grounded answers. See Play 01.
RLHFTReinforcement Learning from Human Feedback. A training technique where humans rank model outputs and a reward model is trained on those preferences to fine-tune the LLM.
Semantic KernelOMicrosoft's open-source SDK for AI orchestration. Supports plugins, planners, memory, and multi-model routing. The recommended orchestration layer for Azure AI apps.
Structured OutputRConstraining LLM output to conform to a JSON schema. OpenAI's response_format: { type: "json_schema", json_schema: {...} } guarantees schema-valid output with 100% reliability.
TemperatureFA generation parameter (0–2) controlling output randomness. 0 = greedy/deterministic, 0.7 = balanced, 1.5+ = highly creative. See F1.
TokenizationFThe process of converting text into tokens (sub-word integers) that models can process. Different models use different tokenizers β€” tiktoken for OpenAI, SentencePiece for Llama.
TransformerFThe neural network architecture (2017, "Attention Is All You Need") underlying all modern LLMs. Uses self-attention to process entire sequences in parallel.
Vector DatabaseRA database optimized for storing and querying high-dimensional vectors (embeddings). Examples: Azure AI Search, Pinecone, Weaviate, Qdrant, pgvector. Core infrastructure for RAG.
Zero-ShotRAsking a model to perform a task with no examples β€” only instructions. Works well for capable models (GPT-4o) on common tasks. Falls back to few-shot when accuracy drops.

:::info Didn't find your term? The full FrootAI glossary in the MCP server covers 200+ terms. Run npx frootai-mcp@latest and use the lookup_term tool, or browse the Learning Hub. :::

Common Confusions​

People SayThey Actually MeanCorrect Term
"The AI understands me"Statistical pattern matching on tokensNext-Token Prediction
"The model remembers"Previous turns are re-sent in the context windowIn-Context Learning
"Fine-tuning the prompt"Iterating on the system/user message textPrompt Engineering
"Open-source model"Weights released but license may restrict commercial useOpen-Weight (usually)
"The AI is hallucinating"The model generated ungrounded but plausible textHallucination
"RAG database"A vector store used for retrieval-augmented generationVector Database

Further Reading​

← F2: LLM Landscape | F4: GitHub Agentic OS β†’