What is context compaction?

Summarizing old conversation history to fit within the agent's context window limit.

How to guide AI agents and ship web apps: a vocabulary

Q: What is a design token?

A design decision stored as data, like --color-primary: #3cffd0. Tokens enable systematic changes across a product.

A scannable reference of terms for working with AI coding agents and shipping production web apps. Steering, agent architecture, typography, layout, motion, design fundamentals.

TL;DR: A pillar reference of terms for two skills that compound: directing AI coding agents effectively, and knowing the design language to ship apps that look intentional. Consider it the menu for the kitchen: you need to know what dishes you can order before you start cooking.

Key takeaways:

Agent conversation skills and design vocabulary compound: each makes the other more valuable

This is a pillar reference: each section links to deeper posts where they exist on this site

Use these terms as shared language with your agent for faster iterations

How do you steer an AI coding agent effectively?

System prompt. The persistent instructions that define the agent’s role, constraints, and behavior for the entire session. Getting this right prevents more problems than any amount of mid-conversation correction. Write it before you start the task, not during. See the Vertical Agent Method.

Task decomposition. Breaking a complex request into steps the agent can handle one at a time. Instead of “build a todo app”, try “create the data model, then the API routes, then the UI.” Each step gets the agent’s full focus.

Context window. The limited space for conversation history, instructions, and tool outputs. Once full, the agent starts forgetting earlier context. Manage this by keeping instructions tight and compacting history. See the full post.

Chain-of-thought. Asking the agent to reason step by step before answering. Reduces errors on complex tasks significantly. Most coding agents do this implicitly, but for logic-heavy problems, explicitly asking “think through this step by step” improves output quality.

Few-shot prompting. Providing examples of the desired output format before asking the agent to generate its own. Works better than abstract descriptions for code generation, data formatting, and content writing.

Negative prompting. Telling the agent what not to do. “Do not use any external dependencies” is more effective than “prefer a lightweight approach.” Be specific about constraints.

Diff review, Reading the agent’s proposed changes line by line before applying them. The single most effective quality gate, don’t trust, verify. See building a code review agent.

What is the agent loop and how does it work?

Agent loop. The fundamental cycle: receive input → decide action → execute tool → process result → repeat. Every agent harness implements this. Understanding it helps you debug why your agent did something unexpected.

Tool selection. How the agent decides which tool to call based on the task and available tool descriptions. The quality of your tool descriptions directly determines whether the agent picks the right tool. Short descriptions get read. Long ones get ignored. See how agents use your SDK.

Streaming. Receiving output token by token as it’s generated. Gives you early visibility into the agent’s thinking and lets you cancel early if it’s going the wrong direction. Essential for interactive use.

Structured output. Forcing the agent to return data in a specific format like JSON or XML. Use this when the agent’s output feeds into another system or tool. Without it, parsing freeform text is fragile.

Function calling. The agent invoking a predefined function with typed arguments rather than generating freeform text. More reliable than hoping the agent outputs valid code. See the OpenAI function calling tutorial.

Temperature. Controls randomness in output. Low values (0-0.3) produce deterministic, focused responses: use for code generation and factual tasks. High values (0.7-1.0) produce creative, varied responses: use for brainstorming and content variation.

How do agents manage memory and context?

Agent memory, Information the agent retains across conversations. Short-term memory is the current session. Long-term memory persists between sessions, useful for user preferences, past decisions, and learned patterns. See branching sessions.

Context compaction. Pruning or summarizing old conversation history to stay within the context window limit. Without this, long sessions degrade as the agent forgets early instructions. See context window management.

Grounding. Providing specific documents, schemas, or examples the agent must reference instead of relying on its training data. Without grounding, the agent may use outdated information or hallucinate APIs that don’t exist. See RAG patterns.

What are the key components of agent architecture?

Agent harness, The runtime that manages the agent loop, tool execution, credential management, and session persistence. It’s the operating system for your agent, choosing the right one matters more than the model in many cases. See the 15 jobs.

State machine. A model that tracks the agent’s current state and valid transitions between states. Prevents the agent from calling tools in the wrong order or repeating actions. Essential for production reliability. See build one in a weekend.

Policy gate. A security check that runs before every tool call, deciding whether the action is allowed. Fail-closed: if the gate can’t decide, deny the action. Prevents the agent from deleting databases or sending unauthorized emails. See the policy gate post.

Human-in-the-loop. Requiring human approval for certain actions before the agent executes them. Use for destructive operations (deletes, writes, financial transactions) or decisions that need judgment.

Multi-agent delegation. Splitting a complex task across specialized agents, each with its own context and tools. Useful when a task has clearly distinct subtasks that benefit from focused attention. See multi-step workflows and framework comparisons.

How do tools and MCP extend agent capabilities?

MCP, Model Context Protocol: a standardized interface for connecting agents to external tools, data sources, and services. Think of it as USB-C for AI agents, one protocol that any compatible agent can use to access any compatible tool.

Skill, A reusable bundle of instructions and tools that the agent can load on demand. In Hermes, skills live in ~/.hermes/skills/, each skill is a directory with a SKILL.md that tells the agent when and how to use it. See how to set up Hermes.

Tool call. The agent invoking an external function: search the web, read a file, execute code, call an API. Each tool call consumes context window space for both the request and the response. Watch your token budget.

Tool description. The text that tells the agent what a tool does and when to use it. Write these for the agent, not for humans: be explicit about the tool’s input, output, and when it’s appropriate to call it. See how agents consume your SDK.

How do you ensure quality and safety in agents?

Hallucination, The agent generating false information that sounds plausible. Not fixable by better prompting alone, requires grounding in reliable sources, confidence thresholds, and occasional fact-checking. See preventing hallucinations.

Prompt injection. An attacker crafting input that overrides the agent’s system prompt or instructions. The most common AI security vulnerability. Mitigate by validating tool inputs, never exposing raw system prompts, and using policy gates.

Failure mode. Predictable ways agents break: laziness (does minimal work), goal drift (loses focus), hallucination (makes things up), tool looping (calls the same tool repeatedly). Recognizing these patterns helps you intervene early. See dynamic workflows and error handling patterns.

Token budget. The maximum number of tokens allocated for a task. Every tool call, every response, every piece of context costs tokens. Track it per task and set hard limits to prevent runaway costs. See cost optimization tips.

What workflow patterns can agents follow?

Sequential chain, Steps executed one after another, where each step depends on the previous one. The simplest workflow pattern, use it when tasks have clear dependencies. See multi-step workflows.

Parallel fan-out, Multiple agents work on independent subtasks simultaneously, then merge results. Use when you have tasks that don’t depend on each other, researching multiple topics, reviewing different files, generating alternative solutions.

Loop-until-done. The agent repeats a task until a termination condition is met. Useful for iterative tasks like code review feedback loops, content refinement, or search that needs depth.

How do you deploy an AI agent to production?

Docker. Containerizing the agent so it runs consistently across environments. A Dockerfile ensures your agent’s dependencies, runtime, and configuration are the same on your laptop and in production. See the beginner’s hosting guide.

VPS. A virtual private server for hosting agents that have outgrown platforms. More control, lower cost at scale, and the ability to run multiple agents or supporting services on one machine. See deployment server setup.

Health check. An endpoint the agent exposes (typically /health) that returns 200 when running. Your hosting platform pings this every minute and restarts the agent if it fails. The simplest and most important production pattern.

CI/CD. An automated pipeline that runs tests and deploys on every code change. A git push to main can build, test, and deploy your agent with zero downtime. See deployment guide.

Monitoring. Tracking cost per request, error rates, response times, and token usage. Without monitoring, you won’t know your agent is broken until a user tells you. See logging and monitoring.

How should I choose typography for my web app?

Kerning. Space between individual letter pairs. Good typefaces handle this automatically, but logos and large headlines sometimes need manual tuning.

Tracking. Uniform letter-spacing across a range of text. Tighten for headlines, loosen for uppercase text at small sizes to improve readability.

Leading. Vertical space between lines of text. Tighter for headlines (1.0-1.2), looser for body text (1.5-1.7) for comfortable reading.

Type scale. A consistent set of font sizes with harmonic ratios (typically 1.25 or 1.333). Using a scale ensures any combination of text sizes looks intentional.

Weight. Character thickness from thin (100) to black (900). Use weight to create hierarchy without changing size. Regular (400) for body, semibold (600) for subheadings, bold (700) for headings.

Widow. A single word left alone on the last line of a paragraph. text-wrap: balance or text-wrap: pretty in CSS fixes it automatically.

Variable font. A single font file containing multiple weights, widths, or styles along adjustable axes. Reduces page weight and enables smooth transitions between weights.

Clamp. A CSS function that sets a value between a minimum and maximum, scaling smoothly between breakpoints. font-size: clamp(16px, 2vw, 24px) is the standard responsive type pattern.

How do I design a color system for my app?

sRGB, The standard color space for the web since the 90s. All hex color values live here. Its gamut is limited, there are colors modern displays can show that sRGB can’t describe.

P3. A wider color gamut that modern displays support. Use color: color(display-p3 0.5 0.5 0.5) for access to more vibrant colors. Fall back to sRGB for older browsers.

Contrast ratio, The difference between foreground and background luminance. WCAG AA requires 4.5:1 for normal text, 3:1 for large text. Check your color pairs, what looks readable on your monitor may not be for others.

How should I structure my app layout?

Gap. The space between flex or grid children. Use gap instead of margin on children. It’s simpler, more predictable, and handles wrapping correctly.

Negative space, The empty area around and between elements. Not wasted space, it defines relationships. Elements close together are related. Elements far apart are separate. The most underrated design tool.

Flexbox. A one-dimensional layout model for distributing space and aligning content in a row or column. Use for navigation bars, card rows, centered content, and any layout that flows in one direction.

Layout shift. Elements jumping as the page loads because images, fonts, or embeds didn’t reserve space. Fix by setting explicit width and height on images, using aspect-ratio for media, and preloading fonts.

How do interaction patterns affect user experience?

Hover state, Visual feedback when the cursor is over an interactive element. Keep it subtle: a color shift, a shadow change, or a slight scale. Avoid full animations on hover, they feel slow.

Focus state. Visual feedback when an element is selected via keyboard navigation (Tab key). Use :focus-visible instead of :focus so the focus ring only shows for keyboard users, not mouse clicks. Never remove focus styles entirely.

Loading state. What the user sees while content loads. Reserve space for the content before it loads to prevent layout shift. Use skeleton loaders for content-heavy areas, spinners for actions.

Empty state. What the user sees when there’s no data yet. Never show a blank screen: show a helpful message explaining what goes here and a clear call to action. “No invoices yet: create your first one.”

Error state. What the user sees when something goes wrong. Explain the problem in plain language, not an error code. Offer a way to retry or recover. “Something went wrong while loading your data. Try again.”

Disabled state, An element that exists but can’t be interacted with. Reduce opacity to 50-60% and remove pointer events. The disabled reason should be obvious from context, if it’s not, add a tooltip explaining why.

How should I use motion design in my app?

Easing. The acceleration curve of an animation. ease-out for elements entering (fast start, slow end). ease-in for elements exiting (slow start, fast end). ease-in-out for elements that both enter and exit.

Duration. How long an animation takes. UI animations: 150-300ms. Longer than 500ms feels slow and annoying. Shorter than 100ms is too fast to be noticed. Consistency matters more than perfect timing.

Stagger, Delaying animations of sibling elements so they don’t all move at once. A list fading in item by item. Keep the delay small, 20-50ms between each. Too much stagger feels slow.

Spring. A physics-based animation where elements move as if connected to a spring. Feels more natural than duration-based easing for interactive elements like drag-and-drop and toggles. Supported by motion/react and Framer Motion.

Reduced motion. The prefers-reduced-motion: reduce media query that disables animations for users with vestibular disorders. Required for accessibility. Wrap every non-essential animation in this query.

How should I structure information architecture?

Progressive disclosure. Revealing complexity only when the user needs it. Advanced settings behind an “Advanced” toggle. Detailed information behind a “Show more” link. Users should not be overwhelmed by everything at once.

Hierarchy, The visual ordering of importance on a page. Achieved through size, color, spacing, and position. The most important element should be the most visually prominent, if everything is emphasized, nothing is.

Mental model. The user’s internal understanding of how your product works. A task manager should feel like a to-do list, not a database. The closer your interface matches their mental model, the less they need to learn.

Wayfinding. How users understand where they are in your app and how to get where they want. Breadcrumbs, active navigation states, and clear page titles. Users should never have to guess which page they’re on.

What tools should I use for web app development?

Design system. A collection of reusable components, patterns, and guidelines ensuring visual and functional consistency. The source of truth for every UI decision. Without one, every screen is a custom one-off.

Tokens, Design decisions stored as data: --color-primary: #3cffd0. Using tokens instead of hardcoded values makes systematic changes possible, change one token, update the entire product.

Open Graph. The metadata controlling how a link previews when shared on social media and messaging apps. Often an afterthought, but it’s the first impression most people get of your site. Title, description, and image all matter.

FAQ

What is the agent loop? The fundamental cycle every agent follows: receive input, decide action, execute tool, process result, repeat.

What is MCP? Model Context Protocol : a standardized interface for connecting agents to external tools, data sources, and services.

What’s the difference between voice and tone? Voice is your product’s consistent personality. Tone is how that voice adapts to context : apologetic for errors, celebratory for success.

What is a design token? A design decision stored as data, like —color-primary: #3cffd0. Tokens enable systematic changes across a product.

Cursor vs Claude Code vs Copilot. A six-month comparison of the three major AI coding tools on real development tasks
Best open source AI tools for indie hackers in 2026. Production-ready open source tools evaluated for solo developers
How to set up Hermes Agent. Installing, configuring, and extending the open-source CLI agent from Nous Research

O’Reilly’s AI agents stack maps the full vocabulary of agent architecture components for 2026.

This article was published on Agentic Up (https://agenticup.dev): practical guides for developers and founders building with AI agents. Reach me at hello@agenticup.dev