BUILD · Jun 10, 2026

Build an agentic incident triage assistant with AWS Quick and New Relic

Build an agentic incident triage assistant with AWS Quick and New Relic. Automate the first 5 minutes of incident response. gather context, run diagnostics, suggest remediation from alerts.

Agent-ready: drop this post into Claude Code or Codex

TL;DR: My pager went off at 2am. By the time I logged in, the logs had rotated, the metrics window had passed, and I spent 30 minutes reconstructing what happened. This architecture fixes that. Alert fires, agent gathers context, runs diagnostics, suggests remediation. All before you finish your coffee.

Incident response follows a predictable pattern. An alert fires, and the engineer on call spends the first 5-10 minutes gathering context: checking dashboards, pulling logs, reviewing recent deployments. Only then do they start diagnosing the actual problem. An agentic triage assistant can automate that context-gathering phase.

Key takeaways:

  • The first 5 minutes of incident response is context gathering: highly automatable
  • Architecture: alert → agent orchestration → observability data → diagnostics → remediation suggestion
  • AWS Quick handles the agent loop, Bedrock AgentCore provides the LLM
  • Works best for well-understood failure patterns with clear diagnostic procedures
  • Always include a human-in-the-loop for remediation: agents suggest, humans approve

What is the architecture of an agentic incident triage assistant?

The pattern from the AWS/New Relic reference architecture breaks down into four stages:

1. Alert triggers the agent. A New Relic alert fires and sends a webhook to Amazon Quick. The alert payload includes the incident type, affected service, severity, and a link to the related dashboard.

2. Context gathering. The agent receives the alert and immediately starts collecting context: recent logs from the affected service, error rates from New Relic metrics, recent deployment activity, and related alerts from the past hour.

3. Diagnostic execution. Based on the incident type, the agent runs predefined diagnostic procedures. For a high-latency alert, it checks database query performance, CPU use, and upstream dependency latency. For an error rate spike, it looks at recent code deployments and error log patterns.

4. Remediation suggestion. The agent compiles its findings into a structured triage report: what’s happening, what’s changed, likely causes, and suggested remediation steps. This goes to the on-call engineer for review.

How can I extend this incident triage architecture?

The interesting part is what happens after the initial implementation. Once you have this pattern running, you can extend it:

  • Playbook automation. For well-understood failure modes, the agent can execute remediation steps directly, restart services, roll back deployments, scale resources, with human approval for each action.

  • Post-mortem generation. After the incident is resolved, the agent can automatically generate a post-mortem draft from the timeline, context data, and remediation actions taken.

  • Pattern learning. Over time, the agent can learn which diagnostic steps are most useful for each incident type and prioritize them accordingly.

The AWS and New Relic reference architecture post is worth reading for the specific implementation details. But the pattern itself, alert-driven agent orchestration with observability tool integration, is applicable to any stack.

I’ve covered agent deployment patterns and monitoring agents in production: the incident triage pattern fits naturally into both.

FAQ

What is an agentic incident triage assistant? An AI agent that automatically responds to incidents by gathering context (logs, metrics, traces), running diagnostic checks, and suggesting remediation steps. This reduces the first 5 minutes of manual incident response.

What AWS services are used? Amazon Quick (agent orchestration), Bedrock AgentCore (LLM integration), and integration with New Relic for observability data.

Can this pattern work outside AWS? Yes: the architecture pattern of alert → context gathering → diagnostics → suggestion works with any observability platform and agent framework.


This article was published on Agentic Up (https://agenticup.dev): practical guides for developers and founders building with AI agents. Reach me at hello@agenticup.dev.

Newsletter

Get the brief on AI agents

Practical posts on shipping agents, automating work, and building in public. No hype, no fluff.

Contact: hello@agenticup.dev