All Posts

72 posts on AI agents, coding, and the craft of shipping software.

TASK-001THINK
9 min read · Jun 14, 2026
Were treating AI agents like magic tricks instead of software

AI agents fail in five predictable ways. Not because the model is bad. because we treat them like magic tricks instead of software. Heres what goes wrong and how to design for it.

TASK-002THINK
10 min read · Jun 14, 2026
Continual learning in mid-2026: memory layers, dreaming agents, and the race to fix AIs biggest limitation

Models that forget. Agents that cant learn from experience. The continual learning landscape in 2026 has three competing approaches, and the most promising one sounds like science fiction.

TASK-003BUILD
9 min read · Jun 14, 2026
How to build model-agnostic agents that survive a provider shutdown

Anthropic had to pull Fable 5 for all customers. If your agent depends on one model, you're one government order away from a full rewrite. Here's the architecture that prevents that.

TASK-004SHIP
14 min read · Jun 13, 2026
The open-source AI model landscape, June 2026

Benchmarks tell you what's technically capable. Production adoption tells you what actually works at scale. These are two different lists. Here's the 2026 open-source landscape ranked for real workloads.

TASK-005BUILD
9 min read · Jun 12, 2026
Is Your Agent Extension Actually Working?

Tool invocation looks like success. But if your agent produces the same output without your extension, your extension is drag, not lift. Here's how to measure it.

TASK-006SHIP
8 min read · Jun 12, 2026
Your AI Agent Just Scaffolded a Project from 2020

Your AI agent just scaffolded a project from 2020. It saw exit code 0, files appeared, and moved on. Here's why npm silently downgrades packages and what to do about it.

TASK-007SHIP
7 min read · Jun 12, 2026
Apple Just Entered the AI Agent Game. Here's What Changed

Apple announced Core AI for on-device LLMs, Xcode 27 with multi-model agentic coding, and free AI for indie developers. Three announcements that change the AI agent landscape.

TASK-008BUILD
8 min read · Jun 12, 2026
Making FlashAttention-4 Faster for Inference

Modal's engineering team made three targeted changes to FlashAttention-4 that improved inference throughput by up to 4.37x. split KV parallelism, FP8 input support, and arbitrary KV page sizes.

TASK-009BUILD
6 min read · Jun 12, 2026
Automation Templates: making cron jobs smart with Hermes Agent

Cron jobs are dumb. They run at 2am and dump output to a file you never read. Hermes Agent's Automation Templates fix that. an LLM evaluates the output, decides if it matters, and ships a summary to Telegram. Here's how to set one up in 5 minutes.

TASK-010THINK
7 min read · Jun 12, 2026
JetBrains Just Ranked Every Agentic Framework. Here's What They Missed

JetBrains ranked 10 agentic frameworks by orchestration paradigm, multi-agent support, memory, and human-in-the-loop. Their analysis is thorough. But it misses the question that matters most for production.

TASK-011SHIP
6 min read · Jun 12, 2026
Kimi K2.7 Code: the first open-source model that competes with Claude Code

Kimi K2.7 Code is the first open-source model I'd actually use in an agent loop. It drops into Claude Code's CLI with one env var change. It scores within striking distance of Fable 5 on coding benchmarks at half the price. And the weights are open.

TASK-012SHIP
8 min read · Jun 12, 2026
MiniMax M3: open-weights coding, 1M context, multimodality at 12x less than GPT

MiniMax M3 is the first open-weights model that doesn't make you choose. Frontier coding? 59% on SWE-Bench Pro (beats GPT-5.5). 1M context? Sparse attention makes it affordable. Multimodal? Native, not bolted on. At $0.60/M tokens, it's 12x cheaper than GPT-5.5.

TASK-013SHIP
7 min read · Jun 12, 2026
OpenAI Just Turned Codex into an Agent Platform

OpenAI is acquiring Ona to give Codex persistent, secure cloud environments. Codex is no longer just a coding assistant. it's an agent platform. Here's what changes.

TASK-014THINK
8 min read · Jun 12, 2026
The Proactive Agent Problem: What Claude Fable Got Right

Claude Fable 5 is relentlessly proactive. Two days with it reveals what this means for AI agent builders. Proactivity is a feature when goals are clear and a liability when they are not.

TASK-015THINK
7 min read · Jun 12, 2026
When a 'worse' model beats a frontier model for agent work

I replaced Claude Opus 4.8 with a cheaper model on three production agent loops. Two got faster. One got more reliable. The benchmarks said this shouldn't work. The benchmarks were wrong.

TASK-016BUILD
8 min read · Jun 11, 2026
DiffusionGemma: hands-on with Google's 4x faster text model

Google's DiffusionGemma generates text through diffusion. denoising blocks of 256 tokens in parallel. reaching up to 1000 tokens/s on an H100. Here's how it works, how to run it, and what it means for local AI.

TASK-017SHIP
9 min read · Jun 11, 2026
Every Anthropic model name, ranked

Sam Wilkinson's satirical HN post extrapolates Anthropic's naming scheme into absurd territory. Aphorism, Marginalia, Diatribe, Terms of Service. But the real names tell a story about strategy. Here's every one ranked.

TASK-018THINK
8 min read · Jun 11, 2026
Spec-Driven Development and the Vertical Agent Method

Microsoft just published their Spec-Driven Development framework. It says the same thing the Vertical Agent Method has been saying: specs as shared ground truth between humans and AI agents beat vibe coding every time.

TASK-019THINK
6 min read · Jun 10, 2026
Your AI agent's memory is a privacy risk: new ICML research

A new paper from ICML 2026 studies deployment-time memorization in AI agents. Key finding: summarization cuts extraction by 76%, but deleted information stays recoverable in ~20% of cases through derived memory tiers.

TASK-020BUILD
4 min read · Jun 10, 2026
Build an agentic incident triage assistant with AWS Quick and New Relic

AWS and New Relic published a guide for building an agentic incident triage assistant. Here's the architecture pattern. automated context gathering, diagnostic execution, and remediation suggestions triggered by alerts.

TASK-021BUILD
5 min read · Jun 10, 2026
AI agent deployment server: production infrastructure setup

A production-grade guide to setting up an AI agent deployment server. VPS provisioning, Docker, Nginx, SSL, monitoring, CI/CD, and zero-downtime deploys. For when a platform isn't enough.

TASK-022BUILD
15 min read · Jun 10, 2026
How to guide AI agents and ship web apps: a vocabulary

A scannable reference of terms for guiding AI coding agents and shipping production web apps. Steering, agent architecture, typography, layout, motion, design systems.

TASK-023BUILD
7 min read · Jun 10, 2026
Best AI Agent Frameworks 2026: Production-Tested Comparison

A hands-on comparison of LangGraph, CrewAI, AutoGen, Pi (Factory), and Mastra. ranked by production readiness, debugging, multi-agent support, and developer experience in 2026.

TASK-024BUILD
8 min read · Jun 10, 2026
Best AI Coding Agents 2026: Ranked for Real Projects

Hands-on comparison of Claude Code, Cursor, Copilot, OpenCode, and Windsurf in 2026. ranked by agentic capability, cost, speed, and real project fit.

TASK-025BUILD
5 min read · Jun 10, 2026
Best MCP Servers & Tools 2026: The Essential List

The MCP ecosystem has exploded in 2026. Here are the servers worth installing. ranked by utility, reliability, and real-world use.

TASK-026BUILD
6 min read · Jun 10, 2026
Best Open-Source LLMs for Coding 2026

The open-source LLM landscape for coding has shifted. DeepSeek V4-Pro and Kimi K2.6 lead the benchmarks. Here's what can run locally, what needs cloud, and which model wins for each coding task.

TASK-027BUILD
8 min read · Jun 10, 2026
Best Side Projects for AI Engineers in India 2026

The best AI side projects for Indian engineers in 2026. ranked by earning potential, learning value, and time to ship. With INR cost estimates and Indian-market considerations.

TASK-028THINK
12 min read · Jun 10, 2026
Claude Fable 5: benchmarks, developer reactions, first look

What developers are saying about Claude Fable 5. Karpathy's review, Stripe's results, benchmark numbers, and what this means for AI engineering.

TASK-029THINK
5 min read · Jun 10, 2026
Claude Fable 5 one week in: integrations, impressions, what's next

One week in, Claude Fable 5 has landed on GitLab Duo, Snowflake Cortex AI, and every major platform. Simon Willison's initial impressions, pricing changes coming June 23, and what developers are actually building with it.

TASK-030BUILD
6 min read · Jun 10, 2026
Cohere North Mini Code: a 30B MoE model for agentic coding

Cohere dropped North Mini Code. a 30B MoE model (3B active) trained for agentic coding tasks. It tops SWE-Bench for its size class and runs on OpenCode. Here's the architecture breakdown and what it means for agent builders.

TASK-031BUILD
9 min read · Jun 10, 2026
How AI coding agents actually use your SDK

You ship an SDK. AI coding agents consume it differently than humans do. Here's the exact step-by-step trace of what happens between 'developer types a prompt' and 'agent generates code with your technology'. and how to design for it.

TASK-032BUILD
5 min read · Jun 10, 2026
How to host an AI agent: a beginner's guide

A beginner-friendly guide to hosting your first AI agent. the simplest path from local development to a live, accessible agent on a server.

TASK-033BUILD
6 min read · Jun 10, 2026
How to set up Hermes Agent: a step-by-step guide

Hermes Agent is an open-source CLI agent by Nous Research. Here's how to install, configure providers, set up skills, and run your first session. from scratch to a working agent.

TASK-034SHIP
4 min read · Jun 10, 2026
npm v12 breaking changes: what to know

npm v12 ships in July 2026 with three security-focused breaking changes to npm install. Staged publishing, stricter install scripts, and stronger package.json validation. Here's what breaks and how to fix it.

TASK-035BUILD
4 min read · Jun 10, 2026
SilverTorch: Meta's Index as Model: a new retrieval paradigm

Meta Engineering published SilverTorch. an 'Index as Model' retrieval paradigm that replaces a microservice mesh with a unified PyTorch neural network. 23.7x higher throughput, 20.9x more compute efficient.

TASK-036THINK
4 min read · Jun 10, 2026
Can you fingerprint which LLM wrote that? Multi-agent stylometry

New research shows LLMs can identify which model generated a piece of text by analyzing stylistic fingerprints. with implications for multi-agent security, content attribution, and agent coordination.

TASK-037BUILD
5 min read · Jun 10, 2026
Test-case reducers: the debugging tool you're not using

When a test fails with a 500-line input, finding the actual bug is tedious. Test-case reducers automatically minimize failing cases to the minimal reproducing input. saving hours of manual binary search.

TASK-038BUILD
22 min read · Jun 6, 2026
The 15 jobs every agent harness must do

Frameworks sell you one decision. The problem is that an agent harness is 15 separate jobs. and when you need to replace one, you're forced to replace all 15. Here's the full list, what each job does, and why the composition model matters.

TASK-039BUILD
10 min read · Jun 6, 2026
Why your agent forgets conversations (and how to fix it with a branching tree)

You ask your agent to try something different. It forgets the original conversation. You try to go back and the agent is confused. That's not a memory problem. it's a data structure problem. Here's why sessions are stored wrong and how to fix it.

TASK-040BUILD
9 min read · Jun 6, 2026
The policy gate every agent needs before you go to production

Your agent can call any tool. That's the point. But without a policy gate, it can also delete production databases, send emails to the wrong people, and burn through budget on a single runaway loop. Here's how to add the gate that catches all of that.

TASK-041BUILD
10 min read · Jun 6, 2026
Build a state machine for your AI agent in a weekend

Your agent crashes mid-conversation and doesn't recover. It runs the same tool call 10 times. It doesn't know when to stop. Those are all state machine problems. Here's how to build the FSM that fixes all of them. in a weekend, with no framework.

TASK-042BUILD
11 min read · Jun 6, 2026
Dynamic Workflows in Claude Code: When to Use Them (and When Not To)

Default Claude Code handles most tasks well. But complex, adversarial, or long-running tasks expose three failure modes. Dynamic workflows solve them by letting Claude coordinate a team.

TASK-043SHIP
11 min read · Jun 1, 2026
AI agent business models: how to build a sustainable agency

I launched Agentic Up as a solo AI agent studio in Bengaluru and tried 5 pricing models. Some made money. Some lost money. One actually became a sustainable business.

TASK-044BUILD
9 min read · Jun 1, 2026
AI agent context window: keeping your agent from forgetting

Your agent remembers everything in the current conversation. That's both its superpower and its biggest weakness. Here's how to manage context windows so your agent stays focused and cost-efficient.

TASK-045THINK
8 min read · Jun 1, 2026
AI agent cost optimization: 10 tips to reduce your LLM bill

My first production agent cost ₹12,000/month in API calls. After applying these 10 strategies, the same agent runs on ₹4,500/month. Here's exactly how. with code, expected savings, and tradeoffs.

TASK-046BUILD
9 min read · Jun 1, 2026
AI agent deployment guide: from localhost to production

Building an agent that works on your laptop is step one. Making it run reliably in production. cost-controlled, monitored, failure-resilient. is where most attempts fail. Here's the deployment guide I wish existed.

TASK-047BUILD
9 min read · Jun 1, 2026
AI agent error handling patterns

Your agent will fail. Not sometimes. regularly. The difference between a demo agent and a production agent is how it handles those failures. Here are the patterns that kept my agents running.

TASK-048BUILD
8 min read · Jun 1, 2026
AI agent logging and monitoring: seeing inside your agent

Your agent is doing something strange and you have no idea why. Here's exactly what to log, how to structure it, and how to debug agents in production.

TASK-049BUILD
9 min read · Jun 1, 2026
AI agent multi-step workflows: building complex pipelines

Four workflow patterns every agent developer needs. sequential, parallel fan-out, conditional branching, and loop with human-in-the-loop. With real code and production lessons.

TASK-050SHIP
12 min read · Jun 1, 2026
AI agent pricing: how much to charge for custom agents

Pricing is the hardest part of selling AI agent services. Here's exactly how I price. the numbers, the reasoning, and the mistakes that led to this model.

TASK-051SHIP
12 min read · Jun 1, 2026
AI agent pricing strategies 2026: what to charge

Three pricing models explained with AI-specific examples, scoping traps to avoid, and a proven method for raising prices over time.

TASK-052SHIP
13 min read · Jun 1, 2026
AI developer jobs in Bengaluru 2026: market reality

The AI job market in Bengaluru is real but noisy. Here's what companies are actually paying, what skills command premium, and how to stand out as a developer.

TASK-053SHIP
9 min read · Jun 1, 2026
बेंगलुरु में एआई डेवलपर नौकरी 2026: बाजार वास्तविकता

बेंगलुरु में एआई नौकरी बाजार असली है लेकिन शोर भी बहुत है। कंपनियां वास्तव में कितना भुगतान कर रही हैं, कौन से कौशल प्रीमियम कमाते हैं, और एक डेवलपर के तौर पर कैसे अलग दिखें।

TASK-054THINK
11 min read · Jun 1, 2026
AI tools that accept UPI and Indian payment methods in 2026

Every AI developer in India hits the payment wall: 'This tool needs an international card.' Here's what actually works with UPI, RuPay, and Indian cards in 2026. plus workarounds for the tools that don't.

TASK-055THINK
10 min read · Jun 1, 2026
Best AI coding tools for developers in India in 2026

Most AI tool comparisons ignore the fact that Indian developers face different constraints. currency conversion, payment blocks, latency from US servers. Here's what actually works from Bengaluru.

TASK-056THINK
8 min read · Jun 1, 2026
भारतीय डेवलपर्स के लिए सबसे अच्छे एआई कोडिंग टूल्स (2026)

ज्यादातर एआई टूल तुलना इस बात को अनदेखा करते हैं कि भारतीय डेवलपर्स के सामने अलग बाधाएं हैं. करेंसी कन्वर्जन, भुगतान ब्लॉक, अमेरिकी सर्वर से लेटेंसी। बेंगलुरु से क्या वास्तव में काम करता है।

TASK-057BUILD
11 min read · Jun 1, 2026
Best open source AI tools for indie hackers in 2026

Open source AI tools are proliferating fast. Most of them are not worth your time. These are the ones that survived my brutal evaluation: must be actually useful for a solo developer shipping products.

TASK-058BUILD
9 min read · Jun 1, 2026
Building an AI code review agent: lessons from production

I built an AI code review agent that posts comments on GitHub PRs. The architecture was the easy part. The failure modes. hallucinated bugs, missing real issues, arguing with human reviewers. nearly made me scrap the project.

TASK-059BUILD
9 min read · Jun 1, 2026
CrewAI vs LangGraph: which agent framework to use

I built the same research agent in both CrewAI and LangGraph. One felt natural from the start. The other made me appreciate why state graphs exist. Here's the honest comparison.

TASK-060BUILD
11 min read · Jun 1, 2026
Cursor vs Claude Code vs Copilot: 6 months of daily use

I've been a heavy user of all three for 6 months. They're not interchangeable. each excels at different things. Here's the honest comparison on real coding tasks.

TASK-061SHIP
9 min read · Jun 1, 2026
How to build an AI customer support agent that works

I've shipped 3 production support agents for startups in Bengaluru. Here's the architecture, the code, and the hard-won lessons. from RAG to escalation to cost per conversation.

TASK-062BUILD
9 min read · Jun 1, 2026
How to build your first AI agent in 2026 (tutorial)

You've used ChatGPT. You've maybe used Claude or Copilot. But building an agent. something that acts on its own. feels like a different skill. It's not. Here's the tutorial I wish existed when I started.

TASK-063BUILD
5 min read · Jun 1, 2026
अपना पहला एआई एजेंट कैसे बनाएं (2026 ट्यूटोरियल)

आपने ChatGPT इस्तेमाल किया है। शायद Claude या Copilot भी इस्तेमाल किया है कोडिंग में मदद के लिए। लेकिन एजेंट बनाना. जो अपने आप कार्रवाई करे. एक अलग कौशल लगता है। ऐसा नहीं है। यह ट्यूटोरियल वह है जो मैं शुरू में चाहता था।

TASK-064SHIP
9 min read · Jun 1, 2026
एआई से पैसे कैसे कमाएं एक डेवलपर के तौर पर (2026)

एआई से पैसे कमाने के चार असली रास्ते हैं एक सोलो डेवलपर के लिए। मैंने सब आज़माए हैं। यह है जो वास्तव में राजस्व उत्पन्न करता है, और क्या समय की बर्बादी है।

TASK-065SHIP
10 min read · Jun 1, 2026
How to make money with AI as a solo developer in 2026

There are four real paths to making money with AI as a solo developer. I've tried all of them. Here's what actually generates revenue, and what's a waste of time.

TASK-066SHIP
13 min read · Jun 1, 2026
How to start an AI development business from India

Starting an AI development business from India is different from starting one in SF or London. Different constraints, different opportunities, different mistakes. Here's what I learned.

TASK-067BUILD
9 min read · Jun 1, 2026
LangGraph tutorial for beginners: build your first workflow

LangGraph keeps getting recommended but every tutorial assumes you already know LangChain. Here's a beginner-friendly walkthrough. from state graphs to a working agent. without the chain-of-thought abstraction.

TASK-068BUILD
9 min read · Jun 1, 2026
OpenAI function calling tutorial: building tools for GPT

OpenAI's function calling API lets the model request function execution. fetch data, interact with APIs, compute things. Here's how to use it from scratch.

TASK-069THINK
10 min read · Jun 1, 2026
Preventing AI agent hallucinations: 7 techniques that work

I've spent the last year trying to make AI agents tell the truth. Not perfectly. just reliably enough that I don't have to double-check every output. Here are 7 techniques that moved the needle.

TASK-070THINK
9 min read · Jun 1, 2026
The Vertical Agent Method: ship AI agents in 14 days

Most AI agent projects fail not because of bad models, but because of bad scoping. The Vertical Agent Method is a framework that forces you to pick one workflow, build one agent, and ship in 14 days.

TASK-071THINK
10 min read · Jun 1, 2026
What is an AI agent? beginner's guide for developers

If you're a developer who's used ChatGPT but never built an agent, the category can feel confusing. Here's a clear explanation of what agents are, how they work, and when they matter.

TASK-072THINK
8 min read · Jun 1, 2026
एआई एजेंट क्या है? शुरुआती लोगों के लिए पूरी गाइड

अगर आप एक डेवलपर हैं जिसने ChatGPT इस्तेमाल किया है लेकिन एजेंट कभी नहीं बनाया, तो यह कैटेगरी भ्रमित करने वाली लग सकती है। यह गाइड एजेंट क्या हैं, कैसे काम करते हैं, और कब मायने रखते हैं. सब स्पष्ट करती है।

Newsletter

Get the brief on AI agents

Practical posts on shipping agents, automating work, and building in public. No hype, no fluff.

Contact: hello@agenticup.dev