72 posts on AI agents, coding, and the craft of shipping software.
AI agents fail in five predictable ways. Not because the model is bad. because we treat them like magic tricks instead of software. Heres what goes wrong and how to design for it.
Models that forget. Agents that cant learn from experience. The continual learning landscape in 2026 has three competing approaches, and the most promising one sounds like science fiction.
Anthropic had to pull Fable 5 for all customers. If your agent depends on one model, you're one government order away from a full rewrite. Here's the architecture that prevents that.
Benchmarks tell you what's technically capable. Production adoption tells you what actually works at scale. These are two different lists. Here's the 2026 open-source landscape ranked for real workloads.
Tool invocation looks like success. But if your agent produces the same output without your extension, your extension is drag, not lift. Here's how to measure it.
Your AI agent just scaffolded a project from 2020. It saw exit code 0, files appeared, and moved on. Here's why npm silently downgrades packages and what to do about it.
Apple announced Core AI for on-device LLMs, Xcode 27 with multi-model agentic coding, and free AI for indie developers. Three announcements that change the AI agent landscape.
Modal's engineering team made three targeted changes to FlashAttention-4 that improved inference throughput by up to 4.37x. split KV parallelism, FP8 input support, and arbitrary KV page sizes.
Cron jobs are dumb. They run at 2am and dump output to a file you never read. Hermes Agent's Automation Templates fix that. an LLM evaluates the output, decides if it matters, and ships a summary to Telegram. Here's how to set one up in 5 minutes.
JetBrains ranked 10 agentic frameworks by orchestration paradigm, multi-agent support, memory, and human-in-the-loop. Their analysis is thorough. But it misses the question that matters most for production.
Kimi K2.7 Code is the first open-source model I'd actually use in an agent loop. It drops into Claude Code's CLI with one env var change. It scores within striking distance of Fable 5 on coding benchmarks at half the price. And the weights are open.
MiniMax M3 is the first open-weights model that doesn't make you choose. Frontier coding? 59% on SWE-Bench Pro (beats GPT-5.5). 1M context? Sparse attention makes it affordable. Multimodal? Native, not bolted on. At $0.60/M tokens, it's 12x cheaper than GPT-5.5.
OpenAI is acquiring Ona to give Codex persistent, secure cloud environments. Codex is no longer just a coding assistant. it's an agent platform. Here's what changes.
Claude Fable 5 is relentlessly proactive. Two days with it reveals what this means for AI agent builders. Proactivity is a feature when goals are clear and a liability when they are not.
I replaced Claude Opus 4.8 with a cheaper model on three production agent loops. Two got faster. One got more reliable. The benchmarks said this shouldn't work. The benchmarks were wrong.
Google's DiffusionGemma generates text through diffusion. denoising blocks of 256 tokens in parallel. reaching up to 1000 tokens/s on an H100. Here's how it works, how to run it, and what it means for local AI.
Sam Wilkinson's satirical HN post extrapolates Anthropic's naming scheme into absurd territory. Aphorism, Marginalia, Diatribe, Terms of Service. But the real names tell a story about strategy. Here's every one ranked.
Microsoft just published their Spec-Driven Development framework. It says the same thing the Vertical Agent Method has been saying: specs as shared ground truth between humans and AI agents beat vibe coding every time.
A new paper from ICML 2026 studies deployment-time memorization in AI agents. Key finding: summarization cuts extraction by 76%, but deleted information stays recoverable in ~20% of cases through derived memory tiers.
AWS and New Relic published a guide for building an agentic incident triage assistant. Here's the architecture pattern. automated context gathering, diagnostic execution, and remediation suggestions triggered by alerts.
A production-grade guide to setting up an AI agent deployment server. VPS provisioning, Docker, Nginx, SSL, monitoring, CI/CD, and zero-downtime deploys. For when a platform isn't enough.
A scannable reference of terms for guiding AI coding agents and shipping production web apps. Steering, agent architecture, typography, layout, motion, design systems.
A hands-on comparison of LangGraph, CrewAI, AutoGen, Pi (Factory), and Mastra. ranked by production readiness, debugging, multi-agent support, and developer experience in 2026.
Hands-on comparison of Claude Code, Cursor, Copilot, OpenCode, and Windsurf in 2026. ranked by agentic capability, cost, speed, and real project fit.
The MCP ecosystem has exploded in 2026. Here are the servers worth installing. ranked by utility, reliability, and real-world use.
The open-source LLM landscape for coding has shifted. DeepSeek V4-Pro and Kimi K2.6 lead the benchmarks. Here's what can run locally, what needs cloud, and which model wins for each coding task.
The best AI side projects for Indian engineers in 2026. ranked by earning potential, learning value, and time to ship. With INR cost estimates and Indian-market considerations.
What developers are saying about Claude Fable 5. Karpathy's review, Stripe's results, benchmark numbers, and what this means for AI engineering.
One week in, Claude Fable 5 has landed on GitLab Duo, Snowflake Cortex AI, and every major platform. Simon Willison's initial impressions, pricing changes coming June 23, and what developers are actually building with it.
Cohere dropped North Mini Code. a 30B MoE model (3B active) trained for agentic coding tasks. It tops SWE-Bench for its size class and runs on OpenCode. Here's the architecture breakdown and what it means for agent builders.
You ship an SDK. AI coding agents consume it differently than humans do. Here's the exact step-by-step trace of what happens between 'developer types a prompt' and 'agent generates code with your technology'. and how to design for it.
A beginner-friendly guide to hosting your first AI agent. the simplest path from local development to a live, accessible agent on a server.
Hermes Agent is an open-source CLI agent by Nous Research. Here's how to install, configure providers, set up skills, and run your first session. from scratch to a working agent.
npm v12 ships in July 2026 with three security-focused breaking changes to npm install. Staged publishing, stricter install scripts, and stronger package.json validation. Here's what breaks and how to fix it.
Meta Engineering published SilverTorch. an 'Index as Model' retrieval paradigm that replaces a microservice mesh with a unified PyTorch neural network. 23.7x higher throughput, 20.9x more compute efficient.
New research shows LLMs can identify which model generated a piece of text by analyzing stylistic fingerprints. with implications for multi-agent security, content attribution, and agent coordination.
When a test fails with a 500-line input, finding the actual bug is tedious. Test-case reducers automatically minimize failing cases to the minimal reproducing input. saving hours of manual binary search.
Frameworks sell you one decision. The problem is that an agent harness is 15 separate jobs. and when you need to replace one, you're forced to replace all 15. Here's the full list, what each job does, and why the composition model matters.
You ask your agent to try something different. It forgets the original conversation. You try to go back and the agent is confused. That's not a memory problem. it's a data structure problem. Here's why sessions are stored wrong and how to fix it.
Your agent can call any tool. That's the point. But without a policy gate, it can also delete production databases, send emails to the wrong people, and burn through budget on a single runaway loop. Here's how to add the gate that catches all of that.
Your agent crashes mid-conversation and doesn't recover. It runs the same tool call 10 times. It doesn't know when to stop. Those are all state machine problems. Here's how to build the FSM that fixes all of them. in a weekend, with no framework.
Default Claude Code handles most tasks well. But complex, adversarial, or long-running tasks expose three failure modes. Dynamic workflows solve them by letting Claude coordinate a team.
I launched Agentic Up as a solo AI agent studio in Bengaluru and tried 5 pricing models. Some made money. Some lost money. One actually became a sustainable business.
Your agent remembers everything in the current conversation. That's both its superpower and its biggest weakness. Here's how to manage context windows so your agent stays focused and cost-efficient.
My first production agent cost ₹12,000/month in API calls. After applying these 10 strategies, the same agent runs on ₹4,500/month. Here's exactly how. with code, expected savings, and tradeoffs.
Building an agent that works on your laptop is step one. Making it run reliably in production. cost-controlled, monitored, failure-resilient. is where most attempts fail. Here's the deployment guide I wish existed.
Your agent will fail. Not sometimes. regularly. The difference between a demo agent and a production agent is how it handles those failures. Here are the patterns that kept my agents running.
Your agent is doing something strange and you have no idea why. Here's exactly what to log, how to structure it, and how to debug agents in production.
Four workflow patterns every agent developer needs. sequential, parallel fan-out, conditional branching, and loop with human-in-the-loop. With real code and production lessons.
Pricing is the hardest part of selling AI agent services. Here's exactly how I price. the numbers, the reasoning, and the mistakes that led to this model.
Three pricing models explained with AI-specific examples, scoping traps to avoid, and a proven method for raising prices over time.
The AI job market in Bengaluru is real but noisy. Here's what companies are actually paying, what skills command premium, and how to stand out as a developer.
बेंगलुरु में एआई नौकरी बाजार असली है लेकिन शोर भी बहुत है। कंपनियां वास्तव में कितना भुगतान कर रही हैं, कौन से कौशल प्रीमियम कमाते हैं, और एक डेवलपर के तौर पर कैसे अलग दिखें।
Every AI developer in India hits the payment wall: 'This tool needs an international card.' Here's what actually works with UPI, RuPay, and Indian cards in 2026. plus workarounds for the tools that don't.
Most AI tool comparisons ignore the fact that Indian developers face different constraints. currency conversion, payment blocks, latency from US servers. Here's what actually works from Bengaluru.
ज्यादातर एआई टूल तुलना इस बात को अनदेखा करते हैं कि भारतीय डेवलपर्स के सामने अलग बाधाएं हैं. करेंसी कन्वर्जन, भुगतान ब्लॉक, अमेरिकी सर्वर से लेटेंसी। बेंगलुरु से क्या वास्तव में काम करता है।
Open source AI tools are proliferating fast. Most of them are not worth your time. These are the ones that survived my brutal evaluation: must be actually useful for a solo developer shipping products.
I built an AI code review agent that posts comments on GitHub PRs. The architecture was the easy part. The failure modes. hallucinated bugs, missing real issues, arguing with human reviewers. nearly made me scrap the project.
I built the same research agent in both CrewAI and LangGraph. One felt natural from the start. The other made me appreciate why state graphs exist. Here's the honest comparison.
I've been a heavy user of all three for 6 months. They're not interchangeable. each excels at different things. Here's the honest comparison on real coding tasks.
I've shipped 3 production support agents for startups in Bengaluru. Here's the architecture, the code, and the hard-won lessons. from RAG to escalation to cost per conversation.
You've used ChatGPT. You've maybe used Claude or Copilot. But building an agent. something that acts on its own. feels like a different skill. It's not. Here's the tutorial I wish existed when I started.
आपने ChatGPT इस्तेमाल किया है। शायद Claude या Copilot भी इस्तेमाल किया है कोडिंग में मदद के लिए। लेकिन एजेंट बनाना. जो अपने आप कार्रवाई करे. एक अलग कौशल लगता है। ऐसा नहीं है। यह ट्यूटोरियल वह है जो मैं शुरू में चाहता था।
एआई से पैसे कमाने के चार असली रास्ते हैं एक सोलो डेवलपर के लिए। मैंने सब आज़माए हैं। यह है जो वास्तव में राजस्व उत्पन्न करता है, और क्या समय की बर्बादी है।
There are four real paths to making money with AI as a solo developer. I've tried all of them. Here's what actually generates revenue, and what's a waste of time.
Starting an AI development business from India is different from starting one in SF or London. Different constraints, different opportunities, different mistakes. Here's what I learned.
LangGraph keeps getting recommended but every tutorial assumes you already know LangChain. Here's a beginner-friendly walkthrough. from state graphs to a working agent. without the chain-of-thought abstraction.
OpenAI's function calling API lets the model request function execution. fetch data, interact with APIs, compute things. Here's how to use it from scratch.
I've spent the last year trying to make AI agents tell the truth. Not perfectly. just reliably enough that I don't have to double-check every output. Here are 7 techniques that moved the needle.
Most AI agent projects fail not because of bad models, but because of bad scoping. The Vertical Agent Method is a framework that forces you to pick one workflow, build one agent, and ship in 14 days.
If you're a developer who's used ChatGPT but never built an agent, the category can feel confusing. Here's a clear explanation of what agents are, how they work, and when they matter.
अगर आप एक डेवलपर हैं जिसने ChatGPT इस्तेमाल किया है लेकिन एजेंट कभी नहीं बनाया, तो यह कैटेगरी भ्रमित करने वाली लग सकती है। यह गाइड एजेंट क्या हैं, कैसे काम करते हैं, और कब मायने रखते हैं. सब स्पष्ट करती है।
Practical posts on shipping agents, automating work, and building in public. No hype, no fluff.
No spam · Unsubscribe anytime
Contact: hello@agenticup.dev