All Posts

Your AI Agent Just Scaffolded a Project from 2020

Your AI agent just scaffolded a project from 2020. It saw exit code 0, files appeared, and moved on. Here's why npm silently downgrades packages and what to do about it.

TASK-081SHIP

Apple Just Entered the AI Agent Game. Here's What Changed

Apple announced Core AI for on-device LLMs, Xcode 27 with multi-model agentic coding, and free AI for indie developers. Three announcements that change the AI agent landscape.

TASK-082BUILD

Making FlashAttention-4 Faster for Inference

Modal's engineering team made three targeted changes to FlashAttention-4 that improved inference throughput by up to 4.37x. split KV parallelism, FP8 input support, and arbitrary KV page sizes.

TASK-083BUILD

6 min read · Jun 12, 2026

Automation Templates: making cron jobs smart with Hermes Agent

Cron jobs are dumb. They run at 2am and dump output to a file you never read. Hermes Agent's Automation Templates fix that. an LLM evaluates the output, decides if it matters, and ships a summary to Telegram. Here's how to set one up in 5 minutes.

TASK-084THINK

JetBrains Just Ranked Every Agentic Framework. Here's What They Missed

JetBrains ranked 10 agentic frameworks by orchestration paradigm, multi-agent support, memory, and human-in-the-loop. Their analysis is thorough. But it misses the question that matters most for production.

TASK-085SHIP

6 min read · Jun 12, 2026

Kimi K2.7 Code: the first open-source model that competes with Claude Code

Kimi K2.7 Code is the first open-source model I'd actually use in an agent loop. It drops into Claude Code's CLI with one env var change. It scores within striking distance of Fable 5 on coding benchmarks at half the price. And the weights are open.

TASK-086SHIP

MiniMax M3: open-weights coding, 1M context, multimodality at 12x less than GPT

MiniMax M3 is the first open-weights model that doesn't make you choose. Frontier coding? 59% on SWE-Bench Pro (beats GPT-5.5). 1M context? Sparse attention makes it affordable. Multimodal? Native, not bolted on. At $0.60/M tokens, it's 12x cheaper than GPT-5.5.

TASK-087SHIP

OpenAI Just Turned Codex into an Agent Platform

OpenAI is acquiring Ona to give Codex persistent, secure cloud environments. Codex is no longer just a coding assistant. it's an agent platform. Here's what changes.

TASK-088THINK

The Proactive Agent Problem: What Claude Fable Got Right

Claude Fable 5 is relentlessly proactive. Two days with it reveals what this means for AI agent builders. Proactivity is a feature when goals are clear and a liability when they are not.

TASK-089THINK

When a 'worse' model beats a frontier model for agent work

I replaced Claude Opus 4.8 with a cheaper model on three production agent loops. Two got faster. One got more reliable. The benchmarks said this shouldn't work. The benchmarks were wrong.

TASK-090BUILD

8 min read · Jun 11, 2026

DiffusionGemma: hands-on with Google's 4x faster text model

Google's DiffusionGemma generates text through diffusion. denoising blocks of 256 tokens in parallel. reaching up to 1000 tokens/s on an H100. Here's how it works, how to run it, and what it means for local AI.

TASK-091SHIP

9 min read · Jun 11, 2026

Every Anthropic model name, ranked

Sam Wilkinson's satirical HN post extrapolates Anthropic's naming scheme into absurd territory. Aphorism, Marginalia, Diatribe, Terms of Service. But the real names tell a story about strategy. Here's every one ranked.

TASK-092THINK

8 min read · Jun 11, 2026

Spec-Driven Development and the Vertical Agent Method

Microsoft just published their Spec-Driven Development framework. It says the same thing the Vertical Agent Method has been saying: specs as shared ground truth between humans and AI agents beat vibe coding every time.

TASK-093THINK

Your AI agent's memory is a privacy risk: new ICML research

A new paper from ICML 2026 studies deployment-time memorization in AI agents. Key finding: summarization cuts extraction by 76%, but deleted information stays recoverable in ~20% of cases through derived memory tiers.

TASK-094BUILD

Build an agentic incident triage assistant with AWS Quick and New Relic

AWS and New Relic published a guide for building an agentic incident triage assistant. Here's the architecture pattern. automated context gathering, diagnostic execution, and remediation suggestions triggered by alerts.

TASK-095BUILD

AI agent deployment server: production infrastructure setup

A production-grade guide to setting up an AI agent deployment server. VPS provisioning, Docker, Nginx, SSL, monitoring, CI/CD, and zero-downtime deploys. For when a platform isn't enough.

TASK-096BUILD

16 min read · Jun 10, 2026

How to guide AI agents and ship web apps: a vocabulary

A scannable reference of terms for guiding AI coding agents and shipping production web apps. Steering, agent architecture, typography, layout, motion, design systems.

TASK-097BUILD

7 min read · Jun 10, 2026

Best AI Agent Frameworks 2026: Production-Tested Comparison

A hands-on comparison of LangGraph, CrewAI, AutoGen, Pi (Factory), and Mastra. ranked by production readiness, debugging, multi-agent support, and developer experience in 2026.

TASK-098BUILD

Best MCP Servers & Tools 2026: The Essential List

The MCP ecosystem has exploded in 2026. Here are the servers worth installing. ranked by utility, reliability, and real-world use.

TASK-099BUILD

Best Open-Source LLMs for Coding 2026

The open-source LLM landscape for coding has shifted. DeepSeek V4-Pro and Kimi K2.6 lead the benchmarks. Here's what can run locally, what needs cloud, and which model wins for each coding task.

TASK-100BUILD

8 min read · Jun 10, 2026

Best Side Projects for AI Engineers in India 2026

The best AI side projects for Indian engineers in 2026. ranked by earning potential, learning value, and time to ship. With INR cost estimates and Indian-market considerations.

TASK-101THINK

12 min read · Jun 10, 2026

Claude Fable 5: benchmarks, developer reactions, first look

What developers are saying about Claude Fable 5. Karpathy's review, Stripe's results, benchmark numbers, and what this means for AI engineering.

TASK-102THINK

Claude Fable 5 one week in: integrations, impressions, what's next

One week in, Claude Fable 5 has landed on GitLab Duo, Snowflake Cortex AI, and every major platform. Simon Willison's initial impressions, pricing changes coming June 23, and what developers are actually building with it.

TASK-103BUILD

Cohere North Mini Code: a 30B MoE model for agentic coding

Cohere dropped North Mini Code. a 30B MoE model (3B active) trained for agentic coding tasks. It tops SWE-Bench for its size class and runs on OpenCode. Here's the architecture breakdown and what it means for agent builders.

TASK-104BUILD

9 min read · Jun 10, 2026

How AI coding agents actually use your SDK

You ship an SDK. AI coding agents consume it differently than humans do. Here's the exact step-by-step trace of what happens between 'developer types a prompt' and 'agent generates code with your technology'. and how to design for it.

TASK-105BUILD

How to host an AI agent: a beginner's guide

A beginner-friendly guide to hosting your first AI agent. the simplest path from local development to a live, accessible agent on a server.

TASK-106BUILD

How to set up Hermes Agent: a step-by-step guide

Hermes Agent is an open-source CLI agent by Nous Research. Here's how to install, configure providers, set up skills, and run your first session. from scratch to a working agent.

TASK-107SHIP

npm v12 breaking changes: what to know

npm v12 ships in July 2026 with three security-focused breaking changes to npm install. Staged publishing, stricter install scripts, and stronger package.json validation. Here's what breaks and how to fix it.

TASK-108BUILD

SilverTorch: Meta's Index as Model: a new retrieval paradigm

Meta Engineering published SilverTorch. an 'Index as Model' retrieval paradigm that replaces a microservice mesh with a unified PyTorch neural network. 23.7x higher throughput, 20.9x more compute efficient.

TASK-109THINK

Can you fingerprint which LLM wrote that? Multi-agent stylometry

New research shows LLMs can identify which model generated a piece of text by analyzing stylistic fingerprints. with implications for multi-agent security, content attribution, and agent coordination.

TASK-110BUILD

Test-case reducers: the debugging tool you're not using

When a test fails with a 500-line input, finding the actual bug is tedious. Test-case reducers automatically minimize failing cases to the minimal reproducing input. saving hours of manual binary search.

TASK-111BUILD

22 min read · Jun 6, 2026

The 15 jobs every agent harness must do

Frameworks sell you one decision. The problem is that an agent harness is 15 separate jobs. and when you need to replace one, you're forced to replace all 15. Here's the full list, what each job does, and why the composition model matters.

TASK-112BUILD

10 min read · Jun 6, 2026

Why your agent forgets conversations (and how to fix it with a branching tree)

You ask your agent to try something different. It forgets the original conversation. You try to go back and the agent is confused. That's not a memory problem. it's a data structure problem. Here's why sessions are stored wrong and how to fix it.

TASK-113BUILD

9 min read · Jun 6, 2026

The policy gate every agent needs before you go to production

Your agent can call any tool. That's the point. But without a policy gate, it can also delete production databases, send emails to the wrong people, and burn through budget on a single runaway loop. Here's how to add the gate that catches all of that.

TASK-114BUILD

10 min read · Jun 6, 2026

Build a state machine for your AI agent in a weekend

Your agent crashes mid-conversation and doesn't recover. It runs the same tool call 10 times. It doesn't know when to stop. Those are all state machine problems. Here's how to build the FSM that fixes all of them. in a weekend, with no framework.

TASK-115BUILD

12 min read · Jun 6, 2026

Dynamic Workflows in Claude Code: When to Use Them (and When Not To)

Default Claude Code handles most tasks well. But complex, adversarial, or long-running tasks expose three failure modes. Dynamic workflows solve them by letting Claude coordinate a team.

TASK-116SHIP

11 min read · Jun 1, 2026

AI agent business models: how to build a sustainable agency

I launched Agentic Up as a solo AI agent studio in Bengaluru and tried 5 pricing models. Some made money. Some lost money. One actually became a sustainable business.

TASK-117BUILD

AI agent context window: keeping your agent from forgetting

Your agent remembers everything in the current conversation. That's both its superpower and its biggest weakness. Here's how to manage context windows so your agent stays focused and cost-efficient.

TASK-118THINK

AI agent cost optimization: 10 tips to reduce your LLM bill

My first production agent cost ₹12,000/month in API calls. After applying these 10 strategies, the same agent runs on ₹4,500/month. Here's exactly how. with code, expected savings, and tradeoffs.

TASK-119BUILD

AI agent deployment guide: from localhost to production

Building an agent that works on your laptop is step one. Making it run reliably in production. cost-controlled, monitored, failure-resilient. is where most attempts fail. Here's the deployment guide I wish existed.

TASK-120BUILD

AI agent error handling patterns

Your agent will fail. Not sometimes. regularly. The difference between a demo agent and a production agent is how it handles those failures. Here are the patterns that kept my agents running.

TASK-121BUILD

AI agent logging and monitoring: seeing inside your agent

Your agent is doing something strange and you have no idea why. Here's exactly what to log, how to structure it, and how to debug agents in production.

TASK-122BUILD

AI agent multi-step workflows: building complex pipelines

Four workflow patterns every agent developer needs. sequential, parallel fan-out, conditional branching, and loop with human-in-the-loop. With real code and production lessons.

TASK-123SHIP

12 min read · Jun 1, 2026

AI agent pricing: how much to charge for custom agents

Pricing is the hardest part of selling AI agent services. Here's exactly how I price. the numbers, the reasoning, and the mistakes that led to this model.

TASK-124SHIP

12 min read · Jun 1, 2026

AI agent pricing strategies 2026: what to charge

Three pricing models explained with AI-specific examples, scoping traps to avoid, and a proven method for raising prices over time.

TASK-125SHIP

13 min read · Jun 1, 2026

AI developer jobs in Bengaluru 2026: market reality

The AI job market in Bengaluru is real but noisy. Here's what companies are actually paying, what skills command premium, and how to stand out as a developer.

TASK-126SHIP

बेंगलुरु में एआई डेवलपर नौकरी 2026: बाजार वास्तविकता

बेंगलुरु में एआई नौकरी बाजार असली है लेकिन शोर भी बहुत है। कंपनियां वास्तव में कितना भुगतान कर रही हैं, कौन से कौशल प्रीमियम कमाते हैं, और एक डेवलपर के तौर पर कैसे अलग दिखें।

TASK-127THINK

11 min read · Jun 1, 2026

AI tools that accept UPI and Indian payment methods in 2026

Every AI developer in India hits the payment wall: 'This tool needs an international card.' Here's what actually works with UPI, RuPay, and Indian cards in 2026. plus workarounds for the tools that don't.

TASK-128THINK

Best AI coding tools for developers in India in 2026

Most AI tool comparisons ignore the fact that Indian developers face different constraints. currency conversion, payment blocks, latency from US servers. Here's what actually works from Bengaluru.

TASK-129THINK

भारतीय डेवलपर्स के लिए सबसे अच्छे एआई कोडिंग टूल्स (2026)

ज्यादातर एआई टूल तुलना इस बात को अनदेखा करते हैं कि भारतीय डेवलपर्स के सामने अलग बाधाएं हैं. करेंसी कन्वर्जन, भुगतान ब्लॉक, अमेरिकी सर्वर से लेटेंसी। बेंगलुरु से क्या वास्तव में काम करता है।

TASK-130BUILD

11 min read · Jun 1, 2026

Best open source AI tools for indie hackers in 2026

Open source AI tools are proliferating fast. Most of them are not worth your time. These are the ones that survived my brutal evaluation: must be actually useful for a solo developer shipping products.

TASK-131BUILD

Building an AI code review agent: lessons from production

I built an AI code review agent that posts comments on GitHub PRs. The architecture was the easy part. The failure modes. hallucinated bugs, missing real issues, arguing with human reviewers. nearly made me scrap the project.

TASK-132BUILD

CrewAI vs LangGraph: which agent framework to use

I built the same research agent in both CrewAI and LangGraph. One felt natural from the start. The other made me appreciate why state graphs exist. Here's the honest comparison.

TASK-133BUILD

12 min read · Jun 1, 2026

Cursor vs Claude Code vs Copilot: 6 months of daily use

I've been a heavy user of all three for 6 months. They're not interchangeable. each excels at different things. Here's the honest comparison on real coding tasks.

TASK-134SHIP

How to build an AI customer support agent that works

I've shipped 3 production support agents for startups in Bengaluru. Here's the architecture, the code, and the hard-won lessons. from RAG to escalation to cost per conversation.

TASK-135BUILD

How to build your first AI agent in 2026 (tutorial)

You've used ChatGPT. You've maybe used Claude or Copilot. But building an agent. something that acts on its own. feels like a different skill. It's not. Here's the tutorial I wish existed when I started.

TASK-136BUILD

6 min read · Jun 1, 2026

अपना पहला एआई एजेंट कैसे बनाएं (2026 ट्यूटोरियल)

आपने ChatGPT इस्तेमाल किया है। शायद Claude या Copilot भी इस्तेमाल किया है कोडिंग में मदद के लिए। लेकिन एजेंट बनाना. जो अपने आप कार्रवाई करे. एक अलग कौशल लगता है। ऐसा नहीं है। यह ट्यूटोरियल वह है जो मैं शुरू में चाहता था।

TASK-137SHIP

एआई से पैसे कैसे कमाएं एक डेवलपर के तौर पर (2026)

एआई से पैसे कमाने के चार असली रास्ते हैं एक सोलो डेवलपर के लिए। मैंने सब आज़माए हैं। यह है जो वास्तव में राजस्व उत्पन्न करता है, और क्या समय की बर्बादी है।

TASK-138SHIP

How to make money with AI as a solo developer in 2026

There are four real paths to making money with AI as a solo developer. I've tried all of them. Here's what actually generates revenue, and what's a waste of time.

TASK-139SHIP

13 min read · Jun 1, 2026

How to start an AI development business from India

Starting an AI development business from India is different from starting one in SF or London. Different constraints, different opportunities, different mistakes. Here's what I learned.

TASK-140BUILD

LangGraph tutorial for beginners: build your first workflow

LangGraph keeps getting recommended but every tutorial assumes you already know LangChain. Here's a beginner-friendly walkthrough. from state graphs to a working agent. without the chain-of-thought abstraction.

TASK-141BUILD

OpenAI function calling tutorial: building tools for GPT

OpenAI's function calling API lets the model request function execution. fetch data, interact with APIs, compute things. Here's how to use it from scratch.

TASK-142THINK

Preventing AI agent hallucinations: 7 techniques that work

I've spent the last year trying to make AI agents tell the truth. Not perfectly. just reliably enough that I don't have to double-check every output. Here are 7 techniques that moved the needle.

TASK-143THINK

The Vertical Agent Method: ship AI agents in 14 days

Most AI agent projects fail not because of bad models, but because of bad scoping. The Vertical Agent Method is a framework that forces you to pick one workflow, build one agent, and ship in 14 days.

TASK-144THINK

What is an AI agent? beginner's guide for developers

If you're a developer who's used ChatGPT but never built an agent, the category can feel confusing. Here's a clear explanation of what agents are, how they work, and when they matter.

TASK-145THINK