文章歸檔 - AI Coding 實戰分享

2026-04-23

Qwen 3.6-27B Local Deployment: Sonnet 4.6-Class AI Agent Running on a DGX Spark / Mac mini

Qwen 3.6-27B, an open-source dense model, hits 136 tokens/sec on the $4,699 NVIDIA DGX Spark — beating Claude Opus 4.5 on benchmarks and edging out Sonnet 4.6 on Terminal-Bench. This post walks IT architects through hardware options for local Qwen 3.6-27B deployment (DGX Spark vs Mac mini M4 Pro 64GB), 12 official benchmarks, the Dflash + DDTree inference stack, a 3-year TCO comparison ($22,500 vs $4,729 per developer), and the architectural rewrites this triggers for on-prem AI Agent setups.

2026-04-13

"Opus Is Too Smart, So It Shouldn't Be Doing the Planning" — A Paper That Flips the Agent Ops Paradigm

Columbia's AgentOpt paper ran 9 models in 81 combinations and proved it: Ministral 8B as Planner + Opus as Solver hits 74.27% accuracy, while Opus as Planner sits at 31.71%. Putting the most expensive model in the Planner seat is the worst move — because it's so strong it skips the tools and answers raw. Anthropic's own Advisor Tool is course-correcting in the same direction: cheap models run the main loop, Opus steps back as on-call advisor. The unit of agent pipeline optimization isn't single-model capability — it's how the model combo fits the specific task.

2026-04-09

Cracking the Cache: From Gemma4 to Claude Code, Save 80% on Tokens

Open Claude Code in the morning, type one sentence — and 2–10% of your monthly quota is gone. I ran an experiment locally with Gemma4 and watched prompt processing drop from 31 seconds to 0.25 seconds — a 100x speedup. Then I dug into the Claude Code source and unpacked Anthropic's multi-layer cache architecture: DYNAMIC_BOUNDARY splitting, two-tier TTL, cache-break detection. From the KV cache fundamentals in Transformers, to the MLSys 2024 Prompt Cache paper, to the daily money-saving habits — once you understand the mechanism, the same plan can stretch 3–5x further.

2026-03-11

Harness Engineering Architecture: AI Can Write Code, But It Can't Ship to Production on Its Own

Amazon let AI fix a bug; AI deleted the entire production environment. DataTalks.Club: AI wiped the whole database. An e-commerce team lost millions of orders to an AI change. Three incidents, one pattern: reset → rebuild → clean state. This post lays out the full picture of Harness Engineering in one architecture diagram — from Amazon's ban to OpenAI's Control Plane, from three-layer defense to the seven-component reference architecture, from five failure modes to three things your team can start tomorrow.

2026-03-05

Harness Engineering Fully Decoded: When AI Agents Finish Writing Code, Is Your Repo Ready to Catch It Automatically?

3 OpenAI engineers shipped 1 million lines of code in 5 months using Codex—0 lines written by humans. They call this Harness Engineering—not the engineering of writing code, but the engineering of building constraints and feedback loops. Inspired by this, Ryan Carson published a complete Control-Plane Pattern: risk tier contract, preflight gate, SHA discipline, remediation loop. Last time we talked four layers of defense. This time we look at how a full control plane catches Agent output at speed.

2026-03-02

Punching Above Its Weight: Meet Qwen 3.5 and the God-Tier 9B Architecture Everyone's Talking About

A 9B model beating last-gen 80B, and even challenging OpenAI's 120B open-source model. Qwen 3.5-9B pulls it off with three architectural innovations—Gated Delta Network hybrid attention, native multimodal early fusion, and RL pushed down to small models—cramming big-model capability into something that runs on an RTX 3060.

2026-02-26

OpenClaw's Five Ways to Browse the Web: From Search API to Taking Over Your Browser

AI agent 'browsing the web' isn't one thing — it's five. Pick the wrong mode and you're either missing capability or handing your accounts to an AI. OpenClaw's five web access architectures — Search API, Web Fetch, Managed Browser, Remote CDP, Extension Relay — each have wildly different capability ranges, security risks, and appropriate use cases. This post breaks down every layer: from the safest search APIs to the most dangerous full browser takeover, including the Accessibility Tree vs screenshot efficiency gap, the manual-login sweet spot for Managed Browser, and WebMCP's future potential.

2026-02-25

Token Export: China's AI Is No Longer Selling Products—It's Selling Tokens

China's AI export is undergoing a qualitative shift—from selling products to selling Tokens. In February 2026, Chinese models (MiniMax, Kimi, GLM) overtook the US in production-grade Token call volume for the first time. GLM-5 walked away clean from distillation accusations, beat GPT-5.2 on SWE-bench, was trained entirely on Huawei chips, and its API is 5-8x cheaper. Stack together China's electricity price ($0.08/kWh vs US $0.18/kWh), open-source talent density, and hardware autonomy, and you see a new trade paradigm forming: exporting SOTA-90% reasoning capability to the world in a metered, priceable way. This isn't a story about the tech race. It's a story about cost structure.

2026-02-20

Stop Using OpenClaw Like Claude Code: The Moat Is in Single-Agent Depth

Multi-agent is powerful. But if your goal is an AI that truly makes decisions for you and handles your life, you need a single agent with complete, evolving memory. This isn't about technical sophistication — it's about choosing the right direction.

2026-02-20

The Channel War: OpenClaw, Anthropic, and Who Gets to Decide the Future of AI Agents

OpenClaw v2.19 shipped an Apple Watch MVP. Anthropic blocked OAuth to shut out third-party subscribers. Sam Altman recruited Peter Steinberger and embraced open source. Put these three things together and you see something beyond a technical competition — you see the most brutal reality of the AI industry: whoever controls the channel decides the model's fate.

2026-02-17

Lunar New Year 2026: China's Open-Source LLMs Explode — How to Choose Between Kimi, Qwen, GLM, and MiniMax

Lunar New Year 2026 saw China's open-source LLMs go off all at once. Kimi K2.5's Agent Swarm, Qwen3.5's -60% cost, GLM-5's Intelligence Index 50+, MiniMax M2.5's speed crown. A head-to-head comparison of all four contenders with benchmarks, cost data, and real-world selection logic.

2026-02-08

OpenClaw Week: From the Claude Code 1.5 Era to a Digital Jarvis | Weekly Vlog EP8

A week-long deep dive into OpenClaw — from creator Peter's builder philosophy to the Memory architecture (AGENTS.md, SOUL.md), three token-saving tricks (cut 50%+ easily), and the 'new employee' enterprise security strategy. The AI agent that comes closest to a real digital Jarvis. Worth your time to understand it properly.

2026-02-07

How to Deploy OpenClaw in an Enterprise: Treating an AI Agent Like a Brand-New Employee

At first, I didn’t dare to put OpenClaw inside the company network. Then I changed my framing: treat it like a new hire. Employee accounts, permission requests, code-level guardrails, manual email filtering—manage AI the same way you manage a junior employee.

2026-02-06

OpenClaw Token Optimization Guide: How to Cut AI Agent Operating Cost by 97%

Real intelligence isn’t paying for the most expensive model—it’s careful prompt and system design. This post shares five core optimization strategies—session initialization, model routing, local heartbeats, prompt caching, and rate limiting—shown in practice to reduce OpenClaw cost from ~$1,500/month to under $50.

2026-02-05

OpenClaw, Fully Dissected: Six Layers Every Agent Engineer Should Learn

From message in to response out, the pipeline passes through six key components. Understand this chain and you’ll see why OpenClaw feels more like an ‘employee’ than Claude Code.

2026-02-04

I Used OpenClaw Intensively for a Few Days: Why I Decided to Put It Into My Workflow

OpenClaw feels like a real human assistant—very alive. After half a day, I decided to make it part of my workflow. From logging TODOs, sending calendar invites, transcribing meetings, connecting to internal systems, to even editing videos—this is what an agent should be.

2026-02-03

OpenClaw Memory System, Fully Explained: SOUL.md, AGENTS.md, and the Painful Token Bill

When your agent burns 5 million tokens a day, the question isn’t ‘Is it smart?’—it’s ‘Is this context construction efficient?’ This deep dive dissects OpenClaw’s file-first memory architecture, explains the design philosophy behind SOUL.md and AGENTS.md, and why it would rather sacrifice efficiency than adopt RAG.

2026-02-02

From $116M Exit to 100K GitHub Stars in 10 Days: Peter Steinberger's Agentic Engineering Philosophy

"Don't waste your time on stuff like RAG, subagents, Agents 2.0 or other things that are mostly just charade. Just talk to it." — Peter Steinberger. The story of a retired engineer who sold his company for $116M, then hacked a 100K-star GitHub project in an hour.

2026-02-01

Personal AI Assistant Showdown Week: Claude Code vs OpenClaw — Different Paths, Same Destination | Weekly Vlog EP7

Twenty years ago, Unix Power Tools said: 'The command line is the best GUI in the world.' I didn’t get it then. Two decades later, LLMs proved how prophetic that line was. This week Claude Code keeps leading, OpenClaw appears out of nowhere—and the two routes are converging toward the sweet spot of Agent 2.0.

2026-01-30

A Top 0.01% Cursor Power User Switches to Claude Code: A Full Breakdown of the Five Pillars of Agentic Coding

When a globally top-tier user officially recognized by Cursor chooses to leave a familiar tool for Claude Code, it’s not just switching tools—it’s a paradigm shift in how ‘AI programming should be done.’ Silen Naihin’s long essay explains the five pillars of Agentic Coding: Context Management, Planning, Closing the Loop, Verifiability, Debugging.

2026-01-29

Moltbot Security Hardening in Practice: A Complete Four-Layer Defense-in-Depth Guide for AI Agents

You don’t need to be a security expert—just be willing to spend an afternoon reading the docs carefully. This post distills Moltbot community battle-tested experience into a four-layer defense-in-depth playbook: Isolation, Quarantine, Rollback, and Transparency. It covers AI Agent Security, Prompt Injection Defense, LLM Agent Security, and an end-to-end Agentic Security framework.

2026-01-28

500 AI Assistants Exposed on the Public Internet: The Clawdbot 0.0.0.0 Configuration Disaster

When ‘out of the box’ becomes a product feature, you may be opening a backdoor for users. Nearly 1,000 Clawdbot servers were exposed to the public internet due to a default 0.0.0.0 bind, allowing anyone to take over your AI assistant, steal sensitive files, and potentially drain your crypto wallet.

2026-01-26

When Unix Philosophy Meets AI: The Command Line Renaissance

When I was a kid I read a book called Unix Power Tools. There was a line I remembered for almost twenty years: ‘Command line pipeline is the best UI interface in the world.’ Back then I had no idea what it meant. But after Claude Code burst onto the scene in April 2025, I finally understood: a brain that understands the world through text plugged into an interface that exposes the world’s state through text. This isn’t retro—it’s structurally the most reasonable choice.

2026-01-24

The Invisible Crisis in AI Coding: Why Constantly Clicking “Yes” Becomes Your Biggest Security Vulnerability

The biggest risk of AI coding tools isn’t that the model is dumb—it’s that you automated your own judgment. From Cursor RCE to GitHub Copilot flaws, this post explains how prompt injection becomes real-world attacks, and how to use CLAUDE.md to establish security boundaries.

2026-01-07

Implementing the CaMeL Agent Architecture in PostgreSQL: Designing Unbypassable AI Memory and Permission Isolation with RLS

This post shows how to implement Google DeepMind’s CaMeL two-layer agent architecture in PostgreSQL. Using native database Roles and Row-Level Security (RLS), we design an unbypassable AI-memory isolation mechanism to defend against prompt injection and privileged-agent loss of control.

2026-01-05

CaMeL: Google DeepMind’s Prompt-Injection Defense Architecture

Simon Willison called this ‘the first credible prompt injection defense’ he’s seen. CaMeL’s core design splits one agent into two: a low-privilege agent that reads external data, and a high-privilege agent that makes decisions—so ‘reading data’ and ‘taking actions’ are always separated.

2026-01-02

Why AI Guardrails Are Doomed to Fail: From Prompt Injection to Secure Agent Architectures

A structured summary of Lenny’s Podcast interviewing HackAPrompt CEO Sander Schulhoff. From an AI-agent architecture perspective, it explains why guardrails can’t stop prompt injection—why stateless defenses fail structurally, and why least privilege and architectural containment are the only viable AI security path.

2025-12-30

After 630K Lines in Three Months: In the AI Coding Era, What Is an Engineer’s Real Value?

Three months. 630,000 lines of code. This isn’t about bragging output—it’s about what matters after code becomes ‘cheap’: humans’ last core value is defining the work (kickoff/specs) and accepting it (review/QA).

2025-12-05

AI Agent Security: Why It’s Reshaping Enterprise Security Architecture (and It’s Not the Prompt Problem You Think)

From Salesforce ForcedLeak to Microsoft 365 Copilot EchoLeak, this post exposes the security blind spots of the AI-agent era. 94.4% of agents are vulnerable, and traditional WAF/APM becomes effectively useless. This isn’t fear-mongering—it’s what the research says.

2025-12-03

Anthropic Explains: Why Does Claude Code Feel So Good to Use?

Anthropic reveals a dual-agent architecture—Initializer Agent + Coding Agent—so long-running tasks don’t rely on brute model power, but on an engineered workflow.

2025-12-01

[Agent Part 5] Dual-Agent Architecture: Anthropic Explains the Internal Engineering Design Behind Why Claude Code Feels So Good

Anthropic published a dual-agent architecture—Initializer Agent + Coding Agent—using an engineered workflow to solve the ‘memory reset’ problem in long-running tasks.

2025-10-24

[AI Ops] Using an Agent (Claude Code) for Linux System Administration—Worth It? Absolutely.

Today I realized how insanely effective Claude Code is for Linux system admin. It not only writes code—it can also ‘excavate’ legacy programs, uncover hidden system secrets, and even find scripts written by former colleagues so you can leave work early and enjoy the boss life...

2025-10-19

[AI PM in Practice] Say Goodbye to the Project-Doc Maze: Use a Claude Code Agent to Digest Your Predecessor’s Massive Handover Fast

The previous PM left 50+ project docs and hundreds of files. With Claude Code + the GDrive MCP, I produced a full project summary, timeline, and tech-stack analysis in one day. A handover that used to take weeks can now run in the background while you go to meetings.