Claude Code Ignores Skills Unless Forced Otherwise

Everyone using Claude Code knows this phrase: “You’re absolutely right!”

It means you interrupted Claude for a repeated mistake it hasn’t learned. The skill exists. The instructions are clear. Claude ignores them anyway.

The Problem: Skills Are Ignored Half the Time

Skills are markdown files that tell Claude how to do specific things. They’re supposed to turn prompts into programs. State machines with defined transitions.

The problem: LLM reasoning is non-deterministic. There’s no algorithmic skill selection. It’s all prompt-based. The model decides whether to use a skill based on vibes, not rules.

Hook-based triggers achieve about 50% activation at best. Flip a coin. That’s your skill activation rate.

Solutions That Don’t Fully Work

I’ve tried several approaches. All improve reliability. None guarantee it.

Keyword hooks trigger skills when certain words appear in prompts. The problems: collisions (multiple skills match), maintenance burden (updating trigger lists), and false negatives (paraphrased requests miss triggers).

Semantic routing uses embeddings to match prompts to skills. My implementation (Iris architecture) achieves 75-85% accuracy. Better than keywords, but still unreliable. The model can still choose to ignore the activated skill.

Middleware corrective loops detect when Claude ignores a skill and inject reminders. This helps, but adds latency and doesn’t solve the fundamental problem. Claude can ignore the reminder too.

Case Study: OpenClaw’s EnforcementHooks

OpenClaw uses a sophisticated hook system that achieves better results. Key features:

Two-tier hooks: plugin-level and internal hooks run separately. Priority-based execution (System hooks before User hooks). Eligibility checking based on OS, binaries, environment, and config. Fault isolation per hook.

This architecture may explain why chat integrations using similar patterns report higher success rates. The hooks aren’t just suggestions. They’re enforced at multiple levels.

The Real Solution: Sandboxing

The only reliable approach I’ve found: permission-based enforcement with isolated execution.

How it works:

  1. Skills define required phases (like TDD’s RED, GREEN, REFACTOR)
  2. The sandbox restricts available tools until phase requirements are met
  3. State machine enforces transitions: BLOCKED until RED phase complete
  4. Tools unlock only after conditions are verified

This achieves approximately 95% reliability versus 50% with prompting alone.

The key insight: the LLM should not decide IF it uses a tool. It should only populate PARAMETERS for tools that the system has already determined are required.

Prediction: The Big Unlock

Sandboxing is the architectural shift skills need.

Current state: Auto-Detection works, Auto-Activation fails half the time.

Future state: Detection triggers automatic enforcement. The model can’t proceed without following the skill. Not because we asked nicely, but because the tools won’t work otherwise.

This is where Claude Code skills become reliable enough for production workflows. Not through better prompts. Through better architecture.

The model should focus on what it’s good at (reasoning, generation, analysis). The system should handle what needs determinism (state management, phase gates, tool availability).

Stop asking. Start enforcing.


Coding With Agents