Self-Improving Claude Code Skill Loop
An automation loop that runs any Claude Code skill 10 times with varied inputs, scores outputs against 3-5 binary eval criteria, rewrites the skill prompt to fix common failures, and repeats until the score plateaus. Converts inconsistent skills (good 70% of the time, unusable 30%) into production-grade reliable outputs. Inspired by Karpathy's auto-research loop — the same method AI labs use to improve their own models, applied to creative and engineering workflows. One loop → 10 test runs, scored against an eval, prompt rewritten, retested, winner kept. A hook writer skill went from 32/50 → 47/50 overnight. Works on any Claude Code skill: hooks, briefs, ad copy, scripts, reports, agents. Outputs: hardened skill prompt, scored improvement log (exactly what changed + why), reusable eval criteria, and a method that compounds over time.
Integrations
Source
https://mikefutia.com/claude-code-self-improving-lm/Discovered on Twitter via @mikefutia