A Skill Is a Folder, Not a Prompt: What Anthropic Learned Running Hundreds of Them

TL;DR

Anthropic published lessons from running hundreds of Claude Code Skills across its engineering organization, describing Skills as reusable folders that agents can discover, read and run. The company says verification Skills had the largest measured effect on output quality, while questions remain about curation, context cost and how the practice scales outside Anthropic.

Anthropic has published a detailed account of how its engineering organization uses hundreds of Claude Code Skills, arguing that the units are folder-based packages of instructions, scripts, references and templates rather than saved prompts. The development matters because it shows how one major AI company is trying to turn repeated prompting into shared, versioned operating knowledge for coding agents.

The write-up, attributed to Claude Code engineer Thariq Shihipar and published on Anthropic’s Claude blog on June 3, 2026, describes a Skill as a discoverable folder containing a root SKILL.md file plus optional references, scripts, templates, configuration and hooks. According to Anthropic, the agent reads the root instructions first and pulls in deeper material only when the task requires it.

Anthropic’s account says its internal Skills fall into nine broad categories, including API references, product verification, data fetching and analysis, business-process automation, code scaffolding, code review, CI/CD, runbooks and infrastructure operations. The company said verification Skills, which check agent work rather than only guide generation, produced the strongest measured improvement in output quality.

The July 1 industry dispatch from Thorsten Meyer AI frames the post as more than a developer how-to. Its central reading is that Skills can serve as institutional memory: standard procedures, scripts and hard-won caveats that agents can apply repeatedly instead of relying on a user to restate instructions each session.

At a glance
reportWhen: published June 3, 2026; discussed in a…
The developmentAnthropic published a Claude Code engineering write-up on June 3, 2026 , detailing what it learned from using hundreds of reusable Skills inside its own engineering organization.
AI Dispatch · Insights · 1 July 2026

A Skill is a folder, not a prompt

Anthropic published what it learned running hundreds of Skills across its own engineering org. Read as a business memo, the point is bigger than a coding trick: this is how ad-hoc prompting becomes durable institutional capability — the SOPs your agents actually follow, versioned and shared.

✕ The misconception

“A Skill is just a clever markdown prompt you save in a file.”

✓ What it actually is

A folder the agent can discover, read & run — instructions, scripts, references, templates, config & on-demand hooks.

Anatomy of a Skill — the file system is context engineering
my-skill/the unit you share & version
├─ SKILL.mdroot instructions + a description written for the model (its trigger)
├─ references/deep detail pulled in only when needed — progressive disclosure
├─ scripts/real code, so the agent composes instead of rebuilding boilerplate
├─ assets/templates & files to copy into the output
├─ config.jsonsetup the agent asks for if it’s missing (e.g. which Slack channel)
└─ hooks + memoryon-demand guardrails + an append-only log so it remembers
Why it matters: the folder itself is the knowledge base. The agent reads the root, then reaches deeper only when the task demands it — the same way you’d hand a new hire a one-pager that points to the detailed docs.
The nine types — a gap-analysis map for your own library
1Library / API reference
2Product verification ★ top impact
3Data fetching & analysis
4Business-process automation
5Code scaffolding & templates
6Code quality & review
7CI/CD & deployment
8Runbooks
9Infrastructure operations
By Anthropic’s own measurement, verification Skills — the ones that check the work — moved output quality the most. If you build one category well, build that one.
The craft — what separates a good Skill from a useless one
Gotchas = highest-signal section Describe for the model, not humans (it’s the trigger) Don’t state the obvious Ship scripts, not just prose On-demand guardrail hooks (/careful, /freeze) Let it remember (log / SQLite) Don’t railroad — leave room to adapt
The take

The knowledge of how your organization actually operates can be captured, versioned, shared & executed — and the thing capturing it is a humble folder with a script and a gotchas list inside. For the builder, that’s context engineering with real tools attached. For whoever owns the budget, it’s the difference between AI that starts from zero every morning and an asset that compounds. Caveats: best practices are still evolving, checked-in Skills cost context, and curation beats accumulation. Start with one Skill, one gotcha, and the category that catches your mistakes.

Source: “Lessons from building Claude Code: How we use skills,” Thariq Shihipar (Anthropic), Claude blog, 3 June 2026. Categories, examples & measured claims are Anthropic’s; framing is the author’s. Docs: code.claude.com/docs/en/skills.
thorstenmeyerai.com

From Prompts to Shared Assets

The main business implication is repeatability. If a team can package its preferred review checks, deployment steps or product-specific rules into a Skill, the agent can apply the same guidance across users and projects. That could reduce variation between a senior engineer’s workflow and a new team member’s agent-assisted work.

The approach also changes how companies may think about AI adoption costs. Anthropic’s framing suggests that reusable Skills can become a maintained asset, closer to internal tooling or documentation than to one-off prompt craft. The Thorsten Meyer AI dispatch says the practical difference is between an agent that starts from zero and one that carries a versioned record of how work is done.

The strongest claim in the source material is tied to quality control. Anthropic says verification Skills had the highest impact in its own measurements. If that holds more broadly, teams may get more value first from Skills that catch mistakes than from Skills that only speed up initial code generation.

From Scripting To Systems: A Practical Guide to Using AI Workflows That Save Time, Reduce Errors, and Make You the Go-To Tech Expert

From Scripting To Systems: A Practical Guide to Using AI Workflows That Save Time, Reduce Errors, and Make You the Go-To Tech Expert

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

How Anthropic Defines Skills

The report pushes back on the idea that a Skill is simply a markdown prompt. In Anthropic’s model, the folder itself is the unit: it can include instructions, supporting documents, runnable code, reusable assets, configuration questions and hooks that apply while the Skill is active.

That design reflects a broader shift in agent tooling toward context engineering, where systems decide what information to load, when to load it and which tools to run. Anthropic describes the Skill description as being written for the model because it helps the agent decide when the Skill should be used.

The company’s guidance also warns against overloading a Skill with obvious prose. The source material emphasizes scripts over repeated explanation, short “gotchas” over long manuals, and room for the agent to adapt when a task does not fit the pattern exactly.

“A Skill is a folder, not a prompt.”

— Thorsten Meyer AI dispatch, July 1, 2026

50 AI Workflows for Engineers: From Debugging to System Design, Code Review & Engineering Automation

50 AI Workflows for Engineers: From Debugging to System Design, Code Review & Engineering Automation

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Limits Outside Anthropic’s Org

Several points remain unclear. The supplied material does not provide the underlying measurement data behind Anthropic’s claim about verification Skills, including sample size, evaluation method or how quality gains were calculated.

It is also not yet clear how well the pattern transfers to smaller teams, non-engineering departments or companies without mature internal documentation. The source material says best practices are still evolving and warns that checked-in Skills can consume context, meaning teams may need to balance reuse against agent attention and performance.

Another open issue is governance. A large Skills library could become noisy if teams keep adding folders without review. The dispatch’s caveat is direct: curation matters more than accumulation.

50 Top Tools for Coaching: A Complete Toolkit for Developing and Empowering People

50 Top Tools for Coaching: A Complete Toolkit for Developing and Empowering People

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Teams Test Smaller Libraries

The next step for teams following Anthropic’s guidance is likely to be small, targeted Skill creation rather than building large libraries at once. The source material recommends starting with one Skill, one known failure mode and the category that catches the most mistakes.

For engineering groups, that points first to verification workflows: product checks, review rules, deployment checks or recurring edge cases that agents often miss. Anthropic’s published guidance and Claude Code documentation are expected to remain the reference points as teams test whether the folder-based model improves agent consistency in their own environments.

Amazon

versioned AI instruction folders

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

What did Anthropic publish about Claude Code Skills?

Anthropic published a June 3, 2026 Claude Code engineering post describing lessons from using hundreds of Skills across its engineering organization.

What is a Skill in this report?

A Skill is described as a discoverable folder that can contain instructions, scripts, references, templates, configuration and hooks. It is not only a saved prompt.

Which Skill category had the biggest reported impact?

According to the supplied source material, Anthropic said verification Skills had the strongest measured effect on output quality.

Why does this matter for companies using AI agents?

It suggests that teams can turn repeated instructions and internal procedures into shared, versioned assets that agents can reuse across tasks.

What is still unknown?

The available material does not show the full measurement method behind Anthropic’s quality claims, and it remains unclear how the approach scales in organizations with less mature tooling or documentation.

Source: Thorsten Meyer AI

Wellness content on this site is informational and not a substitute for professional medical guidance.

You May Also Like

Memory Workout and an Antidote to Worry

Recent studies suggest that targeted memory exercises may help reduce worry and anxiety, offering a new approach to mental wellness.

DoorDash App Outage: Is DoorDash’s Mobile App Down? Thousands of Users Across US Report Checkout Failures & Error Screens | DoorDash Mobile App Downdetector Status

Thousands of users across the US report outages on the DoorDash app, experiencing checkout failures and error screens. The issue is ongoing and unconfirmed when fully resolved.

Acoustic Dampening, Placement, and the “Rig in the Closet” Setup

Learn how to quiet your noisy rig, optimize placement, and build a functional closet booth. Practical tips, real examples, and common pitfalls explained.

Man never giving up on his paralyzed dog pays off!

A man’s unwavering dedication to his paralyzed dog has led to a remarkable improvement in the dog’s condition, highlighting the power of perseverance and care.