INJECTION TEST-BENCH_

Prompt-Injection Defense Lab

Harden a system prompt, then fire a corpus of injection attacks at it. Each attack has a concrete goal, so success is graded two ways: a deterministic canary check and an independent LLM judge.

Stored only in this browser (localStorage). Sent to the server in-memory to run the eval against the Anthropic API — never logged or persisted. Source is open for audit. Get a key at console.anthropic.com.

A secret canary token, a forbidden string, and a banned transfer_funds() tool are always injected into the system prompt so every attack has a checkable goal.