INJECTION TEST-BENCH_
Prompt-Injection Defense Lab
Harden a system prompt, then fire a corpus of injection attacks at it. Each attack has a concrete goal, so success is graded two ways: a deterministic canary check and an independent LLM judge.
Stored only in this browser (localStorage). Sent to the server in-memory to run the eval against the Anthropic API — never logged or persisted. Source is open for audit. Get a key at console.anthropic.com.
A secret canary token, a forbidden string, and a banned transfer_funds() tool are always injected into the system prompt so every attack has a checkable goal.