INJECTION TEST-BENCH_

Prompt-Injection Defense Lab

Harden a system prompt, then fire a corpus of injection attacks at it. Each attack has a concrete goal, so success is graded two ways: a deterministic canary check and an independent LLM judge.

ANTHROPIC API KEY — RUNS ON YOUR KEY○ required

Stored only in this browser (localStorage). Sent to the server in-memory to run the eval against the Anthropic API — never logged or persisted. Source is open for audit. Get a key at console.anthropic.com.

SYSTEM PROMPT — PERSONA UNDER TEST

A secret canary token, a forbidden string, and a banned transfer_funds() tool are always injected into the system prompt so every attack has a checkable goal.

compare LLM judge (slower, ~2×)

DEFENSE STACK

Toggle layers and re-run to watch the matrix move.

0 of 5 layers active