Prompt Defend

Guardrails

Configure global security policies for your AI applications.

Protection Layers

Detect and block attempts to override system instructions.

Identify Personally Identifiable Information (Email, Phone, API Keys).

Block prompts that mention specific competitors.

Detect gibberish or high-entropy inputs (potential attacks).

Detect leaked internal prompts using canary tokens.

Block hate speech, violence, and other harmful content.

Detect LLM prompt delimiters ([INST], ChatML, XML tags).

Catch DAN jailbreaks, persona attacks, and "pretend" exploits.

Keyword Blocklist

379 keywords ยท 15 categories
Instruction Override

Ignore/bypass previous instructions

Developer Mode

Debug/admin/test mode activation

DAN Personas

DAN, AIM, and jailbreak personas

Prompt Extraction

Reveal/show system prompt

Safety Bypass

Disable filters/restrictions

Refusal Suppression

"Start with Sure", no disclaimers

Authority Claims

"CEO approved", fake authorization

Hypothetical Framing

"Hypothetically", "imagine if"

Emotional Manipulation

"I'm desperate", urgency appeals

Encoding Requests

Decode Base64/Hex/Rot13

Code Execution

eval(), exec(), run commands

Memory Manipulation

"Remember this", context attacks

Injection Markers

[SYSTEM], [PRIORITY] markers

SQL/XSS Attacks

DROP TABLE, alert(), scripts

Malicious Commands

Malware, phishing, hacking

Fast-Path Allowlist

631 safe phrases active

Reduces false positives by allowing known safe educational and security-related phrases to bypass strict checks.

Sensitivity Thresholds

0.5

Lower values are stricter. Default is 0.5.

5.5

Higher values allow more randomness. Default is 5.5.

4000 chars

Maximum allowed prompt length. Default is 4000.

Custom Rules

Add New Rule

Enter a keyword or a Regular Expression (Regex). Case-insensitive.

Active Rules

Loading rules...