Guardrails
Configure global security policies for your AI applications.
Protection Layers
Detect and block attempts to override system instructions.
Identify Personally Identifiable Information (Email, Phone, API Keys).
Block prompts that mention specific competitors.
Detect gibberish or high-entropy inputs (potential attacks).
Detect leaked internal prompts using canary tokens.
Block hate speech, violence, and other harmful content.
Detect LLM prompt delimiters ([INST], ChatML, XML tags).
Catch DAN jailbreaks, persona attacks, and "pretend" exploits.
Keyword Blocklist
379 keywords ยท 15 categoriesIgnore/bypass previous instructions
Debug/admin/test mode activation
DAN, AIM, and jailbreak personas
Reveal/show system prompt
Disable filters/restrictions
"Start with Sure", no disclaimers
"CEO approved", fake authorization
"Hypothetically", "imagine if"
"I'm desperate", urgency appeals
Decode Base64/Hex/Rot13
eval(), exec(), run commands
"Remember this", context attacks
[SYSTEM], [PRIORITY] markers
DROP TABLE, alert(), scripts
Malware, phishing, hacking
Fast-Path Allowlist
631 safe phrases activeReduces false positives by allowing known safe educational and security-related phrases to bypass strict checks.
Sensitivity Thresholds
Lower values are stricter. Default is 0.5.
Higher values allow more randomness. Default is 5.5.
Maximum allowed prompt length. Default is 4000.