Guardrails

Configure global security policies for your AI applications.

Protection Layers

Prompt Injection Detection

Detect and block attempts to override system instructions.

PII Detection

Identify Personally Identifiable Information (Email, Phone, API Keys).

Action Mode

Block Request (Returns 403) PII Masking- Redact & Process (Replaces with placeholders)

Competitor Blocking

Block prompts that mention specific competitors.

Competitor Keywords (comma separated)

Entropy Check

Detect gibberish or high-entropy inputs (potential attacks).

Canary Token Check

Detect leaked internal prompts using canary tokens.

Content Moderation

Block hate speech, violence, and other harmful content.

Delimiter Injection Detection

Detect LLM prompt delimiters ([INST], ChatML, XML tags).

Role-Play / Persona Detection

Catch DAN jailbreaks, persona attacks, and "pretend" exploits.

Keyword Blocklist

379 keywords · 15 categories

Instruction Override

Ignore/bypass previous instructions

Developer Mode

Debug/admin/test mode activation

DAN Personas

DAN, AIM, and jailbreak personas

Prompt Extraction

Reveal/show system prompt

Safety Bypass

Disable filters/restrictions

Refusal Suppression

"Start with Sure", no disclaimers

Authority Claims

"CEO approved", fake authorization

Hypothetical Framing

"Hypothetically", "imagine if"

Emotional Manipulation

"I'm desperate", urgency appeals

Encoding Requests

Decode Base64/Hex/Rot13

Code Execution

eval(), exec(), run commands

Memory Manipulation

"Remember this", context attacks

Injection Markers

[SYSTEM], [PRIORITY] markers

SQL/XSS Attacks

DROP TABLE, alert(), scripts

Malicious Commands

Malware, phishing, hacking

Fast-Path Allowlist

631 safe phrases active

Reduces false positives by allowing known safe educational and security-related phrases to bypass strict checks.

Sensitivity Thresholds

Injection Confidence Threshold 0.5

Lower values are stricter. Default is 0.5.

Entropy Threshold 5.5

Higher values allow more randomness. Default is 5.5.

Max Prompt Length 4000 chars

Maximum allowed prompt length. Default is 4000.

Custom Rules

Add New Rule

Active Rules

Loading rules...