Built an AI chat for my portfolio. Cool. Then I realized - whats stopping someone from using it to write their essays, spam it with garbage, or try to jailbreak it into saying weird stuff?
Nothing. Unless you add guardrails.
Heres how I protected my chat from abuse without killing the user experience.
The Problem
AI chats are expensive to run. Every message costs tokens. Bad actors can:
- Spam - Send hundreds of messages to run up your bill
- Jailbreak - Try to bypass your instructions and misuse the AI
- Off-topic abuse - Use your chat for homework, coding, or whatever
- Harmful content - Try to get the AI to generate toxic stuff
Layered Defense
One layer isnt enough. You need multiple:
Layer 1: Rate Limiting
Cheapest defense. No LLM calls needed.
// Simple in-memory rate limiter
const rateLimitMap = new Map<string, { count: number; resetTime: number }>();
const RATE_LIMIT_WINDOW = 60 * 1000; // 1 minute
const MAX_REQUESTS = 10; // 10 messages per minute
function checkRateLimit(ip: string): boolean {
const now = Date.now();
const record = rateLimitMap.get(ip);
if (!record || record.resetTime < now) {
rateLimitMap.set(ip, { count: 1, resetTime: now + RATE_LIMIT_WINDOW });
return true; // allowed
}
if (record.count >= MAX_REQUESTS) {
return false; // blocked
}
record.count++;
return true;
}
10 messages per minute is plenty for legitimate users. Spammers hit the wall fast.
Layer 2: Violation Tracking with Redis
Rate limiting stops volume. But what about someone sending 9 jailbreak attempts per minute?
Track violations. Block repeat offenders. And heres the catch - in-memory tracking doesnt work on serverless.
Vercel spins up new function instances constantly. In-memory Maps reset. Your "blocked" users just refresh and theyre back.
I use Upstash Redis for persistent violation tracking:
// app/lib/redis.ts
import { Redis } from "@upstash/redis";
export const redis = new Redis({
url: process.env.UPSTASH_REDIS_URL!,
token: process.env.UPSTASH_REDIS_TOKEN!,
});
const VIOLATION_PREFIX = "chat:violations:";
const BLOCK_PREFIX = "chat:blocked:";
const MAX_VIOLATIONS = 2;
const BLOCK_DURATION = 24 * 60 * 60; // 24 hours in seconds
export async function incrementViolations(ip: string): Promise<number> {
const key = `${VIOLATION_PREFIX}${ip}`;
const newCount = await redis.incr(key);
if (newCount === 1) {
await redis.expire(key, BLOCK_DURATION);
}
return newCount;
}
export async function isUserBlocked(ip: string): Promise<boolean> {
const blocked = await redis.get<string>(`${BLOCK_PREFIX}${ip}`);
return blocked === "true";
}
export async function blockUser(ip: string): Promise<void> {
await redis.set(`${BLOCK_PREFIX}${ip}`, "true", { ex: BLOCK_DURATION });
}
Two strikes and youre blocked for 24 hours. Persists across function restarts, deployments, everything. The block follows you, not the server instance.
Layer 3: Input Validation
Dont send garbage to your LLM. Validate first.
const MAX_MESSAGE_LENGTH = 500;
function validateInput(messages: Message[]) {
return messages.map(msg => ({
...msg,
content: msg.content.slice(0, MAX_MESSAGE_LENGTH),
}));
}
500 characters is enough for a real question. Not enough for someone trying to paste their entire codebase.
Layer 4: Prompt Injection Detection
This is where Mastra shines. Built-in PromptInjectionDetector catches:
- Direct injection attempts ("ignore previous instructions...")
- Jailbreak patterns ("pretend youre DAN...")
- System override attempts ("your new instructions are...")
import { PromptInjectionDetector } from "@mastra/core/processors";
const agent = new Agent({
// ... config
inputProcessors: [
new PromptInjectionDetector({
model: "openai/gpt-4o-mini",
threshold: 0.7,
strategy: "block",
detectionTypes: ["injection", "jailbreak", "system-override"],
}),
],
});
Uses an LLM to detect attacks. Costs a bit per message but catches stuff regex never would.
Layer 5: Content Moderation
Block harmful content before it reaches your main LLM.
import { ModerationProcessor } from "@mastra/core/processors";
const agent = new Agent({
// ... config
inputProcessors: [
new ModerationProcessor({
model: "openai/gpt-4o-mini",
threshold: 0.7,
strategy: "block",
categories: ["hate", "harassment", "violence", "sexual"],
}),
],
});
Someone sends hate speech? Blocked before it costs you tokens on your main model.
Layer 6: Topic Guard (Custom Processor)
Mastra's built-in processors are great, but I needed something custom: block off-topic messages before they waste tokens on my main LLM.
Someone asks "what's the capital of France?" - thats not about me or my work. Why should I pay to answer it?
// Custom TopicGuard processor
class TopicGuardProcessor {
readonly id = "topic-guard" as const;
private classifierAgent: Agent;
private threshold: number;
constructor(options: { model: string; threshold?: number }) {
this.threshold = options.threshold ?? 0.6;
this.classifierAgent = new Agent({
id: "topic-guard-classifier",
instructions: `You are a topic classifier. Determine if a message is on-topic
for a personal portfolio website chat.
ON-TOPIC: Questions about the owner's skills, experience, projects, blog posts
OFF-TOPIC: General knowledge, homework help, coding requests, random chat
Be LENIENT with greetings and ambiguous questions.
Be STRICT with obvious off-topic requests.`,
model: options.model,
});
}
async processInput(args: {
messages: MastraDBMessage[];
abort: (reason: string) => void;
}) {
const { messages, abort } = args;
const lastMessage = messages[messages.length - 1];
if (lastMessage.role !== "user") return messages;
const result = await this.classifierAgent.generate(
`Classify: "${lastMessage.content}"`,
{
structuredOutput: {
schema: z.object({
isOnTopic: z.boolean(),
confidence: z.number(),
reason: z.string(),
}),
},
}
);
if (!result.object.isOnTopic && result.object.confidence >= this.threshold) {
abort(`off-topic: ${result.object.reason}`);
}
return messages;
}
}
Add it to your agent's input processors:
inputProcessors: [
new PromptInjectionDetector({ /* ... */ }),
new ModerationProcessor({ /* ... */ }),
new TopicGuardProcessor({
model: "openai/gpt-4o-mini",
threshold: 0.7,
}),
],
Now "help me with my math homework" gets blocked with a polite redirect, not a $0.01 response explaining why I cant help.
Layer 7: System Prompt Guardrails
Last line of defense. Tell the AI what its NOT supposed to do.
const instructions = `
## GUARDRAILS - What You Will NOT Do
**Stay on topic. You ONLY discuss:**
- My skills, experience, and work history
- My blog posts and technical knowledge
- My side projects
**Politely decline if asked to:**
- Write code, essays, or content unrelated to me
- Answer general knowledge questions
- Help with homework or assignments
- Roleplay as someone else
**Response to off-topic requests:**
"Hey, Im just here to chat about Tawan and his work.
If you wanna know about my skills or projects - ask away!"
`;
Even if someone bypasses all other layers, the AI itself refuses to cooperate.
Putting It All Together
In the API route:
import { isUserBlocked, incrementViolations, blockUser, MAX_VIOLATIONS } from "../lib/redis";
export async function POST(req: Request) {
const ip = getClientIP(req);
// Layer 2: Check if blocked (Redis - persists across function instances)
const blocked = await isUserBlocked(ip);
if (blocked) {
return Response.json({ error: "Temporarily blocked" }, { status: 403 });
}
// Layer 1: Rate limit (in-memory is fine here - its per-request)
if (!checkRateLimit(ip)) {
return Response.json({ error: "Too many messages" }, { status: 429 });
}
// Layer 3: Validate input
const params = await req.json();
params.messages = validateInput(params.messages);
try {
// Layers 4-7 happen inside Mastra's inputProcessors
const stream = await handleChatStream({
mastra,
agentId: "myAgent",
params,
});
return createUIMessageStreamResponse({ stream });
} catch (error) {
// Guardrail triggered - record violation to Redis
if (isGuardrailError(error)) {
const violations = await incrementViolations(ip);
const warningsLeft = MAX_VIOLATIONS - violations;
if (warningsLeft <= 0) {
await blockUser(ip);
}
return Response.json({
error: "policy_violation",
message: warningsLeft > 0
? `Message flagged. ${warningsLeft} warning(s) left.`
: "Youve been blocked for 24 hours.",
blocked: warningsLeft <= 0,
}, { status: 400 });
}
throw error;
}
}
Cost vs Protection Tradeoff
Free layers catch most abuse. Rate limiting alone stops 90% of spam. Redis costs pennies for violation tracking.
Paid layers (PromptInjectionDetector, ModerationProcessor, TopicGuard) add ~$0.001-0.01 per message. Three LLM calls before your main response - but they use gpt-4o-mini which is cheap.
System prompt is basically free - its part of your main LLM call anyway.
Results
Since adding these guardrails:
- Zero successful jailbreak attempts
- Spam attempts blocked at rate limit layer
- Off-topic requests politely declined
- API costs predictable and controlled
The key is layering. Each layer catches what the previous one missed. And the cheap layers run first so you dont waste money on obvious spam.
Further Reading
- Building an AI Chat That Knows Me - The full chat implementation
- Mastra Guardrails Docs - Official documentation
- OWASP LLM Top 10 - Security risks for LLM apps
