Protecting Your AI Chat from Abuse

Built an AI chat for my portfolio. Cool. Then I realized - whats stopping someone from using it to write their essays, spam it with garbage, or try to jailbreak it into saying weird stuff?

Nothing. Unless you add guardrails.

Heres how I protected my chat from abuse without killing the user experience.

The Problem

AI chats are expensive to run. Every message costs tokens. Bad actors can:

Spam - Send hundreds of messages to run up your bill
Jailbreak - Try to bypass your instructions and misuse the AI
Off-topic abuse - Use your chat for homework, coding, or whatever
Harmful content - Try to get the AI to generate toxic stuff

Layered Defense

One layer isnt enough. You need multiple:

Layer 1: Rate Limiting

Cheapest defense. No LLM calls needed.

// Simple in-memory rate limiter
const rateLimitMap = new Map<string, { count: number; resetTime: number }>();
const RATE_LIMIT_WINDOW = 60 * 1000; // 1 minute
const MAX_REQUESTS = 10; // 10 messages per minute

function checkRateLimit(ip: string): boolean {
  const now = Date.now();
  const record = rateLimitMap.get(ip);

  if (!record || record.resetTime < now) {
    rateLimitMap.set(ip, { count: 1, resetTime: now + RATE_LIMIT_WINDOW });
    return true; // allowed
  }

  if (record.count >= MAX_REQUESTS) {
    return false; // blocked
  }

  record.count++;
  return true;
}

10 messages per minute is plenty for legitimate users. Spammers hit the wall fast.

Layer 2: Violation Tracking with Redis

Rate limiting stops volume. But what about someone sending 9 jailbreak attempts per minute?

Track violations. Block repeat offenders. And heres the catch - in-memory tracking doesnt work on serverless.

Vercel spins up new function instances constantly. In-memory Maps reset. Your "blocked" users just refresh and theyre back.

I use Upstash Redis for persistent violation tracking:

// app/lib/redis.ts
import { Redis } from "@upstash/redis";

export const redis = new Redis({
  url: process.env.UPSTASH_REDIS_URL!,
  token: process.env.UPSTASH_REDIS_TOKEN!,
});

const VIOLATION_PREFIX = "chat:violations:";
const BLOCK_PREFIX = "chat:blocked:";
const MAX_VIOLATIONS = 2;
const BLOCK_DURATION = 24 * 60 * 60; // 24 hours in seconds

export async function incrementViolations(ip: string): Promise<number> {
  const key = `${VIOLATION_PREFIX}${ip}`;
  const newCount = await redis.incr(key);
  if (newCount === 1) {
    await redis.expire(key, BLOCK_DURATION);
  }
  return newCount;
}

export async function isUserBlocked(ip: string): Promise<boolean> {
  const blocked = await redis.get<string>(`${BLOCK_PREFIX}${ip}`);
  return blocked === "true";
}

export async function blockUser(ip: string): Promise<void> {
  await redis.set(`${BLOCK_PREFIX}${ip}`, "true", { ex: BLOCK_DURATION });
}

Two strikes and youre blocked for 24 hours. Persists across function restarts, deployments, everything. The block follows you, not the server instance.

Layer 3: Input Validation

Dont send garbage to your LLM. Validate first.

const MAX_MESSAGE_LENGTH = 500;

function validateInput(messages: Message[]) {
  return messages.map(msg => ({
    ...msg,
    content: msg.content.slice(0, MAX_MESSAGE_LENGTH),
  }));
}

500 characters is enough for a real question. Not enough for someone trying to paste their entire codebase.

Layer 4: Prompt Injection Detection

This is where Mastra shines. Built-in PromptInjectionDetector catches:

Direct injection attempts ("ignore previous instructions...")
Jailbreak patterns ("pretend youre DAN...")
System override attempts ("your new instructions are...")

import { PromptInjectionDetector } from "@mastra/core/processors";

const agent = new Agent({
  // ... config
  inputProcessors: [
    new PromptInjectionDetector({
      model: "openai/gpt-4o-mini",
      threshold: 0.7,
      strategy: "block",
      detectionTypes: ["injection", "jailbreak", "system-override"],
    }),
  ],
});

Uses an LLM to detect attacks. Costs a bit per message but catches stuff regex never would.

Layer 5: Content Moderation

Block harmful content before it reaches your main LLM.

import { ModerationProcessor } from "@mastra/core/processors";

const agent = new Agent({
  // ... config
  inputProcessors: [
    new ModerationProcessor({
      model: "openai/gpt-4o-mini",
      threshold: 0.7,
      strategy: "block",
      categories: ["hate", "harassment", "violence", "sexual"],
    }),
  ],
});

Someone sends hate speech? Blocked before it costs you tokens on your main model.

Layer 6: Topic Guard (Custom Processor)

Mastra's built-in processors are great, but I needed something custom: block off-topic messages before they waste tokens on my main LLM.

Someone asks "what's the capital of France?" - thats not about me or my work. Why should I pay to answer it?

// Custom TopicGuard processor
class TopicGuardProcessor {
  readonly id = "topic-guard" as const;
  private classifierAgent: Agent;
  private threshold: number;

  constructor(options: { model: string; threshold?: number }) {
    this.threshold = options.threshold ?? 0.6;
    this.classifierAgent = new Agent({
      id: "topic-guard-classifier",
      instructions: `You are a topic classifier. Determine if a message is on-topic
        for a personal portfolio website chat.

        ON-TOPIC: Questions about the owner's skills, experience, projects, blog posts
        OFF-TOPIC: General knowledge, homework help, coding requests, random chat

        Be LENIENT with greetings and ambiguous questions.
        Be STRICT with obvious off-topic requests.`,
      model: options.model,
    });
  }

  async processInput(args: {
    messages: MastraDBMessage[];
    abort: (reason: string) => void;
  }) {
    const { messages, abort } = args;
    const lastMessage = messages[messages.length - 1];

    if (lastMessage.role !== "user") return messages;

    const result = await this.classifierAgent.generate(
      `Classify: "${lastMessage.content}"`,
      {
        structuredOutput: {
          schema: z.object({
            isOnTopic: z.boolean(),
            confidence: z.number(),
            reason: z.string(),
          }),
        },
      }
    );

    if (!result.object.isOnTopic && result.object.confidence >= this.threshold) {
      abort(`off-topic: ${result.object.reason}`);
    }

    return messages;
  }
}

Add it to your agent's input processors:

inputProcessors: [
  new PromptInjectionDetector({ /* ... */ }),
  new ModerationProcessor({ /* ... */ }),
  new TopicGuardProcessor({
    model: "openai/gpt-4o-mini",
    threshold: 0.7,
  }),
],

Now "help me with my math homework" gets blocked with a polite redirect, not a $0.01 response explaining why I cant help.

Layer 7: System Prompt Guardrails

Last line of defense. Tell the AI what its NOT supposed to do.

const instructions = `
## GUARDRAILS - What You Will NOT Do

**Stay on topic. You ONLY discuss:**
- My skills, experience, and work history
- My blog posts and technical knowledge
- My side projects

**Politely decline if asked to:**
- Write code, essays, or content unrelated to me
- Answer general knowledge questions
- Help with homework or assignments
- Roleplay as someone else

**Response to off-topic requests:**
"Hey, Im just here to chat about Tawan and his work.
If you wanna know about my skills or projects - ask away!"
`;

Even if someone bypasses all other layers, the AI itself refuses to cooperate.

Putting It All Together

In the API route:

import { isUserBlocked, incrementViolations, blockUser, MAX_VIOLATIONS } from "../lib/redis";

export async function POST(req: Request) {
  const ip = getClientIP(req);

  // Layer 2: Check if blocked (Redis - persists across function instances)
  const blocked = await isUserBlocked(ip);
  if (blocked) {
    return Response.json({ error: "Temporarily blocked" }, { status: 403 });
  }

  // Layer 1: Rate limit (in-memory is fine here - its per-request)
  if (!checkRateLimit(ip)) {
    return Response.json({ error: "Too many messages" }, { status: 429 });
  }

  // Layer 3: Validate input
  const params = await req.json();
  params.messages = validateInput(params.messages);

  try {
    // Layers 4-7 happen inside Mastra's inputProcessors
    const stream = await handleChatStream({
      mastra,
      agentId: "myAgent",
      params,
    });
    return createUIMessageStreamResponse({ stream });
  } catch (error) {
    // Guardrail triggered - record violation to Redis
    if (isGuardrailError(error)) {
      const violations = await incrementViolations(ip);
      const warningsLeft = MAX_VIOLATIONS - violations;

      if (warningsLeft <= 0) {
        await blockUser(ip);
      }

      return Response.json({
        error: "policy_violation",
        message: warningsLeft > 0
          ? `Message flagged. ${warningsLeft} warning(s) left.`
          : "Youve been blocked for 24 hours.",
        blocked: warningsLeft <= 0,
      }, { status: 400 });
    }
    throw error;
  }
}

Cost vs Protection Tradeoff

Free layers catch most abuse. Rate limiting alone stops 90% of spam. Redis costs pennies for violation tracking.

Paid layers (PromptInjectionDetector, ModerationProcessor, TopicGuard) add ~$0.001-0.01 per message. Three LLM calls before your main response - but they use gpt-4o-mini which is cheap.

System prompt is basically free - its part of your main LLM call anyway.

Results

Since adding these guardrails:

Zero successful jailbreak attempts
Spam attempts blocked at rate limit layer
Off-topic requests politely declined
API costs predictable and controlled

The key is layering. Each layer catches what the previous one missed. And the cheap layers run first so you dont waste money on obvious spam.