I wanted visitors to my portfolio to ask questions and get real answers - not generic AI fluff, but actual info about my skills, experience, and what Ive written about. So I built an AI chat that uses my blog posts as its knowledge base.
Heres how it works and why I built it this way.
The Problem
Most portfolio sites are static. You scroll, you read, you leave. If someone wants to know "does this person know about microservices?" they gotta dig through blog posts manually.
I wanted something better - an AI that actually knows me. Not pretending to know me. Actually trained on my content.
The Thinking Behind It
Before writing any code, I had to make some decisions. Let me walk you through my thought process.
Why RAG Instead of Fine-Tuning?
I considered three approaches:
Fine-tuning means training a model on my content. Sounds cool but its expensive, takes time, and every time I write a new blog post Id need to retrain. Not practical for a portfolio.
Prompt stuffing means shoving all my content into the system prompt. Problem is, I have 35 blog posts. Thats way more then fits in a context window, and even if it did, Id be paying for all those tokens on every single message.
RAG is the sweet spot. I embed my content once, store it in a vector database, and only retrieve whats relevant to each question. New blog post? Just re-run the embedding script. Takes 30 seconds.
Why Upstash Vector?
I started with LibSQL (SQLite with vector extensions) for local development. Zero cost, no external services, just a file on disk. Perfect for development.
Then I deployed to Vercel.
Vercel's serverless functions run on a read-only filesystem. You cant create directories or write files. My local SQLite approach? Dead on arrival.
I already use Upstash Redis for rate limiting and violation tracking, so Upstash Vector was a natural choice. Same dashboard, same billing, works perfectly in serverless.
The trade-off? Network latency for vector queries. But for a portfolio chat, the difference is imperceptible - were talking milliseconds.
The Context Window Problem
Heres something that tripped me up initially. LLMs have limited context windows. Even with 128k tokens, you cant just dump everything in.
With RAG, I only pull in whats relevant. Someone asks about WebSockets? They get my WebSocket content, not my AWS cost optimization post. This keeps responses focused and costs low.
Chunking Strategy
I chunk blog posts into ~1000 character segments with 200 character overlap. Why?
Too small chunks = Lost context. A sentence about "use Redis for this" doesnt help without knowing what "this" is.
Too large chunks = Wasted tokens. Pull in a 5000 character chunk when you only need one paragraph? Youre paying for irrelevant text.
Overlap = Continuity. If an important concept spans two chunks, the overlap ensures both chunks capture it.
await doc.chunkRecursive({
maxSize: 1000, // ~250 words per chunk
overlap: 200, // 20% overlap for continuity
});
35 posts became 344 chunks. Each chunk has metadata linking back to its source post, so the AI can cite where it got the info.
Memory and Conversation Context
This is where it gets interesting. Right now, the chat has no persistent memory. Each conversation starts fresh.
Why No Persistent Memory (Yet)?
I made a deliberate choice here:
- Privacy - I dont want to store visitor conversations. No database of "what did people ask about me"
- Simplicity - Memory adds complexity. Storage, retrieval, context management
- Use case - Portfolio chat is typically short. Ask a few questions, get answers, done
Conversation Context vs Long-Term Memory
Theres a difference between:
- Conversation context - What we talked about in this session. The AI SDK handles this automatically via message history
- Long-term memory - Remembering you across sessions. "Hey, you asked about microservices last week"
I have conversation context (the AI remembers earlier messages in the same chat) but no long-term memory. For a portfolio, this makes sense.
Token Management & Message Compaction
Each message in the conversation adds to the context. GPT-4o-mini has a 128k context window, but you dont want every chat eating through thousands of tokens.
I implemented message compaction on both client and server:
Client-side - Shows a notice when older messages are trimmed:
// Keep ~10 exchanges visible (20 messages)
const MAX_VISIBLE_MESSAGES = 20;
if (apiMessages.length >= COMPACTION_THRESHOLD) {
const recentMessages = apiMessages.slice(-MAX_VISIBLE_MESSAGES);
setLocalMessages([WELCOME_MESSAGE, COMPACTION_NOTICE, ...recentMessages]);
}
Server-side - Adds context when trimming for the LLM:
// app/api/chat/route.ts
function compactMessages(messages) {
if (messages.length <= MAX_MESSAGES_TO_LLM) return messages;
const recentMessages = messages.slice(-MAX_MESSAGES_TO_LLM);
const contextSummary = {
role: "system",
content: "Note: This is a continued conversation. Earlier messages trimmed.",
};
return [contextSummary, ...recentMessages];
}
This keeps costs predictable and prevents the "context window exceeded" error on long conversations. Most portfolio chats are 3-5 messages anyway, but the compaction is there for power users who really dig in.
The Architecture
The secret sauce is RAG - Retrieval Augmented Generation. Instead of hoping the AI magically knows stuff, you give it a knowledge base to search.
My knowledge base? My blog posts. Every post I write becomes searchable knowledge about my skills.
Tech Stack
Heres what powers the chat:
- AI Framework: Mastra - Agent framework with built-in RAG tools
- Frontend: AI SDK React useChat hook - Streaming responses, easy state management
- Vector Store: Upstash Vector - Serverless vector DB, works on Vercel
- Embeddings: OpenAI text-embedding-3-small - Good quality, cheap, fast
- LLM: GPT-4o-mini - Fast responses, good enough for chat
- UI: React + Tailwind - iOS-style slide-up chat panel
- Rate Limiting: Upstash Redis - Violation tracking and blocking
The Mastra Setup
Mastra made this surprisingly easy. Its an AI agent framework that handles alot of the RAG plumbing for you.
First, I set up the vector store:
// src/mastra/rag/vector-store.ts
import { UpstashVector } from "@mastra/upstash";
export const vectorStore = new UpstashVector({
id: "blog-vector-store",
url: process.env.UPSTASH_VECTOR_URL!,
token: process.env.UPSTASH_VECTOR_TOKEN!,
});
export const BLOG_INDEX_NAME = "blog_posts";
export const EMBEDDING_DIMENSION = 1536;
Upstash Vector is serverless-native. Works on Vercel, Cloudflare Workers, anywhere. No filesystem needed, just environment variables.
Embedding My Blog Posts
Every blog post gets chunked and embedded:
// src/mastra/rag/embed-blogs.ts
import { MDocument } from "@mastra/rag";
import { embedMany } from "ai";
import { openai } from "@ai-sdk/openai";
export async function embedBlogPosts() {
const posts = getAllBlogPosts(); // 35 posts
const allChunks = [];
for (const post of posts) {
const doc = MDocument.fromMarkdown(
`# ${post.title}\n\n${post.summary}\n\n${post.content}`
);
await doc.chunkRecursive({
maxSize: 1000,
overlap: 200,
});
const chunks = doc.getDocs();
// Add metadata: slug, title, tags, url
allChunks.push(...chunks);
}
// Embed and store
const { embeddings } = await embedMany({
model: openai.embedding("text-embedding-3-small"),
values: allChunks.map((c) => c.text),
});
await vectorStore.upsert({
indexName: BLOG_INDEX_NAME,
vectors: embeddings,
metadata: allChunks.map((c) => c.metadata),
});
}
35 blog posts become 344 searchable chunks. Each chunk knows which post it came from, so the AI can cite sources.
The Search Tool
Mastra has a createVectorQueryTool that handles semantic search:
// src/mastra/tools/blog-search.ts
import { createVectorQueryTool } from "@mastra/rag";
import { openai } from "@ai-sdk/openai";
export const searchBlogTool = createVectorQueryTool({
id: "search-blog",
description: `Search through Tawans blog posts to find relevant
information about his skills and knowledge.`,
vectorStore,
indexName: BLOG_INDEX_NAME,
model: openai.embedding("text-embedding-3-small"),
includeSources: true,
});
When someone asks "do you know about WebSockets?", the tool:
- Embeds the question
- Finds similar chunks in the vector store
- Returns relevant content with source URLs
Making the AI Sound Like Me
The agent needs to respond AS me, not ABOUT me. Heres the key part of the system prompt:
export const tawanAgent = new Agent({
id: "tawan-agent",
instructions: `You ARE Tawan - responding as him on his website.
Talk in first person.
Voice & Tone:
- Casual, friendly, direct. Like chatting with a friend.
- Use contractions without apostrophes: "thats", "dont", "Ive"
- Short sentences. Get to the point.
- Say things like "honestly", "pretty much", "the thing is"
When asked about skills/knowledge:
- ALWAYS search the blog first using search-blog tool
- Your blog posts ARE your documented expertise
- If you havent written about something, be honest about it
Remember: Youre Tawan. Thai developer in Sydney. Self-taught.
10+ years experience. Love building things.`,
model: "openai/gpt-4o-mini",
tools: { searchBlogTool, listBlogTopicsTool },
});
The AI has my full work history, my personality quirks, and instructions to search my blog before answering skill questions.
Prompt Design Deep Dive
Getting the AI to sound like me - not just answer as me - took some iteration. Heres how I structured the prompt.
Identity vs Instructions vs Constraints
I split the system prompt into three parts:
Identity Block - The foundation. This is who the AI IS:
- My background (Thai, self-taught, moved to Australia)
- Full work history with companies, dates, and what I built
- Hackathon wins and side projects
- Contact info and social links
Instructions Block - How to behave:
- Voice and tone guidelines
- When and how to use tools
- How to handle different question types
Constraints Block - Guardrails:
- Always search blog before answering skill questions
- Be honest if I havent written about something
- Dont make up experience I dont have
The Voice Problem
First attempt? Generic AI speak. "I would be happy to assist you with..." Nope.
I needed specific patterns:
// What I told the AI about my voice
`Voice & Tone:
- Casual, friendly, direct. Like chatting with a friend.
- Use contractions without apostrophes: "thats", "dont", "Ive"
- Short sentences. Get to the point.
- Say things like "honestly", "pretty much", "the thing is"`
The apostrophe thing is deliberate - thats how I actually type. I dont use apostrophes in casual writing. Its a small detail but it makes responses feel authentically me.
Tool-First Approach
Heres the key insight: I made the AI search my blog BEFORE answering skill questions.
Why? Two reasons:
- Grounding - The AI cant hallucinate about my experience because it has to cite real content
- Consistency - My blog is my documented expertise. The AI references what I actually wrote
`When asked about skills/knowledge:
- ALWAYS search the blog first using search-blog tool
- Your blog posts ARE your documented expertise
- If you havent written about something, be honest about it`
Handling Edge Cases
What happens when someone asks about something I havent blogged about? I gave explicit instructions:
`If you havent written about something, be honest:
"I havent blogged about that specifically, but..."
Then share what you know from your experience.`
This prevents the AI from making stuff up while still being helpful.
The Full Prompt Structure
const systemPrompt = `
You ARE Tawan - responding as him on his website.
${IDENTITY_BLOCK} // Who I am, history, personality
${INSTRUCTIONS_BLOCK} // How to respond, tools to use
${CONSTRAINTS_BLOCK} // What not to do, edge cases
Remember: Youre Tawan. Thai developer in Sydney.
Self-taught. 10+ years experience. Love building things.
`;
The result? Visitors chat with "me" - not an AI pretending to be me. The voice is right, the facts are grounded, and it knows when to say "I dont know."
The Chat UI
iOS-style slide-up panel. Nothing fancy, just clean:
Built with @ai-sdk/react which handles all the streaming complexity:
const { messages, sendMessage, status } = useChat({
transport: new DefaultChatTransport({
api: "/api/chat",
}),
});
const isLoading = status === "streaming" || status === "submitted";
The API route is dead simple thanks to Mastra:
// app/api/chat/route.ts
import { handleChatStream } from "@mastra/ai-sdk";
import { createUIMessageStreamResponse } from "ai";
export async function POST(req: Request) {
const params = await req.json();
const stream = await handleChatStream({
mastra,
agentId: "tawanAgent",
params,
});
return createUIMessageStreamResponse({ stream });
}
Why This Approach?
Blog posts = Skills documentation. Every time I write about microservices, WebSockets, or AI agents, that knowledge becomes searchable. The AI can answer "do you know X?" by actually finding my content about X.
No hallucinations about my experience. The AI searches real content, not making stuff up. If I havent written about something, it says so.
Scales automatically. Write more posts, run the embedding script, done. No manual updates to some skills list.
Sounds like me. Not generic AI speak. Casual, direct, how I actually write.
Cost effective. Upstash Vector has a generous free tier. GPT-4o-mini is cheap. Embeddings are one-time cost. Scales with usage.
Running It Yourself
If you want something similar:
# Install deps
npm install @mastra/core @mastra/rag @mastra/upstash @ai-sdk/react ai
# Set up Upstash Vector (free tier available)
# Get your URL and token from console.upstash.com
# Add to .env:
# UPSTASH_VECTOR_URL=https://xxx.upstash.io
# UPSTASH_VECTOR_TOKEN=xxx
# Embed your content
npm run embed-blogs
# Start dev server
npm run dev
The embedding script runs once at build time. Takes about 30 seconds for 35 posts. Re-run it whenever you add new content.
Gotchas
Serverless = No local files. If you deploy to Vercel, Cloudflare, or any serverless platform, you cant use file-based vector stores like LibSQL. Use Upstash Vector, Pinecone, or similar cloud solutions.
Re-embed when content changes. Add a new blog post? Run the embedding script again. I might automate this in CI later.
Environment variables. Make sure UPSTASH_VECTOR_URL and UPSTASH_VECTOR_TOKEN are set in your deployment environment, not just locally.
AI SDK v5 breaking changes. If youre upgrading from v4, the useChat API changed significantly. sendMessage({ text }) instead of the old handleSubmit.
Streaming + loading states. Dont show loading dots while streaming. Check status === "submitted" for waiting, status === "streaming" for receiving.
Guardrails & Security
Cant just let people use your AI chat for whatever they want. Someone will try to jailbreak it, ask it to write their homework, or just spam nonsense to run up your API bill.
I wrote a dedicated post about AI guardrails but heres the quick version of what I implemented:
Rate Limiting - 10 messages per minute per IP. Simple but effective.
Violation Tracking - Get blocked by guardrails twice? Youre blocked for 24 hours.
LLM-Powered Detection - Mastra has built-in processors:
inputProcessors: [
new PromptInjectionDetector({
model: "openai/gpt-4o-mini",
strategy: "block",
detectionTypes: ["injection", "jailbreak", "system-override"],
}),
new ModerationProcessor({
model: "openai/gpt-4o-mini",
strategy: "block",
categories: ["hate", "harassment", "violence"],
}),
],
System Prompt Guardrails - The AI only discusses my portfolio. Ask it to write code or do homework? Polite decline.
Whats Next
- Auto-embed on new blog posts via GitHub Action
- Add conversation memory so it remembers context across sessions
- Maybe voice input? Would be cool to literally talk to my portfolio
- Potentially add semantic memory using Mastras memory features
Further Reading
- Mastra Documentation - The AI framework I used
- Vercel AI SDK - React hooks for AI chat
- RAG Systems in Production - My deep dive on RAG patterns
