AI code review is either incredibly useful or incredibly annoying. Theres no middle ground.
The annoying version nitpicks variable names, suggests unnecessary refactors, and adds noise to every PR. The useful version catches bugs before they hit production, spots security issues humans miss, and saves senior developers hours of review time.
Ive built both. Heres how to build the useful one.
The Problem with Naive AI Review
The obvious approach doesnt work:
// Don't do this
const review = await llm.complete(`
Review this code and provide feedback:
${diff}
`);
This produces:
- Style suggestions nobody asked for
- False positives everywhere
- Generic advice that applies to any code
- No understanding of project context
The model sees code in isolation. It doesnt know your conventions, your architecture, or what the PR is actually trying to accomplish.
Architecture of Useful AI Review
A good system has multiple components working together:
Each step builds on the previous. You cant just throw a diff at an LLM and expect magic.
Step 1: Gather Rich Context
The diff alone is not enough. You need the full picture:
async function gatherContext(pr: PullRequest) {
const [
diff,
prDescription,
linkedIssues,
changedFiles,
projectConfig
] = await Promise.all([
github.getPRDiff(pr.id),
github.getPRDescription(pr.id),
github.getLinkedIssues(pr.id),
github.getChangedFilesWithContent(pr.id),
loadProjectConfig(pr.repo)
]);
// Get full file content, not just changed lines
const filesWithContext = await Promise.all(
changedFiles.map(async (file) => ({
path: file.path,
diff: file.diff,
fullContent: await github.getFileContent(pr.repo, file.path, pr.head),
// Also get related files (imports, tests)
relatedFiles: await findRelatedFiles(file.path, pr.repo)
}))
);
return {
diff,
description: prDescription,
intent: await extractIntent(prDescription, linkedIssues),
files: filesWithContext,
conventions: projectConfig.codeConventions,
ignorePatterns: projectConfig.reviewIgnore
};
}
The key insight here? Get the full file content, not just the diff. The model needs surrounding context to understand what the code is actually doing.
async function extractIntent(description: string, issues: Issue[]) {
// Use LLM to understand what this PR is trying to do
const intent = await llm.complete(`
Based on this PR description and linked issues, summarize:
1. What is this change trying to accomplish?
2. What are the acceptance criteria?
3. What areas are most critical to review?
PR Description:
${description}
Linked Issues:
${issues.map(i => `- ${i.title}: ${i.body}`).join('\n')}
Summary:`);
return intent;
}
Understanding intent is huge. A PR that adds user deletion needs very different scrutiny then one that updates button colors.
Step 2: Multi-Pass Analysis
Different types of issues need different prompts. Dont try to catch everything in one shot.
Security Review
async function securityReview(context: ReviewContext) {
const securityPrompt = `You are a security engineer reviewing code for vulnerabilities.
Focus ONLY on security issues:
- SQL injection
- XSS vulnerabilities
- Authentication/authorization flaws
- Secrets in code
- Insecure dependencies
- Input validation issues
- Path traversal
Project security requirements:
${context.conventions.security}
Changed files:
${context.files.map(f => `
=== ${f.path} ===
${f.fullContent}
`).join('\n')}
For each issue found, return JSON:
{
"issues": [
{
"file": "path/to/file.ts",
"line": 42,
"severity": "critical" | "high" | "medium",
"type": "security",
"issue": "brief description",
"suggestion": "how to fix",
"confidence": 0.0-1.0
}
]
}
If no security issues found, return {"issues": []}`;
const result = await llm.complete(securityPrompt);
return JSON.parse(result);
}
Bug Detection
async function bugDetection(context: ReviewContext) {
const bugPrompt = `You are a senior developer reviewing code for bugs.
This PR is trying to: ${context.intent}
Focus on:
- Logic errors
- Off-by-one errors
- Null/undefined handling
- Race conditions
- Resource leaks
- Error handling gaps
- Edge cases not covered
DO NOT comment on:
- Code style
- Naming conventions
- Refactoring opportunities
Changed code with surrounding context:
${context.files.map(f => `
=== ${f.path} ===
FULL FILE:
${f.fullContent}
CHANGES (diff):
${f.diff}
`).join('\n')}
Return JSON with issues found. Include line numbers from the FULL FILE, not the diff.`;
const result = await llm.complete(bugPrompt);
return JSON.parse(result);
}
The "DO NOT comment on" section is critical. Without it the model will nitpick everything and developers will start ignoring all comments.
Performance Review
async function performanceReview(context: ReviewContext) {
// Only for files that might have perf impact
const perfRelevantFiles = context.files.filter(f =>
f.path.includes('api/') ||
f.path.includes('db/') ||
f.path.includes('query') ||
f.fullContent.includes('SELECT') ||
f.fullContent.includes('fetch(')
);
if (perfRelevantFiles.length === 0) {
return { issues: [] };
}
// ... run perf analysis
}
Skip performance review for files that obviously dont need it. No point checking CSS files for N+1 queries.
Step 3: Filter and Rank
This is where most AI review systems fail. They report everything and overwhelm developers.
function filterAndRank(allIssues: Issue[]): Issue[] {
// Remove duplicates
const deduplicated = deduplicateIssues(allIssues);
// Filter by confidence
const confident = deduplicated.filter(issue =>
issue.confidence >= 0.7 ||
(issue.severity === 'critical' && issue.confidence >= 0.5)
);
// Sort by severity and confidence
const sorted = confident.sort((a, b) => {
const severityOrder = { critical: 0, high: 1, medium: 2, low: 3 };
const severityDiff = severityOrder[a.severity] - severityOrder[b.severity];
if (severityDiff !== 0) return severityDiff;
return b.confidence - a.confidence;
});
// Limit total comments to avoid overwhelming
return sorted.slice(0, 10);
}
The confidence threshold is important. Id rather miss a few real issues then flood every PR with false positives. Trust gets destroyed fast.
Step 4: Post Thoughtful Comments
Make comments actionable. Nobody likes "this might be a problem" without a solution.
function formatComment(issue: Issue): string {
const severityEmoji = {
critical: '๐จ',
high: 'โ ๏ธ',
medium: '๐ก',
low: 'โน๏ธ'
};
return `${severityEmoji[issue.severity]} **${issue.severity.toUpperCase()}**: ${issue.issue}
${issue.suggestion}
<details>
<summary>Why this matters</summary>
${issue.explanation || 'This could lead to issues in production.'}
</details>
---
<sub>๐ค AI Review | confidence: ${Math.round(issue.confidence * 100)}% | [false positive?](link-to-feedback)</sub>`;
}
Always include:
- What the problem is
- How to fix it
- Why it matters
- A way to report false positives
Handling False Positives
// Track when developers dismiss AI comments
async function handleCommentReaction(event: CommentEvent) {
if (event.reaction === '๐' || event.resolved_without_change) {
await feedbackStore.record({
issueType: event.comment.issueType,
file: event.comment.file,
wasHelpful: false,
context: event.comment.context
});
// If pattern of false positives emerges, adjust
const falsePositiveRate = await feedbackStore.getFalsePositiveRate(
event.comment.issueType
);
if (falsePositiveRate > 0.3) {
await alerting.notify(
`High false positive rate for ${event.comment.issueType}`
);
}
}
}
Use the feedback to improve prompts over time:
async function loadCalibration(issueType: string) {
const examples = await feedbackStore.getFalsePositives(issueType, 10);
if (examples.length > 0) {
return `
IMPORTANT: Avoid false positives like these previous mistakes:
${examples.map(e => `- ${e.file}: ${e.issue} (was NOT actually a problem)`).join('\n')}
`;
}
return '';
}
CI/CD Integration
Block merges for critical issues, but be smart about it:
async function checkPRStatus(pr: PullRequest): Promise<CheckResult> {
const issues = await runFullReview(pr);
const criticalIssues = issues.filter(i => i.severity === 'critical');
const highConfidenceCritical = criticalIssues.filter(i => i.confidence >= 0.85);
if (highConfidenceCritical.length > 0) {
return {
status: 'failure',
message: `${highConfidenceCritical.length} critical issues found`,
};
}
if (criticalIssues.length > 0) {
return {
status: 'neutral', // Warning but don't block
message: 'Potential critical issues (review recommended)',
};
}
return { status: 'success' };
}
Only block when confidence is really high. Nothing kills trust faster then blocking a PR for a false positive.
Project Configuration
Make your review configurable per project:
# .ai-review.yml
version: 1
review:
security: true
bugs: true
performance: true
tests: false # Don't review test files
thresholds:
block_merge: critical
require_response: high
comment_only: medium
context:
conventions: |
- Use async/await, not callbacks
- All API endpoints need authentication
- Database queries must use parameterized statements
ignore_patterns:
- "*.test.ts"
- "migrations/*"
- "generated/*"
Every codebase is different. The AI needs to know your rules.
Wrapping Up
Effective AI code review requires:
- Rich context - Full files, not just diffs
- Focused prompts - Separate passes for different concerns
- High precision - Filter aggressively, better to miss some then spam
- Actionable feedback - Tell devs how to fix, not just whats wrong
- Learning from mistakes - Track false positives and adapt
Start with security review only. Get that working well. Then add bug detection. Then performance. Each pass should earn its place by providing real value.
The goal isnt to replace human reviewers. Its to let them focus on architecture and design while AI handles the tedious pattern-matching stuff. When done right, everyone wins.
