AI-Assisted Code Reviews with Claude Code: Best Practices March 2026
I spent a few months experimenting with AI-assisted code reviews before I found a setup that actually works. Most of my early attempts produced either obvious findings I'd catch myself or confident-sounding hallucinations that wasted verification time. The difference turned out to be structure: the right tools, the right context, and a two-pass workflow that keeps the model honest. Here's the exact setup, prompts, and best practices I rely on for code reviews with Claude Code in 2026.
The Stack
Claude Code (Opus 4.6, 1M context)
├── Jira MCP → ticket context
├── GH CLI → PRs, diffs, file trees
├── Figma MCP → design context for UI reviews
├── Context7 MCP → up-to-date library docs
└── Skills (curated)
├── vercel-react-best-practices
├── vercel-composition-patterns
├── next-best-practices
├── nodejs-backend-patterns
├── nestjs-best-practices
└── [+ any framework-specific skills for your stack]
Each piece is there for a reason. Let me walk through the why.
1M Context Window
This is what makes cross-file analysis possible. You can load an entire small-to-medium service — source files, configs, test suites — into a single session. The model sees relationships that humans miss when reading file-by-file: an unvalidated input in middleware that a controller three directories away trusts implicitly, a type assertion that silently breaks a downstream consumer, a race condition between two services that only appears when you read both.
One caveat: auto-compaction degrades quality in long sessions. When context gets compressed, the model works from summaries instead of actual code. I'll cover how to deal with this in the workflow section.
Jira MCP
A code review isn't just "does this code have bugs" — it's "does this code match what was asked for." By loading the Jira ticket directly, the model sees acceptance criteria, edge cases discussed in comments, and scope boundaries. This catches scope drift ("this PR adds a feature nobody asked for") and missing requirements ("the ticket says handle the offline case, but there's no offline handling here").
GH CLI
gh pr view and gh pr diff run directly from Claude Code's bash. The model gets full awareness of what changed, what files were touched, and the PR description. Combined with Jira context, it can cross-reference "what was requested" against "what was implemented."
Context7 MCP
This one eliminates an entire class of false positives. Without current docs, the model might flag a perfectly valid API call as "deprecated" because it was trained on older documentation. Context7 fetches the actual, current documentation for the library version you're using. Critical for fast-moving frameworks like Next.js, React, and anything in the TanStack ecosystem.
Figma MCP
For frontend reviews, this is a game-changer. The Figma MCP pulls design context — component specs, spacing, colors, layout — directly from Figma files. During a review, the model can compare the implementation against the actual design. Does the spacing match? Are the right design tokens used? Is the hover state implemented? It turns "does this match the mockup?" from a manual squint into a structured check.
Curated Skills — The Highest-Leverage Piece
Skills are structured context files that tell the model what "good" looks like for a specific domain. A recent paper on arXiv showed that curated context files significantly outperform LLM-generated ones — hand-picked, opinionated guidance beats auto-generated boilerplate every time.
I source my skills from skills.sh and customize them per stack. These aren't generic linting rules. They encode real architectural opinions — the kind of things a senior engineer would catch in review but a linter never would.
My Recommended Skills
| Skill | Focus |
|---|---|
vercel-react-best-practices | Component patterns, hooks discipline, rendering performance, server/client boundaries |
vercel-composition-patterns | Compound components, render props, slot patterns — catches over-engineered or under-composed component trees |
next-best-practices | App Router conventions, data fetching, caching, middleware, route handlers — keeps reviews aligned with current Next.js idioms |
nodejs-backend-patterns | Error handling, logging, graceful shutdown, config management, security hardening |
nestjs-best-practices | Module structure, DI patterns, guards/interceptors/pipes, exception filters |
Pick the ones matching your stack and customize them. The default skills from skills.sh are a solid starting point, but the real value comes from tuning them to your team's conventions.
The Workflow: Two Phases
I split every review into two distinct phases. This isn't arbitrary — it's the most reliable way to get accurate findings from a model with a large context window.
Phase 1: Deep Read
Read deeply:
- Tickets: [links]
- PRs: [links]
- Analyze context, create review plan using skills: [list]
Focus areas:
- Security (auth, input validation, secrets)
- Error handling and failure modes
- Performance implications
- Type safety and strictness
- Dependency health
- Architecture alignment
This is the "load everything" phase. The model reads tickets, diffs, and source files, then builds a structured review plan before writing any findings. Skills constrain what "good" looks like — instead of reviewing against some vague internal standard, it reviews against specific, documented patterns.
The key: don't ask for findings yet. Ask for a plan. This forces the model to organize its analysis before committing to conclusions.
Phase 2: Cross-Validation
Cross-validate findings:
- Re-read flagged files (actual code, not your summary)
- Verify each finding exists as described
- Remove false positives
- Rank: Critical > High > Medium > Low
- Each finding: file path, line range, issue, impact, suggested fix
This is where the magic happens. The first pass catches things; the second pass asks "did I get that right?" Without cross-validation, you get hallucinated findings — the model confidently describes a bug in code that doesn't exist, or references a line number from a compacted summary that's drifted from the actual file.
The output format is intentional. File paths make findings actionable (click, navigate, verify). "Impact" prevents dismissal ("yeah but who cares"). Suggested fixes turn the audit from a list of complaints into a PR-ready action plan.
Real Example: File Upload Service Audit
Here's what this looks like in practice. Imagine a TypeScript file upload service, about 3,500 lines across 25 files, handling upload, virus scanning, thumbnail generation, and storage.
Input prompt:
Read deeply:
- PR: gh pr view 312
- Ticket: UPLOAD-294 (via Jira MCP)
- Analyze full src/ directory
- Skills: nodejs-backend-patterns, nestjs-best-practices
Focus: security, error handling, failure modes, file validation
Findings:
Critical (1)
src/middleware/upload.middleware.ts:41-47 — File type validation relies solely on the Content-Type header from the client. No magic byte verification. Impact: An attacker can upload an executable renamed to .jpg with an image/jpeg header, bypassing the allowlist entirely. Fix: Validate file signatures (magic bytes) server-side using a library like file-type.
High (2)
src/services/storage/local.service.ts:23 — Uploaded file path is constructed by concatenating user-supplied filename with the base directory. No path traversal check. Impact: A filename like ../../etc/cron.d/backdoor writes outside the upload directory. Fix: Use path.basename() to strip directory components, or generate UUIDs for stored filenames.
src/controllers/download.controller.ts:58-63 — Signed URL generation uses a symmetric key stored in a config constant with no rotation mechanism. The key has been the same since initial commit. Impact: If the key leaks, all "private" files become permanently accessible. Fix: Rotate keys on a schedule, support multiple active keys during transition.
Medium (2)
src/services/processing/thumbnail.service.ts:30 — Thumbnail generation spawns a sharp process per request with no concurrency limit. Impact: Under load, the service exhausts available memory and crashes. Fix: Add a processing queue with bounded concurrency (e.g., p-limit or a Bull queue).
src/config/upload.config.ts:12 — Max file size set to 500MB with no per-route override. The avatar upload endpoint shares this limit. Impact: Users can upload 500MB avatars, wasting storage and bandwidth. Fix: Configure per-route size limits.
Low (2)
src/services/cleanup/expiry.service.ts:18 — Temporary files are cleaned up by a cron job every 24 hours. If the job fails silently, temp files accumulate indefinitely. Fix: Add health check alerting for the cleanup job and a fallback TTL at the storage layer.
src/types/upload.types.ts:8-14 — Metadata fields (tags, description, category) all typed as string | undefined instead of a strict schema. Fix: Define a FileMetadata type with proper constraints.
Time: ~25 minutes for both phases. A manual review of this scope would typically take 2-3 hours.
Prompt Templates
Three ready-to-use prompts. Copy, fill in the blanks, run.
Quick PR Review
Review this PR:
- PR: gh pr view [NUMBER]
- Ticket: [JIRA-KEY or description]
- Read all modified files in full (not just diffs)
- Skills: [list relevant skills]
Check:
1. Do changes match ticket requirements?
2. Security: auth, input validation, secrets exposure
3. Edge cases: nulls, empty states, error paths
4. Test coverage: are new paths tested?
5. Type safety: any `any` types or unsafe casts?
Output: findings grouped by severity with file paths and line numbers.
Full Codebase Audit
Audit this codebase:
- Read: package.json, tsconfig.json, eslint config, full src/ tree
- Skills: [list relevant skills]
Produce a structured audit document:
1. Executive summary (3-5 sentences)
2. Findings by severity (Critical > High > Medium > Low)
- Each: file path, line range, issue, impact, suggested fix
3. Dependency report: outdated, vulnerable, or unused deps
4. Architecture observations: patterns, inconsistencies, tech debt
Use Context7 to verify any library API concerns before flagging.
Cross-Validation Pass
Cross-validate the findings from the previous review:
- Re-read every flagged file (actual file content, not your prior summary)
- For each finding:
- Confirm the code actually exists as described
- Confirm the line numbers are accurate
- Confirm the issue is real (not intentional design)
- Re-assess severity
- Remove any finding you cannot verify
- Add any new issues found during re-read
- Output: updated findings list, noting what changed and why
Lessons Learned
1. Auto-compaction will bite you
In long sessions, Claude Code compresses earlier context to fit new information. This means the model might be working from a summary of the code it read, not the actual code. That's why cross-validation is a separate prompt — it forces a fresh read of the actual files.
For critical reviews, I start a new session for the validation pass entirely. Belt and suspenders.
2. Skills beat generic prompts
"Review this code for best practices" produces generic findings. "Review using the nodejs-backend-patterns skill" produces grounded, specific findings tied to documented patterns. Night and day difference.
Start with skills.sh, pick skills for your stack, and customize them as you learn what your team cares about.
3. Demand file paths and line numbers
A finding without a file path is wasted verification time. It also acts as an honesty check — if the model can't point to the exact location, the finding is likely hallucinated. The cross-validation prompt enforces this, but it's worth stating explicitly in Phase 1 too.
4. Let Context7 handle library docs
Don't assume the model knows current APIs. I've seen reviews flag perfectly valid Next.js 16 patterns as "deprecated" because the model's training data included Next.js 13 docs. Context7 eliminates this entire category of false positives.
5. The human still makes the call
The notification service audit found real issues — that open relay would have been bad. But it also flagged a few intentional design choices as problems: a deliberate use of any for a plugin interface, a "missing" validation that was handled by a middleware upstream. Cross-validation caught some of these, but not all.
Use the audit as a high-quality starting point. Apply your own judgment. The goal is to catch what humans miss, not to replace human thinking.
What's Next
I'm packaging this two-phase workflow into a reusable Claude Code skill — the goal is a one-liner: Review [repo/PR] using code-review skill. It's not there yet, but close.
I'm also experimenting with multi-agent orchestration: separate Claude Code instances acting as PM reviewer, frontend reviewer, backend reviewer, and QA — each with different skills and focus areas, merging outputs into a single report. It's promising but the coordination overhead is real. More on that when I have something worth sharing.
Tooling in this space evolves faster than you can write about it, and honestly, that's the fun part. If you've built a review workflow that works for you, I'd love to hear about it.
Written by a frontend/full-stack developer at a fintech company, building with React, TypeScript, and Next.js.