Claude Code debugging workflow testing developer tools troubleshooting

Claude Code Debugging: A Step-by-Step Workflow for Finding and Fixing Bugs

The Prompt Shelf ·

Debugging with Claude Code is not just “paste your error and wait.” That approach works on trivial bugs, but on anything real — a race condition in async code, a corrupted database migration, a frontend component that renders differently in Safari — you need a repeatable workflow.

This guide gives you that workflow. Not theory. Steps you can actually follow.

Why Debugging With Claude Code Is Different

When you debug alone, your context lives in your head. You remember what you changed 20 minutes ago, what you already tried, why you reverted that one fix. That implicit context keeps you oriented.

Claude Code has none of that. It starts fresh each session, builds its context from what you give it, and can go confidently down the wrong path if you give it incomplete information. A human debugger reads frustration and slows down when something is off. Claude Code does not — it will happily apply a “fix” that addresses the symptom rather than the cause, then move on.

This is the core challenge: Claude Code is fast and capable, but it needs you to structure the problem correctly. The debugging workflow below is essentially a protocol for giving Claude Code the right inputs at each stage.

There is also a cognitive offloading advantage. Complex bugs involve holding many variables in mind at once: the call stack, the state at failure, the sequence of events that led there. Claude Code is genuinely good at holding this graph and reasoning over it — often better than a tired developer at 11pm. The workflow below is designed to exploit that strength while managing the failure modes.


Step 1: Reproduce the Bug — Give Claude the Error, Not the Symptom

The most common mistake when debugging with Claude Code is describing symptoms instead of providing evidence.

Wrong:

My API is slow sometimes when there are a lot of users.

Right:

Here is the error from the logs. The p99 latency spikes from 200ms to 4000ms
when concurrent connections exceed 50. This started after the connection pool
refactor in commit a3f91b2.

Stack trace:
Error: Connection timeout after 3000ms
  at Pool.acquire (/app/db/pool.js:142)
  at UserService.getProfile (/app/services/user.js:67)
  at async Router.handler (/app/routes/users.js:23)

Relevant config: pool.max=10, pool.idleTimeoutMillis=30000

The difference is reproducibility. Claude Code can reason about a stack trace, a commit hash, and specific configuration values. It cannot reason about “sometimes slow.”

Before starting a debugging session:

  1. Capture the exact error message and full stack trace
  2. Note the conditions that trigger the bug (specific input, concurrency level, time of day, etc.)
  3. Identify when the bug was introduced if possible (last working commit, related PR)
  4. List what you have already tried

Start your Claude Code session with all of this. You are not bothering Claude with too much detail — you are preventing it from wasting 20 minutes solving a different problem.

For bugs without error messages

Some bugs are behavioral rather than error-producing: wrong output, missing data, UI that renders incorrectly. For these, give Claude Code a concrete before/after:

Expected: clicking "Save" updates the record and shows a success toast
Actual: the record updates in the database (confirmed via direct query) but the
toast never appears and the UI shows stale data until page refresh

Happens in: Chrome 124, Safari 17. NOT in Firefox.
Reproduce: log in, open any record, make any change, click Save.

Browser-specific bugs are a hint that the issue is in rendering or browser API behavior. Give Claude that context explicitly.


Step 2: Use Plan Mode for Complex Multi-File Bugs

For simple bugs — a typo, a missing null check, a wrong variable name — just let Claude Code fix it. But for bugs that span multiple files, involve state management, or require architectural understanding, use Plan Mode before any code changes.

In Plan Mode, Claude Code produces a written investigation plan and proposed fix strategy without touching any files. You review the plan before anything gets changed.

To activate it, tell Claude Code explicitly:

Before making any changes, I want you to analyze this bug and describe:
1. Your hypothesis about the root cause
2. Which files and functions are involved
3. What changes you would make and in what order
4. What you would test to verify the fix

Do not edit any files until I approve the plan.

This single step prevents the most common debugging failure mode: Claude Code confidently fixing the wrong thing. Without a plan, Claude Code may patch the most visible symptom rather than the root cause. The patch may even make the bug harder to find later because the error message changes.

Plan Mode also exposes misunderstandings early. If Claude Code’s plan mentions a function that does not exist, or misunderstands how your state management works, you catch it before any code changes. Correcting a wrong hypothesis takes 30 seconds. Reverting 15 files of changes takes much longer.

Signs you need Plan Mode:

  • The bug involves more than 2-3 files
  • You are not sure what the root cause is (you are investigating, not confirming)
  • The bug is in a part of the codebase Claude Code has not seen before this session
  • Previous fix attempts made things worse

Step 3: Test-Driven Debugging — Write the Failing Test First

This is the most powerful technique in this guide, and the most underused.

Before asking Claude Code to fix anything, ask it to write a test that reproduces the bug. Run the test. Confirm it fails. Then ask Claude Code to fix the code until the test passes.

Write a test that reproduces this bug. The test should:
- Set up the exact conditions that trigger the failure
- Assert the correct behavior (not the current buggy behavior)
- Be runnable with: npm test

Do not fix the bug yet. Just write the failing test.

Why this works:

Tests are an external source of truth. The test does not care about Claude Code’s hypothesis. It does not care if Claude Code changes its mind halfway through. A failing test is failing; a passing test is passing. This anchors the debugging session to objective reality rather than Claude Code’s confidence.

Tests survive context window pressure. During a long debugging session, Claude Code’s context fills up. Earlier reasoning gets compressed. The test file does not — it is still there in the repo, still runnable, still honest about whether the fix actually worked.

Tests prevent regression fixes from breaking other things. When Claude Code runs your full test suite after the fix, you catch unintended side effects immediately rather than discovering them in production.

This approach is especially effective for:

  • Logic bugs in pure functions (easiest to test)
  • API endpoint bugs (test with a request and assert the response)
  • Database query bugs (seed test data, run query, assert results)

Step 4: Hooks for Automatic Verification

Claude Code’s hooks let you run shell commands automatically at specific points in the workflow. For debugging sessions, the most useful hook is running your linter and relevant tests after every file edit.

Add this to your .claude/settings.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "npm run lint --silent 2>&1 | head -20"
          }
        ]
      }
    ]
  }
}

With this hook, every time Claude Code edits a file, the linter runs automatically and the output feeds back into the conversation. Claude Code sees lint errors immediately and corrects them in the same turn, rather than you running the linter manually and starting another round.

For more targeted debugging sessions, you can make the hook run only the test file relevant to the bug:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "npm test -- --testPathPattern=user.service.test --passWithNoTests 2>&1 | tail -20"
          }
        ]
      }
    ]
  }
}

This creates a tight feedback loop: edit → test → result → next edit. Claude Code sees test failures immediately and adjusts. You spend less time manually running checks and more time reviewing the actual fix.

One caution: do not run your entire test suite on every edit during debugging. If you have 500 tests that take 3 minutes to run, hooking all of them to every file edit will make the session feel like it is running in slow motion. Hook the relevant subset, run the full suite at the end.


Step 5: /rewind When Claude Goes Down the Wrong Path

Claude Code has a /rewind command that rolls back both the conversation history and the code changes made during the conversation. This is your escape hatch when a debugging session has gone sideways.

Use /rewind when:

  • Claude Code has made 5+ file changes and the bug is worse than when you started
  • Claude Code’s explanation of the root cause has shifted several times
  • You realize the initial problem statement was wrong and you need to restart with better information
  • The fix broke tests that were previously passing

/rewind is not a failure. It is a correct response to a debugging session that drifted off course. The alternative — continuing forward from a compromised state — is almost always slower.

After rewinding, do not repeat the same session with the same inputs. Diagnose what went wrong:

  • Did you give Claude Code incomplete error information?
  • Did you skip Plan Mode and let Claude Code dive straight into changes?
  • Did the context window fill up and lose important early context? (Use /compact if this is the issue)

Then restart with corrected inputs.


Debugging Patterns by Bug Type

Frontend bugs

Frontend bugs are tricky because the failure is visual and context-dependent. Coordinate-based clicking in browser automation is fragile — small layout changes break it. Use Playwright with semantic selectors instead:

Use Playwright to reproduce this bug. Select elements by role and accessible name,
not by CSS class or coordinate position. The test should open the page, perform
the user actions that trigger the bug, and assert the expected visual state.

For CSS/rendering bugs, give Claude Code a screenshot of the current state versus the expected state. Describe what is wrong in concrete visual terms (“the button is aligned to the left edge of the card; it should be centered”) rather than abstract terms (“the layout is broken”).

For state management bugs (React, Vue, etc.), provide Claude Code with the Redux/Zustand/Pinia action log or component state snapshot at the moment of failure. Tools like Redux DevTools can export this. Paste it directly into the conversation.

Backend bugs

For API bugs, provide:

  • The full request (method, URL, headers, body)
  • The full response (status code, headers, body)
  • The relevant server logs for that request (include request ID if your logging supports it)

For async bugs (race conditions, deadlocks), the key is sequence. Describe the exact order of operations that causes the failure. Attach timing information from logs if available. Race conditions are hard to reproduce reliably — if you can, add temporary instrumentation to make the sequence visible before asking Claude Code to fix it.

Database bugs

For migration bugs, always give Claude Code the migration file AND the current schema state. Do not describe the schema — paste the actual \d tablename output or schema dump. Schema descriptions drift from reality; the actual schema does not.

For query performance bugs, provide the EXPLAIN ANALYZE output. Claude Code can read a query plan and identify missing indexes or inefficient join strategies directly from that output.


Anti-Patterns: What Not to Do

Do not let Claude Code debug from memory. If Claude Code says “I remember that function does X,” verify it against the actual code before proceeding. Long sessions compress earlier context, and Claude Code can misremember details.

Do not accept a fix without understanding it. Ask Claude Code to explain what it changed and why. If the explanation does not make sense, the fix may be treating a symptom. Understanding the fix also helps you prevent the same class of bug in the future.

Do not provide too much irrelevant code. If a bug is in the payment service, do not paste your entire codebase. Claude Code’s context window is limited. Filling it with irrelevant files means the relevant code gets less attention. Share the specific files and functions involved.

Do not debug and refactor simultaneously. If Claude Code spots a code quality issue while fixing a bug, note it but do not address it in the same session. Mixing bug fixes with refactoring makes it hard to know which change fixed the bug, and introduces more surface area for new bugs.

Do not ignore flaky test results. If a test passes on the second run after failing on the first, that is signal — likely a timing issue or test isolation problem. Do not dismiss it as “just flaky.” Flag it for Claude Code explicitly.


CLAUDE.md Tips for Better Debugging

Your CLAUDE.md can encode debugging preferences that apply automatically every session. Add a dedicated debugging section:

## Debugging Workflow

When debugging:
1. Read the full stack trace before proposing any fix
2. Use Plan Mode for bugs touching more than 2 files — describe root cause
   hypothesis and affected files before making any changes
3. Write a failing test that reproduces the bug before fixing it
4. Run `npm test` after all changes. If tests fail, fix the tests or explain
   why the test expectation was wrong
5. Never use `console.log` for debugging — use the existing logger at
   `src/utils/logger.ts` with appropriate log levels

## Verification Commands

Before marking any fix complete:
- `npm run lint` — zero errors required
- `npm test` — all tests must pass
- `npm run typecheck` — TypeScript strict mode, zero errors

The verification commands section is particularly valuable. It creates an explicit definition of “done” that Claude Code can check against automatically. Without it, Claude Code may consider a fix complete when it has resolved the immediate error, even if it introduced a TypeScript error or broke an unrelated test.

The prohibition on console.log is also worth including explicitly. Claude Code defaults to adding debug logs when investigating — specifying your logging infrastructure prevents noise commits.


A Note on Context Window Management

Long debugging sessions eat context quickly. Every file Claude Code reads, every error message it processes, every fix it tries — all of it accumulates. When the context window fills up, earlier information gets summarized or dropped. Claude Code may “forget” the original error message, or lose track of constraints you specified at the start.

Watch for signs of context saturation:

  • Claude Code repeats a fix it already tried
  • Claude Code asks for information you already provided
  • Explanations become vaguer and less specific

When you see this, use /compact to summarize the conversation before continuing. Compact the session while the important context is still clear, not after it has already degraded.

For very complex bugs that require long sessions, consider breaking the work into multiple focused sessions: one session to investigate and produce a written diagnosis, a second session to implement the fix based on that diagnosis. The written diagnosis carries over perfectly; the compressed conversation context does not.


The debugging workflow outlined here — structured reproduction, Plan Mode, test-first fixing, hook-based verification, and the /rewind escape hatch — is not overhead. Each step exists because it addresses a specific failure mode that makes debugging sessions go wrong. Use them together and Claude Code becomes a genuinely capable debugging partner rather than a confident source of wrong answers.

Related Articles

Explore the collection

Browse all AI coding rules — CLAUDE.md, .cursorrules, AGENTS.md, and more.

Browse Rules