How we extract decisions from AI conversations

Three months into using AI coding tools heavily, you'll encounter a moment that makes you want to throw your laptop. You're looking at a piece of code and wondering: why did we structure it this way? Why not use Redux? Why is this a class instead of a set of functions? The answer exists — the AI explained the reasoning in a session from last Tuesday. But finding it means scrubbing through dozens of messages in a session transcript, if you can even find which session it was.

Every AI coding session is a decision-making session in disguise. The output is code, but the real product is a sequence of architectural choices, library selections, approach rejections, and trade-off evaluations. Most of that reasoning evaporates the moment the session ends.

What counts as a decision

Before building the extraction pipeline, we had to define precisely what we were looking for. Not every exchange in an AI coding session is a decision. 'What's the syntax for a TypeScript generic?' is a lookup. 'Should we use Zustand or Redux for this feature, and why?' is a decision.

We settled on four categories of content worth extracting: architectural choices (state management approach, data model structure, API design), library selections (why this library over alternatives), approach rejections (what we considered and decided against, and the reasoning), and explicit trade-off evaluations (performance vs simplicity, type safety vs flexibility).

The extraction pipeline

Decision extraction runs as a background job after each trace is marked complete. We batch the user prompt and AI response text from the trace and send it to Claude Haiku with a structured extraction prompt. We chose Haiku specifically: it's fast, cheap, and more than capable of this classification task. We don't need Sonnet or Opus for pattern recognition on conversation chunks.

typescript

// src/main/engines/decision-extractor.ts
const EXTRACTION_PROMPT = `
You are extracting architectural decisions from an AI coding conversation.

Analyze this conversation excerpt and extract any decisions made.
A decision is: an architectural choice, library selection, approach rejection,
or trade-off evaluation that shaped the codebase.

Return a JSON array. Each item must have:
- question: string (the design question being resolved)
- answer: string (the decision made and brief rationale)
- tags: string[] (e.g. ["state-management", "performance", "security"])
- confidence: number (0-1, how confident you are this is a real decision)
- type: "architectural" | "library" | "rejection" | "tradeoff"

Only include items with confidence >= 0.7. Return [] if no decisions found.
`;

async function extractDecisions(trace: Trace): Promise<Decision[]> {
  const content = formatTraceForExtraction(trace);
  const response = await anthropic.messages.create({
    model: "claude-haiku-4-5",
    max_tokens: 1024,
    messages: [
      { role: "user", content: EXTRACTION_PROMPT + "\n\n" + content },
    ],
  });
  return parseDecisionResponse(response);
}

The confidence threshold of 0.7 is important. Early testing showed that without a threshold, the extractor would flag routine implementation choices as decisions — 'we used a for loop instead of map'. The threshold keeps only choices that reflect genuine design intent.

The decision model

Each extracted decision is stored with: the question it resolves, the answer with rationale, a type enum, an array of tags for categorization, a confidence score, a sessionId reference back to where it was made, and a traceId for jumping directly to the conversation that generated it.

Tags are normalized before storage — 'state management', 'state-management', and 'stateManagement' all become the same canonical form. This matters for search: you want tag:state-management to find everything regardless of how Haiku formatted the tag in a given extraction pass.

Storage: SQLite with FTS5

Decisions are stored in SQLite with a full-text search index using FTS5. The FTS index covers the question, answer, and tags fields. This gives us sub-millisecond search across thousands of decisions with no external search infrastructure. The decisions table also has standard indexes on sessionId, projectId, and type for filtered queries.

sql

-- SQLite FTS5 virtual table for decision search
CREATE VIRTUAL TABLE decisions_fts USING fts5(
  question,
  answer,
  tags,
  content=decisions,
  content_rowid=id
);

-- Trigger to keep FTS in sync
CREATE TRIGGER decisions_ai AFTER INSERT ON decisions BEGIN
  INSERT INTO decisions_fts(rowid, question, answer, tags)
  VALUES (new.id, new.question, new.answer, new.tags);
END;

Team sync

Decisions are part of the metadata sync to Supabase. This is deliberate: decisions don't contain source code — they contain reasoning about code. They're safe to sync. When your teammate makes an architectural decision in their session, it shows up in your Decisions view after the next sync cycle. The whole team's decision history becomes a shared searchable knowledge base.

Smart surfacing

The most valuable moment to surface a decision is when you're about to make a related one. When a new session starts in a project, we query the decisions table for the 5 most relevant past decisions using the project name, recent file paths, and any keywords from the initial prompt. These appear as 'Relevant past decisions' in the Session Intelligence sidebar.

The result: a searchable organizational memory of every architectural decision across all sessions, available in the moment you need it most. The reasoning that used to evaporate at session end now accumulates into institutional knowledge.

How we extract decisions from AI conversations

What counts as a decision

The extraction pipeline

The decision model

Storage: SQLite with FTS5

Team sync

Smart surfacing

Related posts

How much does Claude Code actually cost? I tracked 30 days of sessions.

Why we built a persistent terminal daemon

The trace model that unifies 5 AI tools

Subscribe to updates