2026-04-085 min read

How to Build a Karpathy-Style LLM Wiki From Your Meetings (Obsidian Tutorial)

A practical Obsidian workflow for piping meeting transcripts into Karpathy's LLM wiki pattern. Two paths: manual export and an MCP server. Includes the schema, prompts, and example queries.

MCPobsidianknowledge-graphLLMmeetingsagents

Why this article exists

On April 2, 2026, Andrej Karpathy posted a workflow that broke the AI knowledge management space wide open. His core idea: stop using vector databases and RAG pipelines for personal knowledge bases. Instead, dump raw documents into a folder, let an LLM "compile" them into a structured markdown wiki, and use Obsidian as the front-end. Within five days the original tweet had 16M+ views, his GitHub gist had 5,000+ stars, and at least 15 working implementations had appeared on GitHub.

But there's a gap nobody is talking about: Karpathy explicitly names meeting transcripts as a source type, but no tool actually pipes meetings into his wiki pattern automatically. Every meeting tool — Otter, Fireflies, Granola, Read.ai, Circleback — has launched a Model Context Protocol (MCP) server in the last six months, but they're all read-only. None compile meetings into a persistent, structured knowledge base.

This article is the practical tutorial for bridging that gap. If you've read Karpathy's gist and you want your meetings in your Obsidian vault as first-class sources, here are two ways to do it today.

Worth reading first: our deeper analysis of the meeting-to-wiki gap on Proudfrog, which explains why the major meeting tools haven't built this themselves.

A quick recap of Karpathy's wiki pattern

If you haven't read the gist, the architecture is simple:

your-wiki/
├── raw/                    # Immutable source documents
│   ├── articles/
│   ├── papers/
│   └── meetings/           # ← What this tutorial adds
├── wiki/                   # LLM-owned compiled markdown
│   ├── index.md
│   ├── people/
│   ├── projects/
│   └── decisions/
└── CLAUDE.md               # Schema file: how the LLM should compile

Three operations drive the system:

Ingest — Process new sources from raw/, update cross-references across the wiki
Query — Read index.md, drill into relevant pages, synthesize an answer
Lint — Audit for contradictions, orphaned pages, stale claims

The key insight: at moderate scale (~100 sources, hundreds of wiki pages), a well-maintained markdown index outperforms vector search. The LLM navigates structured files directly. No embeddings, no RAG infrastructure, no retrieval pipeline.

For the canonical version of how this works, read Karpathy's gist and our complete workflow guide on Proudfrog.

The meeting source problem

Karpathy's pattern works beautifully with articles and papers because tools like Obsidian Web Clipper turn web pages into clean markdown in one click. Meetings are different.

A meeting starts as audio. To get it into raw/meetings/ you need:

Recording (the meeting tool)
Transcription with speaker identification (the meeting tool)
Export to markdown (most meeting tools don't do this cleanly)
A way to keep the raw transcript stable while the wiki gets recompiled (most meeting tools store transcripts in their own database, not your filesystem)

This is where most workflows break. Meeting tools optimize for "search past meetings" not "give me a markdown file I own forever." The transcripts live in someone else's database. Even if you can export, the format is usually inconsistent — sometimes JSON, sometimes a PDF, sometimes a UI dump that loses speaker labels.

Two ways to solve this today: a manual export workflow, or an MCP server that exposes meetings as a queryable resource.

Path A: Manual markdown export

The simpler approach. Works with any meeting tool that lets you export a transcript as text or markdown.

Proudfrog, the Nordic meeting transcription tool we built at Up North AI, exports any meeting as a markdown file with speaker labels, timestamps, and frontmatter. The export looks like this:

---
title: "Q1 Roadmap Review with Acme Corp"
date: "2026-04-03"
participants: ["Klara Lindqvist", "Erik Nilsson", "Sarah Chen (Acme)"]
duration_minutes: 47
language: "en"
source: "proudfrog"
---

## Transcript

**Klara Lindqvist** [00:00:14]
Welcome everyone. I'd like to start by walking through the three things we
agreed to ship this quarter and where we are on each.

**Sarah Chen (Acme)** [00:00:31]
Sounds good. I have some concerns about the timeline on the second item but
let's go through them in order.

...

Drop that file into raw/meetings/2026-04-03-acme-q1-roadmap.md. That's it. Your wiki ingest pass will pick it up on the next run.

For other meeting tools, you'll need to clean up the export — strip headers, normalize speaker labels, convert timestamps. A 30-line shell script can usually do it, but the friction adds up if you have meetings every day.

Path B: An MCP server for meetings

This is the more interesting approach. Instead of manually exporting transcripts, you point your AI agent (Claude Code, Cursor, Codex, etc.) at a Model Context Protocol server that exposes meetings as queryable resources.

We're building this for Proudfrog right now. The Proudfrog MCP server (currently in beta) exposes:

list_meetings — List recent meetings, filterable by date, participant, project
get_meeting — Fetch a full transcript by ID, formatted as markdown with frontmatter
search_meetings — Semantic search across all your meetings
list_speakers — Get the canonical list of identified speakers across your library
get_decisions — Retrieve extracted decisions from a meeting or date range

Once configured, your agent can ingest meetings into the wiki on demand:

You: Ingest all meetings from this week into the wiki

Claude Code:
- Calls list_meetings(after="2026-04-01")
- Returns 12 meetings
- For each meeting, calls get_meeting(id)
- Writes each transcript to raw/meetings/
- Runs the wiki ingest pass
- Updates index.md, people/, projects/, decisions/

The MCP approach has three advantages over manual export:

No manual step — The agent fetches what it needs when it needs it
Composable with other sources — The same agent can also call your GitHub MCP, Notion MCP, etc., compiling cross-source knowledge in one pass
Idempotent — Re-running the ingest doesn't duplicate; the agent checks raw/ for what's already there

The Proudfrog MCP is in beta. If you want early access, email us or check the Proudfrog roadmap.

The schema file: teaching the LLM what to do with meetings

Karpathy's gist suggests a CLAUDE.md or AGENTS.md file at the root of your wiki to define how the LLM should compile content. For meetings, the schema needs a few additions beyond the default Karpathy template:

# Wiki Compilation Schema

## Source Types
- `raw/articles/` — long-form articles and blog posts
- `raw/papers/` — academic papers (PDFs converted to .md)
- `raw/meetings/` — meeting transcripts with frontmatter

## Meeting Ingest Rules

When processing a file from raw/meetings/:

1. Extract all named participants and update wiki/people/<name>.md
   - Add a new bullet under "## Meetings" with date, title, and key role
   - If the person is new, create the page from a template

2. Extract decisions made in the meeting
   - A decision is any statement matching: "we decided", "we agreed",
     "let's go with", "the plan is", etc.
   - Add each decision to wiki/decisions/<date>-<short-slug>.md
   - Cross-link to the source meeting in raw/

3. Extract action items
   - An action item is any commitment with an assignee
   - Add to wiki/people/<assignee>.md under "## Open commitments"
   - Mark as resolved when a later meeting confirms completion

4. Extract project mentions
   - Update wiki/projects/<project>.md
   - Add a "## Recent activity" entry pointing to the meeting

## Linting Rules

When running the lint pass on meeting-derived content:

- Flag contradictions: if a decision in meeting A conflicts with a
  decision in meeting B, surface both
- Flag stale commitments: action items older than 30 days with no
  resolution
- Flag orphaned people: speakers who appear in only one meeting and
  have no other context

This schema is the control surface. The LLM follows it on every ingest pass, which means your wiki develops a consistent shape over time. Without a schema, the LLM improvises — sometimes well, sometimes erratically.

Example queries that actually work

Once you have meetings flowing into the wiki, the queries you can run change qualitatively. A few examples from our own wiki:

Tracking commitments across meetings:

Query the wiki: What did Sarah at Acme commit to in Q1, and which of
those commitments are still open?

The agent reads wiki/people/sarah-chen-acme.md, walks through her open commitments, cross-references decision logs, and returns a clean status report. No human had to maintain this list — every commitment was extracted automatically during ingest.

Spotting contradictions:

Query the wiki: Did we make any decisions about the Acme Q1 timeline
that conflict with what we agreed in March?

The lint pass already flagged contradictions if they exist. The agent reads the lint report and surfaces them with citations to the source meetings.

Cross-meeting synthesis as a persistent artifact:

Query the wiki: Compile everything we know about Acme's pricing concerns
across all meetings into a single wiki page.

The agent queries wiki/projects/acme.md, walks the linked decisions, finds every meeting that mentions pricing, and writes a new wiki page summarizing the thread. The page persists — next week, the next ingest pass updates it incrementally instead of regenerating from scratch.

This is what Karpathy calls "compiling knowledge once and keeping it current, not re-derived on every query." It's also what no major meeting tool currently does.

What this won't solve

Two honest caveats:

1. Cost. Each ingest pass touches multiple wiki pages. Linting passes scale with wiki size. Running this on Claude Sonnet for personal use is affordable; running it on Claude Opus across hundreds of meetings is not. We'll publish real numbers once we have a few months of usage data — for now, set a budget cap on your API key.

2. Hallucination contamination. If the LLM extracts a wrong commitment from a meeting and writes it into wiki/people/<name>.md, that error lives in your wiki and influences future queries. Steph Ango (Obsidian's co-creator) recommends keeping personal notes and agent-maintained content in separate vaults for exactly this reason. We agree. We have more on this in the LLM wiki skeptic's guide on Proudfrog.

Why this is worth building

We've spent the last few months building Proudfrog around a specific hypothesis: meetings are the right scope for an AI-maintained knowledge base because the input is naturally bounded. You don't have to decide what to capture — meetings happen anyway. And the source material is rich enough that LLM extraction is reasonably grounded.

The Karpathy pattern validates the approach from a different direction. He showed that structured markdown + LLM compilation outperforms RAG at practical scale. We've been building the infrastructure to make that pattern work for one specific source: your meetings.

If you want to try it today, Proudfrog gives you the markdown export. The MCP server is in beta — join the list for early access. If you're building something similar or want to talk about how meeting knowledge graphs fit into your enterprise AI stack, get in touch.

Frequently Asked Questions

Do I need to be a developer to use this workflow?

Path A (manual export) requires no coding — drop a markdown file into a folder. Path B (MCP) requires you to configure an MCP server in Claude Code, Cursor, or another agent client. That's a one-time setup, not ongoing development.

Which meeting tools work with this pattern today?

Any tool that lets you export a transcript as text or markdown. Proudfrog exports natively as markdown with frontmatter. Otter, Fireflies, and Granola let you copy transcripts but you'll need to clean up the format. tl;dv exports markdown but without consistent speaker labels.

How is this different from just using Otter's "ask my meetings" feature?

Otter (and Granola, and Fireflies) let you query past meetings via a chat interface. The answers disappear after each session. The Karpathy pattern persists answers as wiki pages — so the next time you ask a similar question, the LLM reads the existing wiki page instead of re-deriving the answer from raw transcripts. Knowledge compounds.

What's the difference between MCP and just calling the Proudfrog API?

MCP standardizes how AI agents discover and call tools. With an MCP server, any agent that speaks MCP (Claude Code, Cursor, Codex, etc.) can use Proudfrog without custom integration code. Without MCP, you'd write a custom client for every agent. MCP is to AI agents what HTTP is to browsers — read more about why MCP is the foundation layer that actually works.

Can I use this with local LLMs instead of Claude or GPT?

Yes, with caveats. Tools like Ollama and LM Studio run open-source models (Gemma, Llama) locally, which solves the cost and privacy questions. But model quality matters for compilation tasks — the smaller models miss entity references and produce lower-quality summaries. We recommend starting with cloud models, validating the workflow, then evaluating local models for your specific use case.

What does Up North AI do with this?

Up North AI is a Nordic AI consultancy. Proudfrog is our product — a meeting transcription tool built for Nordic languages with a knowledge graph as the core value proposition. We help enterprises build similar AI-native knowledge systems for their own data. If that's your problem, come talk to us.

Want to go deeper?

We explore the frontier of AI-built software by actually building it. See what we're working on.

View our projects