2026-04-085 min lesning

We're Building an MCP Server for Meeting Transcripts. Here's Why It Changes the LLM Wiki Workflow.

Karpathy's LLM wiki pattern needs raw sources you control. Every meeting tool launched a read-only MCP server in the last six months. We're building the one that closes the loop.

MCPknowledge-graphmeetingsLLMagentsproudfrog

The pattern that exposes the gap

When Andrej Karpathy posted his LLM wiki workflow on April 2, 2026, he described an architecture with two strict folders:

raw/ — immutable source documents you control
wiki/ — compiled markdown that the LLM owns

The raw/ folder is where the pattern lives or dies. Whatever ends up there becomes the substrate for everything the LLM compiles. Articles via Obsidian Web Clipper. PDFs via a converter script. Web pages via a scraper. Karpathy's gist names a few sources explicitly — articles, papers, web content, and one that nobody is solving cleanly: meeting transcripts.

In the months leading up to Karpathy's post, every major meeting tool launched a Model Context Protocol (MCP) server: Otter, Fireflies, Granola, Read.ai, Circleback, tl;dv, Fathom. Read our analysis of MCP as the foundation layer that actually works for why this matters in general. But here's the specific problem with meeting MCPs: every single one is read-only.

That's the gap. Meeting transcripts are some of the highest-value source material in any organization, MCP is the standardized way for AI agents to access that material, and yet every implementation stops at "let agents query past meetings." None of them write back. None of them treat the meeting transcript as a first-class artifact you can pipe into a Karpathy-style wiki and have a real knowledge base by next month.

We're building the one that does. This post explains why, what it does, and how it works.

The read-only MCP problem

Let's be specific about what's missing. Imagine you set up a Karpathy-style wiki and you want to ingest a week of client meetings. With the existing meeting MCPs, here's what an agent can do:

Otter MCP (October 2025) — list_conversations, get_conversation, search. You can pull a transcript. You cannot get a structured speaker list. You cannot extract decisions as a queryable resource. You cannot tell Otter "after this meeting, update my project page."

Fireflies MCP (community + official) — Same shape. AskFred is a chat overlay on top of search. The MCP exposes meetings as flat text blobs. Cross-meeting synthesis is on you.

Granola MCP (February 2026) — Folder-based search across your meeting library. You can query within folders. You cannot retrieve a clean markdown export with speaker frontmatter. Meeting → wiki requires a manual reformatting step.

Read.ai MCP (March 2026 beta) — The most ambitious of the bunch. Search Copilot unifies meetings, emails, and chats. Still read-only. The persistent artifact you can build from the data is "a chat history with the agent," not "a wiki page that compounds across sessions."

The pattern is clear. Every MCP was built for retrieval — give an agent a question, let it pull relevant snippets from the meeting database. None were built for compilation — give an agent a corpus, let it write a structured knowledge artifact and maintain it over time.

This isn't a bug in the protocol. MCP supports tools that mutate state. The meeting tool vendors just haven't built that side. They optimize for the demo where an agent answers a question about a past meeting. They don't optimize for the workflow where an agent processes 12 meetings into 47 wiki updates and a flagged contradiction.

What we're building

The Proudfrog MCP server (currently in beta) exposes meetings as a first-class queryable resource designed for the Karpathy pattern. The tool surface is opinionated — we built it from the ingest side, not the chat side.

Tools
├── list_meetings(after, before, participant?, project?)
│   Returns: [{id, title, date, duration, participants}]
│
├── get_meeting(id)
│   Returns: full markdown transcript with frontmatter
│   Format: Karpathy raw/ compatible
│
├── search_meetings(query, limit?, language?)
│   Semantic search across the full library
│
├── list_speakers()
│   Canonical speaker list with merged identities
│
├── get_decisions(meeting_id?, after?, before?)
│   Extracted decisions across one or many meetings
│
└── get_action_items(participant?, status?)
    Open and resolved commitments by person

Resources
├── meetings://raw/{id}.md
│   Markdown transcript, frontmatter-formatted
│   for direct drop into a Karpathy raw/ folder
│
├── meetings://decisions/{date_range}
│   Decision logs as queryable resources
│
└── meetings://entities/{type}
    People, projects, companies as structured data

The two design choices that matter most:

1. Markdown is the native return format. When an agent calls get_meeting, it gets a clean markdown file with YAML frontmatter — the exact shape that drops into a Karpathy raw/meetings/ folder. No JSON-to-markdown conversion step. No reformatting. No information loss between the meeting tool and the wiki ingest pass.

2. Structured extraction is server-side, not client-side. Decisions, action items, and entities are extracted during transcription and exposed as first-class tools. An agent doesn't have to re-extract them on every query. The meeting "knows" what decisions it contains. This is possible because Proudfrog's pipeline runs entity extraction as part of transcription, not as a downstream chat feature.

The combination means you can wire up an ingest workflow in about 20 lines of agent prompt:

Every Monday at 9am:
1. Call list_meetings(after=last_monday) on the Proudfrog MCP
2. For each meeting, call get_meeting(id) and write to raw/meetings/
3. Call get_decisions(after=last_monday) and write to raw/decisions.json
4. Run the wiki ingest pass using CLAUDE.md schema rules
5. Run the lint pass to flag contradictions
6. Email me a summary of what changed in the wiki this week

That's the workflow Karpathy described, applied to meetings, with no manual export step.

Why this matters beyond meetings

There's a broader principle at work here. The MCP ecosystem is still figuring out what "good" looks like. Most servers in 2025 were built as retrieval interfaces — wrap an existing API in MCP, expose a search and a get, ship it. That's fine for a database query. It's wrong for a knowledge corpus.

A knowledge corpus needs MCP tools that match the consumption pattern of an LLM doing structured compilation:

Bounded enumeration — agents need to know what exists (list_* with predictable filters)
Idempotent retrieval — calling the same get_* twice should produce the same artifact
Structured extracts as first-class — decisions, entities, relationships should be queryable directly, not re-extracted from raw text on every call
Markdown-native output — agents work in markdown; wrapping JSON in another transformation step adds latency, cost, and error surface

We applied these principles to meetings because meetings are what we know. But they generalize. If you're building an MCP server for your own corpus — your CRM history, your code review comments, your support tickets — the same shape applies. If your MCP is read-only and returns JSON, you're building for the wrong era.

The Up North AI angle

We're not just building Proudfrog. We're a Nordic AI consultancy at Up North AI, and we help enterprises navigate the shift from monolithic SaaS to agent-orchestrated systems. Read our recent piece on the build vs. buy equation in the agent era for the broader argument.

Proudfrog is the product we built because we kept seeing the same gap on consulting engagements: organizations have years of accumulated meeting knowledge, every employee feels its absence, and every existing tool stops at "search past meetings." We built Proudfrog to test the hypothesis that a meeting-native, write-back-capable knowledge graph is the missing layer.

The MCP server is the bridge between Proudfrog's product and the Karpathy pattern that's now gaining momentum. If you're a developer trying the Karpathy workflow and you want your meetings in your wiki, the tutorial we just published walks through both the manual export path and the MCP path step by step.

If you're an enterprise thinking about how meeting knowledge should flow into your AI stack, get in touch. We've been building this for a while.

What's in the beta and what's coming

The Proudfrog MCP server beta currently supports:

list_meetings with date and participant filters
get_meeting with markdown export
search_meetings with semantic search across the full library
list_speakers with canonical identity resolution
OAuth authentication for secure agent access

What's coming next:

get_decisions and get_action_items as first-class tools (currently extracted but not yet exposed via MCP)
Webhooks for real-time wiki updates after each meeting
Write-back tools: update_project_context, add_decision_note
Multi-tenant support for team wikis where multiple agents share access

If you want early access, email us or join the Proudfrog beta list. And if you'd rather wait for the stable release, the manual export workflow works today with the existing Proudfrog markdown export.

What we hope this is the start of

Meeting transcripts are the largest pool of unstructured organizational knowledge that exists. They contain the actual context behind every decision, the actual phrasing of every commitment, the actual people who said which thing. And until very recently, they were locked in databases optimized for search-and-forget rather than compile-and-keep.

Karpathy's pattern doesn't require any vendor's permission. The architecture is open. The schema files are markdown. The agents are interchangeable. What's been missing is source material that flows in cleanly, with structure intact. We're trying to fix that for meetings. We hope others fix it for the rest of the corpus.

If you're building MCP servers for any high-value source material, write us. The space is wide open and the patterns are still being set.

Frequently Asked Questions

What's wrong with the existing meeting tool MCPs?

Nothing, if you only want retrieval. The problem is that retrieval is the smaller half of the workflow. Karpathy's pattern requires sources that an agent can ingest into a structured wiki and have that wiki compound over time. Read-only retrieval forces the agent to re-derive the same answers from the same raw transcripts on every query. The Proudfrog MCP is built around the ingest-to-wiki workflow.

Why Markdown instead of JSON?

LLMs are markdown-native. They write markdown by default, they reason about markdown structure naturally, and the wiki pattern Karpathy describes is entirely markdown-based. Forcing every transcript through a JSON intermediate adds latency and error surface for no benefit. The Proudfrog MCP returns markdown directly so agents can drop it into a raw/meetings/ folder without transformation.

How does this compare to building it yourself with Whisper and a database?

You can build a meeting transcript pipeline with open-source Whisper, a database, and a custom MCP server. We did, twice, before we built Proudfrog. The hard parts are: speaker identification across meetings (not within a single meeting), entity resolution when "Sarah" appears in 14 meetings as 6 different speakers, decision extraction that doesn't hallucinate, multi-language support for Nordic languages, and GDPR-aligned data residency. These are 18 months of work each. If you have an unlimited engineering budget, build it. If not, use Proudfrog and focus on the part of your stack that's actually unique.

What about privacy and EU data residency?

Proudfrog stores transcripts in EU regions only. The MCP server respects the same data residency boundaries — an agent calling the MCP can't pull data out of the EU. If you're an enterprise with data sovereignty requirements, this is a hard requirement, not a marketing line. Read our EU AI Act guide for SaaS for the broader compliance picture.

Will the MCP server be open source?

The schema and tool definitions will be. The implementation depends on Proudfrog's transcription pipeline, so the full server isn't open source — but we're publishing the MCP tool definitions so anyone building a similar server for their own corpus can match the conventions. We believe the patterns matter more than any single implementation.

How do I get into the beta?

Email us or join the Proudfrog beta list. We're rolling out access in batches over the next few weeks.

Vil du gå dypere?

Vi utforsker fronten av AI-bygd programvare ved å faktisk bygge den. Se hva vi jobber med.

Se prosjektene våre