---
title: "Microsoft GraphRAG: what we learned"
url: https://mdfy.app/6WkjlKgA
updated: 2026-05-14T18:15:49.480Z
source: "mdfy.app"
---
# Microsoft GraphRAG: what we learned

> Read the 2024 paper and the follow-ups (Project NotebookLM, the v1.1 release, the open-source community fork). Notes for the team to reference when "GraphRAG" comes up.

## The thesis, in one sentence

> Build a knowledge graph from a document corpus, run community detection on it, and answer queries by traversing the graph instead of just embedding-matching.

## What's good

- **Multi-hop reasoning.** GraphRAG beats naive RAG on the "compare X and Y across documents" class of questions. The community-detection step gives it a structural prior that vector recall misses.
- **Honest about its costs.** The paper is explicit that GraphRAG is 10-50x more expensive than naive RAG to build, because the graph extraction is an LLM-per-chunk operation. They don't bury this.
- **The open-source release is real.** The Python package works. The community fork (rust-graphrag) shaves indexing time by ~3x.

## What's structurally different from us

GraphRAG is a **service**. You hand it a corpus, it builds an index, it serves queries internally to an upstream system. The graph never leaves the service.

mdfy is a **delivery model**. We don't traverse the graph internally on the receiver's behalf — we ship the graph in the URL response and let the receiving AI inherit it. Same primitive (a knowledge graph over docs), different delivery shape.

## Where the comparison breaks down

The question "is mdfy a GraphRAG implementation?" is the wrong question. We share the substrate (LLM-extracted entities + relations over a markdown corpus). We don't share the API surface. GraphRAG is a Python package you embed in a backend; mdfy is a URL you paste into Claude.

## What we should take

1. **Community detection.** We currently group concepts by simple cosine clustering. The Leiden-community approach in GraphRAG is more robust to high-variance corpus sizes. Worth porting.
2. **Hierarchical summaries.** GraphRAG indexes summaries at multiple zoom levels (community → community-of-communities). We index summaries at the bundle level only. Possible v7 expansion.

## What we should leave

The whole "build-time is expensive, query-time is fast" promise. Our users won't tolerate a build step. The graph extraction has to be a streaming-friendly background job, not a batch one.