---
title: "mdfy Memory"
url: https://mdfy.app/mdfy-memory
updated: 2026-05-08T18:28:52.772Z
source: "mdfy-explainer"
---
# mdfy Memory

*Your AI memory, owned by you, readable by any AI you paste it to.*

---

## What "memory" means here

Every chat with ChatGPT, Claude, or Cursor produces useful answers. Tomorrow they're gone — the chat is closed, the share link rots, the next session has no idea what you decided last time. Vendors have started building memory layers (ChatGPT memory, Claude projects, Cursor docs) but each one lives behind a vendor wall. They don't talk to each other, you can't share them, you can't read them outside the app, and you definitely can't paste them into the *other* AI tomorrow.

mdfy Memory is the inverse: a memory layer that lives at a public URL you control. Every captured answer is a markdown page anyone (you, your teammate, any AI agent) can read, and the whole hub is one URL that any AI can fetch as context.

The full architecture below is what makes that work — chunked indexing, hybrid retrieval, automatic refresh — but you only need to know the surface to use it.

---

## The surface (what you actually do)

### 1. Capture

- Paste a ChatGPT or Claude share URL into the editor.
- `/mdfy capture <title>` from inside Claude Code, Cursor, Codex CLI, or Aider.
- Drop a PDF, DOCX, or transcript file.

Each capture lands at `mdfy.app/<id>` as a permanent URL. No signup required.

### 2. Organize (or let mdfy do it)

Captures roll up into your hub at `mdfy.app/hub/<you>`. Bundles group docs by topic. You can curate manually, or let mdfy's auto-synthesis suggest groupings as the cluster forms.

### 3. Recall

Two ways:

- **Paste the hub URL** into any AI. They fetch the markdown index and load your knowledge as context.
- **Hit the recall endpoint** for question-targeted retrieval — much fewer tokens, much higher precision:

```bash
curl -X POST https://mdfy.app/api/hub/<slug>/recall \
  -H "Content-Type: application/json" \
  -d '{
        "question": "How does mem0 extract memories?",
        "k": 5,
        "level": "chunk",
        "hybrid": true
      }'
```

That's the whole product surface. The rest of this doc is what's underneath.

---

## How the memory layer works (architecture)

mdfy Memory is built on the same shape Karpathy described in his [LLM Wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) — *raw / wiki / schema* — with the AI doing 80% of the curation work that he does by hand.

### Layer 1 — embeddings everywhere, idempotent

Every public doc carries a 1536-dimensional vector embedded with OpenAI `text-embedding-3-small`, indexed with HNSW for cosine similarity. Same for every bundle. Same, at a finer grain, for every chunk inside a doc.

The refresh is **idempotent**. Each artifact carries a sha256 hash of its source. When you save a doc:

1. Frontend debounces 10 s after the last save.
2. Hits `POST /api/embed/<id>` (fire-and-forget).
3. The route hashes the current source. If the hash matches stored, it returns `{skipped: "unchanged"}` without ever calling OpenAI. Cost on a no-op save: zero.
4. If the hash differs, it embeds, writes the vector + new hash, continues.

Same pattern at three levels:

| Level | Source | Trigger |
|---|---|---|
| **Doc** | title + body | doc save (10s debounce) |
| **Chunk** | each markdown heading subtree | runs alongside doc embed; only changed chunks re-embed; deleted sections pruned |
| **Bundle** | title + description + member doc titles | `/api/embed/bundle/<id>` |

Result: schema layer is always *fresh enough* to retrieve from, without ever paying full embed cost on an unchanged hub.

### Layer 2 — chunks by markdown structure

A doc isn't one vector. It's split on markdown headings (`#` / `##` / `###`); each chunk is the heading line plus everything until the next heading at equal-or-higher rank. Pre-heading prelude becomes chunk 0. Sections longer than ~1800 chars further split on blank-line boundaries with the heading re-emitted at the top of each piece.

Each chunk carries a *breadcrumb*:

```
mdfy Memory > How the memory layer works > Layer 1 — embeddings everywhere
```

When recall returns chunks, the breadcrumb tells the LLM — and the human reading the JSON — exactly *where* in the doc the snippet came from.

### Layer 3 — recall as an HTTP endpoint

The retrieval surface is a single public endpoint. No SDK, no API key, no MCP server.

```
POST mdfy.app/api/hub/<slug>/recall
body:
  {
    "question": "...",
    "k": 5,
    "level": "doc" | "chunk" | "bundle",
    "hybrid": false
  }
```

Three retrieval granularities:

| level | Returns | When |
|---|---|---|
| `doc` | Top-K whole docs | "Which docs are about X?" — lowest tokens |
| `chunk` | Paragraph-level chunks with breadcrumb | Default for AI agents — actual answering paragraph, ~10× less waste |
| `bundle` | Top-K curated bundles | "Is there a reading order for this?" — bundle URL pulls full topic context |

**Hybrid (BM25 + vector RRF)** — when `hybrid: true` on `level: "chunk"`:

1. Vector cosine over chunk embeddings (top `k×4`).
2. Postgres FTS (BM25 via tsvector) over the same chunks (top `k×4`).
3. Reciprocal Rank Fusion: `score = sum( 1 / (60 + rank_in_list) )`.

RRF merges *ranks*, not raw scores, so vector and BM25 (incompatible scales) combine cleanly with no normalization. Each result returns `vector_rank`, `fts_rank`, and `rrf_score` so callers can see why a chunk surfaced.

In practice: query *"MCP server"* has weak semantic signal (an acronym to the embedding model) but strong lexical signal (the chunk that *literally mentions MCP* should win). Vector-only ranks a vague "Why now?" doc first. Hybrid promotes the chunk that says "Built the MCP server" to top-1.

### Layer 4 — privacy filters live in SQL

Every public retrieval RPC enforces the same four privacy gates *in SQL*, not in the API route:

```sql
WHERE d.is_draft = FALSE
  AND d.deleted_at IS NULL
  AND d.password_hash IS NULL
  AND (d.allowed_emails IS NULL OR array_length(d.allowed_emails, 1) IS NULL)
```

Drafts, soft-deletes, password-protected, and email-restricted docs *cannot* leak through recall — even by accident, even if the API route has a bug. The schema is the boundary.

---

## Layer × Operation matrix

| | Embed | Retrieve | Public? |
|---|---|---|---|
| **Doc** | auto on save (idempotent) | `/recall?level=doc` (vector) | yes |
| **Chunk** | auto alongside doc embed (per-chunk hash) | `/recall?level=chunk` (vector) or `hybrid=true` (BM25 + vector RRF) | yes |
| **Bundle** | `/api/embed/bundle/<id>` | `/recall?level=bundle` (vector) | yes |
| **Hub graph** | precomputed semantic edges (cos < 0.42) between all docs | `/hub/<slug>/graph` (visual) | yes |
| **Cross-refs** | extracted from markdown links across all public hubs | `/api/social/cross-refs` | yes |

Five distinct retrieval surfaces, all reading from the same embedding tables, all behind the same SQL privacy gates.

---

## Why this is different from mem0 / OpenMemory

```
                     mem0 / OpenMemory     mdfy Memory
First user           AI agent              human (agent reads via URL)
Interface            MCP server / SDK      HTTP endpoint
Content shape        atomic memories       long-form docs + bundles
Visibility           black box             human-readable markdown URL
Sharing              personal / team       public URL, any AI can fetch
Vendor lock-in       MCP-compatible only   any AI that can hit a URL
```

mdfy Memory isn't a backend store hidden behind an SDK. It's a public HTTP endpoint over content the user can read, edit, and paste. The retrieval pipeline below the surface is comparable to backend-only systems — chunked, hybrid, idempotent — but the *surface* stays human-shaped.

---

## What's deliberately not here (yet)

- **Cross-encoder reranker** on top of RRF. Better, at +50–100 ms latency. Wait until users have hubs big enough that the gain matters.
- **Per-bundle automatic re-embed on metadata edits.** Doc-level is wired through auto-save; bundle-level still needs a manual `/api/embed/bundle/<id>` after edits. Auto-trigger on bundle PATCH is the next sprint.
- **Multi-vector / late interaction (ColBERT-style).** Useful at scale; overkill for hubs in the hundreds.

---

## Try it

```bash
curl -X POST https://mdfy.app/api/hub/yc-demo/recall \
  -H "Content-Type: application/json" \
  -d '{
        "question": "How does mem0 extract memories?",
        "k": 5,
        "level": "chunk",
        "hybrid": true
      }'
```

The response carries `results[].markdown` (the actual chunk), `heading_path` (breadcrumb), `doc_url` (link back), `rrf_score` / `vector_rank` / `fts_rank` so you can see *why* each chunk surfaced.

For the wider thesis — what mdfy is and how it sits next to vendor memory layers — see [How mdfy works](/how-mdfy-works).

---

*This page is itself an mdfy memory. Paste it into Claude or ChatGPT and they read the whole pipeline as context.*
