Loom: Local Memory for Your LLMs, Inspired by Karpathy

There is a question that eventually comes up for anyone who starts using local AI models seriously: “Okay, but how do I make them read MY documents?”
You have Markdown notes, research PDFs, code snippets, random thoughts scattered across a dozen folders. The LLM is smart, but it knows nothing about what you wrote. So how do you give it that context?
The standard answer is: build a RAG system. Chunking, embeddings, vector stores, Docker, Qdrant, a separate embedding model… an infrastructure stack that takes real knowledge and real time to install, maintain, and keep working.
I have already gone down that road with Doc Analyzer, and it works extremely well for certain use cases. But I realized that for my personal notes, my work material, my day-to-day knowledge base, all that complexity was often too much.
Then I read Andrej Karpathy’s llm-wiki gist - the same Andrej Karpathy who co-founded OpenAI - and had one of those “why didn’t I think of this earlier?” moments.
That is where Loom came from.
Karpathy’s idea: the persistent wiki
Before talking about Loom itself, it is worth explaining what Karpathy is actually proposing in that gist - because Loom is not a faithful port of the idea, just heavily inspired by it.
Classic RAG has a structural limitation: it does not accumulate anything. Every time you ask a question, the LLM starts from scratch - retrieves chunks, reads them, summarizes them. There is no memory that grows over time.
Karpathy proposes something different: let the LLM build and maintain a persistent wiki made of interconnected Markdown pages. Every time you add a new document, the LLM reads it, extracts the key information, updates the existing pages, and flags contradictions with what it already knows. Knowledge accumulates and gets refined, instead of being re-derived from scratch every single time. The gist is designed to be used with an agent (Claude Code, Codex) on top of an Obsidian-like folder - the LLM writes the wiki, and you browse it.
It is a beautiful and ambitious idea. But it also requires an active agent and a certain amount of workflow discipline.
My variant: summaries + BM25 on SQLite
Loom takes the core intuition - preprocess the documents once instead of rediscovering everything at query time - and implements it in a simpler, more autonomous way.
Instead of building an interconnected wiki, Loom asks the LLM to generate a compact summary for each file (150-250 words, 5-8 keywords). One LLM call per file, only during the initial indexing pass. Summaries and original content go into SQLite, and when you ask a question, Loom runs BM25 across the whole index to find the most relevant files, which are then passed to the LLM for the final answer.
No wiki to maintain. No always-on agent. The filesystem is the source of truth: add a file, run loom scan, done.
What is BM25? It is the ranking algorithm behind Elasticsearch, Lucene, and most serious search engines. The acronym stands for Best Match 25 - and no, the “25” is not some poetic version number, it is literally the iteration of the algorithm the researchers considered good enough to publish. In practice, BM25 scores each document based on how often the query terms appear, while penalizing very long documents that naturally accumulate more matches just by being longer, and taking into account how rare a term is across the whole collection. It is fast, deterministic, and works remarkably well when you already know the terminology you tend to use - which is exactly the case with personal notes.
Less infrastructure. Less ceremony. Surprisingly good results for personal use - and, as we will see, full freedom in choosing your LLM provider.
What Loom is
Loom is an open source tool written in Go that indexes the files in your ~/loom/ folder (or wherever you want), makes them queryable in natural language, and integrates with Claude Code, Claude Desktop, and any MCP client.
Technically:
- Go - a single compiled binary, no runtime to install
- SQLite + FTS5 - the database is just one file, and full-text search is built in, with stemming and proper support for Italian text
- Configurable LLM provider - Ollama locally, or Anthropic Claude, or OpenAI GPT-4o: your choice
- Wails - for the optional desktop GUI (with Svelte + TypeScript under the hood)
- MCP server - for direct integration with Claude and other AI tools
No Qdrant. No Docker. No separate embedding model. Just one SQLite file, one binary, and the LLM you already use.
How it works (in three steps)
1. Put your files in the folder
~/loom/
├── project-x-notes.md
├── competitor-research.md
├── vendor-contract.pdf
├── work/
│ ├── api-specs-v2.md
│ └── q1-bug-report.md
└── personal/
└── blog-ideas.md
The structure is completely up to you. Use subfolders however you want. Loom scans them recursively. Hidden directories (like .git or .obsidian) are skipped automatically.
2. Index everything with loom scan
loom scan
Loom finds every new or modified file and, for each one, asks the configured LLM to generate a 150-250 word summary plus 5-8 keywords. One LLM call per file, only the first time around - or whenever the file changes. Everything ends up in SQLite.
One detail I particularly like: if you move or rename a file, Loom recognizes the content by hash and reuses the existing summary instead of making another LLM call. The old path is removed from the index automatically.
3. Ask questions with loom ask
loom ask "what are the critical points in the vendor contract?"
Loom runs a BM25 search across summaries, keywords, and content, takes the top 5 relevant files, and makes a single LLM call with the summaries plus the original content. The answer streams back with citations in the format [filename.md].
Example answer:
From the documents found, the main critical points are:
1. **Exclusivity clause** (90 days) - potentially restrictive in case of
early renegotiation [vendor-contract.pdf]
2. **Undefined SLAs** for high-priority tickets [q1-bug-report.md]
3. **Asymmetric penalties** - only enforced in one direction [vendor-contract.pdf]
Installation
With Homebrew (macOS - the easiest way)
brew tap MatteoAdamo82/loom
brew install loom
This installs both loom (CLI) and loom-mcp (the MCP server). The desktop GUI (Loom.app) is available as a separate download from the releases page.
From source
git clone https://github.com/MatteoAdamo82/loom.git
cd loom
go install ./cmd/loom ./cmd/loom-mcp
Requires Go 1.26+. Nothing else.
Quick start
# Initialize: create ~/.loom/config.toml and the ~/loom/ folder
loom init
# Put your files in ~/loom/
# Index them
loom scan
# Query them
loom ask "what did I write about project X?"
Configuration: choose your LLM provider
This is where Loom stands apart from many similar tools: you are not locked into a single provider. The ~/.loom/config.toml file (created automatically by loom init) already includes all three provider blocks, commented and ready to use:
# Ollama (local, default)
[llm]
provider = "ollama"
model = "llama3.1:8b"
endpoint = "http://localhost:11434"
api_key_env = ""
# Anthropic Claude
# [llm]
# provider = "anthropic"
# model = "claude-sonnet-4-5"
# api_key_env = "ANTHROPIC_API_KEY"
# OpenAI
# [llm]
# provider = "openai"
# model = "gpt-4o"
# api_key_env = "OPENAI_API_KEY"
To switch providers: uncomment the block you want and comment the others. That is it.
| Provider | provider | Example model | Notes |
|---|---|---|---|
| Ollama (local) | ollama | llama3.1:8b, qwq:32b | No API key, runs on your own machine |
| Anthropic Claude | anthropic | claude-sonnet-4-5 | Requires ANTHROPIC_API_KEY |
| OpenAI | openai | gpt-4o | Requires OPENAI_API_KEY |
Note on
api_key_env: the value is the name of the environment variable, not the key itself. The secret is never written to disk. Before starting Loom, export the variable in your shell:export ANTHROPIC_API_KEY=sk-ant-... # or export OPENAI_API_KEY=sk-...
Which provider makes sense depends on your use case: Ollama if you want maximum privacy and have the hardware for it (or are happy with the :cloud models), Claude or GPT-4o if you want the highest-quality summaries without worrying about hardware constraints and privacy trade-offs.
MCP integration: Loom inside Claude
Loom includes an MCP server (loom-mcp) that exposes your knowledge base to any MCP client - Claude Desktop, Claude Code, and beyond.
With Claude Desktop
Add this to ~/.claude/settings.json:
{
"mcpServers": {
"loom": {
"command": "loom-mcp",
"args": ["--config", "/Users/yourusername/.loom/config.toml"]
}
}
}
From that moment on, Claude Desktop can access your notes directly in conversation. You can ask things like “check my notes about postgres tuning” and Claude will search your files without you having to paste anything manually.
With Claude Code
claude mcp add loom loom-mcp
Loom exposes five MCP tools:
| Tool | What it does |
|---|---|
loom.ask(question, top_k?) | Full answer with citations (1 LLM call) |
loom.search(query, limit?) | Raw BM25 search, no LLM call |
loom.scan(force?) | (Re)index the folder |
loom.list_files() | List all files with summary and keywords |
loom.get_file(rel_path) | Full content of a specific file |
PDF support and OCR
PDFs are handled in three layers:
- Plain text extraction with
ledongthuc/pdf(native Go, zero dependencies) - If a page is empty and
pdftoppm+tesseractare available in the PATH, OCR is used - If both fail, the file is still indexed with a placeholder - the LLM will not be able to quote it, but nothing crashes
To enable OCR:
# macOS
brew install poppler tesseract tesseract-lang
# Debian/Ubuntu
sudo apt install poppler-utils tesseract-ocr tesseract-ocr-ita
Multiple languages? Set the environment variable:
export TESSERACT_LANGS="eng+ita"
Desktop GUI (optional)
If you prefer something more visual, Loom also includes a desktop GUI built with Wails (Go + Svelte). Download the binary for your platform from the releases page.
The interface has two panes: a file list on the left and the chat on the right. Click a file to open a viewer with Markdown rendering. Click a citation pill like [file.md] in a response and jump straight to that file. Settings - provider, model, endpoint, API key environment variable, folder - all live in a single modal. That is the whole thing.
Use cases: when Loom makes sense
Your work and research notes
Years of Markdown, exports from Obsidian or Notion, random notes sitting in a folder. Loom makes them queryable without migrations. Put the folder in ~/loom/, run loom scan, and ask things like “what did I decide about error handling in project Y?”.
Personal knowledge base
Saved articles, documentation PDFs, technical specs, meeting notes. All the stuff you accumulate over time and then struggle to find again.
Integration into your development workflow
With the MCP server, Claude Code can access your notes while it is working on code. A file like architectural-decisions.md inside the Loom directory becomes part of the toolchain automatically when relevant.
Project documentation
Specs, requirements, bug reports, changelogs. The kind of material that is usually spread across too many places and that you wish you could query in natural language instead of reaching for grep -r.
Loom vs Doc Analyzer: when to use which
Since I built both, I have a pretty clear opinion on where each one makes sense.
| Loom | Doc Analyzer | |
|---|---|---|
| Installation | brew install loom | Docker Compose + Qdrant |
| LLM provider | Ollama / Claude / OpenAI | Ollama (LLM + embeddings) |
| Search | BM25 on FTS5 | Vector similarity search |
| Chunking | No (full files) | Yes (~1000 chars with overlap) |
| UI | CLI + optional desktop GUI | Web app with HTTP Basic auth |
| MCP | Yes (native) | No |
| Import URL/YouTube | No | Yes |
Use Loom if: your documents are already on disk, you want zero infrastructure, you work heavily with Claude Code/Desktop, or you want flexibility in choosing your LLM provider.
Use Doc Analyzer if: your documents are very long and heterogeneous, you want real semantic search, you need a shareable web interface, or you want to import content from URLs and YouTube.
They are not mutually exclusive: I use both, in different contexts.
The technical lesson behind Loom
Classic RAG is often over-engineered for personal use. Chunking, embeddings, vector stores, re-ranking - every piece makes sense in enterprise scenarios with huge corpora. For your own notes, precomputed summaries + BM25 cover 95% of the need with 20% of the complexity.
The LLM understands natural language better than a vector index understands semantic similarity. If you feed it the right summaries, it knows what matters. Your job is simply to find “enough” good candidates - and for that, BM25 is often more than enough.
And Go? Three reasons. The first is technical: native concurrency for scanning lots of files in parallel, compilation into a static binary with no runtime to distribute, and performance that in Python would have required workarounds. The second is practical: I rarely use Go at work, and a personal project is the perfect excuse to keep the language fresh. The third is that it reduces project complexity: fewer dependencies, fewer abstraction layers, fewer things that can break. It is not the trendy choice of the moment, but it is the one that makes the most sense for this use case.
Resources
- Loom repository: github.com/MatteoAdamo82/loom
- Homebrew tap: github.com/MatteoAdamo82/homebrew-loom
- Karpathy’s llm-wiki: gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
- Doc Analyzer: github.com/MatteoAdamo82/doc-analyzer
P.S. If you were expecting a stack with Kubernetes, a message broker, and at least three acronyms you have never heard before, I am sorry to disappoint you. One SQLite file, one binary, and the LLM you already use are enough more often than not.