Loom Grows Up: Optional Hybrid Semantic Search, Multilingual Retrieval, and an HTTP Server

A few weeks ago I introduced Loom: an open source tool written in Go that turns a folder of files into a queryable memory layer for your LLMs, inspired by Karpathy’s llm-wiki pattern. The core idea was - and still is - radically simple: no embeddings, no vector database, no Docker. Just precomputed summaries and BM25 search on SQLite.
And it worked. It still works. But then the use case showed up that put that simplicity under real pressure.
When keywords are not enough
BM25 is an excellent algorithm, but it has one very specific blind spot: it finds documents that share the words in your query. If you search for “carbonara” and your notes say “carbonara”, perfect. If you search for “pasta with guanciale” and your notes only say “carbonara”, BM25 does not care - as far as it is concerned, those are just different words.
For personal notes, that is rarely a problem: most of the time you already know what terminology you used, because you are the one who wrote it. But I ran into two scenarios where this becomes genuinely annoying:
Paraphrases. You search for a concept using words different from the ones you used in the note. You think “error handling”, the document says “exception handling” or “failure management”. Same thing, different words, zero results.
Multilingual search. This is the one that finally made me cave. Notes written in Italian, question asked in English - or the other way around. BM25 is completely blind here: “check-in time” and “orario di arrivo” have nothing in common for it. And yet this is exactly what happens when you work with mixed technical documentation, which is about as normal as it gets.
Semantic search solves precisely this problem: it does not compare words, it compares meanings. And to do that, you need embeddings - the very thing I had proudly kept out of Loom.
What is an embedding, in two words. It is a way to turn text into a list of numbers - a vector - where texts with similar meanings end up “close” to each other in space. “Dog” and “puppy” end up near each other; “dog” and “refrigerator” end up far apart. By comparing the distance between vectors, you can find relevant things even when they do not share a single word - and that is exactly why multilingual search works: the meaning of “gatto” and “cat” is the same, so their vectors end up close together.
The solution: hybrid search, but off by default
I could have done the easy thing: add embeddings, declare Loom “a real RAG system”, and betray the entire premise of the project. I did not.
Loom’s philosophy stays intact. Embeddings are optional and disabled by default. If you do not explicitly enable them, Loom behaves byte for byte exactly as before: pure BM25 search, zero new dependencies, no extra model to download. The original simplicity is still there, unchanged, for the people who want it.
For the people who actually need semantic search, there is now hybrid search: Loom computes one embedding vector per file during scanning, and at query time it merges semantic similarity with BM25 ranking using an algorithm called Reciprocal Rank Fusion (RRF). In practice: the best of both worlds. Keyword precision when the term matches, meaning-level understanding when it does not.
And here comes the part I am most proud of: even with embeddings enabled, Loom stays true to itself.
- The vectors live in the same SQLite file, inside a small
file_vectorstable. No separate vector database. - Similarity search is a brute-force cosine implementation written in pure Go: no
sqlite-vec, no CGO, no vector server to keep running in the background. - It is still one binary, still one file. Just with a little more intelligence packed inside.
In other words: I added semantic search without adding infrastructure. Which was the only way it made sense to add it.
How to enable hybrid search
Everything happens in ~/.loom/config.toml, by adding an [embeddings] block:
[embeddings]
enabled = true
provider = "ollama"
model = "embeddinggemma:300m" # multilingual, ~700 MB RAM, runs on CPU
endpoint = "http://localhost:11434"
dim = 768 # optional; 0 = use the model's native dimension
# api_key_env = "OPENAI_API_KEY" # only for provider = "openai"
Then you pull the model and re-index so the vectors get built:
ollama pull embeddinggemma:300m # one-time download, ~620 MB
loom scan --force
A few practical notes:
- Recommended model:
embeddinggemma:300mon Ollama. It is multilingual (100+ languages, perfect for the use case that pushed me into all of this), lightweight, and runs on CPU without needing a GPU. Alternatively,provider = "openai"works fine too, as does any endpoint compatible with/v1/embeddings. - Changing embedding model changes the vector space. Vectors produced by a different model are simply ignored until you rerun
loom scan --forceto rebuild them - and in the meantime search degrades gracefully back to BM25 for those files, without ever throwing an error. - Granularity is per file. Loom computes one vector per file, consistently with its “one row per file” model. For very long documents, it makes sense to split them into smaller files so each vector stays focused.
- Turning embeddings off is trivial:
enabled = false(or remove the block). Existing vectors stay on disk but are ignored, and BM25 keeps working exactly as always.
Under the hood, the database schema moved up to version 3, but in a purely additive way: the new file_vectors table is created next to the existing structures, which continue to work unchanged. Already-built indexes do not need to be rebuilt. The table stays empty until you enable embeddings.
Hybrid search applies everywhere, transparently: loom ask, the loom_search / loom_ask tools via MCP, the HTTP /search endpoint (we are getting there), and the GUI. When embeddings are disabled, it is the old pure-BM25 path, identical down to the previous bit.
Backward compatibility is not negotiable
I am repeating this because it was the point I cared about the most: if you update Loom and do not touch the configuration, absolutely nothing changes.
No new model to download. No extra command to learn. No database migration. Existing indexes keep working. Search stays pure BM25. Loom 0.6 behaves exactly like the version you were already using.
All the new stuff - embeddings, hybrid search - lives behind a switch that starts in the off position. That is a deliberate choice: people who use Loom for its simplicity should not have to pay the price for features they do not need.
The other new thing: Loom over HTTP
There is a second piece I left out of the previous post that deserves its own space here: loom-http, a small optional REST server.
MCP is perfect when you want to talk to Claude Desktop or Claude Code. But if you have some other application - a backend, a microservice, something that knows absolutely nothing about MCP - and you want to use Loom as its retrieval layer, you needed a more universal interface. That interface is HTTP.
LOOM_HTTP_ADDR=:8080 loom-http --config ~/.loom/config.toml
The API is tiny on purpose:
| Method and path | What it does |
|---|---|
GET /healthz | Liveness check |
GET /corpora | List of corpora (multi-corpus mode) |
POST /search {query, limit?, corpus?} | Returns raw results (rel_path, title, summary, content, rank) - no answer generation (BM25, or hybrid if embeddings are enabled) |
POST /scan {force?, corpus?} | (Re)indexes the folder (uses the LLM for summaries) |
The /search endpoint is designed exactly for the pattern “retrieve here, answer in my own model”: your app gets the most relevant files and composes the answer with its own prompt, without asking Loom to generate anything.
curl -s localhost:8080/search -d '{"query":"check-in time","limit":3}'
It reuses the same config.toml as loom and loom-mcp (via --config or the LOOM_CONFIG environment variable); the listen address is configured with LOOM_HTTP_ADDR (default :8080). And it is completely additive and optional: it does not change one comma of Loom’s core idea - a folder of files on your disk as the source of truth, queried locally.
Multi-corpus mode (for people hosting more than one knowledge base)
Last addition, and useful in slightly more structured scenarios. By default, a loom-http process serves the single corpus defined in its config file. But if you set the LOOM_CORPUS_ROOT environment variable, the same process can serve many isolated knowledge bases - handy if you have multi-tenant needs (think of an app managing notes for multiple different clients).
LOOM_CORPUS_ROOT=/srv/knowledge LOOM_HTTP_ADDR=:8080 loom-http
At that point each request carries a corpus name, which gets resolved into <root>/<corpus>/{notes,index.db} - with a separate SQLite index for each corpus:
curl -s localhost:8080/scan -d '{"corpus":"acme","force":true}'
curl -s localhost:8080/search -d '{"corpus":"acme","query":"returns policy"}'
curl -s localhost:8080/corpora
Two important things about isolation:
- Corpus names are validated as a single safe path segment (
[A-Za-z0-9_-], maximum 64 characters). That way one corpus can never read another corpus’s files - no clever../, no surprises. - LLM providers and embeddings providers are shared across corpora (same models, same config). Hybrid search applies per corpus once that corpus’s index has vectors.
And as usual: if you omit corpus, the one from the config file is used. Existing single-corpus integrations keep working without changing anything.
So, where does that leave us?
This update came with a risk: betraying Loom’s premise. A “minimalist tool without embeddings” that suddenly adds embeddings sounds exactly like the kind of feature creep that bloats a project until it becomes just like all the others.
I tried to avoid that in the only way that felt honest: adding power without taking away simplicity. Embeddings are there for the people who need them - multilingual search and paraphrases are real problems - but they stay off for the people who do not. The HTTP server opens Loom up to a whole world of integrations without touching how you use it from the CLI. Multi-corpus is there for people hosting multiple knowledge bases and irrelevant to everyone else.
In every case, the same rule applies: if you do not enable a thing, it is as if it does not exist. The folder of files is still the truth. The SQLite index is still one regenerable file. And the LLM you already use is still the only engine involved. It just now has a few extra gears in the drawer - ready to be used when, and only when, you actually need them.
Resources
- Loom repository: github.com/MatteoAdamo82/loom
- Homebrew tap: github.com/MatteoAdamo82/homebrew-loom
- Reciprocal Rank Fusion (paper): plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf
- embeddinggemma: ollama.com/library/embeddinggemma
- Previous Loom post: Loom: Local Memory for Your LLMs, Inspired by Karpathy
P.S. Yes, I do realize the irony: I wrote an entire post explaining why Loom did NOT use embeddings, and a few weeks later here I am adding them. But I added them with the switch turned off. Which, in my head, still makes me a coherent person. Your honor, the defense rests.