What is this tool?

Grounded is a small, private, NotebookLM-style tool for retrieval-augmented generation (RAG). You paste your own text — meeting notes, an article, product docs, a lecture transcript — as one or more sources. Then you ask questions in plain language, and the tool answers using only what's in your sources, with inline [1] citations you can click to reveal the exact passage each claim came from. Ask about something that isn't in your notes and it will simply tell you so, instead of making an answer up.

Why "grounded"? The problem RAG solves

A language model on its own answers from a blurry memory of its training data. That's fine for general questions but unreliable foryour material: it can't know your meeting notes, and if you paste a long document into a chatbot it often loses the details or invents plausible-sounding ones (a "hallucination"). Retrieval-augmented generation fixes this by splitting your text into passages, finding the few passages most relevant to your question, and handing only those to the model as the source of truth — so every answer is anchored to something you can see and verify.

How it works, step by step

Chunking. Each source is split into overlapping passages of a few hundred characters, on paragraph and sentence boundaries, so a single idea stays intact.
Embedding (in your browser). Every passage is turned into a 384-number vector — a point in "meaning space" — by a compact neural model calledMiniLM. This model runs entirely on your device via WebAssembly; your sources are never uploaded to do this.
Retrieval. Your question is embedded the same way, and the tool measures cosine similarity between the question vector and every passage vector to pick the top handful that are actually about what you asked.
Grounded generation. Only those few passages — plus your question — are sent to a small language model (the same one that runs this site's AI arcade) with strict instructions: answer from the passages, cite them with [n], and say "that isn't in your notes" when the answer isn't there.

See retrieval do the work: the on/off toggle

Tick "Also answer without my sources" before you ask, and you'll get two answers side by side: the grounded one (with citations) and the bare model's guess from its own memory. On anything specific to your material — a number, a date, a name — the difference is usually stark: the grounded answer is exact and cited, while the bare model either hedges or confabulates. That contrast is the point of RAG, made visible.

Your sources stay on your device

The expensive, private part — reading and embedding your text — happens in your browser and nowhere else. The only thing that leaves your device is a single request per question containing your question and the few short passages that were retrieved for it — never your whole document, never your other notebooks. Notebooks, sources, vectors and chat history are stored locally in your browser (IndexedDB); clear your site data and they're gone. There are no accounts and no cross-device sync.

The paste-time safety layer

Because you might paste anything — including text copied from the wild — three things protect you and the app:

Inert rendering. Pasted text, retrieved passages and citations are always rendered as plain text, never as HTML. A<script> tag or an onerror= handler in your source shows up as literal characters and can never execute. This is the real protection against malicious content.
Prompt-injection neutralization. When passages are sent for generation they are wrapped and labelled as data, not instructions, so a line like "ignore your rules" inside a source is treated as content to read about, not a command to follow.
A content detector. A quick scan flags sources thatlook like a SQL-injection, script or shell string, with a small badge. It's honest about itself: there's no database or shell here for such a string to attack — the badge is a heads-up, and the real safety comes from the two points above.

Limits (and why they're there)

Each source is capped at about 20,000 characters, with up to 10 sources per notebook and a 500-character question — enough for real documents while keeping the in-browser embedder snappy. The language model runs on a modest CPU on this site's own server, so a grounded answer takes a few seconds, identical questions are answered from a cache, and there's a gentle per-minute limit so one visitor can't monopolize it. If the model is ever busy, the tool still shows you the passages it retrieved.

FAQ

Is it free? Yes — free, no sign-up. The generation step has a light rate limit; everything else is unlimited and local.

Is my text uploaded? No. Embedding and retrieval happen in your browser. Only your question and the specific retrieved passages are sent for the final answer, to this site's own model — no third-party AI service is involved.

Why does the first question take a while? The first time you use it, your browser downloads the embedding model (about 25 MB of model plus a WebAssembly runtime) and caches it. After that it's instant and works even offline for the embedding part.

Can I trust the answers? Trust the citations, not the prose. The model is small and can phrase things imperfectly, but every claim links to the passage it came from — click through and check. If something isn't in your sources, it should say so rather than guess.

What's the difference from ChatGPT or NotebookLM? Same core idea as NotebookLM — grounded, cited answers over your own sources — but deliberately tiny, private and free: no uploads, no account, the embeddings run on your device, and the whole thing is one page built for this site.

Built from scratch for this site: a unit-tested chunk-and-retrieve core, on-device MiniLM embeddings via WebAssembly, and a self-hosted language model writing the cited answers — no external AI services.

Grounded

Sources

Chat