Overview
The big picture — a framework-free, deep domain-grouped RAG engine where every external dependency is a Protocol, so the core imports zero SDKs and runs fully offline.
RAGSpine is a backend RAG engine you assemble in plain Python — not a framework you
submit to. There is no Dify, no LangGraph, no graph DSL, and no runtime to host. It is a
coherent, batteries-included library of composable parts — retrieval, agent orchestration,
document extraction, evaluation, and an HTTP service layer — wired together by ordinary
functions and typed Protocols.
This page is the map. It explains the three architectural commitments that shape every folder, then walks the request lifecycle end to end. The deeper pages drill into the request flow, the two channels, and the package layout.
Three commitments
Framework-free
The core is ordinary Python functions and dataclasses. No orchestration runtime, no DSL, no UI
you have to adopt. You call answer_question(...) and you own the process.
Deep, domain-grouped layout
Code is organized by domain, never by technical layer — the folder path locates a file before you read its name. A package splits the moment it holds a second responsibility.
Everything is a Protocol
LLM provider, embeddings, reranker, OCR, retriever, and task queue are all typed Protocols
injected at the edges. The core imports zero SDKs and runs offline.
Framework-free
Most stacks force a choice between hand-rolled glue and heavyweight orchestration platforms
that drag in their own runtime, graph DSL, UI, and lock-in. RAGSpine is the middle path: the
control flow is plain Python you can read top to bottom. The sole public entry to the engine
is answer_question() in agent/agent.py — a function, not a graph you compile.
Deep, domain-grouped layout
The repository follows a screaming architecture / package-by-feature stance: organize by
domain/feature, never by technical layer. Find the file by folder first, then read its name.
There are nine top-level domains under src/ragspine/, and each one is itself split as soon
as it earns a second concern (for example extraction/ carries extractors/, routing/,
color/, and verification/ subtrees). See Package layout
for the full map and the dependency direction.
Everything is a Protocol
Every external dependency enters through a typed, @runtime_checkable Protocol injected at
the edges — never an SDK imported in the core:
Prop
Type
Because the core depends on the abstraction and never the SDK, adding a provider, vector store,
reranker, or OCR engine touches one new file. A top-level import ragspine eagerly loads no
domain and pulls no third-party SDK — submodules load lazily (PEP 562), and the anthropic SDK is
lazy-imported only inside AnthropicProvider.__init__.
The result: the engine runs fully offline with a deterministic MockProvider, the default
narrative retriever is pure CJK-aware BM25, and the bundled demo and 1000+ tests run with no
API key on any platform.
The request lifecycle
A question travels a fixed, auditable path. Two guards bracket it: a clarification gateway up front that can ask or refuse before any model call, and an anti-fabrication guard at the structured exit that rewrites the answer to "not found" if no fact backs it — regardless of what the model produced.
question
→ intent parse (metric / entity / period / channel slots)
→ clarification gate ──(ambiguous)→ ask ──(out-of-scope entity)→ refuse
→ FAQ short-circuit (service edge) ──(vetted hit)→ cached answer + provenance
→ route:
structured → function-calling over the fact store → found / not_found / unrecognized
narrative → hybrid retrieve → listwise rerank → synthesize with citations
composite → run both, compare, merge
→ answer + sources (anti-fabrication guard rewrites to "not found" if no fact)The same flow as a diagram (the ASCII above is the primary; this Mermaid block is an equivalent view for renderers that support it):
flowchart TD
Q[question] --> I[intent parse<br/>metric / entity / period / channel]
I --> C{clarification gate}
C -->|ambiguous: missing metric| ASK[ask first]
C -->|out-of-scope entity| REF[refuse]
C -->|ok / assume| F{FAQ short-circuit<br/>service edge}
F -->|vetted hit| FA[cached answer + provenance]
F -->|miss| R{route}
R -->|structured| S[query_metric over fact store<br/>found / not_found / unrecognized]
R -->|narrative| N[hybrid retrieve → rerank → synthesize]
R -->|composite| B[run both, compare, merge]
S --> G[anti-fabrication guard]
N --> G
B --> G
G --> A[answer + sources]The FAQ short-circuit is a service-edge optimization (service/faq/), not part of the library
core. The Python answer_question(...) entry begins at intent parsing. When the HTTP service is
in front, a vetted FAQ hit returns before the agent runs at all — but it carries the same
conservative exclusions, so structured-numeric, competitor, real-time, expired, disabled, and
RESTRICTED questions never short-circuit.
Where to go next
Request flow
The detailed control flow, step by step, grounded in agent/agent.py.
Channels
Structured vs narrative vs composite — what each one runs.
Package layout
The nine-domain map and the dependency direction between them.
Dual-channel (concept)
Why two mechanisms, and how the router splits a composite question.