RAGSpine
Architecture

Overview

The big picture — a framework-free, deep domain-grouped RAG engine where every external dependency is a Protocol, so the core imports zero SDKs and runs fully offline.

RAGSpine is a backend RAG engine you assemble in plain Python — not a framework you submit to. There is no Dify, no LangGraph, no graph DSL, and no runtime to host. It is a coherent, batteries-included library of composable parts — retrieval, agent orchestration, document extraction, evaluation, and an HTTP service layer — wired together by ordinary functions and typed Protocols.

This page is the map. It explains the three architectural commitments that shape every folder, then walks the request lifecycle end to end. The deeper pages drill into the request flow, the two channels, and the package layout.

Three commitments

Framework-free

The core is ordinary Python functions and dataclasses. No orchestration runtime, no DSL, no UI you have to adopt. You call answer_question(...) and you own the process.

Deep, domain-grouped layout

Code is organized by domain, never by technical layer — the folder path locates a file before you read its name. A package splits the moment it holds a second responsibility.

Everything is a Protocol

LLM provider, embeddings, reranker, OCR, retriever, and task queue are all typed Protocols injected at the edges. The core imports zero SDKs and runs offline.

Framework-free

Most stacks force a choice between hand-rolled glue and heavyweight orchestration platforms that drag in their own runtime, graph DSL, UI, and lock-in. RAGSpine is the middle path: the control flow is plain Python you can read top to bottom. The sole public entry to the engine is answer_question() in agent/agent.py — a function, not a graph you compile.

Deep, domain-grouped layout

The repository follows a screaming architecture / package-by-feature stance: organize by domain/feature, never by technical layer. Find the file by folder first, then read its name. There are nine top-level domains under src/ragspine/, and each one is itself split as soon as it earns a second concern (for example extraction/ carries extractors/, routing/, color/, and verification/ subtrees). See Package layout for the full map and the dependency direction.

Everything is a Protocol

Every external dependency enters through a typed, @runtime_checkable Protocol injected at the edges — never an SDK imported in the core:

Prop

Type

Because the core depends on the abstraction and never the SDK, adding a provider, vector store, reranker, or OCR engine touches one new file. A top-level import ragspine eagerly loads no domain and pulls no third-party SDK — submodules load lazily (PEP 562), and the anthropic SDK is lazy-imported only inside AnthropicProvider.__init__.

The result: the engine runs fully offline with a deterministic MockProvider, the default narrative retriever is pure CJK-aware BM25, and the bundled demo and 1000+ tests run with no API key on any platform.

The request lifecycle

A question travels a fixed, auditable path. Two guards bracket it: a clarification gateway up front that can ask or refuse before any model call, and an anti-fabrication guard at the structured exit that rewrites the answer to "not found" if no fact backs it — regardless of what the model produced.

question
  → intent parse (metric / entity / period / channel slots)
  → clarification gate ──(ambiguous)→ ask  ──(out-of-scope entity)→ refuse
  → FAQ short-circuit (service edge) ──(vetted hit)→ cached answer + provenance
  → route:
       structured → function-calling over the fact store → found / not_found / unrecognized
       narrative  → hybrid retrieve → listwise rerank → synthesize with citations
       composite  → run both, compare, merge
  → answer + sources   (anti-fabrication guard rewrites to "not found" if no fact)

The same flow as a diagram (the ASCII above is the primary; this Mermaid block is an equivalent view for renderers that support it):

flowchart TD
  Q[question] --> I[intent parse<br/>metric / entity / period / channel]
  I --> C{clarification gate}
  C -->|ambiguous: missing metric| ASK[ask first]
  C -->|out-of-scope entity| REF[refuse]
  C -->|ok / assume| F{FAQ short-circuit<br/>service edge}
  F -->|vetted hit| FA[cached answer + provenance]
  F -->|miss| R{route}
  R -->|structured| S[query_metric over fact store<br/>found / not_found / unrecognized]
  R -->|narrative| N[hybrid retrieve → rerank → synthesize]
  R -->|composite| B[run both, compare, merge]
  S --> G[anti-fabrication guard]
  N --> G
  B --> G
  G --> A[answer + sources]

The FAQ short-circuit is a service-edge optimization (service/faq/), not part of the library core. The Python answer_question(...) entry begins at intent parsing. When the HTTP service is in front, a vetted FAQ hit returns before the agent runs at all — but it carries the same conservative exclusions, so structured-numeric, competitor, real-time, expired, disabled, and RESTRICTED questions never short-circuit.

Where to go next

On this page