Knowledge Systems

SWAG: Semantic Weighted Augmented Generation

A retrieval enhancement layer that improves RAG and CRAG ranking without modifying your source documents — or treating generated synonyms as facts.

Most businesses deploying retrieval-augmented generation (RAG) hit the same wall: vector search returns similar text, not necessarily the right text. A query about “three-way matching in accounts payable” may rank a general AP overview above the paragraph that actually explains the matching process — because both chunks share vocabulary and sit close together in embedding space.

SWAG — Semantic Weighted Augmented Generation — is a lightweight augmentation layer we built to address that gap. It does not replace your documents, rewrite chunks with synonyms, or trigger corrective web searches. Instead, it generates structured semantic metadata for each chunk, scores query-to-chunk alignment using weighted term relationships, and reranks vector search results before generation.

Design principle: Original chunk text remains authoritative. SWAG metadata is stored separately and used exclusively for reranking and confidence scoring. Synonyms, adjacent concepts, and domain terms are contextual hints — never source-of-truth content.

The problem with vector-only RAG

Traditional RAG pipelines follow a straightforward path: chunk documents, embed the raw text, store vectors, retrieve top-K by cosine similarity, pass chunks to an LLM. That works until:

  • Multiple chunks share overlapping vocabulary but answer different questions
  • Domain-specific terms collide across unrelated contexts (“Apple silicon” vs. orchard management)
  • Users ask intent-driven questions that do not lexically match the source paragraph
  • Teams inject synonym expansion into chunk text — improving recall but polluting the knowledge base

CRAG (Corrective RAG) addresses retrieval failures after the fact by evaluating confidence and triggering fallback retrieval or web search. That is valuable — but it is reactive. SWAG is complementary: it improves the initial candidate ranking so fewer queries ever reach the correction step.

What SWAG adds

During ingestion, SWAG generates a structured semantic profile for each chunk:

{
  "primary_concepts": ["three-way matching", "invoice reconciliation"],
  "domain_terms": ["accounts payable", "purchase order", "goods receipt"],
  "direct_synonyms": ["PO-GR-IR matching"],
  "adjacent_concepts": ["exception workflows", "straight-through processing"],
  "business_goals": ["reduce manual AP review", "accelerate payment cycles"],
  "user_intents": ["how does three-way matching work"],
  "negative_contexts": ["inventory forecasting", "general ledger reporting"],
  "confidence": 0.91,
  "weighted_terms": [
    {
      "term": "vendor invoice matching",
      "category": "adjacent_concept",
      "weight": 0.86,
      "reason": "Closely related to accounts payable reconciliation"
    }
  ]
}

Every term carries a weight from 0.0 to 1.0 reflecting its contextual relevance to that specific chunk. At query time, SWAG generates a matching profile for the user’s question and scores alignment across weighted categories — with penalties for negative-context matches and domain mismatches.

Guardrails against semantic pollution

SWAG was designed with explicit protections because naive synonym expansion is how RAG systems quietly degrade:

  • No isolated keyword expansion. Ambiguous single words are not expanded without full chunk context.
  • Concept expansion, not keyword stuffing. Multi-word phrases and domain-scoped concepts only.
  • Domain detection first. Each profile includes a detected domain before terms are assigned.
  • Negative contexts. Explicit terms that would cause false-positive matches in other domains.
  • Synonyms are retrieval hints only. Never appended to chunk text. Never treated as facts during generation.

How SWAG fits into RAG and CRAG

Traditional RAG

Embed query → vector search → top-K → generate answer

SWAG + RAG

Embed query → vector search → SWAG rerank → top-K → generate answer

CRAG alone

Retrieve → evaluate confidence → correct if low (web/re-retrieve) → generate

SWAG + CRAG

Retrieve → SWAG rerank → evaluate confidence → correct if still low → generate

In a combined deployment, SWAG sits between vector retrieval and the CRAG confidence evaluator. Better initial ranking means higher confidence scores, fewer corrective retrieval cycles, and lower latency and cost from unnecessary fallback searches.

Scoring formula

Final ranking combines three signals with configurable weights:

final_score = (vector_similarity × 0.65)
            + (swag_score       × 0.30)
            + (metadata_boost   × 0.05)

The vector component preserves the strength of embedding search. The SWAG component rewards semantic profile alignment across weighted categories (primary concepts, user intents, domain terms, adjacent concepts, and direct synonyms). The metadata boost captures domain alignment and structural signals like source filename relevance.

CategoryWeight multiplierRole
Primary concept1.00×Core topic alignment
User intent0.90×Question the chunk answers
Domain term0.85×Specialized vocabulary match
Business goal0.75×Operational objective fit
Direct synonym0.70×Retrieval hint only — not a fact
Adjacent concept0.60×Related but unstated context
Negative contextPenaltyReduces score on false-match terms

Baseline improvement numbers

We benchmarked SWAG against vector-only retrieval on an 8-chunk corpus spanning accounts payable operations documentation and SWAG framework reference material. Eight labeled queries were evaluated with ground-truth relevance judgments (correct chunk identified by content match).

62.5% SWAG Precision@1
+12.5 pp vs baseline
72.9% SWAG MRR
+6.2 pp vs baseline
50% Queries reranked
4 of 8 changed order
MetricBaseline RAGSWAG + RAGChange
Precision@150.0%62.5%+12.5 pp (+25% relative)
Precision@387.5%87.5%No change — recall already strong
Precision@587.5%87.5%No change
Mean Reciprocal Rank66.7%72.9%+6.2 pp (+9.3% relative)
Top-1 result changed12.5% of queriesCorrect chunk promoted where vector alone failed

Example: where SWAG corrected a miss

Query: “How does three-way matching work in accounts payable?”

Baseline RAG ranked the AP overview chunk first (vector similarity 0.732). The chunk that actually explains the three-way matching process ranked second (0.715) — close in embedding space, wrong in operational intent.

SWAG reranking promoted the correct chunk to rank 1 by scoring semantic alignment between the query profile and the chunk’s weighted term profile (user intents, primary concepts, domain terms). The overview chunk dropped to rank 2. Same corpus. Same embeddings. Better answer context.

What these numbers mean in practice: SWAG’s largest gains appear at Precision@1 — the chunk your LLM actually reads first. For operational knowledge systems (SOPs, policy libraries, AP workflows, customer playbooks), that single ranking correction often determines whether the system gives a precise answer or a plausible-sounding generalization.

Architecture at a glance

SWAG is designed local-first and modular — consistent with how we deploy systems for clients who need cloud, on-prem, or hybrid options:

  • Ingestion: .txt, .md, .pdf, .docx with configurable chunk size and overlap
  • Embeddings: sentence-transformers (local) or OpenAI — computed on original chunk text only
  • Vector store: FAISS by default; modular interface for Qdrant, Pinecone, Chroma
  • SWAG profiles: LLM-generated at ingest, stored separately from chunk text
  • Query path: vector search → SWAG rerank → optional CRAG confidence gate → generation

When SWAG earns its place

SWAG is not a replacement for RAG or CRAG. It is an augmentation layer best suited when:

  • Your knowledge base has dense, overlapping content in the same domain
  • Users ask intent-driven questions that do not mirror document phrasing
  • You need disambiguation guardrails without polluting source documents
  • You run CRAG and want fewer corrective retrieval cycles
  • Operational accuracy at rank 1 matters more than broad recall at rank 5

The framework is open and deployable today: Python 3.11+, FAISS, FastAPI demo API, and configurable scoring weights. For production deployments with full LLM-generated profiles (vs. local heuristic fallback), expect further gains on Precision@1 and MRR — particularly on ambiguous, domain-heavy query sets.

What we are building toward

At automatico.llc, we treat retrieval quality the same way we treat workflow design: start with the operational bottleneck, measure what changes, and expand from there. SWAG is one layer in that stack — rules and structure first, deeper models where they earn their place.

If your team is running RAG or CRAG against operational documentation and getting “close enough” answers that are not quite right, SWAG is worth evaluating on your corpus before adding more corrective complexity downstream.

Next step

Discuss a knowledge workflow.

We can benchmark SWAG against your documents and show you the ranking delta on your highest-value queries.

Start a Conversation