SWAG: Semantic Weighted Augmented Generation
A retrieval enhancement layer that improves RAG and CRAG ranking without modifying your source documents — or treating generated synonyms as facts.
A retrieval enhancement layer that improves RAG and CRAG ranking without modifying your source documents — or treating generated synonyms as facts.
Most businesses deploying retrieval-augmented generation (RAG) hit the same wall: vector search returns similar text, not necessarily the right text. A query about “three-way matching in accounts payable” may rank a general AP overview above the paragraph that actually explains the matching process — because both chunks share vocabulary and sit close together in embedding space.
SWAG — Semantic Weighted Augmented Generation — is a lightweight augmentation layer we built to address that gap. It does not replace your documents, rewrite chunks with synonyms, or trigger corrective web searches. Instead, it generates structured semantic metadata for each chunk, scores query-to-chunk alignment using weighted term relationships, and reranks vector search results before generation.
Traditional RAG pipelines follow a straightforward path: chunk documents, embed the raw text, store vectors, retrieve top-K by cosine similarity, pass chunks to an LLM. That works until:
CRAG (Corrective RAG) addresses retrieval failures after the fact by evaluating confidence and triggering fallback retrieval or web search. That is valuable — but it is reactive. SWAG is complementary: it improves the initial candidate ranking so fewer queries ever reach the correction step.
During ingestion, SWAG generates a structured semantic profile for each chunk:
{
"primary_concepts": ["three-way matching", "invoice reconciliation"],
"domain_terms": ["accounts payable", "purchase order", "goods receipt"],
"direct_synonyms": ["PO-GR-IR matching"],
"adjacent_concepts": ["exception workflows", "straight-through processing"],
"business_goals": ["reduce manual AP review", "accelerate payment cycles"],
"user_intents": ["how does three-way matching work"],
"negative_contexts": ["inventory forecasting", "general ledger reporting"],
"confidence": 0.91,
"weighted_terms": [
{
"term": "vendor invoice matching",
"category": "adjacent_concept",
"weight": 0.86,
"reason": "Closely related to accounts payable reconciliation"
}
]
}
Every term carries a weight from 0.0 to 1.0 reflecting its contextual relevance to that specific chunk. At query time, SWAG generates a matching profile for the user’s question and scores alignment across weighted categories — with penalties for negative-context matches and domain mismatches.
SWAG was designed with explicit protections because naive synonym expansion is how RAG systems quietly degrade:
Embed query → vector search → top-K → generate answer
Embed query → vector search → SWAG rerank → top-K → generate answer
Retrieve → evaluate confidence → correct if low (web/re-retrieve) → generate
Retrieve → SWAG rerank → evaluate confidence → correct if still low → generate
In a combined deployment, SWAG sits between vector retrieval and the CRAG confidence evaluator. Better initial ranking means higher confidence scores, fewer corrective retrieval cycles, and lower latency and cost from unnecessary fallback searches.
Final ranking combines three signals with configurable weights:
final_score = (vector_similarity × 0.65)
+ (swag_score × 0.30)
+ (metadata_boost × 0.05)
The vector component preserves the strength of embedding search. The SWAG component rewards semantic profile alignment across weighted categories (primary concepts, user intents, domain terms, adjacent concepts, and direct synonyms). The metadata boost captures domain alignment and structural signals like source filename relevance.
| Category | Weight multiplier | Role |
|---|---|---|
| Primary concept | 1.00× | Core topic alignment |
| User intent | 0.90× | Question the chunk answers |
| Domain term | 0.85× | Specialized vocabulary match |
| Business goal | 0.75× | Operational objective fit |
| Direct synonym | 0.70× | Retrieval hint only — not a fact |
| Adjacent concept | 0.60× | Related but unstated context |
| Negative context | Penalty | Reduces score on false-match terms |
We benchmarked SWAG against vector-only retrieval on an 8-chunk corpus spanning accounts payable operations documentation and SWAG framework reference material. Eight labeled queries were evaluated with ground-truth relevance judgments (correct chunk identified by content match).
| Metric | Baseline RAG | SWAG + RAG | Change |
|---|---|---|---|
| Precision@1 | 50.0% | 62.5% | +12.5 pp (+25% relative) |
| Precision@3 | 87.5% | 87.5% | No change — recall already strong |
| Precision@5 | 87.5% | 87.5% | No change |
| Mean Reciprocal Rank | 66.7% | 72.9% | +6.2 pp (+9.3% relative) |
| Top-1 result changed | — | 12.5% of queries | Correct chunk promoted where vector alone failed |
Query: “How does three-way matching work in accounts payable?”
Baseline RAG ranked the AP overview chunk first (vector similarity 0.732). The chunk that actually explains the three-way matching process ranked second (0.715) — close in embedding space, wrong in operational intent.
SWAG reranking promoted the correct chunk to rank 1 by scoring semantic alignment between the query profile and the chunk’s weighted term profile (user intents, primary concepts, domain terms). The overview chunk dropped to rank 2. Same corpus. Same embeddings. Better answer context.
What these numbers mean in practice: SWAG’s largest gains appear at Precision@1 — the chunk your LLM actually reads first. For operational knowledge systems (SOPs, policy libraries, AP workflows, customer playbooks), that single ranking correction often determines whether the system gives a precise answer or a plausible-sounding generalization.
SWAG is designed local-first and modular — consistent with how we deploy systems for clients who need cloud, on-prem, or hybrid options:
SWAG is not a replacement for RAG or CRAG. It is an augmentation layer best suited when:
The framework is open and deployable today: Python 3.11+, FAISS, FastAPI demo API, and configurable scoring weights. For production deployments with full LLM-generated profiles (vs. local heuristic fallback), expect further gains on Precision@1 and MRR — particularly on ambiguous, domain-heavy query sets.
At automatico.llc, we treat retrieval quality the same way we treat workflow design: start with the operational bottleneck, measure what changes, and expand from there. SWAG is one layer in that stack — rules and structure first, deeper models where they earn their place.
If your team is running RAG or CRAG against operational documentation and getting “close enough” answers that are not quite right, SWAG is worth evaluating on your corpus before adding more corrective complexity downstream.
We can benchmark SWAG against your documents and show you the ranking delta on your highest-value queries.