MRP-VM uses two complementary retrieval strategies to find relevant evidence. Specs: DS009, DS023, DS024.
Standard BM25 with per-field weighting. Matches exact tokens after stemming.
score(query, unit) = Σ fieldWeight[f] × BM25(query, unit[f]) × roleBoost
BM25(q,d) = Σ IDF(t) × (tf × (k1+1)) / (tf + k1 × (1 - b + b × dl/avgdl))
k1 = 1.2, b = 0.75
| Field | Weight | Why |
|---|---|---|
| topic | 1.5× | Most discriminative — what the unit is about |
| claim | 1.0× | Core assertion content |
| procedure | 1.0× | Step content (for Procedure role) |
| utilityActs | 0.8× | Pragmatic act matching |
| utilityNote | 0.6× | Supplementary context |
| condition | 0.6× | Constraints and limitations |
| role | 0.5× | Structural role matching |
Lowercase → split on whitespace → strip edge punctuation → remove possessives ('s) → stopword removal → Porter stemming. Hyphenated terms: keep whole + index parts separately.
Hyperdimensional Computing with 4096-bit binary vectors. Complements BM25 by capturing structural similarity when lexical overlap is partial.
| Operation | Implementation | Purpose |
|---|---|---|
| Random HV | Seeded PRNG from string hash → 4096 bits | Unique vector per token/symbol |
| Bind (⊗) | Bitwise XOR | Associate field name with value |
| Bundle (+) | Majority vote per bit | Combine multiple concepts |
| Similarity | 1 − Hamming(a,b) / 4096 | Compare vectors (0.50 = random) |
Each unit is encoded as separate field vectors, not one blob:
role → randomHV(roleName) — exact match on roletopic → encodeNgrams(tokens) — positional unigrams + bigramsclaim → encodeNgrams(tokens) — captures word orderacts → encodeTokens(actList) — bag-of-wordsfieldScore = max(0, (similarity(query.field, unit.field) - 0.50) × 2)
finalScore = 0.35×topic + 0.35×claim + 0.20×role + 0.10×acts
The 0.50 subtraction removes random noise (two random vectors have ~0.50 similarity).
When both strategies find the same unit:
fusedScore = 1.0 × bm25_normalized + 0.7 × hdc_normalized + 0.15 (agreement bonus)
After scoring and sorting, candidates with score below topScore × gapThreshold are removed. This prevents low-relevance noise from reaching synthesis.
| KB Plugin | Strategies | maxResults | minScore | Gap Threshold | Use Case |
|---|---|---|---|---|---|
| kb-fast | BM25 only | 3 | 0.3 | 50% | Simple questions, max precision |
| kb-balanced | BM25 + HDC escalation | 7 | 0.15 | 35% | Default, good tradeoff |
| kb-thinkingdb | BM25 + bounded symbolic expansion | 8 | 0.12 | 25% | Multi-hop or relation-sensitive retrieval |
Escalation (balanced): HDC runs only when BM25 returns fewer than minAcceptableCandidates.
ThinkingDB: bounded symbolic closure over normalized facts as specified by DS025.