ThinkingDB — Symbolic Retrieval NOT IMPLEMENTED

A bounded symbolic reasoning layer that supplements BM25 and HDC/VSA with local logical closure over structured facts. Full spec: DS025.

What Problem Does It Solve?

BM25 finds units that share words with the query. HDC/VSA finds units with similar structure. Neither can compose facts. If the KB says "AchillesIDE uses Ploinky" and "Ploinky provides sandboxing", neither BM25 nor HDC can conclude that "AchillesIDE has sandboxing capability" — that requires a reasoning step.

ThinkingDB performs bounded symbolic composition: given seed facts near the query, it applies rules to derive new facts and traces which original KB units participated in the proof.

Seed Facts A uses B B provides C Rule Application if ?x uses ?y AND ?y provides ?z then ?x has_capability ?z Derived Fact A has_capability C conf: 0.90, proof: 2 steps Ranked Source Units original KB units that participated in proof

Key Design Decisions

DecisionRationale
No new CNL dialectReuses DS005 Context CNL with optional Subject/Relation/Object fields
Rules in code/config, not CNLAvoids a second public normalization contract in v1
Local closure onlyFull KB saturation is too expensive; only seed neighborhood is expanded
Positive Horn fragmentNo negation, no disjunction — keeps reasoning bounded and predictable
Proof-bearing rankingReturns original KB units with proof traces, not synthetic derived units

Context CNL Extension

A Context Unit MAY carry structured symbolic fields alongside its normal pragmatic fields:

## Context Unit src-001::chunk-000::unit-000
SourceId: src-001
ChunkId: src-001::chunk-000
Role: Explanation
Topic: AchillesIDE and Ploinky
Claim: AchillesIDE uses Ploinky.
Subject: AchillesIDE
Relation: uses
Object: Ploinky
Confidence: 1.00
UtilityActs: explain

Subject, Relation, Object are all-or-none. Confidence is optional (default 1.0). Units without these fields remain valid DS005 units.

Rule Model

{
  id: "tool_to_capability",
  when: [
    { s: "?x", r: "uses", o: "?y" },
    { s: "?y", r: "provides", o: "?z" }
  ],
  then: { s: "?x", r: "has_capability", o: "?z" },
  weight: 0.95,
  maxDepth: 3
}

Bounded positive Horn: 1–3 premises, one conclusion, variables with ? prefix. Recursive rules allowed under depth budget.

Built-in Relations

uses, provides, has_capability, depends_on, part_of, instance_of, relevant_for, supports, mentions, about

Query & Closure

  1. Seed resolution — match query terms to symbolic subjects/objects
  2. Local neighborhood — collect facts within maxDepth hops of seeds
  3. Bounded fixpoint — apply rules iteratively until no new facts or budget exhausted
  4. Ranking — score each source unit by the best proof path it participates in
pathScore = ∏(fact.conf) × ∏(rule.weight) × distancePenalty(len) × goalBonus × seedBonus
distancePenalty(n) = 1 / (1 + n × 0.25)

Proposed Retrieval Profile

{
  "thinkingdb": {
    "primaryStrategies": ["bm25-lexical"],
    "secondaryStrategies": ["thinkingdb-symbolic"],
    "maxResults": 8,
    "minScore": 0.12,
    "confidenceGapThreshold": 0.25,
    "targetLatencyMs": 700
  }
}

BM25 remains the precision anchor. ThinkingDB runs as secondary symbolic expansion behind kb-thinkingdb for multi-hop retrieval.

What's Needed to Implement

  1. Add Subject/Relation/Object/Confidence to DS005 allowed fields and DS007 validator
  2. Extend LLM normalization prompts to extract symbolic triples when present
  3. Implement ThinkingDB class (synchronous, no deps, ~200 lines)
  4. Implement thinkingdb-symbolic DS023 strategy wrapper
  5. Add thinkingdb profile to config/retrieval-strategies.json
  6. Create evaluation suite with multi-hop reasoning scenarios

Explicit Non-Goals (v1)