ThinkingDB — Symbolic Retrieval NOT IMPLEMENTED

A bounded symbolic reasoning layer that supplements BM25 and HDC/VSA with local logical closure over structured facts. Full spec: DS025.

What Problem Does It Solve?

BM25 finds units that share words with the query. HDC/VSA finds units with similar structure. Neither can compose facts. If the KB says "AchillesIDE uses Ploinky" and "Ploinky provides sandboxing", neither BM25 nor HDC can conclude that "AchillesIDE has sandboxing capability" — that requires a reasoning step.

ThinkingDB performs bounded symbolic composition: given seed facts near the query, it applies rules to derive new facts and traces which original KB units participated in the proof.

Key Design Decisions

Decision	Rationale
No new CNL dialect	Reuses DS005 Context CNL with optional `Subject/Relation/Object` fields
Rules in code/config, not CNL	Avoids a second public normalization contract in v1
Local closure only	Full KB saturation is too expensive; only seed neighborhood is expanded
Positive Horn fragment	No negation, no disjunction — keeps reasoning bounded and predictable
Proof-bearing ranking	Returns original KB units with proof traces, not synthetic derived units

Context CNL Extension

A Context Unit MAY carry structured symbolic fields alongside its normal pragmatic fields:

## Context Unit src-001::chunk-000::unit-000
SourceId: src-001
ChunkId: src-001::chunk-000
Role: Explanation
Topic: AchillesIDE and Ploinky
Claim: AchillesIDE uses Ploinky.
Subject: AchillesIDE
Relation: uses
Object: Ploinky
Confidence: 1.00
UtilityActs: explain

Subject, Relation, Object are all-or-none. Confidence is optional (default 1.0). Units without these fields remain valid DS005 units.

Rule Model

{
  id: "tool_to_capability",
  when: [
    { s: "?x", r: "uses", o: "?y" },
    { s: "?y", r: "provides", o: "?z" }
  ],
  then: { s: "?x", r: "has_capability", o: "?z" },
  weight: 0.95,
  maxDepth: 3
}

Bounded positive Horn: 1–3 premises, one conclusion, variables with ? prefix. Recursive rules allowed under depth budget.

Built-in Relations

uses, provides, has_capability, depends_on, part_of, instance_of, relevant_for, supports, mentions, about

Query & Closure

Seed resolution — match query terms to symbolic subjects/objects
Local neighborhood — collect facts within maxDepth hops of seeds
Bounded fixpoint — apply rules iteratively until no new facts or budget exhausted
Ranking — score each source unit by the best proof path it participates in

pathScore = ∏(fact.conf) × ∏(rule.weight) × distancePenalty(len) × goalBonus × seedBonus
distancePenalty(n) = 1 / (1 + n × 0.25)

Proposed Retrieval Profile

{
  "thinkingdb": {
    "primaryStrategies": ["bm25-lexical"],
    "secondaryStrategies": ["thinkingdb-symbolic"],
    "maxResults": 8,
    "minScore": 0.12,
    "confidenceGapThreshold": 0.25,
    "targetLatencyMs": 700
  }
}

BM25 remains the precision anchor. ThinkingDB runs as secondary symbolic expansion behind kb-thinkingdb for multi-hop retrieval.

What's Needed to Implement

Add Subject/Relation/Object/Confidence to DS005 allowed fields and DS007 validator
Extend LLM normalization prompts to extract symbolic triples when present
Implement ThinkingDB class (synchronous, no deps, ~200 lines)
Implement thinkingdb-symbolic DS023 strategy wrapper
Add thinkingdb profile to config/retrieval-strategies.json
Create evaluation suite with multi-hop reasoning scenarios

Explicit Non-Goals (v1)

No public rule CNL or query CNL
No negation or contradiction detection
No confidence calibration by LLM
No persistence of derived facts as KB truth
No alias learning from free text

← Symbolic Exec Home →