Working doc · v0.1 · Last revised: 2026.05

Search·Agent
Harness

A five-layer pipeline that turns a stated goal into a verifiable artifact, by routing across twenty retrieval sources, four co-installed tools, and one fallback web-search lane — with a critic loop and an eval harness scoring every run. Outcome-driven, not signal-driven.

20adapters
Phase 1 sources
4co-install
CLI & MCP
5layers
L0 → L4
15×
Target leverage
§01Pipeline

Goal in. Artifact out.

Seven stages along the spine, two cross-cutting concerns on the side, one feedback loop closing back to the router. The harness owns everything between the user's stated goal and the merge-ready artifact — retrieval is just the middle four boxes.

User goal natural language · issue · repo · log Task Compiler → JobSpec · stack · freshness · risk budget Source Router JobSpec → SourcePlan DAG planner two-stage · top-K filter Adapter Mesh 20 sources → unified Observation schema rate · cache · contract per-source bucket · 3-tier Evidence Graph claim ↔ source ↔ citation UnionFind + NLI entity layer + claim grounding Agent Runtime planner · scout · verifier · critic · builder role-based, with critic loop & revision Artifact Factory ADR · patch · PR · brief · matrix Eval Harness citation coverage · build & test · license · anti-hallucination · human edit-distance below threshold → revise plan L0 L1.0 L1.1 L2 L3 L4.0 L4.1 EVAL
retrieval spine (router → adapters) outcome layer (evidence → artifact → eval) revision loop (eval → router)
§02Adapters

Twenty sources, four lanes, one Hit schema.

Each adapter exports search, get, and (when supported) batch, normalising every result into a unified Hit record while passing source-unique signals through verbatim. No adapter normalises authority — that is L3's job.

Lane A

Code & Semantic Search

outputs repo signals · code patterns · embeddings
01

GitHub

token
api.github.com/search/{repos,code,issues}
+ /graphql for discussions/gists
stars · forks · license · topic · code search · pushed_at
02

GitLab

private-token
{gitlab.com,gnome,fdo}/api/v4/...
multi-instance, license needs N+1
stars · license · pushed_at · 跨实例覆盖
19

Sourcegraph

anon · sse
sourcegraph.com/.api/search/stream
Accept: text/event-stream
trigram across 2M+ OSS · facets · symbol type · commit SHA
20

Exa

x-api-key
api.exa.ai/search · /contents
type: auto · fast · deep · deep-reasoning
neural retrieval · outputSchema grounding · livecrawl freshness · vertical search
Lane B

Packages & Security

outputs downloads · license · CVE · SLSA · advisory keys
09

npm registry

anon
registry.npmjs.org/-/v1/search?text=
downloads.weekly · dependents (inline)
10

crates.io

anon · ua
crates.io/api/v1/crates?q=&sort=downloads
recent_downloads (90d rolling) · sort=downloads
11

libraries.io

key for /search
libraries.io/api/{platform}/{name}
+ /api/search?api_key=...
SourceRank · dependent_repos_count
16

deps.dev

anon
api.deps.dev/v3/systems/{npm,pypi,maven,go,cargo,nuget,rubygems}/...
7 ecosystems · SPDX licenses · advisoryKeys → OSV · slsaProvenances · attestations
15

OSV.dev

anon
POST api.osv.dev/v1/{query,querybatch}
GET /v1/vulns/{id}
package+version → vulns · CVSS 3.1 · affected[] version tree
13

CVE / NVD

anon · key+
services.nvd.nist.gov/rest/json/cves/2.0
CVSS · CWE · CPE version tree · references[]
Lane C

Research

outputs papers · citations · peer review · datasets · models
06

arXiv

anon
export.arxiv.org/api/query
Atom XML
abstract · category · time window · phrase needs %22quote%22
07

OpenAlex

anon · polite
api.openalex.org/works?search=&filter=...
fwci · top_1_percent · concepts · counts_by_year · is_retracted · referenced_works
08

OpenReview

anon
api2.openreview.net/notes/search → /notes?forum=<id>
public review · decision · rebuttal · two-stage fetch
12

Hugging Face Hub

anon
huggingface.co/api/{models,datasets,spaces}
+ /api/daily_papers
downloads · likes · pipeline_tag · daily papers
Lane D

Community & Social

outputs verdicts · gotchas · hype · controversies · accepted answers
03

Stack Exchange

anon
api.stackexchange.com/2.3/search/advanced
?site={so,dba,sf,codereview,ai}
vote · is_accepted · score · multi-site
04

HN Algolia

anon
hn.algolia.com/api/v1/search
?numericFilters=created_at_i>...,points>...
created_at_i × points window · vote filter
05

Reddit

anon · oauth+
reddit.com/r/{sub}/search.json
?t=year&restrict_sr=1
subreddit · upvote_ratio · score · 60–100 req/min when oauth'd
17

Lobste.rs

anon · ua
lobste.rs/{hottest,newest}.json
+ /t/<tag>.json
tag · score · comment_count · submitter_user
14

twitterapi.io

x-api-key
api.twitterapi.io/twitter/{advanced_search,user/info,user/last_tweets,replies}
viewCount · isBlueVerified · likeCount · retweetCount · pinnedTweetIds[0] · replies[]
free-tier QPS = 1 req / 5s
18

Bluesky

bearer jwt
POST bsky.social/xrpc/com.atproto.server.createSession
→ app.bsky.feed.{searchPosts,getAuthorFeed}
likeCount · repostCount · replyCount · did · authed search has no rate limit
§03Co-install

The four local hands.

Where an HTTP adapter is not the right shape — rendering JS, walking docs, downloading captions, asking a wiki — the harness shells out to a local CLI, or to a single MCP when no CLI exists. Web search via the host LLM is reserved as graceful degradation.

CLI · global

Context7

Authoritative docs grounding for mainstream libraries. Two-step: resolve a libId, then fetch a specific topic. Replaces "search the docs site" for anything Context7 covers.

npm install -g ctx7 && npx ctx7 setup

ctx7 library <name> <q> · ctx7 docs <libId> <q>
CLI · global

Playwright

Browser automation for JS-rendered pages, login walls, and any GUI surface that refuses to be curled. The escape hatch when sitemap.xml + curl returns shells of pages.

npm install -g playwright
&& npx playwright install chromium
MCP · http

DeepWiki

AI-generated wiki + Q&A across 50k+ OSS repos. Stands in for "read the entire repo to answer one question". Used when GitHub/Sourcegraph give the code but not the why.

claude mcp add --transport http
deepwiki https://mcp.deepwiki.com/mcp
CLI · global

yt-dlp

Search 1000+ video sites, dump JSON metadata, fetch auto-generated subtitles. The fastest path to "what was actually said in that talk" without watching it.

brew install yt-dlp

yt-dlp "ytsearch5:..." · --write-auto-sub --sub-langs en
Last resort · host LLM

web_search

When all 20 adapters and 4 co-install tools come back empty — fall through to whatever web_search the host LLM ships with. Triggering this should fire a metric, not a feature flag.

Claude · Gemini · GPT — all carry one.
Not part of the dispatch table by design.
Official-docs strategy: Mainstream libraries → ctx7 docs. Specialised sites Context7 does not cover (RFC, IETF, lore.kernel.org, Bugzilla, vendor release notes) → direct curl on HTML & sitemap.xml, with host web_search "site:..." as the deepest fallback.
§04Execution

Cheap breadth, then expensive depth.

L1 compiles the SourcePlan into a two-stage DAG. Stage 1 fans out cheap searches in parallel; a top-K filter passes only survivors to Stage 2 where N+1 enrichments happen. Per-source rate buckets and a three-tier cache do the bookkeeping so adapters stay simple.

Two-stage DAG

STAGE 1 · BREADTH · PARALLEL npm.search q=polars github.repos stars sort crates.search downloads sort libraries.io SourceRank HN algolia points > 50 SE accepted is_accepted lobste.rs/t tag · score sourcegraph trigram · facet cached · rate-bucketed · all parallel Top-K filter (K=5) losers never enter Stage 2 STAGE 2 · DEPTH · ON SURVIVORS depsdev.versions slsa · license osv.querybatch 5 pkgs · 1 call github.repo pushed_at deepwiki MCP repo Q&A openreview.notes forum=<id> twitterapi.replies thread expand batch endpoints preferred · fan-out width = K, not |hits|

Per-source rate governor

sourcerpsconcurrencynote
twitterapi.io0.21free tier: 1 req / 5s
sourcegraph24SSE stream
github58retry 403/429
exa58cost: $0.005/call
openalex108polite pool · UA mailto:
hn algolia1016no auth
osv2016querybatch first
npm registry1016cache 1h warm

Cache · three tiers

tier · hot
in-mem · per DAG
TTL = ∞ inside one query. Same (source, params) never hits twice.
tier · warm
disk · per day
CVE 24h · papers 7d · twitter 5min · npm 1h · github 6h · exa 1h
tier · cold
archive · replay
Hit payloads stored. Re-run with new weights without re-hitting APIs.
§05Evidence & Eval

Verify on the way in, verify on the way out.

The Evidence Graph wires every claim to the source that supports it — cite or it didn't happen. The Eval Harness checks the harness itself: did the artifact build, did its license clear, did a human have to rewrite it? No eval, no leverage.

L3 · Evidence Graph

claim ↔ source ↔ citation

Three node types, two edge types. Entity layer (UnionFind on canonical ids) collapses duplicate sources of the same thing. Claim layer (NLI / textual entailment) anchors every assertion the agent emits back to one or more original sources.

claim A "polars has 0 open CVEs" claim B "MIT license" source S1 osv.querybatch source S2 depsdev.versions source S3 github.licenses entity polars
  • 01Every claim must edge to ≥1 source — orphan claims hard-fail validation.
  • 02UnionFind keys: (host,owner,name) for repos, (eco,name,ver) for packages, arXiv↔OpenAlex↔OpenReview for papers.
  • 03Conflicting sources surface as parallel edges — not silently averaged.
After-artifact · Eval Harness

The harness tests itself.

Each query emits a trace. Each trace gets scored on five axes. Aggregate scores feed back into router weights and intent definitions — the only place LLM judgment is allowed at write time, and it's done offline.

  • 01Citation coverage — % of claims with ≥1 supporting source from L3.
  • 02Build & test — generated patches must go build / npm test green.
  • 03License compliance — every recommended package SPDX-resolves to allowed list.
  • 04Anti-hallucination — random claim sample re-checked against original sources by NLI.
  • 05Human edit-distance — bytes the user changed before merging the artifact.
If edit-distance does not trend toward zero, the harness is delivering 3×, not 15×. The metric is a forcing function — not a vanity number.
§06Surfaces

Four shapes the same capability can take.

Skills, CLIs, MCPs, and host web search are not competitors — they are different envelopes for the same atomic operation: "go fetch X". The harness reaches for them in this order, and never higher than necessary.

Skill SKILL.md folder CLI global binary MCP server endpoint web_search host LLM, last resort
Shape Folder with SKILL.md + adapter code + examples. Loaded by the agent on demand. Local executable on $PATH. gh, ctx7, yt-dlp, npx playwright. Long-running server speaking JSON-RPC over HTTP/stdio. Tool list discovered at session start. The model's built-in browse tool. No setup, no contract.
Examples here Each of the 20 adapters is a skill. So is "how to assemble a SourcePlan".github · arxiv · osv · openalex · … Context7 · Playwright · yt-dlp · gh CLI+ raw curl for sitemap walks DeepWikionly when no CLI exists Claude / Gemini / GPT built-ingraceful degradation only
Strength Self-documenting · LLM reads contract before use · cheap to compose Battle-tested · scriptable · zero RPC overhead · works offline Stateful sessions · streaming · vendor-managed updates Universal coverage — anything indexed by Google et al.
Cost Author once per source. Maintenance < CLI. Install per machine. Version drift across hosts. Network hop per call. Auth per session. Opaque ranking · stale snippets · zero independent attribution.
When chosen Default. Every HTTP source becomes a skill first. Source has no public API, or auth/render needs a real binary. Server already exists, no CLI equivalent, capability is genuinely stateful. All twenty adapters and four co-installs returned empty — fire a metric, then fall through.

Skill schema, abridged

name: github
description: search code, repos,
  issues. Use when the task names a
  repo, or asks for a code pattern.
inputs:
  query: string
  filters: {language, license, stars}
outputs:
  hits[]: {url, title, stars, ...}
when_to_use: |
  Code-finding tasks where the user
  references "GitHub", or any task
  that needs star/fork/issue signals.
when_not_to_use: |
  Pure docs questions → ctx7.
  Whole-repo Q&A → DeepWiki MCP.

Dispatch order

  1. → 01Skill matching the intent
  2. → 02Skill enrichment fan-out
  3. → 03CLI co-install for non-HTTP shapes
  4. → 04MCP for stateful capabilities
  5. ↘ 05host web_search · last resort, fires alert