Search-Agent Harness — Architecture v0.1

§01Pipeline

Goal in. Artifact out.

Seven stages along the spine, two cross-cutting concerns on the side, one feedback loop closing back to the router. The harness owns everything between the user's stated goal and the merge-ready artifact — retrieval is just the middle four boxes.

retrieval spine (router → adapters) outcome layer (evidence → artifact → eval) revision loop (eval → router)

§02Adapters

Twenty sources, four lanes, one Hit schema.

Each adapter exports search, get, and (when supported) batch, normalising every result into a unified Hit record while passing source-unique signals through verbatim. No adapter normalises authority — that is L3's job.

Lane A

Code & Semantic Search

→ outputs repo signals · code patterns · embeddings

01

GitHub

token

api.github.com/search/{repos,code,issues}
+ /graphql for discussions/gists

stars · forks · license · topic · code search · pushed_at

02

GitLab

private-token

{gitlab.com,gnome,fdo}/api/v4/...
multi-instance, license needs N+1

stars · license · pushed_at · 跨实例覆盖

19

Sourcegraph

anon · sse

sourcegraph.com/.api/search/stream
Accept: text/event-stream

trigram across 2M+ OSS · facets · symbol type · commit SHA

20

Exa

x-api-key

api.exa.ai/search · /contents
type: auto · fast · deep · deep-reasoning

neural retrieval · outputSchema grounding · livecrawl freshness · vertical search

Lane B

Packages & Security

→ outputs downloads · license · CVE · SLSA · advisory keys

09

npm registry

anon

registry.npmjs.org/-/v1/search?text=

downloads.weekly · dependents (inline)

10

crates.io

anon · ua

crates.io/api/v1/crates?q=&sort=downloads

recent_downloads (90d rolling) · sort=downloads

11

libraries.io

key for /search

libraries.io/api/{platform}/{name}
+ /api/search?api_key=...

SourceRank · dependent_repos_count

16

deps.dev

anon

api.deps.dev/v3/systems/{npm,pypi,maven,go,cargo,nuget,rubygems}/...

7 ecosystems · SPDX licenses · advisoryKeys → OSV · slsaProvenances · attestations

15

OSV.dev

anon

POST api.osv.dev/v1/{query,querybatch}
GET /v1/vulns/{id}

package+version → vulns · CVSS 3.1 · affected[] version tree

13

CVE / NVD

anon · key+

services.nvd.nist.gov/rest/json/cves/2.0

CVSS · CWE · CPE version tree · references[]

Lane C

Research

→ outputs papers · citations · peer review · datasets · models

06

arXiv

anon

export.arxiv.org/api/query
Atom XML

abstract · category · time window · phrase needs %22quote%22

07

OpenAlex

anon · polite

api.openalex.org/works?search=&filter=...

fwci · top_1_percent · concepts · counts_by_year · is_retracted · referenced_works

08

OpenReview

anon

api2.openreview.net/notes/search → /notes?forum=<id>

public review · decision · rebuttal · two-stage fetch

12

Hugging Face Hub

anon

huggingface.co/api/{models,datasets,spaces}
+ /api/daily_papers

downloads · likes · pipeline_tag · daily papers

Lane D

Community & Social

→ outputs verdicts · gotchas · hype · controversies · accepted answers

03

Stack Exchange

anon

api.stackexchange.com/2.3/search/advanced
?site={so,dba,sf,codereview,ai}

vote · is_accepted · score · multi-site

04

HN Algolia

anon

hn.algolia.com/api/v1/search
?numericFilters=created_at_i>...,points>...

created_at_i × points window · vote filter

05

Reddit

anon · oauth+

reddit.com/r/{sub}/search.json
?t=year&restrict_sr=1

subreddit · upvote_ratio · score · 60–100 req/min when oauth'd

17

Lobste.rs

anon · ua

lobste.rs/{hottest,newest}.json
+ /t/<tag>.json

tag · score · comment_count · submitter_user

14

twitterapi.io

x-api-key

api.twitterapi.io/twitter/{advanced_search,user/info,user/last_tweets,replies}

viewCount · isBlueVerified · likeCount · retweetCount · pinnedTweetIds[0] · replies[]
free-tier QPS = 1 req / 5s

18

Bluesky

bearer jwt

POST bsky.social/xrpc/com.atproto.server.createSession
→ app.bsky.feed.{searchPosts,getAuthorFeed}

likeCount · repostCount · replyCount · did · authed search has no rate limit

§03Co-install

The four local hands.

Where an HTTP adapter is not the right shape — rendering JS, walking docs, downloading captions, asking a wiki — the harness shells out to a local CLI, or to a single MCP when no CLI exists. Web search via the host LLM is reserved as graceful degradation.

CLI · global

Context7

Authoritative docs grounding for mainstream libraries. Two-step: resolve a libId, then fetch a specific topic. Replaces "search the docs site" for anything Context7 covers.

npm install -g ctx7 && npx ctx7 setup

ctx7 library <name> <q> · ctx7 docs <libId> <q>

CLI · global

Playwright

Browser automation for JS-rendered pages, login walls, and any GUI surface that refuses to be curled. The escape hatch when sitemap.xml + curl returns shells of pages.

npm install -g playwright
&& npx playwright install chromium

MCP · http

DeepWiki

AI-generated wiki + Q&A across 50k+ OSS repos. Stands in for "read the entire repo to answer one question". Used when GitHub/Sourcegraph give the code but not the why.

claude mcp add --transport http
deepwiki https://mcp.deepwiki.com/mcp

CLI · global

yt-dlp

Search 1000+ video sites, dump JSON metadata, fetch auto-generated subtitles. The fastest path to "what was actually said in that talk" without watching it.

brew install yt-dlp

yt-dlp "ytsearch5:..." · --write-auto-sub --sub-langs en

Last resort · host LLM

web_search

When all 20 adapters and 4 co-install tools come back empty — fall through to whatever web_search the host LLM ships with. Triggering this should fire a metric, not a feature flag.

Claude · Gemini · GPT — all carry one.
Not part of the dispatch table by design.

Official-docs strategy: Mainstream libraries → ctx7 docs. Specialised sites Context7 does not cover (RFC, IETF, lore.kernel.org, Bugzilla, vendor release notes) → direct curl on HTML & sitemap.xml, with host web_search "site:..." as the deepest fallback.

§04Execution

Cheap breadth, then expensive depth.

L1 compiles the SourcePlan into a two-stage DAG. Stage 1 fans out cheap searches in parallel; a top-K filter passes only survivors to Stage 2 where N+1 enrichments happen. Per-source rate buckets and a three-tier cache do the bookkeeping so adapters stay simple.

Two-stage DAG

Per-source rate governor

source	rps	concurrency	note
twitterapi.io	`0.2`	`1`	free tier: 1 req / 5s
sourcegraph	`2`	`4`	SSE stream
github	`5`	`8`	retry 403/429
exa	`5`	`8`	cost: $0.005/call
openalex	`10`	`8`	polite pool · UA mailto:
hn algolia	`10`	`16`	no auth
osv	`20`	`16`	querybatch first
npm registry	`10`	`16`	cache 1h warm

Cache · three tiers

tier · hot

in-mem · per DAG

TTL = ∞ inside one query. Same (source, params) never hits twice.

tier · warm

disk · per day

CVE 24h · papers 7d · twitter 5min · npm 1h · github 6h · exa 1h

tier · cold

archive · replay

Hit payloads stored. Re-run with new weights without re-hitting APIs.

§05Evidence & Eval

Verify on the way in, verify on the way out.

The Evidence Graph wires every claim to the source that supports it — cite or it didn't happen. The Eval Harness checks the harness itself: did the artifact build, did its license clear, did a human have to rewrite it? No eval, no leverage.

L3 · Evidence Graph

claim ↔ source ↔ citation

Three node types, two edge types. Entity layer (UnionFind on canonical ids) collapses duplicate sources of the same thing. Claim layer (NLI / textual entailment) anchors every assertion the agent emits back to one or more original sources.

01Every claim must edge to ≥1 source — orphan claims hard-fail validation.
02UnionFind keys: (host,owner,name) for repos, (eco,name,ver) for packages, arXiv↔OpenAlex↔OpenReview for papers.
03Conflicting sources surface as parallel edges — not silently averaged.

After-artifact · Eval Harness

The harness tests itself.

Each query emits a trace. Each trace gets scored on five axes. Aggregate scores feed back into router weights and intent definitions — the only place LLM judgment is allowed at write time, and it's done offline.

01Citation coverage — % of claims with ≥1 supporting source from L3.
02Build & test — generated patches must go build / npm test green.
03License compliance — every recommended package SPDX-resolves to allowed list.
04Anti-hallucination — random claim sample re-checked against original sources by NLI.
05Human edit-distance — bytes the user changed before merging the artifact.

If edit-distance does not trend toward zero, the harness is delivering 3×, not 15×. The metric is a forcing function — not a vanity number.

§06Surfaces

Four shapes the same capability can take.

Skills, CLIs, MCPs, and host web search are not competitors — they are different envelopes for the same atomic operation: "go fetch X". The harness reaches for them in this order, and never higher than necessary.

	Skill SKILL.md folder	CLI global binary	MCP server endpoint	web_search host LLM, last resort
Shape	Folder with `SKILL.md` + adapter code + examples. Loaded by the agent on demand.	Local executable on `$PATH`. `gh`, `ctx7`, `yt-dlp`, `npx playwright`.	Long-running server speaking JSON-RPC over HTTP/stdio. Tool list discovered at session start.	The model's built-in browse tool. No setup, no contract.
Examples here	Each of the 20 adapters is a skill. So is "how to assemble a SourcePlan".github · arxiv · osv · openalex · …	Context7 · Playwright · yt-dlp · gh CLI+ raw `curl` for sitemap walks	DeepWikionly when no CLI exists	Claude / Gemini / GPT built-ingraceful degradation only
Strength	Self-documenting · LLM reads contract before use · cheap to compose	Battle-tested · scriptable · zero RPC overhead · works offline	Stateful sessions · streaming · vendor-managed updates	Universal coverage — anything indexed by Google et al.
Cost	Author once per source. Maintenance < CLI.	Install per machine. Version drift across hosts.	Network hop per call. Auth per session.	Opaque ranking · stale snippets · zero independent attribution.
When chosen	Default. Every HTTP source becomes a skill first.	Source has no public API, or auth/render needs a real binary.	Server already exists, no CLI equivalent, capability is genuinely stateful.	All twenty adapters and four co-installs returned empty — fire a metric, then fall through.

Skill schema, abridged

name: github
description: search code, repos,
  issues. Use when the task names a
  repo, or asks for a code pattern.
inputs:
  query: string
  filters: {language, license, stars}
outputs:
  hits[]: {url, title, stars, ...}
when_to_use: |
  Code-finding tasks where the user
  references "GitHub", or any task
  that needs star/fork/issue signals.
when_not_to_use: |
  Pure docs questions → ctx7.
  Whole-repo Q&A → DeepWiki MCP.

Dispatch order

→ 01Skill matching the intent
→ 02Skill enrichment fan-out
→ 03CLI co-install for non-HTTP shapes
→ 04MCP for stateful capabilities
↘ 05host web_search · last resort, fires alert