GitHub
token+ /graphql for discussions/gists
A five-layer pipeline that turns a stated goal into a verifiable artifact, by routing across twenty retrieval sources, four co-installed tools, and one fallback web-search lane — with a critic loop and an eval harness scoring every run. Outcome-driven, not signal-driven.
Seven stages along the spine, two cross-cutting concerns on the side, one feedback loop closing back to the router. The harness owns everything between the user's stated goal and the merge-ready artifact — retrieval is just the middle four boxes.
Each adapter exports search, get, and (when supported) batch, normalising every result into a unified Hit record while passing source-unique signals through verbatim. No adapter normalises authority — that is L3's job.
Where an HTTP adapter is not the right shape — rendering JS, walking docs, downloading captions, asking a wiki — the harness shells out to a local CLI, or to a single MCP when no CLI exists. Web search via the host LLM is reserved as graceful degradation.
Authoritative docs grounding for mainstream libraries. Two-step: resolve a libId, then fetch a specific topic. Replaces "search the docs site" for anything Context7 covers.
Browser automation for JS-rendered pages, login walls, and any GUI surface that refuses to be curled. The escape hatch when sitemap.xml + curl returns shells of pages.
AI-generated wiki + Q&A across 50k+ OSS repos. Stands in for "read the entire repo to answer one question". Used when GitHub/Sourcegraph give the code but not the why.
Search 1000+ video sites, dump JSON metadata, fetch auto-generated subtitles. The fastest path to "what was actually said in that talk" without watching it.
When all 20 adapters and 4 co-install tools come back empty — fall through to whatever web_search the host LLM ships with. Triggering this should fire a metric, not a feature flag.
ctx7 docs. Specialised sites Context7 does not cover (RFC, IETF, lore.kernel.org, Bugzilla, vendor release notes) → direct curl on HTML & sitemap.xml, with host web_search "site:..." as the deepest fallback.
L1 compiles the SourcePlan into a two-stage DAG. Stage 1 fans out cheap searches in parallel; a top-K filter passes only survivors to Stage 2 where N+1 enrichments happen. Per-source rate buckets and a three-tier cache do the bookkeeping so adapters stay simple.
| source | rps | concurrency | note |
|---|---|---|---|
| twitterapi.io | 0.2 | 1 | free tier: 1 req / 5s |
| sourcegraph | 2 | 4 | SSE stream |
| github | 5 | 8 | retry 403/429 |
| exa | 5 | 8 | cost: $0.005/call |
| openalex | 10 | 8 | polite pool · UA mailto: |
| hn algolia | 10 | 16 | no auth |
| osv | 20 | 16 | querybatch first |
| npm registry | 10 | 16 | cache 1h warm |
The Evidence Graph wires every claim to the source that supports it — cite or it didn't happen. The Eval Harness checks the harness itself: did the artifact build, did its license clear, did a human have to rewrite it? No eval, no leverage.
Three node types, two edge types. Entity layer (UnionFind on canonical ids) collapses duplicate sources of the same thing. Claim layer (NLI / textual entailment) anchors every assertion the agent emits back to one or more original sources.
(host,owner,name) for repos, (eco,name,ver) for packages, arXiv↔OpenAlex↔OpenReview for papers.Each query emits a trace. Each trace gets scored on five axes. Aggregate scores feed back into router weights and intent definitions — the only place LLM judgment is allowed at write time, and it's done offline.
go build / npm test green.Skills, CLIs, MCPs, and host web search are not competitors — they are different envelopes for the same atomic operation: "go fetch X". The harness reaches for them in this order, and never higher than necessary.
| Skill SKILL.md folder | CLI global binary | MCP server endpoint | web_search host LLM, last resort | |
|---|---|---|---|---|
| Shape | Folder with SKILL.md + adapter code + examples. Loaded by the agent on demand. |
Local executable on $PATH. gh, ctx7, yt-dlp, npx playwright. |
Long-running server speaking JSON-RPC over HTTP/stdio. Tool list discovered at session start. | The model's built-in browse tool. No setup, no contract. |
| Examples here | Each of the 20 adapters is a skill. So is "how to assemble a SourcePlan".github · arxiv · osv · openalex · … | Context7 · Playwright · yt-dlp · gh CLI+ raw curl for sitemap walks |
DeepWikionly when no CLI exists | Claude / Gemini / GPT built-ingraceful degradation only |
| Strength | Self-documenting · LLM reads contract before use · cheap to compose | Battle-tested · scriptable · zero RPC overhead · works offline | Stateful sessions · streaming · vendor-managed updates | Universal coverage — anything indexed by Google et al. |
| Cost | Author once per source. Maintenance < CLI. | Install per machine. Version drift across hosts. | Network hop per call. Auth per session. | Opaque ranking · stale snippets · zero independent attribution. |
| When chosen | Default. Every HTTP source becomes a skill first. | Source has no public API, or auth/render needs a real binary. | Server already exists, no CLI equivalent, capability is genuinely stateful. | All twenty adapters and four co-installs returned empty — fire a metric, then fall through. |
name: github
description: search code, repos,
issues. Use when the task names a
repo, or asks for a code pattern.
inputs:
query: string
filters: {language, license, stars}
outputs:
hits[]: {url, title, stars, ...}
when_to_use: |
Code-finding tasks where the user
references "GitHub", or any task
that needs star/fork/issue signals.
when_not_to_use: |
Pure docs questions → ctx7.
Whole-repo Q&A → DeepWiki MCP.
web_search · last resort, fires alert