diff --git a/docs/superpowers/specs/2026-04-09-web-search-tools-design.md b/docs/superpowers/specs/2026-04-09-web-search-tools-design.md new file mode 100644 index 0000000..0f3c015 --- /dev/null +++ b/docs/superpowers/specs/2026-04-09-web-search-tools-design.md @@ -0,0 +1,391 @@ +# Web Search Tools Design + +**Date:** 2026-04-09 +**Project:** `/home/alex/dotfiles` +**Target files:** +- `.pi/agent/extensions/web-search/package.json` +- `.pi/agent/extensions/web-search/index.ts` +- `.pi/agent/extensions/web-search/src/schema.ts` +- `.pi/agent/extensions/web-search/src/config.ts` +- `.pi/agent/extensions/web-search/src/providers/types.ts` +- `.pi/agent/extensions/web-search/src/providers/exa.ts` +- `.pi/agent/extensions/web-search/src/tools/web-search.ts` +- `.pi/agent/extensions/web-search/src/tools/web-fetch.ts` +- `.pi/agent/extensions/web-search/src/format.ts` +- tests alongside the new modules + +## Goal + +Add two generic pi tools, `web_search` and `web_fetch`, implemented as a modular extension package that uses Exa as the first provider while keeping the internal design extensible for future providers. + +## Context + +- This dotfiles repo already tracks pi configuration under `.pi/agent/`. +- The current extension workspace contains a tracked `question` extension and small pure helper tests. +- Pi extensions can be packaged as directories with `index.ts` and their own `package.json`, which is the best fit when third-party dependencies are needed. +- The requested feature is explicitly about pi extensions and custom tools, not built-in model providers. +- The user wants: + - generic tool names now + - Exa as the first provider + - configuration read from a separate global file, not `settings.json` + - configuration stored only at the global scope + +## User-Approved Requirements + +1. Add two generic tools: + - `web_search` + - `web_fetch` +2. Use Exa as the initial provider. +3. Keep the implementation extensible so other providers can be added later. +4. Do **not** read configuration from environment variables. +5. Do **not** read configuration from `settings.json`. +6. Read configuration from a dedicated global file: + - `~/.pi/agent/web-search.json` +7. Use a provider-list-based config shape, not a single-provider-only schema. +8. Store credentials as literal values in that config file. +9. `web_search` should return **metadata only** by default. +10. `web_fetch` should accept **one URL or multiple URLs**. +11. `web_fetch` should return **text** by default. +12. The implementation direction should be the modular/package-style structure, not the minimal Exa-shaped shortcut. + +## Recommended Architecture + +Implement the feature as a dedicated extension package at: + +- `/home/alex/dotfiles/.pi/agent/extensions/web-search/` + +This package will register two generic tools and route both through a provider registry. At runtime, the extension loads `~/.pi/agent/web-search.json`, validates it, normalizes the provider list into an internal lookup map, resolves the configured default provider, and then executes requests through a provider adapter. + +For the first version, the only adapter is Exa. However, the tool-facing layer remains provider-agnostic, so future providers only need to implement the shared provider interface and be added to config validation/registry wiring. + +This is intentionally more structured than a single-file Exa wrapper because the user explicitly wants future extensibility without changing tool names or reworking the public API later. + +## File Structure + +### Extension package + +- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/package.json` + - declares the extension package + - declares `exa-js` as a dependency + - points pi at the extension entrypoint + +- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/index.ts` + - extension entrypoint + - registers `web_search` and `web_fetch` + - wires together config loading, provider registry, tool handlers, and shared formatting + +### Shared schemas and config + +- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/schema.ts` + - TypeBox schemas for tool parameters + - TypeBox schemas for `web-search.json` + - shared TypeScript types derived from the schemas where useful + +- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/config.ts` + - reads `~/.pi/agent/web-search.json` + - validates config shape + - normalizes provider list into an internal map keyed by provider name + - resolves default provider + +### Provider abstraction + +- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/providers/types.ts` + - generic request and response types for search/fetch + - provider interface used by the tool layer + - normalized internal result shapes independent of Exa SDK types + +- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/providers/exa.ts` + - Exa-backed implementation of the provider interface + - translates generic search requests into Exa `search(...)` + - translates generic fetch requests into Exa `getContents(...)` + - isolates all Exa-specific request/response details + +### Tool handlers and formatting + +- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/tools/web-search.ts` + - `web_search` schema, execution logic, and tool rendering helpers + +- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/tools/web-fetch.ts` + - `web_fetch` schema, execution logic, and tool rendering helpers + +- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/format.ts` + - shared output shaping + - compact text summaries for the LLM + - truncation behavior for large results + - per-result formatting for batch fetches and partial failures + +## Config File Design + +The extension will read exactly one file: + +- `~/.pi/agent/web-search.json` + +Initial conceptual shape: + +```json +{ + "defaultProvider": "exa-main", + "providers": [ + { + "name": "exa-main", + "type": "exa", + "apiKey": "exa_...", + "options": { + "defaultSearchLimit": 5, + "defaultFetchTextMaxCharacters": 12000 + } + } + ] +} +``` + +### Config rules + +- `defaultProvider` must match one provider entry by name. +- `providers` must be a non-empty array. +- Each provider entry must include: + - `name` + - `type` + - `apiKey` +- `apiKey` is a literal string in the first version. +- `type` is validated so the runtime can select the correct adapter. +- Exa-specific defaults may live under `options`, but they must remain optional. + +### Config non-goals + +The first version will **not**: + +- read provider config from project-local files +- merge config from multiple files +- read credentials from env vars +- support shell-command-based credential resolution +- write or edit `web-search.json` automatically + +If the file is missing or invalid, the tools should return a clear error telling the user where the file belongs and showing a minimal valid example. + +## Tool Contract + +### `web_search` + +Purpose: search the web and return result metadata with a generic surface that can outlive Exa. + +Conceptual input shape: + +```ts +{ + query: string; + limit?: number; + includeDomains?: string[]; + excludeDomains?: string[]; + startPublishedDate?: string; + endPublishedDate?: string; + category?: string; + provider?: string; +} +``` + +### Default behavior + +- returns metadata only +- does not fetch page text by default +- uses the default configured provider unless `provider` explicitly selects another configured provider + +### Result shape intent + +Each search result should preserve a normalized subset of provider output such as: + +- `title` +- `url` +- `publishedDate` +- `author` +- `score` +- provider-specific stable identifiers only if useful for follow-up operations + +The tool’s text output should stay compact and easy for the model to scan. + +### `web_fetch` + +Purpose: fetch contents for one or more URLs with a generic interface. + +Conceptual input shape: + +```ts +{ + urls: string[]; + text?: boolean; + highlights?: boolean; + summary?: boolean; + textMaxCharacters?: number; + provider?: string; +} +``` + +### Input normalization + +The canonical tool shape is `urls: string[]`, where a single URL is represented as a one-element array. For robustness, the implementation may also accept a top-level `url` string through argument normalization and fold it into `urls`, but the stable contract exposed in schemas and docs should remain `urls: string[]`. + +### Default behavior + +- when no content mode is specified, fetch text +- batch requests are allowed +- the default configured provider is used unless overridden + +### Result shape intent + +Each fetched item should preserve normalized per-URL results, including: + +- `url` +- `title` where available +- `text` by default +- optional `highlights` +- optional `summary` +- per-item failure details for partial batch failures + +## Provider Abstraction + +The provider interface should express the minimum shared behaviors needed by the tools: + +```ts +interface WebSearchProvider { + type: string; + search(request: NormalizedSearchRequest): Promise; + fetch(request: NormalizedFetchRequest): Promise; +} +``` + +### Exa adapter responsibilities + +The Exa adapter will: + +- instantiate an Exa client from the configured literal API key +- use Exa search without contents for `web_search` default behavior +- use Exa `getContents(...)` for `web_fetch` +- map Exa response fields into normalized provider-agnostic result types +- keep Exa-only fields contained inside the adapter unless they are intentionally promoted into the shared result model later + +This keeps future provider additions focused: implement the same interface, extend config validation, and register the adapter. + +## Rendering and Output Design + +The extension should provide compact tool rendering so calls and results are readable inside pi. + +### `renderCall` + +- `web_search`: show tool name and the query +- `web_fetch`: show tool name and URL count (or the single URL) + +### `renderResult` + +- `web_search`: show result count and a short numbered list of titles/URLs +- `web_fetch`: show fetched count, failed count if any, and a concise per-URL summary + +### LLM-facing text output + +The text returned to the model should be concise and predictable: + +- search: compact metadata list only by default +- fetch: truncated text payloads with enough context to be useful +- batch fetch: clearly separated per-URL sections + +Large outputs must be truncated with the shared truncation utilities pattern used by pi tool examples. + +## Error Handling + +Expected runtime failures should be handled cleanly and descriptively. + +### Config errors + +- missing `~/.pi/agent/web-search.json` +- invalid JSON +- schema mismatch +- empty provider list +- unknown `defaultProvider` +- unknown explicitly requested provider +- missing literal API key + +These should return actionable errors naming the exact issue. + +### Input errors + +- empty search query +- malformed URL(s) +- empty URL list after normalization + +These should be rejected before any provider request is made. + +### Provider/runtime errors + +- Exa authentication failures +- network failures +- rate limits +- unexpected response shapes + +These should return a concise summary in tool content while preserving richer diagnostics in `details`. + +### Partial failures + +For batch `web_fetch`, mixed outcomes should not fail the entire request unless every target fails. Successful pages should still be returned together with per-URL failure entries. + +## Testing Strategy + +The design intentionally separates pure logic from pi wiring so most behavior can be tested without loading pi itself. + +### Automated tests + +Cover: + +1. config parsing and normalization +2. provider-list validation +3. default-provider resolution +4. generic request → Exa request mapping +5. Exa response → normalized response mapping +6. compact formatting for metadata-only search +7. truncation for long fetch results +8. batch fetch formatting with partial failures +9. helpful error messages when config is absent or invalid + +### Test style + +- prefer pure module tests for config, normalization, and formatting +- inject a fake Exa-like client into the Exa adapter instead of making live network calls +- keep extension entrypoint tests to smoke coverage only + +### Manual verification + +After implementation: + +1. create `~/.pi/agent/web-search.json` +2. reload pi +3. run one `web_search` call +4. run one single-URL `web_fetch` call +5. run one multi-URL `web_fetch` call +6. confirm missing/invalid config errors are readable + +## Non-Goals + +The first version will not add: + +- other providers besides Exa +- project-local web-search config +- automatic setup commands or interactive config editing +- provider-specific passthrough options in the public tool API +- rich snippet/highlight defaults for search +- live network integration tests in the normal automated suite + +## Acceptance Criteria + +The work is complete when: + +1. pi discovers a new extension package at `.pi/agent/extensions/web-search/` +2. the agent has two generic tools: + - `web_search` + - `web_fetch` +3. the implementation uses an internal provider abstraction +4. Exa is the first working provider implementation +5. the runtime reads global config from `~/.pi/agent/web-search.json` +6. config uses a provider-list shape with a default provider selector +7. credentials are read as literal values from that file +8. `web_search` returns metadata only by default +9. `web_fetch` accepts one or multiple URLs and returns text by default +10. missing config, invalid config, and provider failures return clean, actionable tool errors +11. core mapping/formatting/config logic is covered by automated tests