# Web Search Tools Design **Date:** 2026-04-09 **Project:** `/home/alex/dotfiles` **Target files:** - `.pi/agent/extensions/web-search/package.json` - `.pi/agent/extensions/web-search/index.ts` - `.pi/agent/extensions/web-search/src/schema.ts` - `.pi/agent/extensions/web-search/src/config.ts` - `.pi/agent/extensions/web-search/src/providers/types.ts` - `.pi/agent/extensions/web-search/src/providers/exa.ts` - `.pi/agent/extensions/web-search/src/tools/web-search.ts` - `.pi/agent/extensions/web-search/src/tools/web-fetch.ts` - `.pi/agent/extensions/web-search/src/format.ts` - tests alongside the new modules ## Goal Add two generic pi tools, `web_search` and `web_fetch`, implemented as a modular extension package that uses Exa as the first provider while keeping the internal design extensible for future providers. ## Context - This dotfiles repo already tracks pi configuration under `.pi/agent/`. - The current extension workspace contains a tracked `question` extension and small pure helper tests. - Pi extensions can be packaged as directories with `index.ts` and their own `package.json`, which is the best fit when third-party dependencies are needed. - The requested feature is explicitly about pi extensions and custom tools, not built-in model providers. - The user wants: - generic tool names now - Exa as the first provider - configuration read from a separate global file, not `settings.json` - configuration stored only at the global scope ## User-Approved Requirements 1. Add two generic tools: - `web_search` - `web_fetch` 2. Use Exa as the initial provider. 3. Keep the implementation extensible so other providers can be added later. 4. Do **not** read configuration from environment variables. 5. Do **not** read configuration from `settings.json`. 6. Read configuration from a dedicated global file: - `~/.pi/agent/web-search.json` 7. Use a provider-list-based config shape, not a single-provider-only schema. 8. Store credentials as literal values in that config file. 9. `web_search` should return **metadata only** by default. 10. `web_fetch` should accept **one URL or multiple URLs**. 11. `web_fetch` should return **text** by default. 12. The implementation direction should be the modular/package-style structure, not the minimal Exa-shaped shortcut. ## Recommended Architecture Implement the feature as a dedicated extension package at: - `/home/alex/dotfiles/.pi/agent/extensions/web-search/` This package will register two generic tools and route both through a provider registry. At runtime, the extension loads `~/.pi/agent/web-search.json`, validates it, normalizes the provider list into an internal lookup map, resolves the configured default provider, and then executes requests through a provider adapter. For the first version, the only adapter is Exa. However, the tool-facing layer remains provider-agnostic, so future providers only need to implement the shared provider interface and be added to config validation/registry wiring. This is intentionally more structured than a single-file Exa wrapper because the user explicitly wants future extensibility without changing tool names or reworking the public API later. ## File Structure ### Extension package - **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/package.json` - declares the extension package - declares `exa-js` as a dependency - points pi at the extension entrypoint - **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/index.ts` - extension entrypoint - registers `web_search` and `web_fetch` - wires together config loading, provider registry, tool handlers, and shared formatting ### Shared schemas and config - **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/schema.ts` - TypeBox schemas for tool parameters - TypeBox schemas for `web-search.json` - shared TypeScript types derived from the schemas where useful - **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/config.ts` - reads `~/.pi/agent/web-search.json` - validates config shape - normalizes provider list into an internal map keyed by provider name - resolves default provider ### Provider abstraction - **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/providers/types.ts` - generic request and response types for search/fetch - provider interface used by the tool layer - normalized internal result shapes independent of Exa SDK types - **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/providers/exa.ts` - Exa-backed implementation of the provider interface - translates generic search requests into Exa `search(...)` - translates generic fetch requests into Exa `getContents(...)` - isolates all Exa-specific request/response details ### Tool handlers and formatting - **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/tools/web-search.ts` - `web_search` schema, execution logic, and tool rendering helpers - **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/tools/web-fetch.ts` - `web_fetch` schema, execution logic, and tool rendering helpers - **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/format.ts` - shared output shaping - compact text summaries for the LLM - truncation behavior for large results - per-result formatting for batch fetches and partial failures ## Config File Design The extension will read exactly one file: - `~/.pi/agent/web-search.json` Initial conceptual shape: ```json { "defaultProvider": "exa-main", "providers": [ { "name": "exa-main", "type": "exa", "apiKey": "exa_...", "options": { "defaultSearchLimit": 5, "defaultFetchTextMaxCharacters": 12000 } } ] } ``` ### Config rules - `defaultProvider` must match one provider entry by name. - `providers` must be a non-empty array. - Each provider entry must include: - `name` - `type` - `apiKey` - `apiKey` is a literal string in the first version. - `type` is validated so the runtime can select the correct adapter. - Exa-specific defaults may live under `options`, but they must remain optional. ### Config non-goals The first version will **not**: - read provider config from project-local files - merge config from multiple files - read credentials from env vars - support shell-command-based credential resolution - write or edit `web-search.json` automatically If the file is missing or invalid, the tools should return a clear error telling the user where the file belongs and showing a minimal valid example. ## Tool Contract ### `web_search` Purpose: search the web and return result metadata with a generic surface that can outlive Exa. Conceptual input shape: ```ts { query: string; limit?: number; includeDomains?: string[]; excludeDomains?: string[]; startPublishedDate?: string; endPublishedDate?: string; category?: string; provider?: string; } ``` ### Default behavior - returns metadata only - does not fetch page text by default - uses the default configured provider unless `provider` explicitly selects another configured provider ### Result shape intent Each search result should preserve a normalized subset of provider output such as: - `title` - `url` - `publishedDate` - `author` - `score` - provider-specific stable identifiers only if useful for follow-up operations The tool’s text output should stay compact and easy for the model to scan. ### `web_fetch` Purpose: fetch contents for one or more URLs with a generic interface. Conceptual input shape: ```ts { urls: string[]; text?: boolean; highlights?: boolean; summary?: boolean; textMaxCharacters?: number; provider?: string; } ``` ### Input normalization The canonical tool shape is `urls: string[]`, where a single URL is represented as a one-element array. For robustness, the implementation may also accept a top-level `url` string through argument normalization and fold it into `urls`, but the stable contract exposed in schemas and docs should remain `urls: string[]`. ### Default behavior - when no content mode is specified, fetch text - batch requests are allowed - the default configured provider is used unless overridden ### Result shape intent Each fetched item should preserve normalized per-URL results, including: - `url` - `title` where available - `text` by default - optional `highlights` - optional `summary` - per-item failure details for partial batch failures ## Provider Abstraction The provider interface should express the minimum shared behaviors needed by the tools: ```ts interface WebSearchProvider { type: string; search(request: NormalizedSearchRequest): Promise; fetch(request: NormalizedFetchRequest): Promise; } ``` ### Exa adapter responsibilities The Exa adapter will: - instantiate an Exa client from the configured literal API key - use Exa search without contents for `web_search` default behavior - use Exa `getContents(...)` for `web_fetch` - map Exa response fields into normalized provider-agnostic result types - keep Exa-only fields contained inside the adapter unless they are intentionally promoted into the shared result model later This keeps future provider additions focused: implement the same interface, extend config validation, and register the adapter. ## Rendering and Output Design The extension should provide compact tool rendering so calls and results are readable inside pi. ### `renderCall` - `web_search`: show tool name and the query - `web_fetch`: show tool name and URL count (or the single URL) ### `renderResult` - `web_search`: show result count and a short numbered list of titles/URLs - `web_fetch`: show fetched count, failed count if any, and a concise per-URL summary ### LLM-facing text output The text returned to the model should be concise and predictable: - search: compact metadata list only by default - fetch: truncated text payloads with enough context to be useful - batch fetch: clearly separated per-URL sections Large outputs must be truncated with the shared truncation utilities pattern used by pi tool examples. ## Error Handling Expected runtime failures should be handled cleanly and descriptively. ### Config errors - missing `~/.pi/agent/web-search.json` - invalid JSON - schema mismatch - empty provider list - unknown `defaultProvider` - unknown explicitly requested provider - missing literal API key These should return actionable errors naming the exact issue. ### Input errors - empty search query - malformed URL(s) - empty URL list after normalization These should be rejected before any provider request is made. ### Provider/runtime errors - Exa authentication failures - network failures - rate limits - unexpected response shapes These should return a concise summary in tool content while preserving richer diagnostics in `details`. ### Partial failures For batch `web_fetch`, mixed outcomes should not fail the entire request unless every target fails. Successful pages should still be returned together with per-URL failure entries. ## Testing Strategy The design intentionally separates pure logic from pi wiring so most behavior can be tested without loading pi itself. ### Automated tests Cover: 1. config parsing and normalization 2. provider-list validation 3. default-provider resolution 4. generic request → Exa request mapping 5. Exa response → normalized response mapping 6. compact formatting for metadata-only search 7. truncation for long fetch results 8. batch fetch formatting with partial failures 9. helpful error messages when config is absent or invalid ### Test style - prefer pure module tests for config, normalization, and formatting - inject a fake Exa-like client into the Exa adapter instead of making live network calls - keep extension entrypoint tests to smoke coverage only ### Manual verification After implementation: 1. create `~/.pi/agent/web-search.json` 2. reload pi 3. run one `web_search` call 4. run one single-URL `web_fetch` call 5. run one multi-URL `web_fetch` call 6. confirm missing/invalid config errors are readable ## Non-Goals The first version will not add: - other providers besides Exa - project-local web-search config - automatic setup commands or interactive config editing - provider-specific passthrough options in the public tool API - rich snippet/highlight defaults for search - live network integration tests in the normal automated suite ## Acceptance Criteria The work is complete when: 1. pi discovers a new extension package at `.pi/agent/extensions/web-search/` 2. the agent has two generic tools: - `web_search` - `web_fetch` 3. the implementation uses an internal provider abstraction 4. Exa is the first working provider implementation 5. the runtime reads global config from `~/.pi/agent/web-search.json` 6. config uses a provider-list shape with a default provider selector 7. credentials are read as literal values from that file 8. `web_search` returns metadata only by default 9. `web_fetch` accepts one or multiple URLs and returns text by default 10. missing config, invalid config, and provider failures return clean, actionable tool errors 11. core mapping/formatting/config logic is covered by automated tests