Files
dotfiles/docs/superpowers/specs/2026-04-09-web-search-tools-design.md
2026-04-09 10:47:38 +01:00

392 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Web Search Tools Design
**Date:** 2026-04-09
**Project:** `/home/alex/dotfiles`
**Target files:**
- `.pi/agent/extensions/web-search/package.json`
- `.pi/agent/extensions/web-search/index.ts`
- `.pi/agent/extensions/web-search/src/schema.ts`
- `.pi/agent/extensions/web-search/src/config.ts`
- `.pi/agent/extensions/web-search/src/providers/types.ts`
- `.pi/agent/extensions/web-search/src/providers/exa.ts`
- `.pi/agent/extensions/web-search/src/tools/web-search.ts`
- `.pi/agent/extensions/web-search/src/tools/web-fetch.ts`
- `.pi/agent/extensions/web-search/src/format.ts`
- tests alongside the new modules
## Goal
Add two generic pi tools, `web_search` and `web_fetch`, implemented as a modular extension package that uses Exa as the first provider while keeping the internal design extensible for future providers.
## Context
- This dotfiles repo already tracks pi configuration under `.pi/agent/`.
- The current extension workspace contains a tracked `question` extension and small pure helper tests.
- Pi extensions can be packaged as directories with `index.ts` and their own `package.json`, which is the best fit when third-party dependencies are needed.
- The requested feature is explicitly about pi extensions and custom tools, not built-in model providers.
- The user wants:
- generic tool names now
- Exa as the first provider
- configuration read from a separate global file, not `settings.json`
- configuration stored only at the global scope
## User-Approved Requirements
1. Add two generic tools:
- `web_search`
- `web_fetch`
2. Use Exa as the initial provider.
3. Keep the implementation extensible so other providers can be added later.
4. Do **not** read configuration from environment variables.
5. Do **not** read configuration from `settings.json`.
6. Read configuration from a dedicated global file:
- `~/.pi/agent/web-search.json`
7. Use a provider-list-based config shape, not a single-provider-only schema.
8. Store credentials as literal values in that config file.
9. `web_search` should return **metadata only** by default.
10. `web_fetch` should accept **one URL or multiple URLs**.
11. `web_fetch` should return **text** by default.
12. The implementation direction should be the modular/package-style structure, not the minimal Exa-shaped shortcut.
## Recommended Architecture
Implement the feature as a dedicated extension package at:
- `/home/alex/dotfiles/.pi/agent/extensions/web-search/`
This package will register two generic tools and route both through a provider registry. At runtime, the extension loads `~/.pi/agent/web-search.json`, validates it, normalizes the provider list into an internal lookup map, resolves the configured default provider, and then executes requests through a provider adapter.
For the first version, the only adapter is Exa. However, the tool-facing layer remains provider-agnostic, so future providers only need to implement the shared provider interface and be added to config validation/registry wiring.
This is intentionally more structured than a single-file Exa wrapper because the user explicitly wants future extensibility without changing tool names or reworking the public API later.
## File Structure
### Extension package
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/package.json`
- declares the extension package
- declares `exa-js` as a dependency
- points pi at the extension entrypoint
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/index.ts`
- extension entrypoint
- registers `web_search` and `web_fetch`
- wires together config loading, provider registry, tool handlers, and shared formatting
### Shared schemas and config
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/schema.ts`
- TypeBox schemas for tool parameters
- TypeBox schemas for `web-search.json`
- shared TypeScript types derived from the schemas where useful
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/config.ts`
- reads `~/.pi/agent/web-search.json`
- validates config shape
- normalizes provider list into an internal map keyed by provider name
- resolves default provider
### Provider abstraction
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/providers/types.ts`
- generic request and response types for search/fetch
- provider interface used by the tool layer
- normalized internal result shapes independent of Exa SDK types
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/providers/exa.ts`
- Exa-backed implementation of the provider interface
- translates generic search requests into Exa `search(...)`
- translates generic fetch requests into Exa `getContents(...)`
- isolates all Exa-specific request/response details
### Tool handlers and formatting
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/tools/web-search.ts`
- `web_search` schema, execution logic, and tool rendering helpers
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/tools/web-fetch.ts`
- `web_fetch` schema, execution logic, and tool rendering helpers
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/format.ts`
- shared output shaping
- compact text summaries for the LLM
- truncation behavior for large results
- per-result formatting for batch fetches and partial failures
## Config File Design
The extension will read exactly one file:
- `~/.pi/agent/web-search.json`
Initial conceptual shape:
```json
{
"defaultProvider": "exa-main",
"providers": [
{
"name": "exa-main",
"type": "exa",
"apiKey": "exa_...",
"options": {
"defaultSearchLimit": 5,
"defaultFetchTextMaxCharacters": 12000
}
}
]
}
```
### Config rules
- `defaultProvider` must match one provider entry by name.
- `providers` must be a non-empty array.
- Each provider entry must include:
- `name`
- `type`
- `apiKey`
- `apiKey` is a literal string in the first version.
- `type` is validated so the runtime can select the correct adapter.
- Exa-specific defaults may live under `options`, but they must remain optional.
### Config non-goals
The first version will **not**:
- read provider config from project-local files
- merge config from multiple files
- read credentials from env vars
- support shell-command-based credential resolution
- write or edit `web-search.json` automatically
If the file is missing or invalid, the tools should return a clear error telling the user where the file belongs and showing a minimal valid example.
## Tool Contract
### `web_search`
Purpose: search the web and return result metadata with a generic surface that can outlive Exa.
Conceptual input shape:
```ts
{
query: string;
limit?: number;
includeDomains?: string[];
excludeDomains?: string[];
startPublishedDate?: string;
endPublishedDate?: string;
category?: string;
provider?: string;
}
```
### Default behavior
- returns metadata only
- does not fetch page text by default
- uses the default configured provider unless `provider` explicitly selects another configured provider
### Result shape intent
Each search result should preserve a normalized subset of provider output such as:
- `title`
- `url`
- `publishedDate`
- `author`
- `score`
- provider-specific stable identifiers only if useful for follow-up operations
The tools text output should stay compact and easy for the model to scan.
### `web_fetch`
Purpose: fetch contents for one or more URLs with a generic interface.
Conceptual input shape:
```ts
{
urls: string[];
text?: boolean;
highlights?: boolean;
summary?: boolean;
textMaxCharacters?: number;
provider?: string;
}
```
### Input normalization
The canonical tool shape is `urls: string[]`, where a single URL is represented as a one-element array. For robustness, the implementation may also accept a top-level `url` string through argument normalization and fold it into `urls`, but the stable contract exposed in schemas and docs should remain `urls: string[]`.
### Default behavior
- when no content mode is specified, fetch text
- batch requests are allowed
- the default configured provider is used unless overridden
### Result shape intent
Each fetched item should preserve normalized per-URL results, including:
- `url`
- `title` where available
- `text` by default
- optional `highlights`
- optional `summary`
- per-item failure details for partial batch failures
## Provider Abstraction
The provider interface should express the minimum shared behaviors needed by the tools:
```ts
interface WebSearchProvider {
type: string;
search(request: NormalizedSearchRequest): Promise<NormalizedSearchResponse>;
fetch(request: NormalizedFetchRequest): Promise<NormalizedFetchResponse>;
}
```
### Exa adapter responsibilities
The Exa adapter will:
- instantiate an Exa client from the configured literal API key
- use Exa search without contents for `web_search` default behavior
- use Exa `getContents(...)` for `web_fetch`
- map Exa response fields into normalized provider-agnostic result types
- keep Exa-only fields contained inside the adapter unless they are intentionally promoted into the shared result model later
This keeps future provider additions focused: implement the same interface, extend config validation, and register the adapter.
## Rendering and Output Design
The extension should provide compact tool rendering so calls and results are readable inside pi.
### `renderCall`
- `web_search`: show tool name and the query
- `web_fetch`: show tool name and URL count (or the single URL)
### `renderResult`
- `web_search`: show result count and a short numbered list of titles/URLs
- `web_fetch`: show fetched count, failed count if any, and a concise per-URL summary
### LLM-facing text output
The text returned to the model should be concise and predictable:
- search: compact metadata list only by default
- fetch: truncated text payloads with enough context to be useful
- batch fetch: clearly separated per-URL sections
Large outputs must be truncated with the shared truncation utilities pattern used by pi tool examples.
## Error Handling
Expected runtime failures should be handled cleanly and descriptively.
### Config errors
- missing `~/.pi/agent/web-search.json`
- invalid JSON
- schema mismatch
- empty provider list
- unknown `defaultProvider`
- unknown explicitly requested provider
- missing literal API key
These should return actionable errors naming the exact issue.
### Input errors
- empty search query
- malformed URL(s)
- empty URL list after normalization
These should be rejected before any provider request is made.
### Provider/runtime errors
- Exa authentication failures
- network failures
- rate limits
- unexpected response shapes
These should return a concise summary in tool content while preserving richer diagnostics in `details`.
### Partial failures
For batch `web_fetch`, mixed outcomes should not fail the entire request unless every target fails. Successful pages should still be returned together with per-URL failure entries.
## Testing Strategy
The design intentionally separates pure logic from pi wiring so most behavior can be tested without loading pi itself.
### Automated tests
Cover:
1. config parsing and normalization
2. provider-list validation
3. default-provider resolution
4. generic request → Exa request mapping
5. Exa response → normalized response mapping
6. compact formatting for metadata-only search
7. truncation for long fetch results
8. batch fetch formatting with partial failures
9. helpful error messages when config is absent or invalid
### Test style
- prefer pure module tests for config, normalization, and formatting
- inject a fake Exa-like client into the Exa adapter instead of making live network calls
- keep extension entrypoint tests to smoke coverage only
### Manual verification
After implementation:
1. create `~/.pi/agent/web-search.json`
2. reload pi
3. run one `web_search` call
4. run one single-URL `web_fetch` call
5. run one multi-URL `web_fetch` call
6. confirm missing/invalid config errors are readable
## Non-Goals
The first version will not add:
- other providers besides Exa
- project-local web-search config
- automatic setup commands or interactive config editing
- provider-specific passthrough options in the public tool API
- rich snippet/highlight defaults for search
- live network integration tests in the normal automated suite
## Acceptance Criteria
The work is complete when:
1. pi discovers a new extension package at `.pi/agent/extensions/web-search/`
2. the agent has two generic tools:
- `web_search`
- `web_fetch`
3. the implementation uses an internal provider abstraction
4. Exa is the first working provider implementation
5. the runtime reads global config from `~/.pi/agent/web-search.json`
6. config uses a provider-list shape with a default provider selector
7. credentials are read as literal values from that file
8. `web_search` returns metadata only by default
9. `web_fetch` accepts one or multiple URLs and returns text by default
10. missing config, invalid config, and provider failures return clean, actionable tool errors
11. core mapping/formatting/config logic is covered by automated tests