docs: add web search tools design spec
This commit is contained in:
391
docs/superpowers/specs/2026-04-09-web-search-tools-design.md
Normal file
391
docs/superpowers/specs/2026-04-09-web-search-tools-design.md
Normal file
@@ -0,0 +1,391 @@
|
||||
# Web Search Tools Design
|
||||
|
||||
**Date:** 2026-04-09
|
||||
**Project:** `/home/alex/dotfiles`
|
||||
**Target files:**
|
||||
- `.pi/agent/extensions/web-search/package.json`
|
||||
- `.pi/agent/extensions/web-search/index.ts`
|
||||
- `.pi/agent/extensions/web-search/src/schema.ts`
|
||||
- `.pi/agent/extensions/web-search/src/config.ts`
|
||||
- `.pi/agent/extensions/web-search/src/providers/types.ts`
|
||||
- `.pi/agent/extensions/web-search/src/providers/exa.ts`
|
||||
- `.pi/agent/extensions/web-search/src/tools/web-search.ts`
|
||||
- `.pi/agent/extensions/web-search/src/tools/web-fetch.ts`
|
||||
- `.pi/agent/extensions/web-search/src/format.ts`
|
||||
- tests alongside the new modules
|
||||
|
||||
## Goal
|
||||
|
||||
Add two generic pi tools, `web_search` and `web_fetch`, implemented as a modular extension package that uses Exa as the first provider while keeping the internal design extensible for future providers.
|
||||
|
||||
## Context
|
||||
|
||||
- This dotfiles repo already tracks pi configuration under `.pi/agent/`.
|
||||
- The current extension workspace contains a tracked `question` extension and small pure helper tests.
|
||||
- Pi extensions can be packaged as directories with `index.ts` and their own `package.json`, which is the best fit when third-party dependencies are needed.
|
||||
- The requested feature is explicitly about pi extensions and custom tools, not built-in model providers.
|
||||
- The user wants:
|
||||
- generic tool names now
|
||||
- Exa as the first provider
|
||||
- configuration read from a separate global file, not `settings.json`
|
||||
- configuration stored only at the global scope
|
||||
|
||||
## User-Approved Requirements
|
||||
|
||||
1. Add two generic tools:
|
||||
- `web_search`
|
||||
- `web_fetch`
|
||||
2. Use Exa as the initial provider.
|
||||
3. Keep the implementation extensible so other providers can be added later.
|
||||
4. Do **not** read configuration from environment variables.
|
||||
5. Do **not** read configuration from `settings.json`.
|
||||
6. Read configuration from a dedicated global file:
|
||||
- `~/.pi/agent/web-search.json`
|
||||
7. Use a provider-list-based config shape, not a single-provider-only schema.
|
||||
8. Store credentials as literal values in that config file.
|
||||
9. `web_search` should return **metadata only** by default.
|
||||
10. `web_fetch` should accept **one URL or multiple URLs**.
|
||||
11. `web_fetch` should return **text** by default.
|
||||
12. The implementation direction should be the modular/package-style structure, not the minimal Exa-shaped shortcut.
|
||||
|
||||
## Recommended Architecture
|
||||
|
||||
Implement the feature as a dedicated extension package at:
|
||||
|
||||
- `/home/alex/dotfiles/.pi/agent/extensions/web-search/`
|
||||
|
||||
This package will register two generic tools and route both through a provider registry. At runtime, the extension loads `~/.pi/agent/web-search.json`, validates it, normalizes the provider list into an internal lookup map, resolves the configured default provider, and then executes requests through a provider adapter.
|
||||
|
||||
For the first version, the only adapter is Exa. However, the tool-facing layer remains provider-agnostic, so future providers only need to implement the shared provider interface and be added to config validation/registry wiring.
|
||||
|
||||
This is intentionally more structured than a single-file Exa wrapper because the user explicitly wants future extensibility without changing tool names or reworking the public API later.
|
||||
|
||||
## File Structure
|
||||
|
||||
### Extension package
|
||||
|
||||
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/package.json`
|
||||
- declares the extension package
|
||||
- declares `exa-js` as a dependency
|
||||
- points pi at the extension entrypoint
|
||||
|
||||
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/index.ts`
|
||||
- extension entrypoint
|
||||
- registers `web_search` and `web_fetch`
|
||||
- wires together config loading, provider registry, tool handlers, and shared formatting
|
||||
|
||||
### Shared schemas and config
|
||||
|
||||
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/schema.ts`
|
||||
- TypeBox schemas for tool parameters
|
||||
- TypeBox schemas for `web-search.json`
|
||||
- shared TypeScript types derived from the schemas where useful
|
||||
|
||||
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/config.ts`
|
||||
- reads `~/.pi/agent/web-search.json`
|
||||
- validates config shape
|
||||
- normalizes provider list into an internal map keyed by provider name
|
||||
- resolves default provider
|
||||
|
||||
### Provider abstraction
|
||||
|
||||
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/providers/types.ts`
|
||||
- generic request and response types for search/fetch
|
||||
- provider interface used by the tool layer
|
||||
- normalized internal result shapes independent of Exa SDK types
|
||||
|
||||
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/providers/exa.ts`
|
||||
- Exa-backed implementation of the provider interface
|
||||
- translates generic search requests into Exa `search(...)`
|
||||
- translates generic fetch requests into Exa `getContents(...)`
|
||||
- isolates all Exa-specific request/response details
|
||||
|
||||
### Tool handlers and formatting
|
||||
|
||||
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/tools/web-search.ts`
|
||||
- `web_search` schema, execution logic, and tool rendering helpers
|
||||
|
||||
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/tools/web-fetch.ts`
|
||||
- `web_fetch` schema, execution logic, and tool rendering helpers
|
||||
|
||||
- **Create:** `/home/alex/dotfiles/.pi/agent/extensions/web-search/src/format.ts`
|
||||
- shared output shaping
|
||||
- compact text summaries for the LLM
|
||||
- truncation behavior for large results
|
||||
- per-result formatting for batch fetches and partial failures
|
||||
|
||||
## Config File Design
|
||||
|
||||
The extension will read exactly one file:
|
||||
|
||||
- `~/.pi/agent/web-search.json`
|
||||
|
||||
Initial conceptual shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"defaultProvider": "exa-main",
|
||||
"providers": [
|
||||
{
|
||||
"name": "exa-main",
|
||||
"type": "exa",
|
||||
"apiKey": "exa_...",
|
||||
"options": {
|
||||
"defaultSearchLimit": 5,
|
||||
"defaultFetchTextMaxCharacters": 12000
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Config rules
|
||||
|
||||
- `defaultProvider` must match one provider entry by name.
|
||||
- `providers` must be a non-empty array.
|
||||
- Each provider entry must include:
|
||||
- `name`
|
||||
- `type`
|
||||
- `apiKey`
|
||||
- `apiKey` is a literal string in the first version.
|
||||
- `type` is validated so the runtime can select the correct adapter.
|
||||
- Exa-specific defaults may live under `options`, but they must remain optional.
|
||||
|
||||
### Config non-goals
|
||||
|
||||
The first version will **not**:
|
||||
|
||||
- read provider config from project-local files
|
||||
- merge config from multiple files
|
||||
- read credentials from env vars
|
||||
- support shell-command-based credential resolution
|
||||
- write or edit `web-search.json` automatically
|
||||
|
||||
If the file is missing or invalid, the tools should return a clear error telling the user where the file belongs and showing a minimal valid example.
|
||||
|
||||
## Tool Contract
|
||||
|
||||
### `web_search`
|
||||
|
||||
Purpose: search the web and return result metadata with a generic surface that can outlive Exa.
|
||||
|
||||
Conceptual input shape:
|
||||
|
||||
```ts
|
||||
{
|
||||
query: string;
|
||||
limit?: number;
|
||||
includeDomains?: string[];
|
||||
excludeDomains?: string[];
|
||||
startPublishedDate?: string;
|
||||
endPublishedDate?: string;
|
||||
category?: string;
|
||||
provider?: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Default behavior
|
||||
|
||||
- returns metadata only
|
||||
- does not fetch page text by default
|
||||
- uses the default configured provider unless `provider` explicitly selects another configured provider
|
||||
|
||||
### Result shape intent
|
||||
|
||||
Each search result should preserve a normalized subset of provider output such as:
|
||||
|
||||
- `title`
|
||||
- `url`
|
||||
- `publishedDate`
|
||||
- `author`
|
||||
- `score`
|
||||
- provider-specific stable identifiers only if useful for follow-up operations
|
||||
|
||||
The tool’s text output should stay compact and easy for the model to scan.
|
||||
|
||||
### `web_fetch`
|
||||
|
||||
Purpose: fetch contents for one or more URLs with a generic interface.
|
||||
|
||||
Conceptual input shape:
|
||||
|
||||
```ts
|
||||
{
|
||||
urls: string[];
|
||||
text?: boolean;
|
||||
highlights?: boolean;
|
||||
summary?: boolean;
|
||||
textMaxCharacters?: number;
|
||||
provider?: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Input normalization
|
||||
|
||||
The canonical tool shape is `urls: string[]`, where a single URL is represented as a one-element array. For robustness, the implementation may also accept a top-level `url` string through argument normalization and fold it into `urls`, but the stable contract exposed in schemas and docs should remain `urls: string[]`.
|
||||
|
||||
### Default behavior
|
||||
|
||||
- when no content mode is specified, fetch text
|
||||
- batch requests are allowed
|
||||
- the default configured provider is used unless overridden
|
||||
|
||||
### Result shape intent
|
||||
|
||||
Each fetched item should preserve normalized per-URL results, including:
|
||||
|
||||
- `url`
|
||||
- `title` where available
|
||||
- `text` by default
|
||||
- optional `highlights`
|
||||
- optional `summary`
|
||||
- per-item failure details for partial batch failures
|
||||
|
||||
## Provider Abstraction
|
||||
|
||||
The provider interface should express the minimum shared behaviors needed by the tools:
|
||||
|
||||
```ts
|
||||
interface WebSearchProvider {
|
||||
type: string;
|
||||
search(request: NormalizedSearchRequest): Promise<NormalizedSearchResponse>;
|
||||
fetch(request: NormalizedFetchRequest): Promise<NormalizedFetchResponse>;
|
||||
}
|
||||
```
|
||||
|
||||
### Exa adapter responsibilities
|
||||
|
||||
The Exa adapter will:
|
||||
|
||||
- instantiate an Exa client from the configured literal API key
|
||||
- use Exa search without contents for `web_search` default behavior
|
||||
- use Exa `getContents(...)` for `web_fetch`
|
||||
- map Exa response fields into normalized provider-agnostic result types
|
||||
- keep Exa-only fields contained inside the adapter unless they are intentionally promoted into the shared result model later
|
||||
|
||||
This keeps future provider additions focused: implement the same interface, extend config validation, and register the adapter.
|
||||
|
||||
## Rendering and Output Design
|
||||
|
||||
The extension should provide compact tool rendering so calls and results are readable inside pi.
|
||||
|
||||
### `renderCall`
|
||||
|
||||
- `web_search`: show tool name and the query
|
||||
- `web_fetch`: show tool name and URL count (or the single URL)
|
||||
|
||||
### `renderResult`
|
||||
|
||||
- `web_search`: show result count and a short numbered list of titles/URLs
|
||||
- `web_fetch`: show fetched count, failed count if any, and a concise per-URL summary
|
||||
|
||||
### LLM-facing text output
|
||||
|
||||
The text returned to the model should be concise and predictable:
|
||||
|
||||
- search: compact metadata list only by default
|
||||
- fetch: truncated text payloads with enough context to be useful
|
||||
- batch fetch: clearly separated per-URL sections
|
||||
|
||||
Large outputs must be truncated with the shared truncation utilities pattern used by pi tool examples.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Expected runtime failures should be handled cleanly and descriptively.
|
||||
|
||||
### Config errors
|
||||
|
||||
- missing `~/.pi/agent/web-search.json`
|
||||
- invalid JSON
|
||||
- schema mismatch
|
||||
- empty provider list
|
||||
- unknown `defaultProvider`
|
||||
- unknown explicitly requested provider
|
||||
- missing literal API key
|
||||
|
||||
These should return actionable errors naming the exact issue.
|
||||
|
||||
### Input errors
|
||||
|
||||
- empty search query
|
||||
- malformed URL(s)
|
||||
- empty URL list after normalization
|
||||
|
||||
These should be rejected before any provider request is made.
|
||||
|
||||
### Provider/runtime errors
|
||||
|
||||
- Exa authentication failures
|
||||
- network failures
|
||||
- rate limits
|
||||
- unexpected response shapes
|
||||
|
||||
These should return a concise summary in tool content while preserving richer diagnostics in `details`.
|
||||
|
||||
### Partial failures
|
||||
|
||||
For batch `web_fetch`, mixed outcomes should not fail the entire request unless every target fails. Successful pages should still be returned together with per-URL failure entries.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
The design intentionally separates pure logic from pi wiring so most behavior can be tested without loading pi itself.
|
||||
|
||||
### Automated tests
|
||||
|
||||
Cover:
|
||||
|
||||
1. config parsing and normalization
|
||||
2. provider-list validation
|
||||
3. default-provider resolution
|
||||
4. generic request → Exa request mapping
|
||||
5. Exa response → normalized response mapping
|
||||
6. compact formatting for metadata-only search
|
||||
7. truncation for long fetch results
|
||||
8. batch fetch formatting with partial failures
|
||||
9. helpful error messages when config is absent or invalid
|
||||
|
||||
### Test style
|
||||
|
||||
- prefer pure module tests for config, normalization, and formatting
|
||||
- inject a fake Exa-like client into the Exa adapter instead of making live network calls
|
||||
- keep extension entrypoint tests to smoke coverage only
|
||||
|
||||
### Manual verification
|
||||
|
||||
After implementation:
|
||||
|
||||
1. create `~/.pi/agent/web-search.json`
|
||||
2. reload pi
|
||||
3. run one `web_search` call
|
||||
4. run one single-URL `web_fetch` call
|
||||
5. run one multi-URL `web_fetch` call
|
||||
6. confirm missing/invalid config errors are readable
|
||||
|
||||
## Non-Goals
|
||||
|
||||
The first version will not add:
|
||||
|
||||
- other providers besides Exa
|
||||
- project-local web-search config
|
||||
- automatic setup commands or interactive config editing
|
||||
- provider-specific passthrough options in the public tool API
|
||||
- rich snippet/highlight defaults for search
|
||||
- live network integration tests in the normal automated suite
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
The work is complete when:
|
||||
|
||||
1. pi discovers a new extension package at `.pi/agent/extensions/web-search/`
|
||||
2. the agent has two generic tools:
|
||||
- `web_search`
|
||||
- `web_fetch`
|
||||
3. the implementation uses an internal provider abstraction
|
||||
4. Exa is the first working provider implementation
|
||||
5. the runtime reads global config from `~/.pi/agent/web-search.json`
|
||||
6. config uses a provider-list shape with a default provider selector
|
||||
7. credentials are read as literal values from that file
|
||||
8. `web_search` returns metadata only by default
|
||||
9. `web_fetch` accepts one or multiple URLs and returns text by default
|
||||
10. missing config, invalid config, and provider failures return clean, actionable tool errors
|
||||
11. core mapping/formatting/config logic is covered by automated tests
|
||||
Reference in New Issue
Block a user