Files
dotfiles/docs/superpowers/specs/2026-04-09-web-search-tools-design.md
2026-04-09 10:47:38 +01:00

13 KiB
Raw Blame History

Web Search Tools Design

Date: 2026-04-09 Project: /home/alex/dotfiles Target files:

  • .pi/agent/extensions/web-search/package.json
  • .pi/agent/extensions/web-search/index.ts
  • .pi/agent/extensions/web-search/src/schema.ts
  • .pi/agent/extensions/web-search/src/config.ts
  • .pi/agent/extensions/web-search/src/providers/types.ts
  • .pi/agent/extensions/web-search/src/providers/exa.ts
  • .pi/agent/extensions/web-search/src/tools/web-search.ts
  • .pi/agent/extensions/web-search/src/tools/web-fetch.ts
  • .pi/agent/extensions/web-search/src/format.ts
  • tests alongside the new modules

Goal

Add two generic pi tools, web_search and web_fetch, implemented as a modular extension package that uses Exa as the first provider while keeping the internal design extensible for future providers.

Context

  • This dotfiles repo already tracks pi configuration under .pi/agent/.
  • The current extension workspace contains a tracked question extension and small pure helper tests.
  • Pi extensions can be packaged as directories with index.ts and their own package.json, which is the best fit when third-party dependencies are needed.
  • The requested feature is explicitly about pi extensions and custom tools, not built-in model providers.
  • The user wants:
    • generic tool names now
    • Exa as the first provider
    • configuration read from a separate global file, not settings.json
    • configuration stored only at the global scope

User-Approved Requirements

  1. Add two generic tools:
    • web_search
    • web_fetch
  2. Use Exa as the initial provider.
  3. Keep the implementation extensible so other providers can be added later.
  4. Do not read configuration from environment variables.
  5. Do not read configuration from settings.json.
  6. Read configuration from a dedicated global file:
    • ~/.pi/agent/web-search.json
  7. Use a provider-list-based config shape, not a single-provider-only schema.
  8. Store credentials as literal values in that config file.
  9. web_search should return metadata only by default.
  10. web_fetch should accept one URL or multiple URLs.
  11. web_fetch should return text by default.
  12. The implementation direction should be the modular/package-style structure, not the minimal Exa-shaped shortcut.

Implement the feature as a dedicated extension package at:

  • /home/alex/dotfiles/.pi/agent/extensions/web-search/

This package will register two generic tools and route both through a provider registry. At runtime, the extension loads ~/.pi/agent/web-search.json, validates it, normalizes the provider list into an internal lookup map, resolves the configured default provider, and then executes requests through a provider adapter.

For the first version, the only adapter is Exa. However, the tool-facing layer remains provider-agnostic, so future providers only need to implement the shared provider interface and be added to config validation/registry wiring.

This is intentionally more structured than a single-file Exa wrapper because the user explicitly wants future extensibility without changing tool names or reworking the public API later.

File Structure

Extension package

  • Create: /home/alex/dotfiles/.pi/agent/extensions/web-search/package.json

    • declares the extension package
    • declares exa-js as a dependency
    • points pi at the extension entrypoint
  • Create: /home/alex/dotfiles/.pi/agent/extensions/web-search/index.ts

    • extension entrypoint
    • registers web_search and web_fetch
    • wires together config loading, provider registry, tool handlers, and shared formatting

Shared schemas and config

  • Create: /home/alex/dotfiles/.pi/agent/extensions/web-search/src/schema.ts

    • TypeBox schemas for tool parameters
    • TypeBox schemas for web-search.json
    • shared TypeScript types derived from the schemas where useful
  • Create: /home/alex/dotfiles/.pi/agent/extensions/web-search/src/config.ts

    • reads ~/.pi/agent/web-search.json
    • validates config shape
    • normalizes provider list into an internal map keyed by provider name
    • resolves default provider

Provider abstraction

  • Create: /home/alex/dotfiles/.pi/agent/extensions/web-search/src/providers/types.ts

    • generic request and response types for search/fetch
    • provider interface used by the tool layer
    • normalized internal result shapes independent of Exa SDK types
  • Create: /home/alex/dotfiles/.pi/agent/extensions/web-search/src/providers/exa.ts

    • Exa-backed implementation of the provider interface
    • translates generic search requests into Exa search(...)
    • translates generic fetch requests into Exa getContents(...)
    • isolates all Exa-specific request/response details

Tool handlers and formatting

  • Create: /home/alex/dotfiles/.pi/agent/extensions/web-search/src/tools/web-search.ts

    • web_search schema, execution logic, and tool rendering helpers
  • Create: /home/alex/dotfiles/.pi/agent/extensions/web-search/src/tools/web-fetch.ts

    • web_fetch schema, execution logic, and tool rendering helpers
  • Create: /home/alex/dotfiles/.pi/agent/extensions/web-search/src/format.ts

    • shared output shaping
    • compact text summaries for the LLM
    • truncation behavior for large results
    • per-result formatting for batch fetches and partial failures

Config File Design

The extension will read exactly one file:

  • ~/.pi/agent/web-search.json

Initial conceptual shape:

{
  "defaultProvider": "exa-main",
  "providers": [
    {
      "name": "exa-main",
      "type": "exa",
      "apiKey": "exa_...",
      "options": {
        "defaultSearchLimit": 5,
        "defaultFetchTextMaxCharacters": 12000
      }
    }
  ]
}

Config rules

  • defaultProvider must match one provider entry by name.
  • providers must be a non-empty array.
  • Each provider entry must include:
    • name
    • type
    • apiKey
  • apiKey is a literal string in the first version.
  • type is validated so the runtime can select the correct adapter.
  • Exa-specific defaults may live under options, but they must remain optional.

Config non-goals

The first version will not:

  • read provider config from project-local files
  • merge config from multiple files
  • read credentials from env vars
  • support shell-command-based credential resolution
  • write or edit web-search.json automatically

If the file is missing or invalid, the tools should return a clear error telling the user where the file belongs and showing a minimal valid example.

Tool Contract

Purpose: search the web and return result metadata with a generic surface that can outlive Exa.

Conceptual input shape:

{
  query: string;
  limit?: number;
  includeDomains?: string[];
  excludeDomains?: string[];
  startPublishedDate?: string;
  endPublishedDate?: string;
  category?: string;
  provider?: string;
}

Default behavior

  • returns metadata only
  • does not fetch page text by default
  • uses the default configured provider unless provider explicitly selects another configured provider

Result shape intent

Each search result should preserve a normalized subset of provider output such as:

  • title
  • url
  • publishedDate
  • author
  • score
  • provider-specific stable identifiers only if useful for follow-up operations

The tools text output should stay compact and easy for the model to scan.

web_fetch

Purpose: fetch contents for one or more URLs with a generic interface.

Conceptual input shape:

{
  urls: string[];
  text?: boolean;
  highlights?: boolean;
  summary?: boolean;
  textMaxCharacters?: number;
  provider?: string;
}

Input normalization

The canonical tool shape is urls: string[], where a single URL is represented as a one-element array. For robustness, the implementation may also accept a top-level url string through argument normalization and fold it into urls, but the stable contract exposed in schemas and docs should remain urls: string[].

Default behavior

  • when no content mode is specified, fetch text
  • batch requests are allowed
  • the default configured provider is used unless overridden

Result shape intent

Each fetched item should preserve normalized per-URL results, including:

  • url
  • title where available
  • text by default
  • optional highlights
  • optional summary
  • per-item failure details for partial batch failures

Provider Abstraction

The provider interface should express the minimum shared behaviors needed by the tools:

interface WebSearchProvider {
  type: string;
  search(request: NormalizedSearchRequest): Promise<NormalizedSearchResponse>;
  fetch(request: NormalizedFetchRequest): Promise<NormalizedFetchResponse>;
}

Exa adapter responsibilities

The Exa adapter will:

  • instantiate an Exa client from the configured literal API key
  • use Exa search without contents for web_search default behavior
  • use Exa getContents(...) for web_fetch
  • map Exa response fields into normalized provider-agnostic result types
  • keep Exa-only fields contained inside the adapter unless they are intentionally promoted into the shared result model later

This keeps future provider additions focused: implement the same interface, extend config validation, and register the adapter.

Rendering and Output Design

The extension should provide compact tool rendering so calls and results are readable inside pi.

renderCall

  • web_search: show tool name and the query
  • web_fetch: show tool name and URL count (or the single URL)

renderResult

  • web_search: show result count and a short numbered list of titles/URLs
  • web_fetch: show fetched count, failed count if any, and a concise per-URL summary

LLM-facing text output

The text returned to the model should be concise and predictable:

  • search: compact metadata list only by default
  • fetch: truncated text payloads with enough context to be useful
  • batch fetch: clearly separated per-URL sections

Large outputs must be truncated with the shared truncation utilities pattern used by pi tool examples.

Error Handling

Expected runtime failures should be handled cleanly and descriptively.

Config errors

  • missing ~/.pi/agent/web-search.json
  • invalid JSON
  • schema mismatch
  • empty provider list
  • unknown defaultProvider
  • unknown explicitly requested provider
  • missing literal API key

These should return actionable errors naming the exact issue.

Input errors

  • empty search query
  • malformed URL(s)
  • empty URL list after normalization

These should be rejected before any provider request is made.

Provider/runtime errors

  • Exa authentication failures
  • network failures
  • rate limits
  • unexpected response shapes

These should return a concise summary in tool content while preserving richer diagnostics in details.

Partial failures

For batch web_fetch, mixed outcomes should not fail the entire request unless every target fails. Successful pages should still be returned together with per-URL failure entries.

Testing Strategy

The design intentionally separates pure logic from pi wiring so most behavior can be tested without loading pi itself.

Automated tests

Cover:

  1. config parsing and normalization
  2. provider-list validation
  3. default-provider resolution
  4. generic request → Exa request mapping
  5. Exa response → normalized response mapping
  6. compact formatting for metadata-only search
  7. truncation for long fetch results
  8. batch fetch formatting with partial failures
  9. helpful error messages when config is absent or invalid

Test style

  • prefer pure module tests for config, normalization, and formatting
  • inject a fake Exa-like client into the Exa adapter instead of making live network calls
  • keep extension entrypoint tests to smoke coverage only

Manual verification

After implementation:

  1. create ~/.pi/agent/web-search.json
  2. reload pi
  3. run one web_search call
  4. run one single-URL web_fetch call
  5. run one multi-URL web_fetch call
  6. confirm missing/invalid config errors are readable

Non-Goals

The first version will not add:

  • other providers besides Exa
  • project-local web-search config
  • automatic setup commands or interactive config editing
  • provider-specific passthrough options in the public tool API
  • rich snippet/highlight defaults for search
  • live network integration tests in the normal automated suite

Acceptance Criteria

The work is complete when:

  1. pi discovers a new extension package at .pi/agent/extensions/web-search/
  2. the agent has two generic tools:
    • web_search
    • web_fetch
  3. the implementation uses an internal provider abstraction
  4. Exa is the first working provider implementation
  5. the runtime reads global config from ~/.pi/agent/web-search.json
  6. config uses a provider-list shape with a default provider selector
  7. credentials are read as literal values from that file
  8. web_search returns metadata only by default
  9. web_fetch accepts one or multiple URLs and returns text by default
  10. missing config, invalid config, and provider failures return clean, actionable tool errors
  11. core mapping/formatting/config logic is covered by automated tests