This commit is contained in:
alex wiesner
2026-04-12 06:47:14 +01:00
parent 1f0df3ed0d
commit 5d5d0e2d26
17 changed files with 1575 additions and 0 deletions

71
ccc/SKILL.md Normal file
View File

@@ -0,0 +1,71 @@
---
name: ccc
description: "This skill should be used when code search is needed (whether explicitly requested or as part of completing a task), when indexing the codebase after changes, or when the user asks about ccc, cocoindex-code, or the codebase index. Trigger phrases include 'search the codebase', 'find code related to', 'update the index', 'ccc', 'cocoindex-code'."
---
# ccc - Semantic Code Search & Indexing
`ccc` is the CLI for CocoIndex Code, providing semantic search over the current codebase and index management.
## Ownership
The agent owns the `ccc` lifecycle for the current project — initialization, indexing, and searching. Do not ask the user to perform these steps; handle them automatically.
- **Initialization**: If `ccc search` or `ccc index` fails with an initialization error (e.g., "Not in an initialized project directory"), run `ccc init` from the project root directory, then `ccc index` to build the index, then retry the original command.
- **Index freshness**: Keep the index up to date by running `ccc index` (or `ccc search --refresh`) when the index may be stale — e.g., at the start of a session, or after making significant code changes (new files, refactors, renamed modules). There is no need to re-index between consecutive searches if no code was changed in between.
- **Installation**: If `ccc` itself is not found (command not found), refer to [management.md](references/management.md) for installation instructions and inform the user.
## Searching the Codebase
To perform a semantic search:
```bash
ccc search <query terms>
```
The query should describe the concept, functionality, or behavior to find, not exact code syntax. For example:
```bash
ccc search database connection pooling
ccc search user authentication flow
ccc search error handling retry logic
```
### Filtering Results
- **By language** (`--lang`, repeatable): restrict results to specific languages.
```bash
ccc search --lang python --lang markdown database schema
```
- **By path** (`--path`): restrict results to a glob pattern relative to project root. If omitted, defaults to the current working directory (only results under that subdirectory are returned).
```bash
ccc search --path 'src/api/*' request validation
```
### Pagination
Results default to the first page. To retrieve additional results:
```bash
ccc search --offset 5 --limit 5 database schema
```
If all returned results look relevant, use `--offset` to fetch the next page — there are likely more useful matches beyond the first page.
### Working with Search Results
Search results include file paths and line ranges. To explore a result in more detail:
- Use the editor's built-in file reading capabilities (e.g., the `Read` tool) to load the matched file and read lines around the returned range for full context.
- When working in a terminal without a file-reading tool, use `sed -n '<start>,<end>p' <file>` to extract a specific line range.
## Settings
To view or edit embedding model configuration, include/exclude patterns, or language overrides, see [settings.md](references/settings.md).
## Management & Troubleshooting
For installation, initialization, daemon management, troubleshooting, and cleanup commands, see [management.md](references/management.md).

View File

@@ -0,0 +1,95 @@
# ccc Management
## Installation
Install CocoIndex Code via pipx:
```bash
pipx install cocoindex-code
```
To upgrade to the latest version:
```bash
pipx upgrade cocoindex-code
```
After installation, the `ccc` command is available globally.
## Project Initialization
Run from the root directory of the project to index:
```bash
ccc init
```
This creates:
- `~/.cocoindex_code/global_settings.yml` (user-level settings, e.g., model configuration) if it does not already exist.
- `.cocoindex_code/settings.yml` (project-level settings, e.g., include/exclude file patterns).
If `.git` exists in the directory, `.cocoindex_code/` is automatically added to `.gitignore`.
Use `-f` to skip the confirmation prompt if `ccc init` detects a potential parent project root.
After initialization, edit the settings files if needed (see [settings.md](settings.md) for format details), then run `ccc index` to build the initial index.
## Troubleshooting
### Diagnostics
Run `ccc doctor` to check system health end-to-end:
```bash
ccc doctor
```
This checks global settings, daemon status, embedding model (runs a test embedding), and — if run from within a project — file matching (walks files using the same logic as the indexer) and index status. Results stream incrementally. Always points to `daemon.log` at the end for further investigation.
### Checking Project Status
To view the current project's index status:
```bash
ccc status
```
This shows whether indexing is ongoing and index statistics.
### Daemon Management
The daemon starts automatically on first use. To check its status:
```bash
ccc daemon status
```
This shows whether the daemon is running, its version, uptime, and loaded projects.
To restart the daemon (useful if it gets into a bad state):
```bash
ccc daemon restart
```
To stop the daemon:
```bash
ccc daemon stop
```
## Cleanup
To reset a project's index (removes databases, keeps settings):
```bash
ccc reset
```
To fully remove all CocoIndex Code data for a project (including settings):
```bash
ccc reset --all
```
Both commands prompt for confirmation. Use `-f` to skip.

126
ccc/references/settings.md Normal file
View File

@@ -0,0 +1,126 @@
# ccc Settings
Configuration lives in two YAML files, both created automatically by `ccc init`.
## User-Level Settings (`~/.cocoindex_code/global_settings.yml`)
Shared across all projects. Controls the embedding model and extra environment variables for the daemon.
```yaml
embedding:
provider: sentence-transformers # or "litellm" (default when provider is omitted)
model: sentence-transformers/all-MiniLM-L6-v2
device: mps # optional: cpu, cuda, mps (auto-detected if omitted)
min_interval_ms: 300 # optional: pace LiteLLM embedding requests to reduce 429s; defaults to 5 for LiteLLM
envs: # extra environment variables for the daemon
OPENAI_API_KEY: your-key # only needed if not already in the shell environment
```
### Fields
| Field | Description |
|-------|-------------|
| `embedding.provider` | `sentence-transformers` for local models, `litellm` (or omit) for cloud/remote models |
| `embedding.model` | Model identifier — format depends on provider (see examples below) |
| `embedding.device` | Optional. `cpu`, `cuda`, or `mps`. Auto-detected if omitted. Only relevant for `sentence-transformers`. |
| `embedding.min_interval_ms` | Optional. Minimum delay between LiteLLM embedding requests in milliseconds. Defaults to `5` for LiteLLM and is ignored by `sentence-transformers`. Set explicitly to override the default. |
| `envs` | Key-value map of environment variables injected into the daemon. Use for API keys not already in the shell environment. |
### Embedding Model Examples
**Local (sentence-transformers, no API key needed):**
```yaml
embedding:
provider: sentence-transformers
model: sentence-transformers/all-MiniLM-L6-v2 # default, lightweight
```
```yaml
embedding:
provider: sentence-transformers
model: nomic-ai/CodeRankEmbed # better code retrieval, needs GPU (~1 GB VRAM)
```
**Ollama (local):**
```yaml
embedding:
model: ollama/nomic-embed-text
```
**OpenAI:**
```yaml
embedding:
model: text-embedding-3-small
min_interval_ms: 300
envs:
OPENAI_API_KEY: your-api-key
```
**Gemini:**
```yaml
embedding:
model: gemini/gemini-embedding-001
envs:
GEMINI_API_KEY: your-api-key
```
**Voyage (code-optimized):**
```yaml
embedding:
model: voyage/voyage-code-3
envs:
VOYAGE_API_KEY: your-api-key
```
For the full list of supported cloud providers and model identifiers, see [LiteLLM Embedding Models](https://docs.litellm.ai/docs/embedding/supported_embedding).
### Important
Switching embedding models changes vector dimensions — you must re-index after changing the model:
```bash
ccc reset && ccc index
```
## Project-Level Settings (`<project>/.cocoindex_code/settings.yml`)
Per-project. Controls which files to index. Created by `ccc init` and automatically added to `.gitignore`.
```yaml
include_patterns:
- "**/*.py"
- "**/*.js"
- "**/*.ts"
# ... (sensible defaults for 28+ file types)
exclude_patterns:
- "**/.*" # hidden directories
- "**/__pycache__"
- "**/node_modules"
- "**/dist"
# ...
language_overrides:
- ext: inc # treat .inc files as PHP
lang: php
```
### Fields
| Field | Description |
|-------|-------------|
| `include_patterns` | Glob patterns for files to index. Defaults cover common languages (Python, JS/TS, Rust, Go, Java, C/C++, C#, SQL, Shell, Markdown, PHP, Lua, etc.). |
| `exclude_patterns` | Glob patterns for files/directories to skip. Defaults exclude hidden dirs, `node_modules`, `dist`, `__pycache__`, `vendor`, etc. |
| `language_overrides` | List of `{ext, lang}` pairs to override language detection for specific file extensions. |
### Editing Tips
- To index additional file types, append glob patterns to `include_patterns` (e.g. `"**/*.proto"`).
- To exclude a directory, append to `exclude_patterns` (e.g. `"**/generated"`).
- After editing, run `ccc index` to re-index with the new settings.