updates
This commit is contained in:
126
ccc/references/settings.md
Normal file
126
ccc/references/settings.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# ccc Settings
|
||||
|
||||
Configuration lives in two YAML files, both created automatically by `ccc init`.
|
||||
|
||||
## User-Level Settings (`~/.cocoindex_code/global_settings.yml`)
|
||||
|
||||
Shared across all projects. Controls the embedding model and extra environment variables for the daemon.
|
||||
|
||||
```yaml
|
||||
embedding:
|
||||
provider: sentence-transformers # or "litellm" (default when provider is omitted)
|
||||
model: sentence-transformers/all-MiniLM-L6-v2
|
||||
device: mps # optional: cpu, cuda, mps (auto-detected if omitted)
|
||||
min_interval_ms: 300 # optional: pace LiteLLM embedding requests to reduce 429s; defaults to 5 for LiteLLM
|
||||
|
||||
envs: # extra environment variables for the daemon
|
||||
OPENAI_API_KEY: your-key # only needed if not already in the shell environment
|
||||
```
|
||||
|
||||
### Fields
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `embedding.provider` | `sentence-transformers` for local models, `litellm` (or omit) for cloud/remote models |
|
||||
| `embedding.model` | Model identifier — format depends on provider (see examples below) |
|
||||
| `embedding.device` | Optional. `cpu`, `cuda`, or `mps`. Auto-detected if omitted. Only relevant for `sentence-transformers`. |
|
||||
| `embedding.min_interval_ms` | Optional. Minimum delay between LiteLLM embedding requests in milliseconds. Defaults to `5` for LiteLLM and is ignored by `sentence-transformers`. Set explicitly to override the default. |
|
||||
| `envs` | Key-value map of environment variables injected into the daemon. Use for API keys not already in the shell environment. |
|
||||
|
||||
### Embedding Model Examples
|
||||
|
||||
**Local (sentence-transformers, no API key needed):**
|
||||
|
||||
```yaml
|
||||
embedding:
|
||||
provider: sentence-transformers
|
||||
model: sentence-transformers/all-MiniLM-L6-v2 # default, lightweight
|
||||
```
|
||||
|
||||
```yaml
|
||||
embedding:
|
||||
provider: sentence-transformers
|
||||
model: nomic-ai/CodeRankEmbed # better code retrieval, needs GPU (~1 GB VRAM)
|
||||
```
|
||||
|
||||
**Ollama (local):**
|
||||
|
||||
```yaml
|
||||
embedding:
|
||||
model: ollama/nomic-embed-text
|
||||
```
|
||||
|
||||
**OpenAI:**
|
||||
|
||||
```yaml
|
||||
embedding:
|
||||
model: text-embedding-3-small
|
||||
min_interval_ms: 300
|
||||
envs:
|
||||
OPENAI_API_KEY: your-api-key
|
||||
```
|
||||
|
||||
**Gemini:**
|
||||
|
||||
```yaml
|
||||
embedding:
|
||||
model: gemini/gemini-embedding-001
|
||||
envs:
|
||||
GEMINI_API_KEY: your-api-key
|
||||
```
|
||||
|
||||
**Voyage (code-optimized):**
|
||||
|
||||
```yaml
|
||||
embedding:
|
||||
model: voyage/voyage-code-3
|
||||
envs:
|
||||
VOYAGE_API_KEY: your-api-key
|
||||
```
|
||||
|
||||
For the full list of supported cloud providers and model identifiers, see [LiteLLM Embedding Models](https://docs.litellm.ai/docs/embedding/supported_embedding).
|
||||
|
||||
### Important
|
||||
|
||||
Switching embedding models changes vector dimensions — you must re-index after changing the model:
|
||||
|
||||
```bash
|
||||
ccc reset && ccc index
|
||||
```
|
||||
|
||||
## Project-Level Settings (`<project>/.cocoindex_code/settings.yml`)
|
||||
|
||||
Per-project. Controls which files to index. Created by `ccc init` and automatically added to `.gitignore`.
|
||||
|
||||
```yaml
|
||||
include_patterns:
|
||||
- "**/*.py"
|
||||
- "**/*.js"
|
||||
- "**/*.ts"
|
||||
# ... (sensible defaults for 28+ file types)
|
||||
|
||||
exclude_patterns:
|
||||
- "**/.*" # hidden directories
|
||||
- "**/__pycache__"
|
||||
- "**/node_modules"
|
||||
- "**/dist"
|
||||
# ...
|
||||
|
||||
language_overrides:
|
||||
- ext: inc # treat .inc files as PHP
|
||||
lang: php
|
||||
```
|
||||
|
||||
### Fields
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `include_patterns` | Glob patterns for files to index. Defaults cover common languages (Python, JS/TS, Rust, Go, Java, C/C++, C#, SQL, Shell, Markdown, PHP, Lua, etc.). |
|
||||
| `exclude_patterns` | Glob patterns for files/directories to skip. Defaults exclude hidden dirs, `node_modules`, `dist`, `__pycache__`, `vendor`, etc. |
|
||||
| `language_overrides` | List of `{ext, lang}` pairs to override language detection for specific file extensions. |
|
||||
|
||||
### Editing Tips
|
||||
|
||||
- To index additional file types, append glob patterns to `include_patterns` (e.g. `"**/*.proto"`).
|
||||
- To exclude a directory, append to `exclude_patterns` (e.g. `"**/generated"`).
|
||||
- After editing, run `ccc index` to re-index with the new settings.
|
||||
Reference in New Issue
Block a user