feat: add marketplace skills as local project skills

Move skill definitions from the nanoclaw-skills marketplace plugin
into .claude/skills/ so they're available as unprefixed slash commands
(e.g. /add-whatsapp instead of /nanoclaw-skills:add-whatsapp).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
gavrielc
2026-03-10 02:25:17 +02:00
parent 621fde8c75
commit d572bab5c6
13 changed files with 2301 additions and 0 deletions

View File

@@ -0,0 +1,135 @@
---
name: add-compact
description: Add /compact command for manual context compaction. Solves context rot in long sessions by forwarding the SDK's built-in /compact slash command. Main-group or trusted sender only.
---
# Add /compact Command
Adds a `/compact` session command that compacts conversation history to fight context rot in long-running sessions. Uses the Claude Agent SDK's built-in `/compact` slash command — no synthetic system prompts.
**Session contract:** `/compact` keeps the same logical session alive. The SDK returns a new session ID after compaction (via the `init` system message), which the agent-runner forwards to the orchestrator as `newSessionId`. No destructive reset occurs — the agent retains summarized context.
## Phase 1: Pre-flight
Check if `src/session-commands.ts` exists:
```bash
test -f src/session-commands.ts && echo "Already applied" || echo "Not applied"
```
If already applied, skip to Phase 3 (Verify).
## Phase 2: Apply Code Changes
Merge the skill branch:
```bash
git fetch upstream skill/compact
git merge upstream/skill/compact
```
> **Note:** `upstream` is the remote pointing to `qwibitai/nanoclaw`. If using a different remote name, substitute accordingly.
This adds:
- `src/session-commands.ts` (extract and authorize session commands)
- `src/session-commands.test.ts` (unit tests for command parsing and auth)
- Session command interception in `src/index.ts` (both `processGroupMessages` and `startMessageLoop`)
- Slash command handling in `container/agent-runner/src/index.ts`
### Validate
```bash
npm test
npm run build
```
### Rebuild container
```bash
./container/build.sh
```
### Restart service
```bash
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
# Linux: systemctl --user restart nanoclaw
```
## Phase 3: Verify
### Integration Test
1. Start NanoClaw in dev mode: `npm run dev`
2. From the **main group** (self-chat), send exactly: `/compact`
3. Verify:
- The agent acknowledges compaction (e.g., "Conversation compacted.")
- The session continues — send a follow-up message and verify the agent responds coherently
- A conversation archive is written to `groups/{folder}/conversations/` (by the PreCompact hook)
- Container logs show `Compact boundary observed` (confirms SDK actually compacted)
- If `compact_boundary` was NOT observed, the response says "compact_boundary was not observed"
4. From a **non-main group** as a non-admin user, send: `@<assistant> /compact`
5. Verify:
- The bot responds with "Session commands require admin access."
- No compaction occurs, no container is spawned for the command
6. From a **non-main group** as the admin (device owner / `is_from_me`), send: `@<assistant> /compact`
7. Verify:
- Compaction proceeds normally (same behavior as main group)
8. While an **active container** is running for the main group, send `/compact`
9. Verify:
- The active container is signaled to close (authorized senders only — untrusted senders cannot kill in-flight work)
- Compaction proceeds via a new container once the active one exits
- The command is not dropped (no cursor race)
10. Send a normal message, then `/compact`, then another normal message in quick succession (same polling batch):
11. Verify:
- Pre-compact messages are sent to the agent first (check container logs for two `runAgent` calls)
- Compaction proceeds after pre-compact messages are processed
- Messages **after** `/compact` in the batch are preserved (cursor advances to `/compact`'s timestamp only) and processed on the next poll cycle
12. From a **non-main group** as a non-admin user, send `@<assistant> /compact`:
13. Verify:
- Denial message is sent ("Session commands require admin access.")
- The `/compact` is consumed (cursor advanced) — it does NOT replay on future polls
- Other messages in the same batch are also consumed (cursor is a high-water mark — this is an accepted tradeoff for the narrow edge case of denied `/compact` + other messages in the same polling interval)
- No container is killed or interrupted
14. From a **non-main group** (with `requiresTrigger` enabled) as a non-admin user, send bare `/compact` (no trigger prefix):
15. Verify:
- No denial message is sent (trigger policy prevents untrusted bot responses)
- The `/compact` is consumed silently
- Note: in groups where `requiresTrigger` is `false`, a denial message IS sent because the sender is considered reachable
16. After compaction, verify **no auto-compaction** behavior — only manual `/compact` triggers it
### Validation on Fresh Clone
```bash
git clone <your-fork> /tmp/nanoclaw-test
cd /tmp/nanoclaw-test
claude # then run /add-compact
npm run build
npm test
./container/build.sh
# Manual: send /compact from main group, verify compaction + continuation
# Manual: send @<assistant> /compact from non-main as non-admin, verify denial
# Manual: send @<assistant> /compact from non-main as admin, verify allowed
# Manual: verify no auto-compaction behavior
```
## Security Constraints
- **Main-group or trusted/admin sender only.** The main group is the user's private self-chat and is trusted (see `docs/SECURITY.md`). Non-main groups are untrusted — a careless or malicious user could wipe the agent's short-term memory. However, the device owner (`is_from_me`) is always trusted and can compact from any group.
- **No auto-compaction.** This skill implements manual compaction only. Automatic threshold-based compaction is a separate concern and should be a separate skill.
- **No config file.** NanoClaw's philosophy is customization through code changes, not configuration sprawl.
- **Transcript archived before compaction.** The existing `PreCompact` hook in the agent-runner archives the full transcript to `conversations/` before the SDK compacts it.
- **Session continues after compaction.** This is not a destructive reset. The conversation continues with summarized context.
## What This Does NOT Do
- No automatic compaction threshold (add separately if desired)
- No `/clear` command (separate skill, separate semantics — `/clear` is a destructive reset)
- No cross-group compaction (each group's session is isolated)
- No changes to the container image, Dockerfile, or build script
## Troubleshooting
- **"Session commands require admin access"**: Only the device owner (`is_from_me`) or main-group senders can use `/compact`. Other users are denied.
- **No compact_boundary in logs**: The SDK may not emit this event in all versions. Check the agent-runner logs for the warning message. Compaction may still have succeeded.
- **Pre-compact failure**: If messages before `/compact` fail to process, the error message says "Failed to process messages before /compact." The cursor advances past sent output to prevent duplicates; `/compact` remains pending for the next attempt.

View File

@@ -0,0 +1,212 @@
---
name: add-discord
description: Add Discord bot channel integration to NanoClaw.
---
# Add Discord Channel
This skill adds Discord support to NanoClaw, then walks through interactive setup.
## Phase 1: Pre-flight
### Check if already applied
Check if `src/channels/discord.ts` exists. If it does, skip to Phase 3 (Setup). The code changes are already in place.
### Ask the user
Use `AskUserQuestion` to collect configuration:
AskUserQuestion: Do you have a Discord bot token, or do you need to create one?
If they have one, collect it now. If not, we'll create one in Phase 3.
## Phase 2: Apply Code Changes
### Ensure channel remote
```bash
git remote -v
```
If `discord` is missing, add it:
```bash
git remote add discord https://github.com/qwibitai/nanoclaw-discord.git
```
### Merge the skill branch
```bash
git fetch discord main
git merge discord/main
```
This merges in:
- `src/channels/discord.ts` (DiscordChannel class with self-registration via `registerChannel`)
- `src/channels/discord.test.ts` (unit tests with discord.js mock)
- `import './discord.js'` appended to the channel barrel file `src/channels/index.ts`
- `discord.js` npm dependency in `package.json`
- `DISCORD_BOT_TOKEN` in `.env.example`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Validate code changes
```bash
npm install
npm run build
npx vitest run src/channels/discord.test.ts
```
All tests must pass (including the new Discord tests) and build must be clean before proceeding.
## Phase 3: Setup
### Create Discord Bot (if needed)
If the user doesn't have a bot token, tell them:
> I need you to create a Discord bot:
>
> 1. Go to the [Discord Developer Portal](https://discord.com/developers/applications)
> 2. Click **New Application** and give it a name (e.g., "Andy Assistant")
> 3. Go to the **Bot** tab on the left sidebar
> 4. Click **Reset Token** to generate a new bot token — copy it immediately (you can only see it once)
> 5. Under **Privileged Gateway Intents**, enable:
> - **Message Content Intent** (required to read message text)
> - **Server Members Intent** (optional, for member display names)
> 6. Go to **OAuth2** > **URL Generator**:
> - Scopes: select `bot`
> - Bot Permissions: select `Send Messages`, `Read Message History`, `View Channels`
> - Copy the generated URL and open it in your browser to invite the bot to your server
Wait for the user to provide the token.
### Configure environment
Add to `.env`:
```bash
DISCORD_BOT_TOKEN=<their-token>
```
Channels auto-enable when their credentials are present — no extra configuration needed.
Sync to container environment:
```bash
mkdir -p data/env && cp .env data/env/env
```
The container reads environment from `data/env/env`, not `.env` directly.
### Build and restart
```bash
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
```
## Phase 4: Registration
### Get Channel ID
Tell the user:
> To get the channel ID for registration:
>
> 1. In Discord, go to **User Settings** > **Advanced** > Enable **Developer Mode**
> 2. Right-click the text channel you want the bot to respond in
> 3. Click **Copy Channel ID**
>
> The channel ID will be a long number like `1234567890123456`.
Wait for the user to provide the channel ID (format: `dc:1234567890123456`).
### Register the channel
Use the IPC register flow or register directly. The channel ID, name, and folder name are needed.
For a main channel (responds to all messages):
```typescript
registerGroup("dc:<channel-id>", {
name: "<server-name> #<channel-name>",
folder: "discord_main",
trigger: `@${ASSISTANT_NAME}`,
added_at: new Date().toISOString(),
requiresTrigger: false,
isMain: true,
});
```
For additional channels (trigger-only):
```typescript
registerGroup("dc:<channel-id>", {
name: "<server-name> #<channel-name>",
folder: "discord_<channel-name>",
trigger: `@${ASSISTANT_NAME}`,
added_at: new Date().toISOString(),
requiresTrigger: true,
});
```
## Phase 5: Verify
### Test the connection
Tell the user:
> Send a message in your registered Discord channel:
> - For main channel: Any message works
> - For non-main: @mention the bot in Discord
>
> The bot should respond within a few seconds.
### Check logs if needed
```bash
tail -f logs/nanoclaw.log
```
## Troubleshooting
### Bot not responding
1. Check `DISCORD_BOT_TOKEN` is set in `.env` AND synced to `data/env/env`
2. Check channel is registered: `sqlite3 store/messages.db "SELECT * FROM registered_groups WHERE jid LIKE 'dc:%'"`
3. For non-main channels: message must include trigger pattern (@mention the bot)
4. Service is running: `launchctl list | grep nanoclaw`
5. Verify the bot has been invited to the server (check OAuth2 URL was used)
### Bot only responds to @mentions
This is the default behavior for non-main channels (`requiresTrigger: true`). To change:
- Update the registered group's `requiresTrigger` to `false`
- Or register the channel as the main channel
### Message Content Intent not enabled
If the bot connects but can't read messages, ensure:
1. Go to [Discord Developer Portal](https://discord.com/developers/applications)
2. Select your application > **Bot** tab
3. Under **Privileged Gateway Intents**, enable **Message Content Intent**
4. Restart NanoClaw
### Getting Channel ID
If you can't copy the channel ID:
- Ensure **Developer Mode** is enabled: User Settings > Advanced > Developer Mode
- Right-click the channel name in the server sidebar > Copy Channel ID
## After Setup
The Discord bot supports:
- Text messages in registered channels
- Attachment descriptions (images, videos, files shown as placeholders)
- Reply context (shows who the user is replying to)
- @mention translation (Discord `<@botId>` → NanoClaw trigger format)
- Message splitting for responses over 2000 characters
- Typing indicators while the agent processes

View File

@@ -0,0 +1,216 @@
---
name: add-gmail
description: Add Gmail integration to NanoClaw. Can be configured as a tool (agent reads/sends emails when triggered from WhatsApp) or as a full channel (emails can trigger the agent, schedule tasks, and receive replies). Guides through GCP OAuth setup and implements the integration.
---
# Add Gmail Integration
This skill adds Gmail support to NanoClaw — either as a tool (read, send, search, draft) or as a full channel that polls the inbox.
## Phase 1: Pre-flight
### Check if already applied
Check if `src/channels/gmail.ts` exists. If it does, skip to Phase 3 (Setup). The code changes are already in place.
### Ask the user
Use `AskUserQuestion`:
AskUserQuestion: Should incoming emails be able to trigger the agent?
- **Yes** — Full channel mode: the agent listens on Gmail and responds to incoming emails automatically
- **No** — Tool-only: the agent gets full Gmail tools (read, send, search, draft) but won't monitor the inbox. No channel code is added.
## Phase 2: Apply Code Changes
### Ensure channel remote
```bash
git remote -v
```
If `gmail` is missing, add it:
```bash
git remote add gmail https://github.com/qwibitai/nanoclaw-gmail.git
```
### Merge the skill branch
```bash
git fetch gmail main
git merge gmail/main
```
This merges in:
- `src/channels/gmail.ts` (GmailChannel class with self-registration via `registerChannel`)
- `src/channels/gmail.test.ts` (unit tests)
- `import './gmail.js'` appended to the channel barrel file `src/channels/index.ts`
- Gmail credentials mount (`~/.gmail-mcp`) in `src/container-runner.ts`
- Gmail MCP server (`@gongrzhe/server-gmail-autoauth-mcp`) and `mcp__gmail__*` allowed tool in `container/agent-runner/src/index.ts`
- `googleapis` npm dependency in `package.json`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Add email handling instructions (Channel mode only)
If the user chose channel mode, append the following to `groups/main/CLAUDE.md` (before the formatting section):
```markdown
## Email Notifications
When you receive an email notification (messages starting with `[Email from ...`), inform the user about it but do NOT reply to the email unless specifically asked. You have Gmail tools available — use them only when the user explicitly asks you to reply, forward, or take action on an email.
```
### Validate code changes
```bash
npm install
npm run build
npx vitest run src/channels/gmail.test.ts
```
All tests must pass (including the new Gmail tests) and build must be clean before proceeding.
## Phase 3: Setup
### Check existing Gmail credentials
```bash
ls -la ~/.gmail-mcp/ 2>/dev/null || echo "No Gmail config found"
```
If `credentials.json` already exists, skip to "Build and restart" below.
### GCP Project Setup
Tell the user:
> I need you to set up Google Cloud OAuth credentials:
>
> 1. Open https://console.cloud.google.com — create a new project or select existing
> 2. Go to **APIs & Services > Library**, search "Gmail API", click **Enable**
> 3. Go to **APIs & Services > Credentials**, click **+ CREATE CREDENTIALS > OAuth client ID**
> - If prompted for consent screen: choose "External", fill in app name and email, save
> - Application type: **Desktop app**, name: anything (e.g., "NanoClaw Gmail")
> 4. Click **DOWNLOAD JSON** and save as `gcp-oauth.keys.json`
>
> Where did you save the file? (Give me the full path, or paste the file contents here)
If user provides a path, copy it:
```bash
mkdir -p ~/.gmail-mcp
cp "/path/user/provided/gcp-oauth.keys.json" ~/.gmail-mcp/gcp-oauth.keys.json
```
If user pastes JSON content, write it to `~/.gmail-mcp/gcp-oauth.keys.json`.
### OAuth Authorization
Tell the user:
> I'm going to run Gmail authorization. A browser window will open — sign in and grant access. If you see an "app isn't verified" warning, click "Advanced" then "Go to [app name] (unsafe)" — this is normal for personal OAuth apps.
Run the authorization:
```bash
npx -y @gongrzhe/server-gmail-autoauth-mcp auth
```
If that fails (some versions don't have an auth subcommand), try `timeout 60 npx -y @gongrzhe/server-gmail-autoauth-mcp || true`. Verify with `ls ~/.gmail-mcp/credentials.json`.
### Build and restart
Clear stale per-group agent-runner copies (they only get re-created if missing, so existing copies won't pick up the new Gmail server):
```bash
rm -r data/sessions/*/agent-runner-src 2>/dev/null || true
```
Rebuild the container (agent-runner changed):
```bash
cd container && ./build.sh
```
Then compile and restart:
```bash
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
# Linux: systemctl --user restart nanoclaw
```
## Phase 4: Verify
### Test tool access (both modes)
Tell the user:
> Gmail is connected! Send this in your main channel:
>
> `@Andy check my recent emails` or `@Andy list my Gmail labels`
### Test channel mode (Channel mode only)
Tell the user to send themselves a test email. The agent should pick it up within a minute. Monitor: `tail -f logs/nanoclaw.log | grep -iE "(gmail|email)"`.
Once verified, offer filter customization via `AskUserQuestion` — by default, only emails in the Primary inbox trigger the agent (Promotions, Social, Updates, and Forums are excluded). The user can keep this default or narrow further by sender, label, or keywords. No code changes needed for filters.
### Check logs if needed
```bash
tail -f logs/nanoclaw.log
```
## Troubleshooting
### Gmail connection not responding
Test directly:
```bash
npx -y @gongrzhe/server-gmail-autoauth-mcp
```
### OAuth token expired
Re-authorize:
```bash
rm ~/.gmail-mcp/credentials.json
npx -y @gongrzhe/server-gmail-autoauth-mcp
```
### Container can't access Gmail
- Verify `~/.gmail-mcp` is mounted: check `src/container-runner.ts` for the `.gmail-mcp` mount
- Check container logs: `cat groups/main/logs/container-*.log | tail -50`
### Emails not being detected (Channel mode only)
- By default, the channel polls unread Primary inbox emails (`is:unread category:primary`)
- Check logs for Gmail polling errors
## Removal
### Tool-only mode
1. Remove `~/.gmail-mcp` mount from `src/container-runner.ts`
2. Remove `gmail` MCP server and `mcp__gmail__*` from `container/agent-runner/src/index.ts`
3. Rebuild and restart
4. Clear stale agent-runner copies: `rm -r data/sessions/*/agent-runner-src 2>/dev/null || true`
5. Rebuild: `cd container && ./build.sh && cd .. && npm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw` (macOS) or `systemctl --user restart nanoclaw` (Linux)
### Channel mode
1. Delete `src/channels/gmail.ts` and `src/channels/gmail.test.ts`
2. Remove `import './gmail.js'` from `src/channels/index.ts`
3. Remove `~/.gmail-mcp` mount from `src/container-runner.ts`
4. Remove `gmail` MCP server and `mcp__gmail__*` from `container/agent-runner/src/index.ts`
5. Uninstall: `npm uninstall googleapis`
6. Rebuild and restart
7. Clear stale agent-runner copies: `rm -r data/sessions/*/agent-runner-src 2>/dev/null || true`
8. Rebuild: `cd container && ./build.sh && cd .. && npm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw` (macOS) or `systemctl --user restart nanoclaw` (Linux)

View File

@@ -0,0 +1,90 @@
---
name: add-image-vision
description: Add image vision to NanoClaw agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks.
---
# Image Vision Skill
Adds the ability for NanoClaw agents to see and understand images sent via WhatsApp. Images are downloaded, resized with sharp, saved to the group workspace, and passed to the agent as base64-encoded multimodal content blocks.
## Phase 1: Pre-flight
1. Check if `src/image.ts` exists — skip to Phase 3 if already applied
2. Confirm `sharp` is installable (native bindings require build tools)
**Prerequisite:** WhatsApp must be installed first (`skill/whatsapp` merged). This skill modifies WhatsApp channel files.
## Phase 2: Apply Code Changes
### Ensure WhatsApp fork remote
```bash
git remote -v
```
If `whatsapp` is missing, add it:
```bash
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
```
### Merge the skill branch
```bash
git fetch whatsapp skill/image-vision
git merge whatsapp/skill/image-vision
```
This merges in:
- `src/image.ts` (image download, resize via sharp, base64 encoding)
- `src/image.test.ts` (8 unit tests)
- Image attachment handling in `src/channels/whatsapp.ts`
- Image passing to agent in `src/index.ts` and `src/container-runner.ts`
- Image content block support in `container/agent-runner/src/index.ts`
- `sharp` npm dependency in `package.json`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Validate code changes
```bash
npm install
npm run build
npx vitest run src/image.test.ts
```
All tests must pass and build must be clean before proceeding.
## Phase 3: Configure
1. Rebuild the container (agent-runner changes need a rebuild):
```bash
./container/build.sh
```
2. Sync agent-runner source to group caches:
```bash
for dir in data/sessions/*/agent-runner-src/; do
cp container/agent-runner/src/*.ts "$dir"
done
```
3. Restart the service:
```bash
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
```
## Phase 4: Verify
1. Send an image in a registered WhatsApp group
2. Check the agent responds with understanding of the image content
3. Check logs for "Processed image attachment":
```bash
tail -50 groups/*/logs/container-*.log
```
## Troubleshooting
- **"Image - download failed"**: Check WhatsApp connection stability. The download may timeout on slow connections.
- **"Image - processing failed"**: Sharp may not be installed correctly. Run `npm ls sharp` to verify.
- **Agent doesn't mention image content**: Check container logs for "Loaded image" messages. If missing, ensure agent-runner source was synced to group caches.

View File

@@ -0,0 +1,153 @@
---
name: add-ollama-tool
description: Add Ollama MCP server so the container agent can call local models for cheaper/faster tasks like summarization, translation, or general queries.
---
# Add Ollama Integration
This skill adds a stdio-based MCP server that exposes local Ollama models as tools for the container agent. Claude remains the orchestrator but can offload work to local models.
Tools added:
- `ollama_list_models` — lists installed Ollama models
- `ollama_generate` — sends a prompt to a specified model and returns the response
## Phase 1: Pre-flight
### Check if already applied
Check if `container/agent-runner/src/ollama-mcp-stdio.ts` exists. If it does, skip to Phase 3 (Configure).
### Check prerequisites
Verify Ollama is installed and running on the host:
```bash
ollama list
```
If Ollama is not installed, direct the user to https://ollama.com/download.
If no models are installed, suggest pulling one:
> You need at least one model. I recommend:
>
> ```bash
> ollama pull gemma3:1b # Small, fast (1GB)
> ollama pull llama3.2 # Good general purpose (2GB)
> ollama pull qwen3-coder:30b # Best for code tasks (18GB)
> ```
## Phase 2: Apply Code Changes
### Ensure upstream remote
```bash
git remote -v
```
If `upstream` is missing, add it:
```bash
git remote add upstream https://github.com/qwibitai/nanoclaw.git
```
### Merge the skill branch
```bash
git fetch upstream skill/ollama-tool
git merge upstream/skill/ollama-tool
```
This merges in:
- `container/agent-runner/src/ollama-mcp-stdio.ts` (Ollama MCP server)
- `scripts/ollama-watch.sh` (macOS notification watcher)
- Ollama MCP config in `container/agent-runner/src/index.ts` (allowedTools + mcpServers)
- `[OLLAMA]` log surfacing in `src/container-runner.ts`
- `OLLAMA_HOST` in `.env.example`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Copy to per-group agent-runner
Existing groups have a cached copy of the agent-runner source. Copy the new files:
```bash
for dir in data/sessions/*/agent-runner-src; do
cp container/agent-runner/src/ollama-mcp-stdio.ts "$dir/"
cp container/agent-runner/src/index.ts "$dir/"
done
```
### Validate code changes
```bash
npm run build
./container/build.sh
```
Build must be clean before proceeding.
## Phase 3: Configure
### Set Ollama host (optional)
By default, the MCP server connects to `http://host.docker.internal:11434` (Docker Desktop) with a fallback to `localhost`. To use a custom Ollama host, add to `.env`:
```bash
OLLAMA_HOST=http://your-ollama-host:11434
```
### Restart the service
```bash
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
# Linux: systemctl --user restart nanoclaw
```
## Phase 4: Verify
### Test via WhatsApp
Tell the user:
> Send a message like: "use ollama to tell me the capital of France"
>
> The agent should use `ollama_list_models` to find available models, then `ollama_generate` to get a response.
### Monitor activity (optional)
Run the watcher script for macOS notifications when Ollama is used:
```bash
./scripts/ollama-watch.sh
```
### Check logs if needed
```bash
tail -f logs/nanoclaw.log | grep -i ollama
```
Look for:
- `Agent output: ... Ollama ...` — agent used Ollama successfully
- `[OLLAMA] >>> Generating` — generation started (if log surfacing works)
- `[OLLAMA] <<< Done` — generation completed
## Troubleshooting
### Agent says "Ollama is not installed"
The agent is trying to run `ollama` CLI inside the container instead of using the MCP tools. This means:
1. The MCP server wasn't registered — check `container/agent-runner/src/index.ts` has the `ollama` entry in `mcpServers`
2. The per-group source wasn't updated — re-copy files (see Phase 2)
3. The container wasn't rebuilt — run `./container/build.sh`
### "Failed to connect to Ollama"
1. Verify Ollama is running: `ollama list`
2. Check Docker can reach the host: `docker run --rm curlimages/curl curl -s http://host.docker.internal:11434/api/tags`
3. If using a custom host, check `OLLAMA_HOST` in `.env`
### Agent doesn't use Ollama tools
The agent may not know about the tools. Try being explicit: "use the ollama_generate tool with gemma3:1b to answer: ..."

View File

@@ -0,0 +1,100 @@
---
name: add-pdf-reader
description: Add PDF reading to NanoClaw agents. Extracts text from PDFs via pdftotext CLI. Handles WhatsApp attachments, URLs, and local files.
---
# Add PDF Reader
Adds PDF reading capability to all container agents using poppler-utils (pdftotext/pdfinfo). PDFs sent as WhatsApp attachments are auto-downloaded to the group workspace.
## Phase 1: Pre-flight
1. Check if `container/skills/pdf-reader/pdf-reader` exists — skip to Phase 3 if already applied
2. Confirm WhatsApp is installed first (`skill/whatsapp` merged). This skill modifies WhatsApp channel files.
## Phase 2: Apply Code Changes
### Ensure WhatsApp fork remote
```bash
git remote -v
```
If `whatsapp` is missing, add it:
```bash
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
```
### Merge the skill branch
```bash
git fetch whatsapp skill/pdf-reader
git merge whatsapp/skill/pdf-reader
```
This merges in:
- `container/skills/pdf-reader/SKILL.md` (agent-facing documentation)
- `container/skills/pdf-reader/pdf-reader` (CLI script)
- `poppler-utils` in `container/Dockerfile`
- PDF attachment download in `src/channels/whatsapp.ts`
- PDF tests in `src/channels/whatsapp.test.ts`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Validate
```bash
npm run build
npx vitest run src/channels/whatsapp.test.ts
```
### Rebuild container
```bash
./container/build.sh
```
### Restart service
```bash
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
# Linux: systemctl --user restart nanoclaw
```
## Phase 3: Verify
### Test PDF extraction
Send a PDF file in any registered WhatsApp chat. The agent should:
1. Download the PDF to `attachments/`
2. Respond acknowledging the PDF
3. Be able to extract text when asked
### Test URL fetching
Ask the agent to read a PDF from a URL. It should use `pdf-reader fetch <url>`.
### Check logs if needed
```bash
tail -f logs/nanoclaw.log | grep -i pdf
```
Look for:
- `Downloaded PDF attachment` — successful download
- `Failed to download PDF attachment` — media download issue
## Troubleshooting
### Agent says pdf-reader command not found
Container needs rebuilding. Run `./container/build.sh` and restart the service.
### PDF text extraction is empty
The PDF may be scanned (image-based). pdftotext only handles text-based PDFs. Consider using the agent-browser to open the PDF visually instead.
### WhatsApp PDF not detected
Verify the message has `documentMessage` with `mimetype: application/pdf`. Some file-sharing apps send PDFs as generic files without the correct mimetype.

View File

@@ -0,0 +1,113 @@
---
name: add-reactions
description: Add WhatsApp emoji reaction support — receive, send, store, and search reactions.
---
# Add Reactions
This skill adds emoji reaction support to NanoClaw's WhatsApp channel: receive and store reactions, send reactions from the container agent via MCP tool, and query reaction history from SQLite.
## Phase 1: Pre-flight
### Check if already applied
Check if `src/status-tracker.ts` exists:
```bash
test -f src/status-tracker.ts && echo "Already applied" || echo "Not applied"
```
If already applied, skip to Phase 3 (Verify).
## Phase 2: Apply Code Changes
### Ensure WhatsApp fork remote
```bash
git remote -v
```
If `whatsapp` is missing, add it:
```bash
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
```
### Merge the skill branch
```bash
git fetch whatsapp skill/reactions
git merge whatsapp/skill/reactions
```
This adds:
- `scripts/migrate-reactions.ts` (database migration for `reactions` table with composite PK and indexes)
- `src/status-tracker.ts` (forward-only emoji state machine for message lifecycle signaling, with persistence and retry)
- `src/status-tracker.test.ts` (unit tests for StatusTracker)
- `container/skills/reactions/SKILL.md` (agent-facing documentation for the `react_to_message` MCP tool)
- Reaction support in `src/db.ts`, `src/channels/whatsapp.ts`, `src/types.ts`, `src/ipc.ts`, `src/index.ts`, `src/group-queue.ts`, and `container/agent-runner/src/ipc-mcp-stdio.ts`
### Run database migration
```bash
npx tsx scripts/migrate-reactions.ts
```
### Validate code changes
```bash
npm test
npm run build
```
All tests must pass and build must be clean before proceeding.
## Phase 3: Verify
### Build and restart
```bash
npm run build
```
Linux:
```bash
systemctl --user restart nanoclaw
```
macOS:
```bash
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
```
### Test receiving reactions
1. Send a message from your phone
2. React to it with an emoji on WhatsApp
3. Check the database:
```bash
sqlite3 store/messages.db "SELECT * FROM reactions ORDER BY timestamp DESC LIMIT 5;"
```
### Test sending reactions
Ask the agent to react to a message via the `react_to_message` MCP tool. Check your phone — the reaction should appear on the message.
## Troubleshooting
### Reactions not appearing in database
- Check NanoClaw logs for `Failed to process reaction` errors
- Verify the chat is registered
- Confirm the service is running
### Migration fails
- Ensure `store/messages.db` exists and is accessible
- If "table reactions already exists", the migration already ran — skip it
### Agent can't send reactions
- Check IPC logs for `Unauthorized IPC reaction attempt blocked` — the agent can only react in its own group's chat
- Verify WhatsApp is connected: check logs for connection status

View File

@@ -0,0 +1,216 @@
---
name: add-slack
description: Add Slack as a channel. Can replace WhatsApp entirely or run alongside it. Uses Socket Mode (no public URL needed).
---
# Add Slack Channel
This skill adds Slack support to NanoClaw, then walks through interactive setup.
## Phase 1: Pre-flight
### Check if already applied
Check if `src/channels/slack.ts` exists. If it does, skip to Phase 3 (Setup). The code changes are already in place.
### Ask the user
**Do they already have a Slack app configured?** If yes, collect the Bot Token and App Token now. If no, we'll create one in Phase 3.
## Phase 2: Apply Code Changes
### Ensure channel remote
```bash
git remote -v
```
If `slack` is missing, add it:
```bash
git remote add slack https://github.com/qwibitai/nanoclaw-slack.git
```
### Merge the skill branch
```bash
git fetch slack main
git merge slack/main
```
This merges in:
- `src/channels/slack.ts` (SlackChannel class with self-registration via `registerChannel`)
- `src/channels/slack.test.ts` (46 unit tests)
- `import './slack.js'` appended to the channel barrel file `src/channels/index.ts`
- `@slack/bolt` npm dependency in `package.json`
- `SLACK_BOT_TOKEN` and `SLACK_APP_TOKEN` in `.env.example`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Validate code changes
```bash
npm install
npm run build
npx vitest run src/channels/slack.test.ts
```
All tests must pass (including the new Slack tests) and build must be clean before proceeding.
## Phase 3: Setup
### Create Slack App (if needed)
If the user doesn't have a Slack app, share [SLACK_SETUP.md](SLACK_SETUP.md) which has step-by-step instructions with screenshots guidance, troubleshooting, and a token reference table.
Quick summary of what's needed:
1. Create a Slack app at [api.slack.com/apps](https://api.slack.com/apps)
2. Enable Socket Mode and generate an App-Level Token (`xapp-...`)
3. Subscribe to bot events: `message.channels`, `message.groups`, `message.im`
4. Add OAuth scopes: `chat:write`, `channels:history`, `groups:history`, `im:history`, `channels:read`, `groups:read`, `users:read`
5. Install to workspace and copy the Bot Token (`xoxb-...`)
Wait for the user to provide both tokens.
### Configure environment
Add to `.env`:
```bash
SLACK_BOT_TOKEN=xoxb-your-bot-token
SLACK_APP_TOKEN=xapp-your-app-token
```
Channels auto-enable when their credentials are present — no extra configuration needed.
Sync to container environment:
```bash
mkdir -p data/env && cp .env data/env/env
```
The container reads environment from `data/env/env`, not `.env` directly.
### Build and restart
```bash
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
```
## Phase 4: Registration
### Get Channel ID
Tell the user:
> 1. Add the bot to a Slack channel (right-click channel → **View channel details** → **Integrations** → **Add apps**)
> 2. In that channel, the channel ID is in the URL when you open it in a browser: `https://app.slack.com/client/T.../C0123456789` — the `C...` part is the channel ID
> 3. Alternatively, right-click the channel name → **Copy link** — the channel ID is the last path segment
>
> The JID format for NanoClaw is: `slack:C0123456789`
Wait for the user to provide the channel ID.
### Register the channel
Use the IPC register flow or register directly. The channel ID, name, and folder name are needed.
For a main channel (responds to all messages):
```typescript
registerGroup("slack:<channel-id>", {
name: "<channel-name>",
folder: "slack_main",
trigger: `@${ASSISTANT_NAME}`,
added_at: new Date().toISOString(),
requiresTrigger: false,
isMain: true,
});
```
For additional channels (trigger-only):
```typescript
registerGroup("slack:<channel-id>", {
name: "<channel-name>",
folder: "slack_<channel-name>",
trigger: `@${ASSISTANT_NAME}`,
added_at: new Date().toISOString(),
requiresTrigger: true,
});
```
## Phase 5: Verify
### Test the connection
Tell the user:
> Send a message in your registered Slack channel:
> - For main channel: Any message works
> - For non-main: `@<assistant-name> hello` (using the configured trigger word)
>
> The bot should respond within a few seconds.
### Check logs if needed
```bash
tail -f logs/nanoclaw.log
```
## Troubleshooting
### Bot not responding
1. Check `SLACK_BOT_TOKEN` and `SLACK_APP_TOKEN` are set in `.env` AND synced to `data/env/env`
2. Check channel is registered: `sqlite3 store/messages.db "SELECT * FROM registered_groups WHERE jid LIKE 'slack:%'"`
3. For non-main channels: message must include trigger pattern
4. Service is running: `launchctl list | grep nanoclaw`
### Bot connected but not receiving messages
1. Verify Socket Mode is enabled in the Slack app settings
2. Verify the bot is subscribed to the correct events (`message.channels`, `message.groups`, `message.im`)
3. Verify the bot has been added to the channel
4. Check that the bot has the required OAuth scopes
### Bot not seeing messages in channels
By default, bots only see messages in channels they've been explicitly added to. Make sure to:
1. Add the bot to each channel you want it to monitor
2. Check the bot has `channels:history` and/or `groups:history` scopes
### "missing_scope" errors
If the bot logs `missing_scope` errors:
1. Go to **OAuth & Permissions** in your Slack app settings
2. Add the missing scope listed in the error message
3. **Reinstall the app** to your workspace — scope changes require reinstallation
4. Copy the new Bot Token (it changes on reinstall) and update `.env`
5. Sync: `mkdir -p data/env && cp .env data/env/env`
6. Restart: `launchctl kickstart -k gui/$(id -u)/com.nanoclaw`
### Getting channel ID
If the channel ID is hard to find:
- In Slack desktop: right-click channel → **Copy link** → extract the `C...` ID from the URL
- In Slack web: the URL shows `https://app.slack.com/client/TXXXXXXX/C0123456789`
- Via API: `curl -s -H "Authorization: Bearer $SLACK_BOT_TOKEN" "https://slack.com/api/conversations.list" | jq '.channels[] | {id, name}'`
## After Setup
The Slack channel supports:
- **Public channels** — Bot must be added to the channel
- **Private channels** — Bot must be invited to the channel
- **Direct messages** — Users can DM the bot directly
- **Multi-channel** — Can run alongside WhatsApp or other channels (auto-enabled by credentials)
## Known Limitations
- **Threads are flattened** — Threaded replies are delivered to the agent as regular channel messages. The agent sees them but has no awareness they originated in a thread. Responses always go to the channel, not back into the thread. Users in a thread will need to check the main channel for the bot's reply. Full thread-aware routing (respond in-thread) requires pipeline-wide changes: database schema, `NewMessage` type, `Channel.sendMessage` interface, and routing logic.
- **No typing indicator** — Slack's Bot API does not expose a typing indicator endpoint. The `setTyping()` method is a no-op. Users won't see "bot is typing..." while the agent works.
- **Message splitting is naive** — Long messages are split at a fixed 4000-character boundary, which may break mid-word or mid-sentence. A smarter split (on paragraph or sentence boundaries) would improve readability.
- **No file/image handling** — The bot only processes text content. File uploads, images, and rich message blocks are not forwarded to the agent.
- **Channel metadata sync is unbounded** — `syncChannelMetadata()` paginates through all channels the bot is a member of, but has no upper bound or timeout. Workspaces with thousands of channels may experience slow startup.
- **Workspace admin policies not detected** — If the Slack workspace restricts bot app installation, the setup will fail at the "Install to Workspace" step with no programmatic detection or guidance. See SLACK_SETUP.md troubleshooting section.

View File

@@ -0,0 +1,231 @@
---
name: add-telegram
description: Add Telegram as a channel. Can replace WhatsApp entirely or run alongside it. Also configurable as a control-only channel (triggers actions) or passive channel (receives notifications only).
---
# Add Telegram Channel
This skill adds Telegram support to NanoClaw, then walks through interactive setup.
## Phase 1: Pre-flight
### Check if already applied
Check if `src/channels/telegram.ts` exists. If it does, skip to Phase 3 (Setup). The code changes are already in place.
### Ask the user
Use `AskUserQuestion` to collect configuration:
AskUserQuestion: Do you have a Telegram bot token, or do you need to create one?
If they have one, collect it now. If not, we'll create one in Phase 3.
## Phase 2: Apply Code Changes
### Ensure channel remote
```bash
git remote -v
```
If `telegram` is missing, add it:
```bash
git remote add telegram https://github.com/qwibitai/nanoclaw-telegram.git
```
### Merge the skill branch
```bash
git fetch telegram main
git merge telegram/main
```
This merges in:
- `src/channels/telegram.ts` (TelegramChannel class with self-registration via `registerChannel`)
- `src/channels/telegram.test.ts` (unit tests with grammy mock)
- `import './telegram.js'` appended to the channel barrel file `src/channels/index.ts`
- `grammy` npm dependency in `package.json`
- `TELEGRAM_BOT_TOKEN` in `.env.example`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Validate code changes
```bash
npm install
npm run build
npx vitest run src/channels/telegram.test.ts
```
All tests must pass (including the new Telegram tests) and build must be clean before proceeding.
## Phase 3: Setup
### Create Telegram Bot (if needed)
If the user doesn't have a bot token, tell them:
> I need you to create a Telegram bot:
>
> 1. Open Telegram and search for `@BotFather`
> 2. Send `/newbot` and follow prompts:
> - Bot name: Something friendly (e.g., "Andy Assistant")
> - Bot username: Must end with "bot" (e.g., "andy_ai_bot")
> 3. Copy the bot token (looks like `123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11`)
Wait for the user to provide the token.
### Configure environment
Add to `.env`:
```bash
TELEGRAM_BOT_TOKEN=<their-token>
```
Channels auto-enable when their credentials are present — no extra configuration needed.
Sync to container environment:
```bash
mkdir -p data/env && cp .env data/env/env
```
The container reads environment from `data/env/env`, not `.env` directly.
### Disable Group Privacy (for group chats)
Tell the user:
> **Important for group chats**: By default, Telegram bots only see @mentions and commands in groups. To let the bot see all messages:
>
> 1. Open Telegram and search for `@BotFather`
> 2. Send `/mybots` and select your bot
> 3. Go to **Bot Settings** > **Group Privacy** > **Turn off**
>
> This is optional if you only want trigger-based responses via @mentioning the bot.
### Build and restart
```bash
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
# Linux: systemctl --user restart nanoclaw
```
## Phase 4: Registration
### Get Chat ID
Tell the user:
> 1. Open your bot in Telegram (search for its username)
> 2. Send `/chatid` — it will reply with the chat ID
> 3. For groups: add the bot to the group first, then send `/chatid` in the group
Wait for the user to provide the chat ID (format: `tg:123456789` or `tg:-1001234567890`).
### Register the chat
Use the IPC register flow or register directly. The chat ID, name, and folder name are needed.
For a main chat (responds to all messages):
```typescript
registerGroup("tg:<chat-id>", {
name: "<chat-name>",
folder: "telegram_main",
trigger: `@${ASSISTANT_NAME}`,
added_at: new Date().toISOString(),
requiresTrigger: false,
isMain: true,
});
```
For additional chats (trigger-only):
```typescript
registerGroup("tg:<chat-id>", {
name: "<chat-name>",
folder: "telegram_<group-name>",
trigger: `@${ASSISTANT_NAME}`,
added_at: new Date().toISOString(),
requiresTrigger: true,
});
```
## Phase 5: Verify
### Test the connection
Tell the user:
> Send a message to your registered Telegram chat:
> - For main chat: Any message works
> - For non-main: `@Andy hello` or @mention the bot
>
> The bot should respond within a few seconds.
### Check logs if needed
```bash
tail -f logs/nanoclaw.log
```
## Troubleshooting
### Bot not responding
Check:
1. `TELEGRAM_BOT_TOKEN` is set in `.env` AND synced to `data/env/env`
2. Chat is registered in SQLite (check with: `sqlite3 store/messages.db "SELECT * FROM registered_groups WHERE jid LIKE 'tg:%'"`)
3. For non-main chats: message includes trigger pattern
4. Service is running: `launchctl list | grep nanoclaw` (macOS) or `systemctl --user status nanoclaw` (Linux)
### Bot only responds to @mentions in groups
Group Privacy is enabled (default). Fix:
1. `@BotFather` > `/mybots` > select bot > **Bot Settings** > **Group Privacy** > **Turn off**
2. Remove and re-add the bot to the group (required for the change to take effect)
### Getting chat ID
If `/chatid` doesn't work:
- Verify token: `curl -s "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/getMe"`
- Check bot is started: `tail -f logs/nanoclaw.log`
## After Setup
If running `npm run dev` while the service is active:
```bash
# macOS:
launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
npm run dev
# When done testing:
launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist
# Linux:
# systemctl --user stop nanoclaw
# npm run dev
# systemctl --user start nanoclaw
```
## Agent Swarms (Teams)
After completing the Telegram setup, use `AskUserQuestion`:
AskUserQuestion: Would you like to add Agent Swarm support? Without it, Agent Teams still work — they just operate behind the scenes. With Swarm support, each subagent appears as a different bot in the Telegram group so you can see who's saying what and have interactive team sessions.
If they say yes, invoke the `/add-telegram-swarm` skill.
## Removal
To remove Telegram integration:
1. Delete `src/channels/telegram.ts` and `src/channels/telegram.test.ts`
2. Remove `import './telegram.js'` from `src/channels/index.ts`
3. Remove `TELEGRAM_BOT_TOKEN` from `.env`
4. Remove Telegram registrations from SQLite: `sqlite3 store/messages.db "DELETE FROM registered_groups WHERE jid LIKE 'tg:%'"`
5. Uninstall: `npm uninstall grammy`
6. Rebuild: `npm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw` (macOS) or `npm run build && systemctl --user restart nanoclaw` (Linux)

View File

@@ -0,0 +1,144 @@
---
name: add-voice-transcription
description: Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.
---
# Add Voice Transcription
This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as `[Voice: <transcript>]`.
## Phase 1: Pre-flight
### Check if already applied
Check if `src/transcription.ts` exists. If it does, skip to Phase 3 (Configure). The code changes are already in place.
### Ask the user
Use `AskUserQuestion` to collect information:
AskUserQuestion: Do you have an OpenAI API key for Whisper transcription?
If yes, collect it now. If no, direct them to create one at https://platform.openai.com/api-keys.
## Phase 2: Apply Code Changes
**Prerequisite:** WhatsApp must be installed first (`skill/whatsapp` merged). This skill modifies WhatsApp channel files.
### Ensure WhatsApp fork remote
```bash
git remote -v
```
If `whatsapp` is missing, add it:
```bash
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
```
### Merge the skill branch
```bash
git fetch whatsapp skill/voice-transcription
git merge whatsapp/skill/voice-transcription
```
This merges in:
- `src/transcription.ts` (voice transcription module using OpenAI Whisper)
- Voice handling in `src/channels/whatsapp.ts` (isVoiceMessage check, transcribeAudioMessage call)
- Transcription tests in `src/channels/whatsapp.test.ts`
- `openai` npm dependency in `package.json`
- `OPENAI_API_KEY` in `.env.example`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Validate code changes
```bash
npm install --legacy-peer-deps
npm run build
npx vitest run src/channels/whatsapp.test.ts
```
All tests must pass and build must be clean before proceeding.
## Phase 3: Configure
### Get OpenAI API key (if needed)
If the user doesn't have an API key:
> I need you to create an OpenAI API key:
>
> 1. Go to https://platform.openai.com/api-keys
> 2. Click "Create new secret key"
> 3. Give it a name (e.g., "NanoClaw Transcription")
> 4. Copy the key (starts with `sk-`)
>
> Cost: ~$0.006 per minute of audio (~$0.003 per typical 30-second voice note)
Wait for the user to provide the key.
### Add to environment
Add to `.env`:
```bash
OPENAI_API_KEY=<their-key>
```
Sync to container environment:
```bash
mkdir -p data/env && cp .env data/env/env
```
The container reads environment from `data/env/env`, not `.env` directly.
### Build and restart
```bash
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
# Linux: systemctl --user restart nanoclaw
```
## Phase 4: Verify
### Test with a voice note
Tell the user:
> Send a voice note in any registered WhatsApp chat. The agent should receive it as `[Voice: <transcript>]` and respond to its content.
### Check logs if needed
```bash
tail -f logs/nanoclaw.log | grep -i voice
```
Look for:
- `Transcribed voice message` — successful transcription with character count
- `OPENAI_API_KEY not set` — key missing from `.env`
- `OpenAI transcription failed` — API error (check key validity, billing)
- `Failed to download audio message` — media download issue
## Troubleshooting
### Voice notes show "[Voice Message - transcription unavailable]"
1. Check `OPENAI_API_KEY` is set in `.env` AND synced to `data/env/env`
2. Verify key works: `curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200`
3. Check OpenAI billing — Whisper requires a funded account
### Voice notes show "[Voice Message - transcription failed]"
Check logs for the specific error. Common causes:
- Network timeout — transient, will work on next message
- Invalid API key — regenerate at https://platform.openai.com/api-keys
- Rate limiting — wait and retry
### Agent doesn't respond to voice notes
Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.

View File

@@ -0,0 +1,368 @@
---
name: add-whatsapp
description: Add WhatsApp as a channel. Can replace other channels entirely or run alongside them. Uses QR code or pairing code for authentication.
---
# Add WhatsApp Channel
This skill adds WhatsApp support to NanoClaw. It installs the WhatsApp channel code, dependencies, and guides through authentication, registration, and configuration.
## Phase 1: Pre-flight
### Check current state
Check if WhatsApp is already configured. If `store/auth/` exists with credential files, skip to Phase 4 (Registration) or Phase 5 (Verify).
```bash
ls store/auth/creds.json 2>/dev/null && echo "WhatsApp auth exists" || echo "No WhatsApp auth"
```
### Detect environment
Check whether the environment is headless (no display server):
```bash
[[ -z "$DISPLAY" && -z "$WAYLAND_DISPLAY" && "$OSTYPE" != darwin* ]] && echo "IS_HEADLESS=true" || echo "IS_HEADLESS=false"
```
### Ask the user
Use `AskUserQuestion` to collect configuration. **Adapt auth options based on environment:**
If IS_HEADLESS=true AND not WSL → AskUserQuestion: How do you want to authenticate WhatsApp?
- **Pairing code** (Recommended) - Enter a numeric code on your phone (no camera needed, requires phone number)
- **QR code in terminal** - Displays QR code in the terminal (can be too small on some displays)
Otherwise (macOS, desktop Linux, or WSL) → AskUserQuestion: How do you want to authenticate WhatsApp?
- **QR code in browser** (Recommended) - Opens a browser window with a large, scannable QR code
- **Pairing code** - Enter a numeric code on your phone (no camera needed, requires phone number)
- **QR code in terminal** - Displays QR code in the terminal (can be too small on some displays)
If they chose pairing code:
AskUserQuestion: What is your phone number? (Include country code without +, e.g., 1234567890)
## Phase 2: Apply Code Changes
Check if `src/channels/whatsapp.ts` already exists. If it does, skip to Phase 3 (Authentication).
### Ensure channel remote
```bash
git remote -v
```
If `whatsapp` is missing, add it:
```bash
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
```
### Merge the skill branch
```bash
git fetch whatsapp main
git merge whatsapp/main
```
This merges in:
- `src/channels/whatsapp.ts` (WhatsAppChannel class with self-registration via `registerChannel`)
- `src/channels/whatsapp.test.ts` (41 unit tests)
- `src/whatsapp-auth.ts` (standalone WhatsApp authentication script)
- `setup/whatsapp-auth.ts` (WhatsApp auth setup step)
- `import './whatsapp.js'` appended to the channel barrel file `src/channels/index.ts`
- `'whatsapp-auth'` step added to `setup/index.ts`
- `@whiskeysockets/baileys`, `qrcode`, `qrcode-terminal` npm dependencies in `package.json`
- `ASSISTANT_HAS_OWN_NUMBER` in `.env.example`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Validate code changes
```bash
npm install
npm run build
npx vitest run src/channels/whatsapp.test.ts
```
All tests must pass and build must be clean before proceeding.
## Phase 3: Authentication
### Clean previous auth state (if re-authenticating)
```bash
rm -rf store/auth/
```
### Run WhatsApp authentication
For QR code in browser (recommended):
```bash
npx tsx setup/index.ts --step whatsapp-auth -- --method qr-browser
```
(Bash timeout: 150000ms)
Tell the user:
> A browser window will open with a QR code.
>
> 1. Open WhatsApp > **Settings** > **Linked Devices** > **Link a Device**
> 2. Scan the QR code in the browser
> 3. The page will show "Authenticated!" when done
For QR code in terminal:
```bash
npx tsx setup/index.ts --step whatsapp-auth -- --method qr-terminal
```
Tell the user to run `npm run auth` in another terminal, then:
> 1. Open WhatsApp > **Settings** > **Linked Devices** > **Link a Device**
> 2. Scan the QR code displayed in the terminal
For pairing code:
Tell the user to have WhatsApp open on **Settings > Linked Devices > Link a Device**, ready to tap **"Link with phone number instead"** — the code expires in ~60 seconds and must be entered immediately.
Run the auth process in the background and poll `store/pairing-code.txt` for the code:
```bash
rm -f store/pairing-code.txt && npx tsx setup/index.ts --step whatsapp-auth -- --method pairing-code --phone <their-phone-number> > /tmp/wa-auth.log 2>&1 &
```
Then immediately poll for the code (do NOT wait for the background command to finish):
```bash
for i in $(seq 1 20); do [ -f store/pairing-code.txt ] && cat store/pairing-code.txt && break; sleep 1; done
```
Display the code to the user the moment it appears. Tell them:
> **Enter this code now** — it expires in ~60 seconds.
>
> 1. Open WhatsApp > **Settings** > **Linked Devices** > **Link a Device**
> 2. Tap **Link with phone number instead**
> 3. Enter the code immediately
After the user enters the code, poll for authentication to complete:
```bash
for i in $(seq 1 60); do grep -q 'AUTH_STATUS: authenticated' /tmp/wa-auth.log 2>/dev/null && echo "authenticated" && break; grep -q 'AUTH_STATUS: failed' /tmp/wa-auth.log 2>/dev/null && echo "failed" && break; sleep 2; done
```
**If failed:** qr_timeout → re-run. logged_out → delete `store/auth/` and re-run. 515 → re-run. timeout → ask user, offer retry.
### Verify authentication succeeded
```bash
test -f store/auth/creds.json && echo "Authentication successful" || echo "Authentication failed"
```
### Configure environment
Channels auto-enable when their credentials are present — WhatsApp activates when `store/auth/creds.json` exists.
Sync to container environment:
```bash
mkdir -p data/env && cp .env data/env/env
```
## Phase 4: Registration
### Configure trigger and channel type
Get the bot's WhatsApp number: `node -e "const c=require('./store/auth/creds.json');console.log(c.me.id.split(':')[0].split('@')[0])"`
AskUserQuestion: Is this a shared phone number (personal WhatsApp) or a dedicated number (separate device)?
- **Shared number** - Your personal WhatsApp number (recommended: use self-chat or a solo group)
- **Dedicated number** - A separate phone/SIM for the assistant
AskUserQuestion: What trigger word should activate the assistant?
- **@Andy** - Default trigger
- **@Claw** - Short and easy
- **@Claude** - Match the AI name
AskUserQuestion: What should the assistant call itself?
- **Andy** - Default name
- **Claw** - Short and easy
- **Claude** - Match the AI name
AskUserQuestion: Where do you want to chat with the assistant?
**Shared number options:**
- **Self-chat** (Recommended) - Chat in your own "Message Yourself" conversation
- **Solo group** - A group with just you and the linked device
- **Existing group** - An existing WhatsApp group
**Dedicated number options:**
- **DM with bot** (Recommended) - Direct message the bot's number
- **Solo group** - A group with just you and the bot
- **Existing group** - An existing WhatsApp group
### Get the JID
**Self-chat:** JID = your phone number with `@s.whatsapp.net`. Extract from auth credentials:
```bash
node -e "const c=JSON.parse(require('fs').readFileSync('store/auth/creds.json','utf-8'));console.log(c.me?.id?.split(':')[0]+'@s.whatsapp.net')"
```
**DM with bot:** Ask for the bot's phone number. JID = `NUMBER@s.whatsapp.net`
**Group (solo, existing):** Run group sync and list available groups:
```bash
npx tsx setup/index.ts --step groups
npx tsx setup/index.ts --step groups --list
```
The output shows `JID|GroupName` pairs. Present candidates as AskUserQuestion (names only, not JIDs).
### Register the chat
```bash
npx tsx setup/index.ts --step register \
--jid "<jid>" \
--name "<chat-name>" \
--trigger "@<trigger>" \
--folder "whatsapp_main" \
--channel whatsapp \
--assistant-name "<name>" \
--is-main \
--no-trigger-required # Only for main/self-chat
```
For additional groups (trigger-required):
```bash
npx tsx setup/index.ts --step register \
--jid "<group-jid>" \
--name "<group-name>" \
--trigger "@<trigger>" \
--folder "whatsapp_<group-name>" \
--channel whatsapp
```
## Phase 5: Verify
### Build and restart
```bash
npm run build
```
Restart the service:
```bash
# macOS (launchd)
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
# Linux (systemd)
systemctl --user restart nanoclaw
# Linux (nohup fallback)
bash start-nanoclaw.sh
```
### Test the connection
Tell the user:
> Send a message to your registered WhatsApp chat:
> - For self-chat / main: Any message works
> - For groups: Use the trigger word (e.g., "@Andy hello")
>
> The assistant should respond within a few seconds.
### Check logs if needed
```bash
tail -f logs/nanoclaw.log
```
## Troubleshooting
### QR code expired
QR codes expire after ~60 seconds. Re-run the auth command:
```bash
rm -rf store/auth/ && npx tsx src/whatsapp-auth.ts
```
### Pairing code not working
Codes expire in ~60 seconds. To retry:
```bash
rm -rf store/auth/ && npx tsx src/whatsapp-auth.ts --pairing-code --phone <phone>
```
Enter the code **immediately** when it appears. Also ensure:
1. Phone number includes country code without `+` (e.g., `1234567890`)
2. Phone has internet access
3. WhatsApp is updated to the latest version
If pairing code keeps failing, switch to QR-browser auth instead:
```bash
rm -rf store/auth/ && npx tsx setup/index.ts --step whatsapp-auth -- --method qr-browser
```
### "conflict" disconnection
This happens when two instances connect with the same credentials. Ensure only one NanoClaw process is running:
```bash
pkill -f "node dist/index.js"
# Then restart
```
### Bot not responding
Check:
1. Auth credentials exist: `ls store/auth/creds.json`
3. Chat is registered: `sqlite3 store/messages.db "SELECT * FROM registered_groups WHERE jid LIKE '%whatsapp%' OR jid LIKE '%@g.us' OR jid LIKE '%@s.whatsapp.net'"`
4. Service is running: `launchctl list | grep nanoclaw` (macOS) or `systemctl --user status nanoclaw` (Linux)
5. Logs: `tail -50 logs/nanoclaw.log`
### Group names not showing
Run group metadata sync:
```bash
npx tsx setup/index.ts --step groups
```
This fetches all group names from WhatsApp. Runs automatically every 24 hours.
## After Setup
If running `npm run dev` while the service is active:
```bash
# macOS:
launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
npm run dev
# When done testing:
launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist
# Linux:
# systemctl --user stop nanoclaw
# npm run dev
# systemctl --user start nanoclaw
```
## Removal
To remove WhatsApp integration:
1. Delete auth credentials: `rm -rf store/auth/`
2. Remove WhatsApp registrations: `sqlite3 store/messages.db "DELETE FROM registered_groups WHERE jid LIKE '%@g.us' OR jid LIKE '%@s.whatsapp.net'"`
3. Sync env: `mkdir -p data/env && cp .env data/env/env`
4. Rebuild and restart: `npm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw` (macOS) or `npm run build && systemctl --user restart nanoclaw` (Linux)

View File

@@ -0,0 +1,175 @@
---
name: convert-to-apple-container
description: Switch from Docker to Apple Container for macOS-native container isolation. Use when the user wants Apple Container instead of Docker, or is setting up on macOS and prefers the native runtime. Triggers on "apple container", "convert to apple container", "switch to apple container", or "use apple container".
---
# Convert to Apple Container
This skill switches NanoClaw's container runtime from Docker to Apple Container (macOS-only). It uses the skills engine for deterministic code changes, then walks through verification.
**What this changes:**
- Container runtime binary: `docker``container`
- Mount syntax: `-v path:path:ro``--mount type=bind,source=...,target=...,readonly`
- Startup check: `docker info``container system status` (with auto-start)
- Orphan detection: `docker ps --filter``container ls --format json`
- Build script default: `docker``container`
- Dockerfile entrypoint: `.env` shadowing via `mount --bind` inside the container (Apple Container only supports directory mounts, not file mounts like Docker's `/dev/null` overlay)
- Container runner: main-group containers start as root for `mount --bind`, then drop privileges via `setpriv`
**What stays the same:**
- Mount security/allowlist validation
- All exported interfaces and IPC protocol
- Non-main container behavior (still uses `--user` flag)
- All other functionality
## Prerequisites
Verify Apple Container is installed:
```bash
container --version && echo "Apple Container ready" || echo "Install Apple Container first"
```
If not installed:
- Download from https://github.com/apple/container/releases
- Install the `.pkg` file
- Verify: `container --version`
Apple Container requires macOS. It does not work on Linux.
## Phase 1: Pre-flight
### Check if already applied
```bash
grep "CONTAINER_RUNTIME_BIN" src/container-runtime.ts
```
If it already shows `'container'`, the runtime is already Apple Container. Skip to Phase 3.
## Phase 2: Apply Code Changes
### Ensure upstream remote
```bash
git remote -v
```
If `upstream` is missing, add it:
```bash
git remote add upstream https://github.com/qwibitai/nanoclaw.git
```
### Merge the skill branch
```bash
git fetch upstream skill/apple-container
git merge upstream/skill/apple-container
```
This merges in:
- `src/container-runtime.ts` — Apple Container implementation (replaces Docker)
- `src/container-runtime.test.ts` — Apple Container-specific tests
- `src/container-runner.ts` — .env shadow mount fix and privilege dropping
- `container/Dockerfile` — entrypoint that shadows .env via `mount --bind`
- `container/build.sh` — default runtime set to `container`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Validate code changes
```bash
npm test
npm run build
```
All tests must pass and build must be clean before proceeding.
## Phase 3: Verify
### Ensure Apple Container runtime is running
```bash
container system status || container system start
```
### Build the container image
```bash
./container/build.sh
```
### Test basic execution
```bash
echo '{}' | container run -i --entrypoint /bin/echo nanoclaw-agent:latest "Container OK"
```
### Test readonly mounts
```bash
mkdir -p /tmp/test-ro && echo "test" > /tmp/test-ro/file.txt
container run --rm --entrypoint /bin/bash \
--mount type=bind,source=/tmp/test-ro,target=/test,readonly \
nanoclaw-agent:latest \
-c "cat /test/file.txt && touch /test/new.txt 2>&1 || echo 'Write blocked (expected)'"
rm -rf /tmp/test-ro
```
Expected: Read succeeds, write fails with "Read-only file system".
### Test read-write mounts
```bash
mkdir -p /tmp/test-rw
container run --rm --entrypoint /bin/bash \
-v /tmp/test-rw:/test \
nanoclaw-agent:latest \
-c "echo 'test write' > /test/new.txt && cat /test/new.txt"
cat /tmp/test-rw/new.txt && rm -rf /tmp/test-rw
```
Expected: Both operations succeed.
### Full integration test
```bash
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
```
Send a message via WhatsApp and verify the agent responds.
## Troubleshooting
**Apple Container not found:**
- Download from https://github.com/apple/container/releases
- Install the `.pkg` file
- Verify: `container --version`
**Runtime won't start:**
```bash
container system start
container system status
```
**Image build fails:**
```bash
# Clean rebuild — Apple Container caches aggressively
container builder stop && container builder rm && container builder start
./container/build.sh
```
**Container can't write to mounted directories:**
Check directory permissions on the host. The container runs as uid 1000.
## Summary of Changed Files
| File | Type of Change |
|------|----------------|
| `src/container-runtime.ts` | Full replacement — Docker → Apple Container API |
| `src/container-runtime.test.ts` | Full replacement — tests for Apple Container behavior |
| `src/container-runner.ts` | .env shadow mount removed, main containers start as root with privilege drop |
| `container/Dockerfile` | Entrypoint: `mount --bind` for .env shadowing, `setpriv` privilege drop |
| `container/build.sh` | Default runtime: `docker``container` |

View File

@@ -0,0 +1,148 @@
---
name: use-local-whisper
description: Use when the user wants local voice transcription instead of OpenAI Whisper API. Switches to whisper.cpp running on Apple Silicon. WhatsApp only for now. Requires voice-transcription skill to be applied first.
---
# Use Local Whisper
Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost.
**Channel support:** Currently WhatsApp only. The transcription module (`src/transcription.ts`) uses Baileys types for audio download. Other channels (Telegram, Discord, etc.) would need their own audio-download logic before this skill can serve them.
**Note:** The Homebrew package is `whisper-cpp`, but the CLI binary it installs is `whisper-cli`.
## Prerequisites
- `voice-transcription` skill must be applied first (WhatsApp channel)
- macOS with Apple Silicon (M1+) recommended
- `whisper-cpp` installed: `brew install whisper-cpp` (provides the `whisper-cli` binary)
- `ffmpeg` installed: `brew install ffmpeg`
- A GGML model file downloaded to `data/models/`
## Phase 1: Pre-flight
### Check if already applied
Check if `src/transcription.ts` already uses `whisper-cli`:
```bash
grep 'whisper-cli' src/transcription.ts && echo "Already applied" || echo "Not applied"
```
If already applied, skip to Phase 3 (Verify).
### Check dependencies are installed
```bash
whisper-cli --help >/dev/null 2>&1 && echo "WHISPER_OK" || echo "WHISPER_MISSING"
ffmpeg -version >/dev/null 2>&1 && echo "FFMPEG_OK" || echo "FFMPEG_MISSING"
```
If missing, install via Homebrew:
```bash
brew install whisper-cpp ffmpeg
```
### Check for model file
```bash
ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL"
```
If no model exists, download the base model (148MB, good balance of speed and accuracy):
```bash
mkdir -p data/models
curl -L -o data/models/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
```
For better accuracy at the cost of speed, use `ggml-small.bin` (466MB) or `ggml-medium.bin` (1.5GB).
## Phase 2: Apply Code Changes
### Ensure WhatsApp fork remote
```bash
git remote -v
```
If `whatsapp` is missing, add it:
```bash
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
```
### Merge the skill branch
```bash
git fetch whatsapp skill/local-whisper
git merge whatsapp/skill/local-whisper
```
This modifies `src/transcription.ts` to use the `whisper-cli` binary instead of the OpenAI API.
### Validate
```bash
npm run build
```
## Phase 3: Verify
### Ensure launchd PATH includes Homebrew
The NanoClaw launchd service runs with a restricted PATH. `whisper-cli` and `ffmpeg` are in `/opt/homebrew/bin/` (Apple Silicon) or `/usr/local/bin/` (Intel), which may not be in the plist's PATH.
Check the current PATH:
```bash
grep -A1 'PATH' ~/Library/LaunchAgents/com.nanoclaw.plist
```
If `/opt/homebrew/bin` is missing, add it to the `<string>` value inside the `PATH` key in the plist. Then reload:
```bash
launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist
```
### Build and restart
```bash
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
```
### Test
Send a voice note in any registered group. The agent should receive it as `[Voice: <transcript>]`.
### Check logs
```bash
tail -f logs/nanoclaw.log | grep -i -E "voice|transcri|whisper"
```
Look for:
- `Transcribed voice message` — successful transcription
- `whisper.cpp transcription failed` — check model path, ffmpeg, or PATH
## Configuration
Environment variables (optional, set in `.env`):
| Variable | Default | Description |
|----------|---------|-------------|
| `WHISPER_BIN` | `whisper-cli` | Path to whisper.cpp binary |
| `WHISPER_MODEL` | `data/models/ggml-base.bin` | Path to GGML model file |
## Troubleshooting
**"whisper.cpp transcription failed"**: Ensure both `whisper-cli` and `ffmpeg` are in PATH. The launchd service uses a restricted PATH — see Phase 3 above. Test manually:
```bash
ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y
whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt
```
**Transcription works in dev but not as service**: The launchd plist PATH likely doesn't include `/opt/homebrew/bin`. See "Ensure launchd PATH includes Homebrew" in Phase 3.
**Slow transcription**: The base model processes ~30s of audio in <1s on M1+. If slower, check CPU usage — another process may be competing.
**Wrong language**: whisper.cpp auto-detects language. To force a language, you can set `WHISPER_LANG` and modify `src/transcription.ts` to pass `-l $WHISPER_LANG`.