feat: skills as branches, channels as forks

Replace the custom skills engine with standard git operations.
Feature skills are now git branches (on upstream or channel forks)
applied via `git merge`. Channels are separate fork repos.

- Remove skills-engine/ (6,300+ lines), apply/uninstall/rebase scripts
- Remove old skill format (add/, modify/, manifest.yaml) from all skills
- Remove old CI (skill-drift.yml, skill-pr.yml)
- Add merge-forward CI for upstream skill branches
- Add fork notification (repository_dispatch to channel forks)
- Add marketplace config (.claude/settings.json)
- Add /update-skills operational skill
- Update /setup and /customize for marketplace plugin install
- Add docs/skills-as-branches.md architecture doc

Channel forks created: nanoclaw-whatsapp (with 5 skill branches),
nanoclaw-telegram, nanoclaw-discord, nanoclaw-slack, nanoclaw-gmail.

Upstream retains: skill/ollama-tool, skill/apple-container, skill/compact.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
gavrielc
2026-03-10 00:18:25 +02:00
parent e7852a45a5
commit 5118239cea
182 changed files with 1065 additions and 36205 deletions

View File

@@ -1,128 +0,0 @@
---
name: use-local-whisper
description: Use when the user wants local voice transcription instead of OpenAI Whisper API. Switches to whisper.cpp running on Apple Silicon. WhatsApp only for now. Requires voice-transcription skill to be applied first.
---
# Use Local Whisper
Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost.
**Channel support:** Currently WhatsApp only. The transcription module (`src/transcription.ts`) uses Baileys types for audio download. Other channels (Telegram, Discord, etc.) would need their own audio-download logic before this skill can serve them.
**Note:** The Homebrew package is `whisper-cpp`, but the CLI binary it installs is `whisper-cli`.
## Prerequisites
- `voice-transcription` skill must be applied first (WhatsApp channel)
- macOS with Apple Silicon (M1+) recommended
- `whisper-cpp` installed: `brew install whisper-cpp` (provides the `whisper-cli` binary)
- `ffmpeg` installed: `brew install ffmpeg`
- A GGML model file downloaded to `data/models/`
## Phase 1: Pre-flight
### Check if already applied
Read `.nanoclaw/state.yaml`. If `use-local-whisper` is in `applied_skills`, skip to Phase 3 (Verify).
### Check dependencies are installed
```bash
whisper-cli --help >/dev/null 2>&1 && echo "WHISPER_OK" || echo "WHISPER_MISSING"
ffmpeg -version >/dev/null 2>&1 && echo "FFMPEG_OK" || echo "FFMPEG_MISSING"
```
If missing, install via Homebrew:
```bash
brew install whisper-cpp ffmpeg
```
### Check for model file
```bash
ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL"
```
If no model exists, download the base model (148MB, good balance of speed and accuracy):
```bash
mkdir -p data/models
curl -L -o data/models/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
```
For better accuracy at the cost of speed, use `ggml-small.bin` (466MB) or `ggml-medium.bin` (1.5GB).
## Phase 2: Apply Code Changes
```bash
npx tsx scripts/apply-skill.ts .claude/skills/use-local-whisper
```
This modifies `src/transcription.ts` to use the `whisper-cli` binary instead of the OpenAI API.
### Validate
```bash
npm test
npm run build
```
## Phase 3: Verify
### Ensure launchd PATH includes Homebrew
The NanoClaw launchd service runs with a restricted PATH. `whisper-cli` and `ffmpeg` are in `/opt/homebrew/bin/` (Apple Silicon) or `/usr/local/bin/` (Intel), which may not be in the plist's PATH.
Check the current PATH:
```bash
grep -A1 'PATH' ~/Library/LaunchAgents/com.nanoclaw.plist
```
If `/opt/homebrew/bin` is missing, add it to the `<string>` value inside the `PATH` key in the plist. Then reload:
```bash
launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist
```
### Build and restart
```bash
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
```
### Test
Send a voice note in any registered group. The agent should receive it as `[Voice: <transcript>]`.
### Check logs
```bash
tail -f logs/nanoclaw.log | grep -i -E "voice|transcri|whisper"
```
Look for:
- `Transcribed voice message` — successful transcription
- `whisper.cpp transcription failed` — check model path, ffmpeg, or PATH
## Configuration
Environment variables (optional, set in `.env`):
| Variable | Default | Description |
|----------|---------|-------------|
| `WHISPER_BIN` | `whisper-cli` | Path to whisper.cpp binary |
| `WHISPER_MODEL` | `data/models/ggml-base.bin` | Path to GGML model file |
## Troubleshooting
**"whisper.cpp transcription failed"**: Ensure both `whisper-cli` and `ffmpeg` are in PATH. The launchd service uses a restricted PATH — see Phase 3 above. Test manually:
```bash
ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y
whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt
```
**Transcription works in dev but not as service**: The launchd plist PATH likely doesn't include `/opt/homebrew/bin`. See "Ensure launchd PATH includes Homebrew" in Phase 3.
**Slow transcription**: The base model processes ~30s of audio in <1s on M1+. If slower, check CPU usage — another process may be competing.
**Wrong language**: whisper.cpp auto-detects language. To force a language, you can set `WHISPER_LANG` and modify `src/transcription.ts` to pass `-l $WHISPER_LANG`.

View File

@@ -1,12 +0,0 @@
skill: use-local-whisper
version: 1.0.0
description: "Switch voice transcription from OpenAI Whisper API to local whisper.cpp (WhatsApp only)"
core_version: 0.1.0
adds: []
modifies:
- src/transcription.ts
structured: {}
conflicts: []
depends:
- voice-transcription
test: "npx vitest run src/channels/whatsapp.test.ts"

View File

@@ -1,95 +0,0 @@
import { execFile } from 'child_process';
import fs from 'fs';
import os from 'os';
import path from 'path';
import { promisify } from 'util';
import { downloadMediaMessage, WAMessage, WASocket } from '@whiskeysockets/baileys';
const execFileAsync = promisify(execFile);
const WHISPER_BIN = process.env.WHISPER_BIN || 'whisper-cli';
const WHISPER_MODEL =
process.env.WHISPER_MODEL ||
path.join(process.cwd(), 'data', 'models', 'ggml-base.bin');
const FALLBACK_MESSAGE = '[Voice Message - transcription unavailable]';
async function transcribeWithWhisperCpp(
audioBuffer: Buffer,
): Promise<string | null> {
const tmpDir = os.tmpdir();
const id = `nanoclaw-voice-${Date.now()}`;
const tmpOgg = path.join(tmpDir, `${id}.ogg`);
const tmpWav = path.join(tmpDir, `${id}.wav`);
try {
fs.writeFileSync(tmpOgg, audioBuffer);
// Convert ogg/opus to 16kHz mono WAV (required by whisper.cpp)
await execFileAsync('ffmpeg', [
'-i', tmpOgg,
'-ar', '16000',
'-ac', '1',
'-f', 'wav',
'-y', tmpWav,
], { timeout: 30_000 });
const { stdout } = await execFileAsync(WHISPER_BIN, [
'-m', WHISPER_MODEL,
'-f', tmpWav,
'--no-timestamps',
'-nt',
], { timeout: 60_000 });
const transcript = stdout.trim();
return transcript || null;
} catch (err) {
console.error('whisper.cpp transcription failed:', err);
return null;
} finally {
for (const f of [tmpOgg, tmpWav]) {
try { fs.unlinkSync(f); } catch { /* best effort cleanup */ }
}
}
}
export async function transcribeAudioMessage(
msg: WAMessage,
sock: WASocket,
): Promise<string | null> {
try {
const buffer = (await downloadMediaMessage(
msg,
'buffer',
{},
{
logger: console as any,
reuploadRequest: sock.updateMediaMessage,
},
)) as Buffer;
if (!buffer || buffer.length === 0) {
console.error('Failed to download audio message');
return FALLBACK_MESSAGE;
}
console.log(`Downloaded audio message: ${buffer.length} bytes`);
const transcript = await transcribeWithWhisperCpp(buffer);
if (!transcript) {
return FALLBACK_MESSAGE;
}
console.log(`Transcribed voice message: ${transcript.length} chars`);
return transcript.trim();
} catch (err) {
console.error('Transcription error:', err);
return FALLBACK_MESSAGE;
}
}
export function isVoiceMessage(msg: WAMessage): boolean {
return msg.message?.audioMessage?.ptt === true;
}

View File

@@ -1,39 +0,0 @@
# Intent: src/transcription.ts modifications
## What changed
Replaced the OpenAI Whisper API backend with local whisper.cpp CLI execution. Audio is converted from ogg/opus to 16kHz mono WAV via ffmpeg, then transcribed locally using whisper-cpp. No API key or network required.
## Key sections
### Imports
- Removed: `readEnvFile` from `./env.js` (no API key needed)
- Added: `execFile` from `child_process`, `fs`, `os`, `path`, `promisify` from `util`
### Configuration
- Removed: `TranscriptionConfig` interface and `DEFAULT_CONFIG` (no model/enabled/fallback config)
- Added: `WHISPER_BIN` constant (env `WHISPER_BIN` or `'whisper-cli'`)
- Added: `WHISPER_MODEL` constant (env `WHISPER_MODEL` or `data/models/ggml-base.bin`)
- Added: `FALLBACK_MESSAGE` constant
### transcribeWithWhisperCpp (replaces transcribeWithOpenAI)
- Writes audio buffer to temp .ogg file
- Converts to 16kHz mono WAV via ffmpeg
- Runs whisper-cpp CLI with `--no-timestamps -nt` flags
- Cleans up temp files in finally block
- Returns trimmed stdout or null on error
### transcribeAudioMessage
- Same signature: `(msg: WAMessage, sock: WASocket) => Promise<string | null>`
- Same download logic via `downloadMediaMessage`
- Calls `transcribeWithWhisperCpp` instead of `transcribeWithOpenAI`
- Same fallback behavior on error/null
### isVoiceMessage
- Unchanged: `msg.message?.audioMessage?.ptt === true`
## Invariants (must-keep)
- `transcribeAudioMessage` export signature unchanged
- `isVoiceMessage` export unchanged
- Fallback message strings unchanged: `[Voice Message - transcription unavailable]`
- downloadMediaMessage call pattern unchanged
- Error logging pattern unchanged

View File

@@ -1,115 +0,0 @@
import { describe, expect, it } from 'vitest';
import fs from 'fs';
import path from 'path';
describe('use-local-whisper skill package', () => {
const skillDir = path.resolve(__dirname, '..');
it('has a valid manifest', () => {
const manifestPath = path.join(skillDir, 'manifest.yaml');
expect(fs.existsSync(manifestPath)).toBe(true);
const content = fs.readFileSync(manifestPath, 'utf-8');
expect(content).toContain('skill: use-local-whisper');
expect(content).toContain('version: 1.0.0');
expect(content).toContain('src/transcription.ts');
expect(content).toContain('voice-transcription');
});
it('declares voice-transcription as a dependency', () => {
const content = fs.readFileSync(
path.join(skillDir, 'manifest.yaml'),
'utf-8',
);
expect(content).toContain('depends:');
expect(content).toContain('voice-transcription');
});
it('has no structured operations (no new npm deps needed)', () => {
const content = fs.readFileSync(
path.join(skillDir, 'manifest.yaml'),
'utf-8',
);
expect(content).toContain('structured: {}');
});
it('has the modified transcription file', () => {
const filePath = path.join(skillDir, 'modify', 'src', 'transcription.ts');
expect(fs.existsSync(filePath)).toBe(true);
});
it('has an intent file for the modified file', () => {
const intentPath = path.join(skillDir, 'modify', 'src', 'transcription.ts.intent.md');
expect(fs.existsSync(intentPath)).toBe(true);
const content = fs.readFileSync(intentPath, 'utf-8');
expect(content).toContain('whisper.cpp');
expect(content).toContain('transcribeAudioMessage');
expect(content).toContain('isVoiceMessage');
expect(content).toContain('Invariants');
});
it('uses whisper-cli (not OpenAI) for transcription', () => {
const content = fs.readFileSync(
path.join(skillDir, 'modify', 'src', 'transcription.ts'),
'utf-8',
);
// Uses local whisper.cpp CLI
expect(content).toContain('whisper-cli');
expect(content).toContain('execFileAsync');
expect(content).toContain('WHISPER_BIN');
expect(content).toContain('WHISPER_MODEL');
expect(content).toContain('ggml-base.bin');
// Does NOT use OpenAI
expect(content).not.toContain('openai');
expect(content).not.toContain('OpenAI');
expect(content).not.toContain('OPENAI_API_KEY');
expect(content).not.toContain('readEnvFile');
});
it('preserves the public API (transcribeAudioMessage and isVoiceMessage)', () => {
const content = fs.readFileSync(
path.join(skillDir, 'modify', 'src', 'transcription.ts'),
'utf-8',
);
expect(content).toContain('export async function transcribeAudioMessage(');
expect(content).toContain('msg: WAMessage');
expect(content).toContain('sock: WASocket');
expect(content).toContain('Promise<string | null>');
expect(content).toContain('export function isVoiceMessage(');
expect(content).toContain('downloadMediaMessage');
});
it('preserves fallback message strings', () => {
const content = fs.readFileSync(
path.join(skillDir, 'modify', 'src', 'transcription.ts'),
'utf-8',
);
expect(content).toContain('[Voice Message - transcription unavailable]');
});
it('includes ffmpeg conversion step', () => {
const content = fs.readFileSync(
path.join(skillDir, 'modify', 'src', 'transcription.ts'),
'utf-8',
);
expect(content).toContain('ffmpeg');
expect(content).toContain("'-ar', '16000'");
expect(content).toContain("'-ac', '1'");
});
it('cleans up temp files in finally block', () => {
const content = fs.readFileSync(
path.join(skillDir, 'modify', 'src', 'transcription.ts'),
'utf-8',
);
expect(content).toContain('finally');
expect(content).toContain('unlinkSync');
});
});