Files
nanoclaw/.claude/skills/add-voice-transcription/SKILL.md
Daniel M 8fc1c23925 Migrate setup from bash scripts to cross-platform Node.js modules (#382)
* refactor: migrate setup from bash scripts to cross-platform Node.js modules

Replace 9 bash scripts + qr-auth.html with a two-phase setup system:
a bash bootstrap (setup.sh) for Node.js/npm verification, and TypeScript
modules (src/setup/) for everything else. Resolves cross-platform issues:
sed -i replaced with fs operations, sqlite3 CLI replaced with better-sqlite3,
browser opening made cross-platform, service management supports launchd/
systemd/WSL nohup fallback, SQL injection prevented with parameterized queries.

Add Linux systemctl equivalents alongside macOS launchctl commands in 8 skill
files and CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: setup migration issues — pairing code, systemd fallback, nohup escaping

- Emit WhatsApp pairing code immediately when received, before polling
  for auth completion. Previously the code was only shown in the final
  status block after auth succeeded — a catch-22 since the user needs
  the code to authenticate. (whatsapp-auth.ts)

- Add systemd user session pre-check before attempting to write the
  user-level service unit. Falls back to nohup wrapper when user-level
  systemd is unavailable (e.g. su session without login/D-Bus). (service.ts)

- Rewrite nohup wrapper template using array join instead of template
  literal to fix shell variable escaping (\\$ → $). (service.ts)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: detect stale docker group and kill
  orphaned processes on Linux systemd

* fix: remove redundant shell option from execSync to fix TS2769

execSync already runs in a shell by default; the explicit `shell: true`
caused a type error with @types/node which expects string, not boolean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: hide QR browser auth option on headless Linux

Emit IS_HEADLESS from environment step and condition SKILL.md to
only show pairing code + QR terminal when no display server is
available (headless Linux without WSL). WSL is excluded from the
headless gate because browser opening works via Windows interop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:25:11 +02:00

138 lines
4.1 KiB
Markdown

---
name: add-voice-transcription
description: Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.
---
# Add Voice Transcription
This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as `[Voice: <transcript>]`.
## Phase 1: Pre-flight
### Check if already applied
Read `.nanoclaw/state.yaml`. If `voice-transcription` is in `applied_skills`, skip to Phase 3 (Configure). The code changes are already in place.
### Ask the user
1. **Do they have an OpenAI API key?** If yes, collect it now. If no, they'll need to create one at https://platform.openai.com/api-keys.
## Phase 2: Apply Code Changes
Run the skills engine to apply this skill's code package.
### Initialize skills system (if needed)
If `.nanoclaw/` directory doesn't exist yet:
```bash
npx tsx scripts/apply-skill.ts --init
```
### Apply the skill
```bash
npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription
```
This deterministically:
- Adds `src/transcription.ts` (voice transcription module using OpenAI Whisper)
- Three-way merges voice handling into `src/channels/whatsapp.ts` (isVoiceMessage check, transcribeAudioMessage call)
- Three-way merges transcription tests into `src/channels/whatsapp.test.ts` (mock + 3 test cases)
- Installs the `openai` npm dependency
- Updates `.env.example` with `OPENAI_API_KEY`
- Records the application in `.nanoclaw/state.yaml`
If the apply reports merge conflicts, read the intent files:
- `modify/src/channels/whatsapp.ts.intent.md` — what changed and invariants for whatsapp.ts
- `modify/src/channels/whatsapp.test.ts.intent.md` — what changed for whatsapp.test.ts
### Validate code changes
```bash
npm test
npm run build
```
All tests must pass (including the 3 new voice transcription tests) and build must be clean before proceeding.
## Phase 3: Configure
### Get OpenAI API key (if needed)
If the user doesn't have an API key:
> I need you to create an OpenAI API key:
>
> 1. Go to https://platform.openai.com/api-keys
> 2. Click "Create new secret key"
> 3. Give it a name (e.g., "NanoClaw Transcription")
> 4. Copy the key (starts with `sk-`)
>
> Cost: ~$0.006 per minute of audio (~$0.003 per typical 30-second voice note)
Wait for the user to provide the key.
### Add to environment
Add to `.env`:
```bash
OPENAI_API_KEY=<their-key>
```
Sync to container environment:
```bash
mkdir -p data/env && cp .env data/env/env
```
The container reads environment from `data/env/env`, not `.env` directly.
### Build and restart
```bash
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
# Linux: systemctl --user restart nanoclaw
```
## Phase 4: Verify
### Test with a voice note
Tell the user:
> Send a voice note in any registered WhatsApp chat. The agent should receive it as `[Voice: <transcript>]` and respond to its content.
### Check logs if needed
```bash
tail -f logs/nanoclaw.log | grep -i voice
```
Look for:
- `Transcribed voice message` — successful transcription with character count
- `OPENAI_API_KEY not set` — key missing from `.env`
- `OpenAI transcription failed` — API error (check key validity, billing)
- `Failed to download audio message` — media download issue
## Troubleshooting
### Voice notes show "[Voice Message - transcription unavailable]"
1. Check `OPENAI_API_KEY` is set in `.env` AND synced to `data/env/env`
2. Verify key works: `curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200`
3. Check OpenAI billing — Whisper requires a funded account
### Voice notes show "[Voice Message - transcription failed]"
Check logs for the specific error. Common causes:
- Network timeout — transient, will work on next message
- Invalid API key — regenerate at https://platform.openai.com/api-keys
- Rate limiting — wait and retry
### Agent doesn't respond to voice notes
Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.