* refactor: migrate setup from bash scripts to cross-platform Node.js modules Replace 9 bash scripts + qr-auth.html with a two-phase setup system: a bash bootstrap (setup.sh) for Node.js/npm verification, and TypeScript modules (src/setup/) for everything else. Resolves cross-platform issues: sed -i replaced with fs operations, sqlite3 CLI replaced with better-sqlite3, browser opening made cross-platform, service management supports launchd/ systemd/WSL nohup fallback, SQL injection prevented with parameterized queries. Add Linux systemctl equivalents alongside macOS launchctl commands in 8 skill files and CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: setup migration issues — pairing code, systemd fallback, nohup escaping - Emit WhatsApp pairing code immediately when received, before polling for auth completion. Previously the code was only shown in the final status block after auth succeeded — a catch-22 since the user needs the code to authenticate. (whatsapp-auth.ts) - Add systemd user session pre-check before attempting to write the user-level service unit. Falls back to nohup wrapper when user-level systemd is unavailable (e.g. su session without login/D-Bus). (service.ts) - Rewrite nohup wrapper template using array join instead of template literal to fix shell variable escaping (\\$ → $). (service.ts) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: detect stale docker group and kill orphaned processes on Linux systemd * fix: remove redundant shell option from execSync to fix TS2769 execSync already runs in a shell by default; the explicit `shell: true` caused a type error with @types/node which expects string, not boolean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: hide QR browser auth option on headless Linux Emit IS_HEADLESS from environment step and condition SKILL.md to only show pairing code + QR terminal when no display server is available (headless Linux without WSL). WSL is excluded from the headless gate because browser opening works via Windows interop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
4.1 KiB
name, description
| name | description |
|---|---|
| add-voice-transcription | Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them. |
Add Voice Transcription
This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as [Voice: <transcript>].
Phase 1: Pre-flight
Check if already applied
Read .nanoclaw/state.yaml. If voice-transcription is in applied_skills, skip to Phase 3 (Configure). The code changes are already in place.
Ask the user
- Do they have an OpenAI API key? If yes, collect it now. If no, they'll need to create one at https://platform.openai.com/api-keys.
Phase 2: Apply Code Changes
Run the skills engine to apply this skill's code package.
Initialize skills system (if needed)
If .nanoclaw/ directory doesn't exist yet:
npx tsx scripts/apply-skill.ts --init
Apply the skill
npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription
This deterministically:
- Adds
src/transcription.ts(voice transcription module using OpenAI Whisper) - Three-way merges voice handling into
src/channels/whatsapp.ts(isVoiceMessage check, transcribeAudioMessage call) - Three-way merges transcription tests into
src/channels/whatsapp.test.ts(mock + 3 test cases) - Installs the
openainpm dependency - Updates
.env.examplewithOPENAI_API_KEY - Records the application in
.nanoclaw/state.yaml
If the apply reports merge conflicts, read the intent files:
modify/src/channels/whatsapp.ts.intent.md— what changed and invariants for whatsapp.tsmodify/src/channels/whatsapp.test.ts.intent.md— what changed for whatsapp.test.ts
Validate code changes
npm test
npm run build
All tests must pass (including the 3 new voice transcription tests) and build must be clean before proceeding.
Phase 3: Configure
Get OpenAI API key (if needed)
If the user doesn't have an API key:
I need you to create an OpenAI API key:
- Go to https://platform.openai.com/api-keys
- Click "Create new secret key"
- Give it a name (e.g., "NanoClaw Transcription")
- Copy the key (starts with
sk-)Cost:
$0.006 per minute of audio ($0.003 per typical 30-second voice note)
Wait for the user to provide the key.
Add to environment
Add to .env:
OPENAI_API_KEY=<their-key>
Sync to container environment:
mkdir -p data/env && cp .env data/env/env
The container reads environment from data/env/env, not .env directly.
Build and restart
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
# Linux: systemctl --user restart nanoclaw
Phase 4: Verify
Test with a voice note
Tell the user:
Send a voice note in any registered WhatsApp chat. The agent should receive it as
[Voice: <transcript>]and respond to its content.
Check logs if needed
tail -f logs/nanoclaw.log | grep -i voice
Look for:
Transcribed voice message— successful transcription with character countOPENAI_API_KEY not set— key missing from.envOpenAI transcription failed— API error (check key validity, billing)Failed to download audio message— media download issue
Troubleshooting
Voice notes show "[Voice Message - transcription unavailable]"
- Check
OPENAI_API_KEYis set in.envAND synced todata/env/env - Verify key works:
curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200 - Check OpenAI billing — Whisper requires a funded account
Voice notes show "[Voice Message - transcription failed]"
Check logs for the specific error. Common causes:
- Network timeout — transient, will work on next message
- Invalid API key — regenerate at https://platform.openai.com/api-keys
- Rate limiting — wait and retry
Agent doesn't respond to voice notes
Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.