Files
nanoclaw/.claude/skills/add-voice-transcription/SKILL.md
Daniel M 8fc1c23925 Migrate setup from bash scripts to cross-platform Node.js modules (#382)
* refactor: migrate setup from bash scripts to cross-platform Node.js modules

Replace 9 bash scripts + qr-auth.html with a two-phase setup system:
a bash bootstrap (setup.sh) for Node.js/npm verification, and TypeScript
modules (src/setup/) for everything else. Resolves cross-platform issues:
sed -i replaced with fs operations, sqlite3 CLI replaced with better-sqlite3,
browser opening made cross-platform, service management supports launchd/
systemd/WSL nohup fallback, SQL injection prevented with parameterized queries.

Add Linux systemctl equivalents alongside macOS launchctl commands in 8 skill
files and CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: setup migration issues — pairing code, systemd fallback, nohup escaping

- Emit WhatsApp pairing code immediately when received, before polling
  for auth completion. Previously the code was only shown in the final
  status block after auth succeeded — a catch-22 since the user needs
  the code to authenticate. (whatsapp-auth.ts)

- Add systemd user session pre-check before attempting to write the
  user-level service unit. Falls back to nohup wrapper when user-level
  systemd is unavailable (e.g. su session without login/D-Bus). (service.ts)

- Rewrite nohup wrapper template using array join instead of template
  literal to fix shell variable escaping (\\$ → $). (service.ts)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: detect stale docker group and kill
  orphaned processes on Linux systemd

* fix: remove redundant shell option from execSync to fix TS2769

execSync already runs in a shell by default; the explicit `shell: true`
caused a type error with @types/node which expects string, not boolean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: hide QR browser auth option on headless Linux

Emit IS_HEADLESS from environment step and condition SKILL.md to
only show pairing code + QR terminal when no display server is
available (headless Linux without WSL). WSL is excluded from the
headless gate because browser opening works via Windows interop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:25:11 +02:00

4.1 KiB

name, description
name description
add-voice-transcription Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.

Add Voice Transcription

This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as [Voice: <transcript>].

Phase 1: Pre-flight

Check if already applied

Read .nanoclaw/state.yaml. If voice-transcription is in applied_skills, skip to Phase 3 (Configure). The code changes are already in place.

Ask the user

  1. Do they have an OpenAI API key? If yes, collect it now. If no, they'll need to create one at https://platform.openai.com/api-keys.

Phase 2: Apply Code Changes

Run the skills engine to apply this skill's code package.

Initialize skills system (if needed)

If .nanoclaw/ directory doesn't exist yet:

npx tsx scripts/apply-skill.ts --init

Apply the skill

npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription

This deterministically:

  • Adds src/transcription.ts (voice transcription module using OpenAI Whisper)
  • Three-way merges voice handling into src/channels/whatsapp.ts (isVoiceMessage check, transcribeAudioMessage call)
  • Three-way merges transcription tests into src/channels/whatsapp.test.ts (mock + 3 test cases)
  • Installs the openai npm dependency
  • Updates .env.example with OPENAI_API_KEY
  • Records the application in .nanoclaw/state.yaml

If the apply reports merge conflicts, read the intent files:

  • modify/src/channels/whatsapp.ts.intent.md — what changed and invariants for whatsapp.ts
  • modify/src/channels/whatsapp.test.ts.intent.md — what changed for whatsapp.test.ts

Validate code changes

npm test
npm run build

All tests must pass (including the 3 new voice transcription tests) and build must be clean before proceeding.

Phase 3: Configure

Get OpenAI API key (if needed)

If the user doesn't have an API key:

I need you to create an OpenAI API key:

  1. Go to https://platform.openai.com/api-keys
  2. Click "Create new secret key"
  3. Give it a name (e.g., "NanoClaw Transcription")
  4. Copy the key (starts with sk-)

Cost: $0.006 per minute of audio ($0.003 per typical 30-second voice note)

Wait for the user to provide the key.

Add to environment

Add to .env:

OPENAI_API_KEY=<their-key>

Sync to container environment:

mkdir -p data/env && cp .env data/env/env

The container reads environment from data/env/env, not .env directly.

Build and restart

npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw  # macOS
# Linux: systemctl --user restart nanoclaw

Phase 4: Verify

Test with a voice note

Tell the user:

Send a voice note in any registered WhatsApp chat. The agent should receive it as [Voice: <transcript>] and respond to its content.

Check logs if needed

tail -f logs/nanoclaw.log | grep -i voice

Look for:

  • Transcribed voice message — successful transcription with character count
  • OPENAI_API_KEY not set — key missing from .env
  • OpenAI transcription failed — API error (check key validity, billing)
  • Failed to download audio message — media download issue

Troubleshooting

Voice notes show "[Voice Message - transcription unavailable]"

  1. Check OPENAI_API_KEY is set in .env AND synced to data/env/env
  2. Verify key works: curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200
  3. Check OpenAI billing — Whisper requires a funded account

Voice notes show "[Voice Message - transcription failed]"

Check logs for the specific error. Common causes:

Agent doesn't respond to voice notes

Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.