diff --git a/.claude/skills/add-voice-transcription/SKILL.md b/.claude/skills/add-voice-transcription/SKILL.md index 7c9141d..740345b 100644 --- a/.claude/skills/add-voice-transcription/SKILL.md +++ b/.claude/skills/add-voice-transcription/SKILL.md @@ -3,458 +3,134 @@ name: add-voice-transcription description: Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them. --- -# Add Voice Message Transcription +# Add Voice Transcription -This skill adds automatic voice message transcription using OpenAI's Whisper API. When users send voice notes in WhatsApp, they'll be transcribed and the agent can read and respond to the content. +This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as `[Voice: ]`. -**UX Note:** When asking the user questions, prefer using the `AskUserQuestion` tool instead of just outputting text. This integrates with Claude's built-in question/answer system for a better experience. +## Phase 1: Pre-flight -## Prerequisites +### Check if already applied -**USER ACTION REQUIRED** +Read `.nanoclaw/state.yaml`. If `voice-transcription` is in `applied_skills`, skip to Phase 3 (Configure). The code changes are already in place. -**Use the AskUserQuestion tool** to present this: +### Ask the user -> You'll need an OpenAI API key for Whisper transcription. +1. **Do they have an OpenAI API key?** If yes, collect it now. If no, they'll need to create one at https://platform.openai.com/api-keys. + +## Phase 2: Apply Code Changes + +Run the skills engine to apply this skill's code package. + +### Initialize skills system (if needed) + +If `.nanoclaw/` directory doesn't exist yet: + +```bash +npx tsx scripts/apply-skill.ts --init +``` + +### Apply the skill + +```bash +npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription +``` + +This deterministically: +- Adds `src/transcription.ts` (voice transcription module using OpenAI Whisper) +- Three-way merges voice handling into `src/channels/whatsapp.ts` (isVoiceMessage check, transcribeAudioMessage call) +- Three-way merges transcription tests into `src/channels/whatsapp.test.ts` (mock + 3 test cases) +- Installs the `openai` npm dependency +- Updates `.env.example` with `OPENAI_API_KEY` +- Records the application in `.nanoclaw/state.yaml` + +If the apply reports merge conflicts, read the intent files: +- `modify/src/channels/whatsapp.ts.intent.md` — what changed and invariants for whatsapp.ts +- `modify/src/channels/whatsapp.test.ts.intent.md` — what changed for whatsapp.test.ts + +### Validate code changes + +```bash +npm test +npm run build +``` + +All tests must pass (including the 3 new voice transcription tests) and build must be clean before proceeding. + +## Phase 3: Configure + +### Get OpenAI API key (if needed) + +If the user doesn't have an API key: + +> I need you to create an OpenAI API key: > -> Get one at: https://platform.openai.com/api-keys +> 1. Go to https://platform.openai.com/api-keys +> 2. Click "Create new secret key" +> 3. Give it a name (e.g., "NanoClaw Transcription") +> 4. Copy the key (starts with `sk-`) > > Cost: ~$0.006 per minute of audio (~$0.003 per typical 30-second voice note) -> -> Once you have your API key, we'll configure it securely. -Wait for user to confirm they have an API key before continuing. +Wait for the user to provide the key. ---- +### Add to environment -## Implementation - -### Step 1: Add OpenAI Dependency - -Read `package.json` and add the `openai` package to dependencies: - -```json -"dependencies": { - ...existing dependencies... - "openai": "^4.77.0" -} -``` - -Then install it. **IMPORTANT:** The OpenAI SDK requires Zod v3 as an optional peer dependency, but NanoClaw uses Zod v4. This conflict is guaranteed, so always use `--legacy-peer-deps`: +Add to `.env`: ```bash -npm install --legacy-peer-deps +OPENAI_API_KEY= ``` -### Step 2: Create Transcription Configuration - -Create a configuration file for transcription settings (without the API key): - -Write to `.transcription.config.json`: - -```json -{ - "provider": "openai", - "openai": { - "apiKey": "", - "model": "whisper-1" - }, - "enabled": true, - "fallbackMessage": "[Voice Message - transcription unavailable]" -} -``` - -Add this file to `.gitignore` to prevent committing API keys: +Sync to container environment: ```bash -echo ".transcription.config.json" >> .gitignore +mkdir -p data/env && cp .env data/env/env ``` -**Use the AskUserQuestion tool** to confirm: +The container reads environment from `data/env/env`, not `.env` directly. -> I've created `.transcription.config.json` in the project root. You'll need to add your OpenAI API key to it manually: -> -> 1. Open `.transcription.config.json` -> 2. Replace the empty `"apiKey": ""` with your key: `"apiKey": "sk-proj-..."` -> 3. Save the file -> -> Let me know when you've added it. - -Wait for user confirmation. - -### Step 3: Create Transcription Module - -Create `src/transcription.ts`: - -```typescript -import { downloadMediaMessage } from '@whiskeysockets/baileys'; -import { WAMessage, WASocket } from '@whiskeysockets/baileys'; -import fs from 'fs'; -import path from 'path'; -import { fileURLToPath } from 'url'; -import { dirname } from 'path'; - -// Get __dirname equivalent in ES modules -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); - -// Configuration interface -interface TranscriptionConfig { - provider: string; - openai?: { - apiKey: string; - model: string; - }; - enabled: boolean; - fallbackMessage: string; -} - -// Load configuration -function loadConfig(): TranscriptionConfig { - const configPath = path.join(__dirname, '../.transcription.config.json'); - try { - const configData = fs.readFileSync(configPath, 'utf-8'); - return JSON.parse(configData); - } catch (err) { - console.error('Failed to load transcription config:', err); - return { - provider: 'openai', - enabled: false, - fallbackMessage: '[Voice Message - transcription unavailable]' - }; - } -} - -// Transcribe audio using OpenAI Whisper API -async function transcribeWithOpenAI(audioBuffer: Buffer, config: TranscriptionConfig): Promise { - if (!config.openai?.apiKey || config.openai.apiKey === '') { - console.warn('OpenAI API key not configured'); - return null; - } - - try { - // Dynamic import of openai - const openaiModule = await import('openai'); - const OpenAI = openaiModule.default; - const toFile = openaiModule.toFile; - - const openai = new OpenAI({ - apiKey: config.openai.apiKey - }); - - // Use OpenAI's toFile helper to create a proper file upload - const file = await toFile(audioBuffer, 'voice.ogg', { - type: 'audio/ogg' - }); - - // Call Whisper API - const transcription = await openai.audio.transcriptions.create({ - file: file, - model: config.openai.model || 'whisper-1', - response_format: 'text' - }); - - // Type assertion needed: OpenAI SDK types response_format='text' as Transcription object, - // but it actually returns a plain string when response_format is 'text' - return transcription as unknown as string; - } catch (err) { - console.error('OpenAI transcription failed:', err); - return null; - } -} - -// Main transcription function -export async function transcribeAudioMessage( - msg: WAMessage, - sock: WASocket -): Promise { - const config = loadConfig(); - - // Check if transcription is enabled - if (!config.enabled) { - console.log('Transcription disabled in config'); - return config.fallbackMessage; - } - - try { - // Download the audio message - const buffer = await downloadMediaMessage( - msg, - 'buffer', - {}, - { - logger: console as any, - reuploadRequest: sock.updateMediaMessage - } - ) as Buffer; - - if (!buffer || buffer.length === 0) { - console.error('Failed to download audio message'); - return config.fallbackMessage; - } - - console.log(`Downloaded audio message: ${buffer.length} bytes`); - - // Transcribe based on provider - let transcript: string | null = null; - - switch (config.provider) { - case 'openai': - transcript = await transcribeWithOpenAI(buffer, config); - break; - default: - console.error(`Unknown transcription provider: ${config.provider}`); - return config.fallbackMessage; - } - - if (!transcript) { - return config.fallbackMessage; - } - - return transcript.trim(); - } catch (err) { - console.error('Transcription error:', err); - return config.fallbackMessage; - } -} - -// Helper to check if a message is a voice note -export function isVoiceMessage(msg: WAMessage): boolean { - return msg.message?.audioMessage?.ptt === true; -} -``` - -### Step 4: Update Database to Handle Transcribed Content - -Read `src/db.ts` and find the `storeMessage` function. Update its signature and implementation to accept transcribed content: - -Change the function signature from: -```typescript -export function storeMessage(msg: proto.IWebMessageInfo, chatJid: string, isFromMe: boolean, pushName?: string): void -``` - -To: -```typescript -export function storeMessage(msg: proto.IWebMessageInfo, chatJid: string, isFromMe: boolean, pushName?: string, transcribedContent?: string): void -``` - -Update the content extraction to use transcribed content if provided: - -```typescript -const content = transcribedContent || - msg.message?.conversation || - msg.message?.extendedTextMessage?.text || - msg.message?.imageMessage?.caption || - msg.message?.videoMessage?.caption || - (msg.message?.audioMessage?.ptt ? '[Voice Message]' : '') || - ''; -``` - -### Step 5: Integrate Transcription into Message Handler - -**Note:** Voice messages are transcribed for all messages in registered groups, regardless of the trigger word. This is because: -1. Voice notes can't easily include a trigger word -2. Users expect voice notes to work the same as text messages -3. The transcribed content is stored in the database for context, even if it doesn't trigger the agent - -Read `src/index.ts` and find the `sock.ev.on('messages.upsert', ...)` event handler. - -Change the callback from synchronous to async: - -```typescript -sock.ev.on('messages.upsert', async ({ messages }) => { -``` - -Inside the loop where messages are stored, add voice message detection and transcription: - -```typescript -// Only store full message content for registered groups -if (registeredGroups[chatJid]) { - // Check if this is a voice message - if (msg.message.audioMessage?.ptt) { - try { - // Import transcription module - const { transcribeAudioMessage } = await import('./transcription.js'); - const transcript = await transcribeAudioMessage(msg, sock); - - if (transcript) { - // Store with transcribed content - storeMessage(msg, chatJid, msg.key.fromMe || false, msg.pushName || undefined, `[Voice: ${transcript}]`); - logger.info({ chatJid, length: transcript.length }, 'Transcribed voice message'); - } else { - // Store with fallback message - storeMessage(msg, chatJid, msg.key.fromMe || false, msg.pushName || undefined, '[Voice Message - transcription unavailable]'); - } - } catch (err) { - logger.error({ err }, 'Voice transcription error'); - storeMessage(msg, chatJid, msg.key.fromMe || false, msg.pushName || undefined, '[Voice Message - transcription failed]'); - } - } else { - // Regular message, store normally - storeMessage(msg, chatJid, msg.key.fromMe || false, msg.pushName || undefined); - } -} -``` - -### Step 6: Build and Restart +### Build and restart ```bash npm run build launchctl kickstart -k gui/$(id -u)/com.nanoclaw ``` -Verify it started: +## Phase 4: Verify -```bash -sleep 3 && launchctl list | grep nanoclaw -``` - -### Step 7: Test Voice Transcription +### Test with a voice note Tell the user: -> Voice transcription is ready! Test it by: -> -> 1. Open WhatsApp on your phone -> 2. Go to a registered group chat -> 3. Send a voice note using the microphone button -> 4. The agent should receive the transcribed text and respond -> -> In the database and agent context, voice messages appear as: -> `[Voice: ]` +> Send a voice note in any registered WhatsApp chat. The agent should receive it as `[Voice: ]` and respond to its content. -Watch for transcription in the logs: +### Check logs if needed ```bash -tail -f logs/nanoclaw.log | grep -i "voice\|transcri" +tail -f logs/nanoclaw.log | grep -i voice ``` ---- - -## Configuration Options - -### Enable/Disable Transcription - -To temporarily disable without removing code, edit `.transcription.config.json`: - -```json -{ - "enabled": false -} -``` - -### Change Fallback Message - -Customize what's stored when transcription fails: - -```json -{ - "fallbackMessage": "[🎤 Voice note - transcription unavailable]" -} -``` - -### Switch to Different Provider (Future) - -The architecture supports multiple providers. To add Groq, Deepgram, or local Whisper: - -1. Add provider config to `.transcription.config.json` -2. Implement provider function in `src/transcription.ts` (similar to `transcribeWithOpenAI`) -3. Add case to the switch statement - ---- +Look for: +- `Transcribed voice message` — successful transcription with character count +- `OPENAI_API_KEY not set` — key missing from `.env` +- `OpenAI transcription failed` — API error (check key validity, billing) +- `Failed to download audio message` — media download issue ## Troubleshooting -### "Transcription unavailable" or "Transcription failed" +### Voice notes show "[Voice Message - transcription unavailable]" -Check logs for specific errors: -```bash -tail -100 logs/nanoclaw.log | grep -i transcription -``` +1. Check `OPENAI_API_KEY` is set in `.env` AND synced to `data/env/env` +2. Verify key works: `curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200` +3. Check OpenAI billing — Whisper requires a funded account -Common causes: -- API key not configured or invalid -- No API credits remaining -- Network connectivity issues -- Audio format not supported by Whisper +### Voice notes show "[Voice Message - transcription failed]" -### Voice messages not being detected +Check logs for the specific error. Common causes: +- Network timeout — transient, will work on next message +- Invalid API key — regenerate at https://platform.openai.com/api-keys +- Rate limiting — wait and retry -- Ensure you're sending actual voice notes (microphone button), not audio file attachments -- Check that `audioMessage.ptt` is `true` in the message object +### Agent doesn't respond to voice notes -### ES Module errors (`__dirname is not defined`) - -The fix is already included in the implementation above using: -```typescript -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); -``` - -### Dependency conflicts (Zod versions) - -The OpenAI SDK requires Zod v3, but NanoClaw uses Zod v4. This conflict is guaranteed — always use: -```bash -npm install --legacy-peer-deps -``` - ---- - -## Security Notes - -- The `.transcription.config.json` file contains your API key and should NOT be committed to version control -- It's added to `.gitignore` by this skill -- Audio files are sent to OpenAI for transcription - review their data usage policy -- No audio files are stored locally after transcription -- Transcripts are stored in the SQLite database like regular text messages - ---- - -## Cost Management - -Monitor usage in your OpenAI dashboard: https://platform.openai.com/usage - -Tips to control costs: -- Set spending limits in OpenAI account settings -- Disable transcription during development/testing with `"enabled": false` -- Typical usage: 100 voice notes/month (~3 minutes average) = ~$1.80 - ---- - -## Removing Voice Transcription - -To remove the feature: - -1. Remove from `package.json`: - ```bash - npm uninstall openai - ``` - -2. Delete `src/transcription.ts` - -3. Revert changes in `src/index.ts`: - - Remove the voice message handling block - - Change callback back to synchronous if desired - -4. Revert changes in `src/db.ts`: - - Remove the `transcribedContent` parameter from `storeMessage` - -5. Delete `.transcription.config.json` - -6. Rebuild: - ```bash - npm run build - launchctl kickstart -k gui/$(id -u)/com.nanoclaw - ``` - ---- - -## Future Enhancements - -Potential additions: -- **Local Whisper**: Use `whisper.cpp` or `faster-whisper` for offline transcription -- **Groq Integration**: Free tier with Whisper, very fast -- **Deepgram**: Alternative cloud provider -- **Language Detection**: Auto-detect and transcribe non-English voice notes -- **Cost Tracking**: Log transcription costs per message -- **Speaker Diarization**: Identify different speakers in voice notes +Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups. diff --git a/.claude/skills/add-voice-transcription/add/src/transcription.ts b/.claude/skills/add-voice-transcription/add/src/transcription.ts new file mode 100644 index 0000000..91c5e7f --- /dev/null +++ b/.claude/skills/add-voice-transcription/add/src/transcription.ts @@ -0,0 +1,98 @@ +import { downloadMediaMessage } from '@whiskeysockets/baileys'; +import { WAMessage, WASocket } from '@whiskeysockets/baileys'; + +import { readEnvFile } from './env.js'; + +interface TranscriptionConfig { + model: string; + enabled: boolean; + fallbackMessage: string; +} + +const DEFAULT_CONFIG: TranscriptionConfig = { + model: 'whisper-1', + enabled: true, + fallbackMessage: '[Voice Message - transcription unavailable]', +}; + +async function transcribeWithOpenAI( + audioBuffer: Buffer, + config: TranscriptionConfig, +): Promise { + const env = readEnvFile(['OPENAI_API_KEY']); + const apiKey = env.OPENAI_API_KEY; + + if (!apiKey) { + console.warn('OPENAI_API_KEY not set in .env'); + return null; + } + + try { + const openaiModule = await import('openai'); + const OpenAI = openaiModule.default; + const toFile = openaiModule.toFile; + + const openai = new OpenAI({ apiKey }); + + const file = await toFile(audioBuffer, 'voice.ogg', { + type: 'audio/ogg', + }); + + const transcription = await openai.audio.transcriptions.create({ + file: file, + model: config.model, + response_format: 'text', + }); + + // When response_format is 'text', the API returns a plain string + return transcription as unknown as string; + } catch (err) { + console.error('OpenAI transcription failed:', err); + return null; + } +} + +export async function transcribeAudioMessage( + msg: WAMessage, + sock: WASocket, +): Promise { + const config = DEFAULT_CONFIG; + + if (!config.enabled) { + return config.fallbackMessage; + } + + try { + const buffer = (await downloadMediaMessage( + msg, + 'buffer', + {}, + { + logger: console as any, + reuploadRequest: sock.updateMediaMessage, + }, + )) as Buffer; + + if (!buffer || buffer.length === 0) { + console.error('Failed to download audio message'); + return config.fallbackMessage; + } + + console.log(`Downloaded audio message: ${buffer.length} bytes`); + + const transcript = await transcribeWithOpenAI(buffer, config); + + if (!transcript) { + return config.fallbackMessage; + } + + return transcript.trim(); + } catch (err) { + console.error('Transcription error:', err); + return config.fallbackMessage; + } +} + +export function isVoiceMessage(msg: WAMessage): boolean { + return msg.message?.audioMessage?.ptt === true; +} diff --git a/.claude/skills/add-voice-transcription/manifest.yaml b/.claude/skills/add-voice-transcription/manifest.yaml new file mode 100644 index 0000000..cb4d587 --- /dev/null +++ b/.claude/skills/add-voice-transcription/manifest.yaml @@ -0,0 +1,17 @@ +skill: voice-transcription +version: 1.0.0 +description: "Voice message transcription via OpenAI Whisper" +core_version: 0.1.0 +adds: + - src/transcription.ts +modifies: + - src/channels/whatsapp.ts + - src/channels/whatsapp.test.ts +structured: + npm_dependencies: + openai: "^4.77.0" + env_additions: + - OPENAI_API_KEY +conflicts: [] +depends: [] +test: "npx vitest run src/channels/whatsapp.test.ts" diff --git a/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.test.ts b/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.test.ts new file mode 100644 index 0000000..b56c6c4 --- /dev/null +++ b/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.test.ts @@ -0,0 +1,963 @@ +import { describe, it, expect, beforeEach, vi, afterEach } from 'vitest'; +import { EventEmitter } from 'events'; + +// --- Mocks --- + +// Mock config +vi.mock('../config.js', () => ({ + STORE_DIR: '/tmp/nanoclaw-test-store', + ASSISTANT_NAME: 'Andy', + ASSISTANT_HAS_OWN_NUMBER: false, +})); + +// Mock logger +vi.mock('../logger.js', () => ({ + logger: { + debug: vi.fn(), + info: vi.fn(), + warn: vi.fn(), + error: vi.fn(), + }, +})); + +// Mock db +vi.mock('../db.js', () => ({ + getLastGroupSync: vi.fn(() => null), + setLastGroupSync: vi.fn(), + updateChatName: vi.fn(), +})); + +// Mock transcription +vi.mock('../transcription.js', () => ({ + isVoiceMessage: vi.fn((msg: any) => msg.message?.audioMessage?.ptt === true), + transcribeAudioMessage: vi.fn().mockResolvedValue('Hello this is a voice message'), +})); + +// Mock fs +vi.mock('fs', async () => { + const actual = await vi.importActual('fs'); + return { + ...actual, + default: { + ...actual, + existsSync: vi.fn(() => true), + mkdirSync: vi.fn(), + }, + }; +}); + +// Mock child_process (used for osascript notification) +vi.mock('child_process', () => ({ + exec: vi.fn(), +})); + +// Build a fake WASocket that's an EventEmitter with the methods we need +function createFakeSocket() { + const ev = new EventEmitter(); + const sock = { + ev: { + on: (event: string, handler: (...args: unknown[]) => void) => { + ev.on(event, handler); + }, + }, + user: { + id: '1234567890:1@s.whatsapp.net', + lid: '9876543210:1@lid', + }, + sendMessage: vi.fn().mockResolvedValue(undefined), + sendPresenceUpdate: vi.fn().mockResolvedValue(undefined), + groupFetchAllParticipating: vi.fn().mockResolvedValue({}), + end: vi.fn(), + // Expose the event emitter for triggering events in tests + _ev: ev, + }; + return sock; +} + +let fakeSocket: ReturnType; + +// Mock Baileys +vi.mock('@whiskeysockets/baileys', () => { + return { + default: vi.fn(() => fakeSocket), + Browsers: { macOS: vi.fn(() => ['macOS', 'Chrome', '']) }, + DisconnectReason: { + loggedOut: 401, + badSession: 500, + connectionClosed: 428, + connectionLost: 408, + connectionReplaced: 440, + timedOut: 408, + restartRequired: 515, + }, + makeCacheableSignalKeyStore: vi.fn((keys: unknown) => keys), + useMultiFileAuthState: vi.fn().mockResolvedValue({ + state: { + creds: {}, + keys: {}, + }, + saveCreds: vi.fn(), + }), + }; +}); + +import { WhatsAppChannel, WhatsAppChannelOpts } from './whatsapp.js'; +import { getLastGroupSync, updateChatName, setLastGroupSync } from '../db.js'; +import { transcribeAudioMessage } from '../transcription.js'; + +// --- Test helpers --- + +function createTestOpts(overrides?: Partial): WhatsAppChannelOpts { + return { + onMessage: vi.fn(), + onChatMetadata: vi.fn(), + registeredGroups: vi.fn(() => ({ + 'registered@g.us': { + name: 'Test Group', + folder: 'test-group', + trigger: '@Andy', + added_at: '2024-01-01T00:00:00.000Z', + }, + })), + ...overrides, + }; +} + +function triggerConnection(state: string, extra?: Record) { + fakeSocket._ev.emit('connection.update', { connection: state, ...extra }); +} + +function triggerDisconnect(statusCode: number) { + fakeSocket._ev.emit('connection.update', { + connection: 'close', + lastDisconnect: { + error: { output: { statusCode } }, + }, + }); +} + +async function triggerMessages(messages: unknown[]) { + fakeSocket._ev.emit('messages.upsert', { messages }); + // Flush microtasks so the async messages.upsert handler completes + await new Promise((r) => setTimeout(r, 0)); +} + +// --- Tests --- + +describe('WhatsAppChannel', () => { + beforeEach(() => { + fakeSocket = createFakeSocket(); + vi.mocked(getLastGroupSync).mockReturnValue(null); + }); + + afterEach(() => { + vi.restoreAllMocks(); + }); + + /** + * Helper: start connect, flush microtasks so event handlers are registered, + * then trigger the connection open event. Returns the resolved promise. + */ + async function connectChannel(channel: WhatsAppChannel): Promise { + const p = channel.connect(); + // Flush microtasks so connectInternal completes its await and registers handlers + await new Promise((r) => setTimeout(r, 0)); + triggerConnection('open'); + return p; + } + + // --- Connection lifecycle --- + + describe('connection lifecycle', () => { + it('resolves connect() when connection opens', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + expect(channel.isConnected()).toBe(true); + }); + + it('sets up LID to phone mapping on open', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + // The channel should have mapped the LID from sock.user + // We can verify by sending a message from a LID JID + // and checking the translated JID in the callback + }); + + it('flushes outgoing queue on reconnect', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + // Disconnect + (channel as any).connected = false; + + // Queue a message while disconnected + await channel.sendMessage('test@g.us', 'Queued message'); + expect(fakeSocket.sendMessage).not.toHaveBeenCalled(); + + // Reconnect + (channel as any).connected = true; + await (channel as any).flushOutgoingQueue(); + + // Group messages get prefixed when flushed + expect(fakeSocket.sendMessage).toHaveBeenCalledWith( + 'test@g.us', + { text: 'Andy: Queued message' }, + ); + }); + + it('disconnects cleanly', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await channel.disconnect(); + expect(channel.isConnected()).toBe(false); + expect(fakeSocket.end).toHaveBeenCalled(); + }); + }); + + // --- QR code and auth --- + + describe('authentication', () => { + it('exits process when QR code is emitted (no auth state)', async () => { + vi.useFakeTimers(); + const mockExit = vi.spyOn(process, 'exit').mockImplementation(() => undefined as never); + + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + // Start connect but don't await (it won't resolve - process exits) + channel.connect().catch(() => {}); + + // Flush microtasks so connectInternal registers handlers + await vi.advanceTimersByTimeAsync(0); + + // Emit QR code event + fakeSocket._ev.emit('connection.update', { qr: 'some-qr-data' }); + + // Advance timer past the 1000ms setTimeout before exit + await vi.advanceTimersByTimeAsync(1500); + + expect(mockExit).toHaveBeenCalledWith(1); + mockExit.mockRestore(); + vi.useRealTimers(); + }); + }); + + // --- Reconnection behavior --- + + describe('reconnection', () => { + it('reconnects on non-loggedOut disconnect', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + expect(channel.isConnected()).toBe(true); + + // Disconnect with a non-loggedOut reason (e.g., connectionClosed = 428) + triggerDisconnect(428); + + expect(channel.isConnected()).toBe(false); + // The channel should attempt to reconnect (calls connectInternal again) + }); + + it('exits on loggedOut disconnect', async () => { + const mockExit = vi.spyOn(process, 'exit').mockImplementation(() => undefined as never); + + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + // Disconnect with loggedOut reason (401) + triggerDisconnect(401); + + expect(channel.isConnected()).toBe(false); + expect(mockExit).toHaveBeenCalledWith(0); + mockExit.mockRestore(); + }); + + it('retries reconnection after 5s on failure', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + // Disconnect with stream error 515 + triggerDisconnect(515); + + // The channel sets a 5s retry — just verify it doesn't crash + await new Promise((r) => setTimeout(r, 100)); + }); + }); + + // --- Message handling --- + + describe('message handling', () => { + it('delivers message for registered group', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-1', + remoteJid: 'registered@g.us', + participant: '5551234@s.whatsapp.net', + fromMe: false, + }, + message: { conversation: 'Hello Andy' }, + pushName: 'Alice', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onChatMetadata).toHaveBeenCalledWith( + 'registered@g.us', + expect.any(String), + undefined, + 'whatsapp', + true, + ); + expect(opts.onMessage).toHaveBeenCalledWith( + 'registered@g.us', + expect.objectContaining({ + id: 'msg-1', + content: 'Hello Andy', + sender_name: 'Alice', + is_from_me: false, + }), + ); + }); + + it('only emits metadata for unregistered groups', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-2', + remoteJid: 'unregistered@g.us', + participant: '5551234@s.whatsapp.net', + fromMe: false, + }, + message: { conversation: 'Hello' }, + pushName: 'Bob', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onChatMetadata).toHaveBeenCalledWith( + 'unregistered@g.us', + expect.any(String), + undefined, + 'whatsapp', + true, + ); + expect(opts.onMessage).not.toHaveBeenCalled(); + }); + + it('ignores status@broadcast messages', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-3', + remoteJid: 'status@broadcast', + fromMe: false, + }, + message: { conversation: 'Status update' }, + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onChatMetadata).not.toHaveBeenCalled(); + expect(opts.onMessage).not.toHaveBeenCalled(); + }); + + it('ignores messages with no content', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-4', + remoteJid: 'registered@g.us', + fromMe: false, + }, + message: null, + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onMessage).not.toHaveBeenCalled(); + }); + + it('extracts text from extendedTextMessage', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-5', + remoteJid: 'registered@g.us', + participant: '5551234@s.whatsapp.net', + fromMe: false, + }, + message: { + extendedTextMessage: { text: 'A reply message' }, + }, + pushName: 'Charlie', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onMessage).toHaveBeenCalledWith( + 'registered@g.us', + expect.objectContaining({ content: 'A reply message' }), + ); + }); + + it('extracts caption from imageMessage', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-6', + remoteJid: 'registered@g.us', + participant: '5551234@s.whatsapp.net', + fromMe: false, + }, + message: { + imageMessage: { caption: 'Check this photo', mimetype: 'image/jpeg' }, + }, + pushName: 'Diana', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onMessage).toHaveBeenCalledWith( + 'registered@g.us', + expect.objectContaining({ content: 'Check this photo' }), + ); + }); + + it('extracts caption from videoMessage', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-7', + remoteJid: 'registered@g.us', + participant: '5551234@s.whatsapp.net', + fromMe: false, + }, + message: { + videoMessage: { caption: 'Watch this', mimetype: 'video/mp4' }, + }, + pushName: 'Eve', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onMessage).toHaveBeenCalledWith( + 'registered@g.us', + expect.objectContaining({ content: 'Watch this' }), + ); + }); + + it('transcribes voice messages', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-8', + remoteJid: 'registered@g.us', + participant: '5551234@s.whatsapp.net', + fromMe: false, + }, + message: { + audioMessage: { mimetype: 'audio/ogg; codecs=opus', ptt: true }, + }, + pushName: 'Frank', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(transcribeAudioMessage).toHaveBeenCalled(); + expect(opts.onMessage).toHaveBeenCalledTimes(1); + expect(opts.onMessage).toHaveBeenCalledWith( + 'registered@g.us', + expect.objectContaining({ content: '[Voice: Hello this is a voice message]' }), + ); + }); + + it('falls back when transcription returns null', async () => { + vi.mocked(transcribeAudioMessage).mockResolvedValueOnce(null); + + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-8b', + remoteJid: 'registered@g.us', + participant: '5551234@s.whatsapp.net', + fromMe: false, + }, + message: { + audioMessage: { mimetype: 'audio/ogg; codecs=opus', ptt: true }, + }, + pushName: 'Frank', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onMessage).toHaveBeenCalledTimes(1); + expect(opts.onMessage).toHaveBeenCalledWith( + 'registered@g.us', + expect.objectContaining({ content: '[Voice Message - transcription unavailable]' }), + ); + }); + + it('falls back when transcription throws', async () => { + vi.mocked(transcribeAudioMessage).mockRejectedValueOnce(new Error('API error')); + + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-8c', + remoteJid: 'registered@g.us', + participant: '5551234@s.whatsapp.net', + fromMe: false, + }, + message: { + audioMessage: { mimetype: 'audio/ogg; codecs=opus', ptt: true }, + }, + pushName: 'Frank', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onMessage).toHaveBeenCalledTimes(1); + expect(opts.onMessage).toHaveBeenCalledWith( + 'registered@g.us', + expect.objectContaining({ content: '[Voice Message - transcription failed]' }), + ); + }); + + it('uses sender JID when pushName is absent', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-9', + remoteJid: 'registered@g.us', + participant: '5551234@s.whatsapp.net', + fromMe: false, + }, + message: { conversation: 'No push name' }, + // pushName is undefined + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onMessage).toHaveBeenCalledWith( + 'registered@g.us', + expect.objectContaining({ sender_name: '5551234' }), + ); + }); + }); + + // --- LID ↔ JID translation --- + + describe('LID to JID translation', () => { + it('translates known LID to phone JID', async () => { + const opts = createTestOpts({ + registeredGroups: vi.fn(() => ({ + '1234567890@s.whatsapp.net': { + name: 'Self Chat', + folder: 'self-chat', + trigger: '@Andy', + added_at: '2024-01-01T00:00:00.000Z', + }, + })), + }); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + // The socket has lid '9876543210:1@lid' → phone '1234567890@s.whatsapp.net' + // Send a message from the LID + await triggerMessages([ + { + key: { + id: 'msg-lid', + remoteJid: '9876543210@lid', + fromMe: false, + }, + message: { conversation: 'From LID' }, + pushName: 'Self', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + // Should be translated to phone JID + expect(opts.onChatMetadata).toHaveBeenCalledWith( + '1234567890@s.whatsapp.net', + expect.any(String), + undefined, + 'whatsapp', + false, + ); + }); + + it('passes through non-LID JIDs unchanged', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-normal', + remoteJid: 'registered@g.us', + participant: '5551234@s.whatsapp.net', + fromMe: false, + }, + message: { conversation: 'Normal JID' }, + pushName: 'Grace', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + expect(opts.onChatMetadata).toHaveBeenCalledWith( + 'registered@g.us', + expect.any(String), + undefined, + 'whatsapp', + true, + ); + }); + + it('passes through unknown LID JIDs unchanged', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await triggerMessages([ + { + key: { + id: 'msg-unknown-lid', + remoteJid: '0000000000@lid', + fromMe: false, + }, + message: { conversation: 'Unknown LID' }, + pushName: 'Unknown', + messageTimestamp: Math.floor(Date.now() / 1000), + }, + ]); + + // Unknown LID passes through unchanged + expect(opts.onChatMetadata).toHaveBeenCalledWith( + '0000000000@lid', + expect.any(String), + undefined, + 'whatsapp', + false, + ); + }); + }); + + // --- Outgoing message queue --- + + describe('outgoing message queue', () => { + it('sends message directly when connected', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await channel.sendMessage('test@g.us', 'Hello'); + // Group messages get prefixed with assistant name + expect(fakeSocket.sendMessage).toHaveBeenCalledWith('test@g.us', { text: 'Andy: Hello' }); + }); + + it('prefixes direct chat messages on shared number', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await channel.sendMessage('123@s.whatsapp.net', 'Hello'); + // Shared number: DMs also get prefixed (needed for self-chat distinction) + expect(fakeSocket.sendMessage).toHaveBeenCalledWith('123@s.whatsapp.net', { text: 'Andy: Hello' }); + }); + + it('queues message when disconnected', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + // Don't connect — channel starts disconnected + await channel.sendMessage('test@g.us', 'Queued'); + expect(fakeSocket.sendMessage).not.toHaveBeenCalled(); + }); + + it('queues message on send failure', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + // Make sendMessage fail + fakeSocket.sendMessage.mockRejectedValueOnce(new Error('Network error')); + + await channel.sendMessage('test@g.us', 'Will fail'); + + // Should not throw, message queued for retry + // The queue should have the message + }); + + it('flushes multiple queued messages in order', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + // Queue messages while disconnected + await channel.sendMessage('test@g.us', 'First'); + await channel.sendMessage('test@g.us', 'Second'); + await channel.sendMessage('test@g.us', 'Third'); + + // Connect — flush happens automatically on open + await connectChannel(channel); + + // Give the async flush time to complete + await new Promise((r) => setTimeout(r, 50)); + + expect(fakeSocket.sendMessage).toHaveBeenCalledTimes(3); + // Group messages get prefixed + expect(fakeSocket.sendMessage).toHaveBeenNthCalledWith(1, 'test@g.us', { text: 'Andy: First' }); + expect(fakeSocket.sendMessage).toHaveBeenNthCalledWith(2, 'test@g.us', { text: 'Andy: Second' }); + expect(fakeSocket.sendMessage).toHaveBeenNthCalledWith(3, 'test@g.us', { text: 'Andy: Third' }); + }); + }); + + // --- Group metadata sync --- + + describe('group metadata sync', () => { + it('syncs group metadata on first connection', async () => { + fakeSocket.groupFetchAllParticipating.mockResolvedValue({ + 'group1@g.us': { subject: 'Group One' }, + 'group2@g.us': { subject: 'Group Two' }, + }); + + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + // Wait for async sync to complete + await new Promise((r) => setTimeout(r, 50)); + + expect(fakeSocket.groupFetchAllParticipating).toHaveBeenCalled(); + expect(updateChatName).toHaveBeenCalledWith('group1@g.us', 'Group One'); + expect(updateChatName).toHaveBeenCalledWith('group2@g.us', 'Group Two'); + expect(setLastGroupSync).toHaveBeenCalled(); + }); + + it('skips sync when synced recently', async () => { + // Last sync was 1 hour ago (within 24h threshold) + vi.mocked(getLastGroupSync).mockReturnValue( + new Date(Date.now() - 60 * 60 * 1000).toISOString(), + ); + + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await new Promise((r) => setTimeout(r, 50)); + + expect(fakeSocket.groupFetchAllParticipating).not.toHaveBeenCalled(); + }); + + it('forces sync regardless of cache', async () => { + vi.mocked(getLastGroupSync).mockReturnValue( + new Date(Date.now() - 60 * 60 * 1000).toISOString(), + ); + + fakeSocket.groupFetchAllParticipating.mockResolvedValue({ + 'group@g.us': { subject: 'Forced Group' }, + }); + + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await channel.syncGroupMetadata(true); + + expect(fakeSocket.groupFetchAllParticipating).toHaveBeenCalled(); + expect(updateChatName).toHaveBeenCalledWith('group@g.us', 'Forced Group'); + }); + + it('handles group sync failure gracefully', async () => { + fakeSocket.groupFetchAllParticipating.mockRejectedValue( + new Error('Network timeout'), + ); + + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + // Should not throw + await expect(channel.syncGroupMetadata(true)).resolves.toBeUndefined(); + }); + + it('skips groups with no subject', async () => { + fakeSocket.groupFetchAllParticipating.mockResolvedValue({ + 'group1@g.us': { subject: 'Has Subject' }, + 'group2@g.us': { subject: '' }, + 'group3@g.us': {}, + }); + + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + // Clear any calls from the automatic sync on connect + vi.mocked(updateChatName).mockClear(); + + await channel.syncGroupMetadata(true); + + expect(updateChatName).toHaveBeenCalledTimes(1); + expect(updateChatName).toHaveBeenCalledWith('group1@g.us', 'Has Subject'); + }); + }); + + // --- JID ownership --- + + describe('ownsJid', () => { + it('owns @g.us JIDs (WhatsApp groups)', () => { + const channel = new WhatsAppChannel(createTestOpts()); + expect(channel.ownsJid('12345@g.us')).toBe(true); + }); + + it('owns @s.whatsapp.net JIDs (WhatsApp DMs)', () => { + const channel = new WhatsAppChannel(createTestOpts()); + expect(channel.ownsJid('12345@s.whatsapp.net')).toBe(true); + }); + + it('does not own Telegram JIDs', () => { + const channel = new WhatsAppChannel(createTestOpts()); + expect(channel.ownsJid('tg:12345')).toBe(false); + }); + + it('does not own unknown JID formats', () => { + const channel = new WhatsAppChannel(createTestOpts()); + expect(channel.ownsJid('random-string')).toBe(false); + }); + }); + + // --- Typing indicator --- + + describe('setTyping', () => { + it('sends composing presence when typing', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await channel.setTyping('test@g.us', true); + expect(fakeSocket.sendPresenceUpdate).toHaveBeenCalledWith('composing', 'test@g.us'); + }); + + it('sends paused presence when stopping', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + await channel.setTyping('test@g.us', false); + expect(fakeSocket.sendPresenceUpdate).toHaveBeenCalledWith('paused', 'test@g.us'); + }); + + it('handles typing indicator failure gracefully', async () => { + const opts = createTestOpts(); + const channel = new WhatsAppChannel(opts); + + await connectChannel(channel); + + fakeSocket.sendPresenceUpdate.mockRejectedValueOnce(new Error('Failed')); + + // Should not throw + await expect(channel.setTyping('test@g.us', true)).resolves.toBeUndefined(); + }); + }); + + // --- Channel properties --- + + describe('channel properties', () => { + it('has name "whatsapp"', () => { + const channel = new WhatsAppChannel(createTestOpts()); + expect(channel.name).toBe('whatsapp'); + }); + + it('does not expose prefixAssistantName (prefix handled internally)', () => { + const channel = new WhatsAppChannel(createTestOpts()); + expect('prefixAssistantName' in channel).toBe(false); + }); + }); +}); diff --git a/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.test.ts.intent.md b/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.test.ts.intent.md new file mode 100644 index 0000000..5856320 --- /dev/null +++ b/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.test.ts.intent.md @@ -0,0 +1,26 @@ +# Intent: src/channels/whatsapp.test.ts modifications + +## What changed +Added mock for the transcription module and 3 new test cases for voice message handling. + +## Key sections + +### Mocks (top of file) +- Added: `vi.mock('../transcription.js', ...)` with `isVoiceMessage` and `transcribeAudioMessage` mocks +- Added: `import { transcribeAudioMessage } from '../transcription.js'` for test assertions + +### Test cases (inside "message handling" describe block) +- Changed: "handles message with no extractable text (e.g. voice note without caption)" → "transcribes voice messages" + - Now expects `[Voice: Hello this is a voice message]` instead of empty content +- Added: "falls back when transcription returns null" — expects `[Voice Message - transcription unavailable]` +- Added: "falls back when transcription throws" — expects `[Voice Message - transcription failed]` + +## Invariants (must-keep) +- All existing test cases for text, extendedTextMessage, imageMessage, videoMessage unchanged +- All connection lifecycle tests unchanged +- All LID translation tests unchanged +- All outgoing queue tests unchanged +- All group metadata sync tests unchanged +- All ownsJid and setTyping tests unchanged +- All existing mocks (config, logger, db, fs, child_process, baileys) unchanged +- Test helpers (createTestOpts, triggerConnection, triggerDisconnect, triggerMessages, connectChannel) unchanged diff --git a/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.ts b/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.ts new file mode 100644 index 0000000..6fb963b --- /dev/null +++ b/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.ts @@ -0,0 +1,344 @@ +import { exec } from 'child_process'; +import fs from 'fs'; +import path from 'path'; + +import makeWASocket, { + Browsers, + DisconnectReason, + WASocket, + makeCacheableSignalKeyStore, + useMultiFileAuthState, +} from '@whiskeysockets/baileys'; + +import { ASSISTANT_HAS_OWN_NUMBER, ASSISTANT_NAME, STORE_DIR } from '../config.js'; +import { + getLastGroupSync, + setLastGroupSync, + updateChatName, +} from '../db.js'; +import { logger } from '../logger.js'; +import { isVoiceMessage, transcribeAudioMessage } from '../transcription.js'; +import { Channel, OnInboundMessage, OnChatMetadata, RegisteredGroup } from '../types.js'; + +const GROUP_SYNC_INTERVAL_MS = 24 * 60 * 60 * 1000; // 24 hours + +export interface WhatsAppChannelOpts { + onMessage: OnInboundMessage; + onChatMetadata: OnChatMetadata; + registeredGroups: () => Record; +} + +export class WhatsAppChannel implements Channel { + name = 'whatsapp'; + + private sock!: WASocket; + private connected = false; + private lidToPhoneMap: Record = {}; + private outgoingQueue: Array<{ jid: string; text: string }> = []; + private flushing = false; + private groupSyncTimerStarted = false; + + private opts: WhatsAppChannelOpts; + + constructor(opts: WhatsAppChannelOpts) { + this.opts = opts; + } + + async connect(): Promise { + return new Promise((resolve, reject) => { + this.connectInternal(resolve).catch(reject); + }); + } + + private async connectInternal(onFirstOpen?: () => void): Promise { + const authDir = path.join(STORE_DIR, 'auth'); + fs.mkdirSync(authDir, { recursive: true }); + + const { state, saveCreds } = await useMultiFileAuthState(authDir); + + this.sock = makeWASocket({ + auth: { + creds: state.creds, + keys: makeCacheableSignalKeyStore(state.keys, logger), + }, + printQRInTerminal: false, + logger, + browser: Browsers.macOS('Chrome'), + }); + + this.sock.ev.on('connection.update', (update) => { + const { connection, lastDisconnect, qr } = update; + + if (qr) { + const msg = + 'WhatsApp authentication required. Run /setup in Claude Code.'; + logger.error(msg); + exec( + `osascript -e 'display notification "${msg}" with title "NanoClaw" sound name "Basso"'`, + ); + setTimeout(() => process.exit(1), 1000); + } + + if (connection === 'close') { + this.connected = false; + const reason = (lastDisconnect?.error as any)?.output?.statusCode; + const shouldReconnect = reason !== DisconnectReason.loggedOut; + logger.info({ reason, shouldReconnect, queuedMessages: this.outgoingQueue.length }, 'Connection closed'); + + if (shouldReconnect) { + logger.info('Reconnecting...'); + this.connectInternal().catch((err) => { + logger.error({ err }, 'Failed to reconnect, retrying in 5s'); + setTimeout(() => { + this.connectInternal().catch((err2) => { + logger.error({ err: err2 }, 'Reconnection retry failed'); + }); + }, 5000); + }); + } else { + logger.info('Logged out. Run /setup to re-authenticate.'); + process.exit(0); + } + } else if (connection === 'open') { + this.connected = true; + logger.info('Connected to WhatsApp'); + + // Announce availability so WhatsApp relays subsequent presence updates (typing indicators) + this.sock.sendPresenceUpdate('available').catch(() => {}); + + // Build LID to phone mapping from auth state for self-chat translation + if (this.sock.user) { + const phoneUser = this.sock.user.id.split(':')[0]; + const lidUser = this.sock.user.lid?.split(':')[0]; + if (lidUser && phoneUser) { + this.lidToPhoneMap[lidUser] = `${phoneUser}@s.whatsapp.net`; + logger.debug({ lidUser, phoneUser }, 'LID to phone mapping set'); + } + } + + // Flush any messages queued while disconnected + this.flushOutgoingQueue().catch((err) => + logger.error({ err }, 'Failed to flush outgoing queue'), + ); + + // Sync group metadata on startup (respects 24h cache) + this.syncGroupMetadata().catch((err) => + logger.error({ err }, 'Initial group sync failed'), + ); + // Set up daily sync timer (only once) + if (!this.groupSyncTimerStarted) { + this.groupSyncTimerStarted = true; + setInterval(() => { + this.syncGroupMetadata().catch((err) => + logger.error({ err }, 'Periodic group sync failed'), + ); + }, GROUP_SYNC_INTERVAL_MS); + } + + // Signal first connection to caller + if (onFirstOpen) { + onFirstOpen(); + onFirstOpen = undefined; + } + } + }); + + this.sock.ev.on('creds.update', saveCreds); + + this.sock.ev.on('messages.upsert', async ({ messages }) => { + for (const msg of messages) { + if (!msg.message) continue; + const rawJid = msg.key.remoteJid; + if (!rawJid || rawJid === 'status@broadcast') continue; + + // Translate LID JID to phone JID if applicable + const chatJid = await this.translateJid(rawJid); + + const timestamp = new Date( + Number(msg.messageTimestamp) * 1000, + ).toISOString(); + + // Always notify about chat metadata for group discovery + const isGroup = chatJid.endsWith('@g.us'); + this.opts.onChatMetadata(chatJid, timestamp, undefined, 'whatsapp', isGroup); + + // Only deliver full message for registered groups + const groups = this.opts.registeredGroups(); + if (groups[chatJid]) { + const content = + msg.message?.conversation || + msg.message?.extendedTextMessage?.text || + msg.message?.imageMessage?.caption || + msg.message?.videoMessage?.caption || + ''; + + const sender = msg.key.participant || msg.key.remoteJid || ''; + const senderName = msg.pushName || sender.split('@')[0]; + + const fromMe = msg.key.fromMe || false; + // Detect bot messages: with own number, fromMe is reliable + // since only the bot sends from that number. + // With shared number, bot messages carry the assistant name prefix + // (even in DMs/self-chat) so we check for that. + const isBotMessage = ASSISTANT_HAS_OWN_NUMBER + ? fromMe + : content.startsWith(`${ASSISTANT_NAME}:`); + + // Transcribe voice messages before storing + let finalContent = content; + if (isVoiceMessage(msg)) { + try { + const transcript = await transcribeAudioMessage(msg, this.sock); + if (transcript) { + finalContent = `[Voice: ${transcript}]`; + logger.info({ chatJid, length: transcript.length }, 'Transcribed voice message'); + } else { + finalContent = '[Voice Message - transcription unavailable]'; + } + } catch (err) { + logger.error({ err }, 'Voice transcription error'); + finalContent = '[Voice Message - transcription failed]'; + } + } + + this.opts.onMessage(chatJid, { + id: msg.key.id || '', + chat_jid: chatJid, + sender, + sender_name: senderName, + content: finalContent, + timestamp, + is_from_me: fromMe, + is_bot_message: isBotMessage, + }); + } + } + }); + } + + async sendMessage(jid: string, text: string): Promise { + // Prefix bot messages with assistant name so users know who's speaking. + // On a shared number, prefix is also needed in DMs (including self-chat) + // to distinguish bot output from user messages. + // Skip only when the assistant has its own dedicated phone number. + const prefixed = ASSISTANT_HAS_OWN_NUMBER + ? text + : `${ASSISTANT_NAME}: ${text}`; + + if (!this.connected) { + this.outgoingQueue.push({ jid, text: prefixed }); + logger.info({ jid, length: prefixed.length, queueSize: this.outgoingQueue.length }, 'WA disconnected, message queued'); + return; + } + try { + await this.sock.sendMessage(jid, { text: prefixed }); + logger.info({ jid, length: prefixed.length }, 'Message sent'); + } catch (err) { + // If send fails, queue it for retry on reconnect + this.outgoingQueue.push({ jid, text: prefixed }); + logger.warn({ jid, err, queueSize: this.outgoingQueue.length }, 'Failed to send, message queued'); + } + } + + isConnected(): boolean { + return this.connected; + } + + ownsJid(jid: string): boolean { + return jid.endsWith('@g.us') || jid.endsWith('@s.whatsapp.net'); + } + + async disconnect(): Promise { + this.connected = false; + this.sock?.end(undefined); + } + + async setTyping(jid: string, isTyping: boolean): Promise { + try { + const status = isTyping ? 'composing' : 'paused'; + logger.debug({ jid, status }, 'Sending presence update'); + await this.sock.sendPresenceUpdate(status, jid); + } catch (err) { + logger.debug({ jid, err }, 'Failed to update typing status'); + } + } + + /** + * Sync group metadata from WhatsApp. + * Fetches all participating groups and stores their names in the database. + * Called on startup, daily, and on-demand via IPC. + */ + async syncGroupMetadata(force = false): Promise { + if (!force) { + const lastSync = getLastGroupSync(); + if (lastSync) { + const lastSyncTime = new Date(lastSync).getTime(); + if (Date.now() - lastSyncTime < GROUP_SYNC_INTERVAL_MS) { + logger.debug({ lastSync }, 'Skipping group sync - synced recently'); + return; + } + } + } + + try { + logger.info('Syncing group metadata from WhatsApp...'); + const groups = await this.sock.groupFetchAllParticipating(); + + let count = 0; + for (const [jid, metadata] of Object.entries(groups)) { + if (metadata.subject) { + updateChatName(jid, metadata.subject); + count++; + } + } + + setLastGroupSync(); + logger.info({ count }, 'Group metadata synced'); + } catch (err) { + logger.error({ err }, 'Failed to sync group metadata'); + } + } + + private async translateJid(jid: string): Promise { + if (!jid.endsWith('@lid')) return jid; + const lidUser = jid.split('@')[0].split(':')[0]; + + // Check local cache first + const cached = this.lidToPhoneMap[lidUser]; + if (cached) { + logger.debug({ lidJid: jid, phoneJid: cached }, 'Translated LID to phone JID (cached)'); + return cached; + } + + // Query Baileys' signal repository for the mapping + try { + const pn = await this.sock.signalRepository?.lidMapping?.getPNForLID(jid); + if (pn) { + const phoneJid = `${pn.split('@')[0].split(':')[0]}@s.whatsapp.net`; + this.lidToPhoneMap[lidUser] = phoneJid; + logger.info({ lidJid: jid, phoneJid }, 'Translated LID to phone JID (signalRepository)'); + return phoneJid; + } + } catch (err) { + logger.debug({ err, jid }, 'Failed to resolve LID via signalRepository'); + } + + return jid; + } + + private async flushOutgoingQueue(): Promise { + if (this.flushing || this.outgoingQueue.length === 0) return; + this.flushing = true; + try { + logger.info({ count: this.outgoingQueue.length }, 'Flushing outgoing message queue'); + while (this.outgoingQueue.length > 0) { + const item = this.outgoingQueue.shift()!; + // Send directly — queued items are already prefixed by sendMessage + await this.sock.sendMessage(item.jid, { text: item.text }); + logger.info({ jid: item.jid, length: item.text.length }, 'Queued message sent'); + } + } finally { + this.flushing = false; + } + } +} diff --git a/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.ts.intent.md b/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.ts.intent.md new file mode 100644 index 0000000..0049fed --- /dev/null +++ b/.claude/skills/add-voice-transcription/modify/src/channels/whatsapp.ts.intent.md @@ -0,0 +1,27 @@ +# Intent: src/channels/whatsapp.ts modifications + +## What changed +Added voice message transcription support. When a WhatsApp voice note (PTT audio) arrives, it is downloaded and transcribed via OpenAI Whisper before being stored as message content. + +## Key sections + +### Imports (top of file) +- Added: `isVoiceMessage`, `transcribeAudioMessage` from `../transcription.js` + +### messages.upsert handler (inside connectInternal) +- Added: `let finalContent = content` variable to allow voice transcription to override text content +- Added: `isVoiceMessage(msg)` check after content extraction +- Added: try/catch block calling `transcribeAudioMessage(msg, this.sock)` + - Success: `finalContent = '[Voice: ]'` + - Null result: `finalContent = '[Voice Message - transcription unavailable]'` + - Error: `finalContent = '[Voice Message - transcription failed]'` +- Changed: `this.opts.onMessage()` call uses `finalContent` instead of `content` + +## Invariants (must-keep) +- All existing message handling (conversation, extendedTextMessage, imageMessage, videoMessage) unchanged +- Connection lifecycle (connect, reconnect, disconnect) unchanged +- LID translation logic unchanged +- Outgoing message queue unchanged +- Group metadata sync unchanged +- sendMessage prefix logic unchanged +- setTyping, ownsJid, isConnected — all unchanged diff --git a/.claude/skills/add-voice-transcription/tests/voice-transcription.test.ts b/.claude/skills/add-voice-transcription/tests/voice-transcription.test.ts new file mode 100644 index 0000000..76ebd0d --- /dev/null +++ b/.claude/skills/add-voice-transcription/tests/voice-transcription.test.ts @@ -0,0 +1,123 @@ +import { describe, expect, it } from 'vitest'; +import fs from 'fs'; +import path from 'path'; + +describe('voice-transcription skill package', () => { + const skillDir = path.resolve(__dirname, '..'); + + it('has a valid manifest', () => { + const manifestPath = path.join(skillDir, 'manifest.yaml'); + expect(fs.existsSync(manifestPath)).toBe(true); + + const content = fs.readFileSync(manifestPath, 'utf-8'); + expect(content).toContain('skill: voice-transcription'); + expect(content).toContain('version: 1.0.0'); + expect(content).toContain('openai'); + expect(content).toContain('OPENAI_API_KEY'); + }); + + it('has all files declared in adds', () => { + const transcriptionFile = path.join(skillDir, 'add', 'src', 'transcription.ts'); + expect(fs.existsSync(transcriptionFile)).toBe(true); + + const content = fs.readFileSync(transcriptionFile, 'utf-8'); + expect(content).toContain('transcribeAudioMessage'); + expect(content).toContain('isVoiceMessage'); + expect(content).toContain('transcribeWithOpenAI'); + expect(content).toContain('downloadMediaMessage'); + expect(content).toContain('readEnvFile'); + }); + + it('has all files declared in modifies', () => { + const whatsappFile = path.join(skillDir, 'modify', 'src', 'channels', 'whatsapp.ts'); + const whatsappTestFile = path.join(skillDir, 'modify', 'src', 'channels', 'whatsapp.test.ts'); + + expect(fs.existsSync(whatsappFile)).toBe(true); + expect(fs.existsSync(whatsappTestFile)).toBe(true); + }); + + it('has intent files for modified files', () => { + expect(fs.existsSync(path.join(skillDir, 'modify', 'src', 'channels', 'whatsapp.ts.intent.md'))).toBe(true); + expect(fs.existsSync(path.join(skillDir, 'modify', 'src', 'channels', 'whatsapp.test.ts.intent.md'))).toBe(true); + }); + + it('modified whatsapp.ts preserves core structure', () => { + const content = fs.readFileSync( + path.join(skillDir, 'modify', 'src', 'channels', 'whatsapp.ts'), + 'utf-8', + ); + + // Core class and methods preserved + expect(content).toContain('class WhatsAppChannel'); + expect(content).toContain('implements Channel'); + expect(content).toContain('async connect()'); + expect(content).toContain('async sendMessage('); + expect(content).toContain('isConnected()'); + expect(content).toContain('ownsJid('); + expect(content).toContain('async disconnect()'); + expect(content).toContain('async setTyping('); + expect(content).toContain('async syncGroupMetadata('); + expect(content).toContain('private async translateJid('); + expect(content).toContain('private async flushOutgoingQueue('); + + // Core imports preserved + expect(content).toContain('ASSISTANT_HAS_OWN_NUMBER'); + expect(content).toContain('ASSISTANT_NAME'); + expect(content).toContain('STORE_DIR'); + }); + + it('modified whatsapp.ts includes transcription integration', () => { + const content = fs.readFileSync( + path.join(skillDir, 'modify', 'src', 'channels', 'whatsapp.ts'), + 'utf-8', + ); + + // Transcription imports + expect(content).toContain("import { isVoiceMessage, transcribeAudioMessage } from '../transcription.js'"); + + // Voice message handling + expect(content).toContain('isVoiceMessage(msg)'); + expect(content).toContain('transcribeAudioMessage(msg, this.sock)'); + expect(content).toContain('finalContent'); + expect(content).toContain('[Voice:'); + expect(content).toContain('[Voice Message - transcription unavailable]'); + expect(content).toContain('[Voice Message - transcription failed]'); + }); + + it('modified whatsapp.test.ts includes transcription mock and tests', () => { + const content = fs.readFileSync( + path.join(skillDir, 'modify', 'src', 'channels', 'whatsapp.test.ts'), + 'utf-8', + ); + + // Transcription mock + expect(content).toContain("vi.mock('../transcription.js'"); + expect(content).toContain('isVoiceMessage'); + expect(content).toContain('transcribeAudioMessage'); + + // Voice transcription test cases + expect(content).toContain('transcribes voice messages'); + expect(content).toContain('falls back when transcription returns null'); + expect(content).toContain('falls back when transcription throws'); + expect(content).toContain('[Voice: Hello this is a voice message]'); + }); + + it('modified whatsapp.test.ts preserves all existing test sections', () => { + const content = fs.readFileSync( + path.join(skillDir, 'modify', 'src', 'channels', 'whatsapp.test.ts'), + 'utf-8', + ); + + // All existing test describe blocks preserved + expect(content).toContain("describe('connection lifecycle'"); + expect(content).toContain("describe('authentication'"); + expect(content).toContain("describe('reconnection'"); + expect(content).toContain("describe('message handling'"); + expect(content).toContain("describe('LID to JID translation'"); + expect(content).toContain("describe('outgoing message queue'"); + expect(content).toContain("describe('group metadata sync'"); + expect(content).toContain("describe('ownsJid'"); + expect(content).toContain("describe('setTyping'"); + expect(content).toContain("describe('channel properties'"); + }); +}); diff --git a/skills-engine/structured.ts b/skills-engine/structured.ts index f7fd119..76ad412 100644 --- a/skills-engine/structured.ts +++ b/skills-engine/structured.ts @@ -192,5 +192,5 @@ export function mergeDockerComposeServices( } export function runNpmInstall(): void { - execSync('npm install', { stdio: 'inherit', cwd: process.cwd() }); + execSync('npm install --legacy-peer-deps', { stdio: 'inherit', cwd: process.cwd() }); }