Files

glifocat af937d6453 feat(skills): add image vision skill for WhatsApp (#770 )

* chore: prepare image-vision skill for template regeneration

- Delete stale modify/*.ts templates (built against 1.1.2)
- Update core_version to 1.2.6
- Strip fork-specific details from intent docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(skills): regenerate image-vision modify/ templates against upstream

Templates regenerated against upstream 1.2.6:
- src/container-runner.ts: imageAttachments field in ContainerInput
- src/index.ts: parseImageReferences + threading to runAgent
- src/channels/whatsapp.ts: downloadMediaMessage + image handling block
- src/channels/whatsapp.test.ts: image mocks + 4 test cases
- container/agent-runner/src/index.ts: ContentBlock types, pushMultimodal, image loading

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: update image-vision tests for upstream templates

- Relax downloadMediaMessage import pattern check (multi-line import)
- Remove check for [Image - processing failed] (not in upstream template)
- Add vitest.skills.config.ts for skill package test runs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: update image-vision core_version to 1.2.8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-06 18:52:59 +02:00

2.0 KiB

Raw Blame History

name, description

name	description
add-image-vision	Add image vision to NanoClaw agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks.

Image Vision Skill

Adds the ability for NanoClaw agents to see and understand images sent via WhatsApp. Images are downloaded, resized with sharp, saved to the group workspace, and passed to the agent as base64-encoded multimodal content blocks.

Phase 1: Pre-flight

Check .nanoclaw/state.yaml for add-image-vision — skip if already applied
Confirm sharp is installable (native bindings require build tools)

Phase 2: Apply Code Changes

Initialize the skills system if not already done:

npx tsx -e "import { initNanoclawDir } from './skills-engine/init.ts'; initNanoclawDir();"

Apply the skill:

npx tsx skills-engine/apply-skill.ts add-image-vision

Install new dependency:
```
npm install sharp
```
Validate:
```
npm run typecheck
npm test
```

Phase 3: Configure

Rebuild the container (agent-runner changes need a rebuild):
```
./container/build.sh
```

Sync agent-runner source to group caches:

for dir in data/sessions/*/agent-runner-src/; do
  cp container/agent-runner/src/*.ts "$dir"
done

Restart the service:

launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Phase 4: Verify

Send an image in a registered WhatsApp group
Check the agent responds with understanding of the image content
Check logs for "Processed image attachment":
```
tail -50 groups/*/logs/container-*.log
```

Troubleshooting

"Image - download failed": Check WhatsApp connection stability. The download may timeout on slow connections.
"Image - processing failed": Sharp may not be installed correctly. Run npm ls sharp to verify.
Agent doesn't mention image content: Check container logs for "Loaded image" messages. If missing, ensure agent-runner source was synced to group caches.

2.0 KiB Raw Blame History