diff --git a/docs/DEBUG_CHECKLIST.md b/docs/DEBUG_CHECKLIST.md index c1d53f1..a7b3de1 100644 --- a/docs/DEBUG_CHECKLIST.md +++ b/docs/DEBUG_CHECKLIST.md @@ -11,6 +11,34 @@ Both timers fire at the same time, so containers always exit via hard SIGKILL (c ### 3. Cursor advanced before agent succeeds `processGroupMessages` advances `lastAgentTimestamp` before the agent runs. If the container times out, retries find no messages (cursor already past them). Messages are permanently lost on timeout. +### 4. Kubernetes image garbage collection deletes nanoclaw-agent image + +**Symptoms**: `Container exited with code 125: pull access denied for nanoclaw-agent` — the container image disappears overnight or after a few hours, even though you just built it. + +**Cause**: If your container runtime has Kubernetes enabled (Rancher Desktop enables it by default), the kubelet runs image garbage collection when disk usage exceeds 85%. NanoClaw containers are ephemeral (run and exit), so `nanoclaw-agent:latest` is never protected by a running container. The kubelet sees it as unused and deletes it — often overnight when no messages are being processed. Other images (docker-compose services) survive because they have long-running containers referencing them. + +**Fix**: Disable Kubernetes if you don't need it: +```bash +# Rancher Desktop +rdctl set --kubernetes-enabled=false + +# Then rebuild the container image +./container/build.sh +``` + +**Diagnosis**: Check the k3s log for image GC activity: +```bash +grep -i "nanoclaw" ~/Library/Logs/rancher-desktop/k3s.log +# Look for: "Removing image to free bytes" with the nanoclaw-agent image ID +``` + +Check NanoClaw logs for image status: +```bash +grep -E "image found|image NOT found|image missing" logs/nanoclaw.log +``` + +If you need Kubernetes enabled, set `CONTAINER_IMAGE` to an image stored in a registry that the kubelet won't GC, or raise the GC thresholds. + ## Quick Status Check ```bash