From 8935e4f636ced7c94b0a56a28affb4da3581e20f Mon Sep 17 00:00:00 2001 From: James Schindler Date: Fri, 27 Mar 2026 08:28:58 -0400 Subject: [PATCH] docs: add k8s image GC known issue to debug checklist Kubernetes image garbage collection silently deletes the nanoclaw-agent image when disk usage is high because ephemeral containers don't protect the image from GC. Documents symptoms, cause, fix, and diagnosis. --- docs/DEBUG_CHECKLIST.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/docs/DEBUG_CHECKLIST.md b/docs/DEBUG_CHECKLIST.md index 5597067..a04d88f 100644 --- a/docs/DEBUG_CHECKLIST.md +++ b/docs/DEBUG_CHECKLIST.md @@ -11,6 +11,34 @@ Both timers fire at the same time, so containers always exit via hard SIGKILL (c ### 3. Cursor advanced before agent succeeds `processGroupMessages` advances `lastAgentTimestamp` before the agent runs. If the container times out, retries find no messages (cursor already past them). Messages are permanently lost on timeout. +### 4. Kubernetes image garbage collection deletes nanoclaw-agent image + +**Symptoms**: `Container exited with code 125: pull access denied for nanoclaw-agent` — the container image disappears overnight or after a few hours, even though you just built it. + +**Cause**: If your container runtime has Kubernetes enabled (Rancher Desktop enables it by default), the kubelet runs image garbage collection when disk usage exceeds 85%. NanoClaw containers are ephemeral (run and exit), so `nanoclaw-agent:latest` is never protected by a running container. The kubelet sees it as unused and deletes it — often overnight when no messages are being processed. Other images (docker-compose services) survive because they have long-running containers referencing them. + +**Fix**: Disable Kubernetes if you don't need it: +```bash +# Rancher Desktop +rdctl set --kubernetes-enabled=false + +# Then rebuild the container image +./container/build.sh +``` + +**Diagnosis**: Check the k3s log for image GC activity: +```bash +grep -i "nanoclaw" ~/Library/Logs/rancher-desktop/k3s.log +# Look for: "Removing image to free bytes" with the nanoclaw-agent image ID +``` + +Check NanoClaw logs for image status: +```bash +grep -E "image found|image NOT found|image missing" logs/nanoclaw.log +``` + +If you need Kubernetes enabled, set `CONTAINER_IMAGE` to an image stored in a registry that the kubelet won't GC, or raise the GC thresholds. + ## Quick Status Check ```bash