Fix/shadow env in container (#646)
* fix: shadow .env file in container to prevent agents from reading secrets The main agent's container mounts the project root read-only, which inadvertently exposed the .env file containing API keys. Mount /dev/null over /workspace/project/.env to shadow it — secrets are already passed via stdin and never need to be read from disk inside the container. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: adapt .env shadowing and runtime for Apple Container Apple Container (VirtioFS) only supports directory mounts, not file mounts. The previous /dev/null host-side mount over .env crashes with VZErrorDomain "A directory sharing device configuration is invalid". - Dockerfile: entrypoint now shadows .env via mount --bind inside the container, then drops privileges via setpriv to the host UID/GID - container-runner: main containers skip --user and pass RUN_UID/RUN_GID env vars so entrypoint starts as root for mount --bind - container-runtime: switch to Apple Container CLI (container), fix cleanupOrphans to use container list --format json - Skill: add Dockerfile and container-runner.ts to convert-to-apple-container skill (v1.1.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: revert src to Docker runtime, keep Apple Container in skill only The source files should remain Docker-compatible. The Apple Container adaptations live in the convert-to-apple-container skill and are applied on demand. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -13,11 +13,13 @@ This skill switches NanoClaw's container runtime from Docker to Apple Container
|
||||
- Startup check: `docker info` → `container system status` (with auto-start)
|
||||
- Orphan detection: `docker ps --filter` → `container ls --format json`
|
||||
- Build script default: `docker` → `container`
|
||||
- Dockerfile entrypoint: `.env` shadowing via `mount --bind` inside the container (Apple Container only supports directory mounts, not file mounts like Docker's `/dev/null` overlay)
|
||||
- Container runner: main-group containers start as root for `mount --bind`, then drop privileges via `setpriv`
|
||||
|
||||
**What stays the same:**
|
||||
- Dockerfile (shared by both runtimes)
|
||||
- Container runner code (`src/container-runner.ts`)
|
||||
- Mount security/allowlist validation
|
||||
- All exported interfaces and IPC protocol
|
||||
- Non-main container behavior (still uses `--user` flag)
|
||||
- All other functionality
|
||||
|
||||
## Prerequisites
|
||||
@@ -72,11 +74,15 @@ npx tsx scripts/apply-skill.ts .claude/skills/convert-to-apple-container
|
||||
This deterministically:
|
||||
- Replaces `src/container-runtime.ts` with the Apple Container implementation
|
||||
- Replaces `src/container-runtime.test.ts` with Apple Container-specific tests
|
||||
- Updates `src/container-runner.ts` with .env shadow mount fix and privilege dropping
|
||||
- Updates `container/Dockerfile` with entrypoint that shadows .env via `mount --bind`
|
||||
- Updates `container/build.sh` to default to `container` runtime
|
||||
- Records the application in `.nanoclaw/state.yaml`
|
||||
|
||||
If the apply reports merge conflicts, read the intent files:
|
||||
- `modify/src/container-runtime.ts.intent.md` — what changed and invariants
|
||||
- `modify/src/container-runner.ts.intent.md` — .env shadow and privilege drop changes
|
||||
- `modify/container/Dockerfile.intent.md` — entrypoint changes for .env shadowing
|
||||
- `modify/container/build.sh.intent.md` — what changed for build script
|
||||
|
||||
### Validate code changes
|
||||
@@ -172,4 +178,6 @@ Check directory permissions on the host. The container runs as uid 1000.
|
||||
|------|----------------|
|
||||
| `src/container-runtime.ts` | Full replacement — Docker → Apple Container API |
|
||||
| `src/container-runtime.test.ts` | Full replacement — tests for Apple Container behavior |
|
||||
| `src/container-runner.ts` | .env shadow mount removed, main containers start as root with privilege drop |
|
||||
| `container/Dockerfile` | Entrypoint: `mount --bind` for .env shadowing, `setpriv` privilege drop |
|
||||
| `container/build.sh` | Default runtime: `docker` → `container` |
|
||||
|
||||
@@ -1,12 +1,14 @@
|
||||
skill: convert-to-apple-container
|
||||
version: 1.0.0
|
||||
version: 1.1.0
|
||||
description: "Switch container runtime from Docker to Apple Container (macOS)"
|
||||
core_version: 0.1.0
|
||||
adds: []
|
||||
modifies:
|
||||
- src/container-runtime.ts
|
||||
- src/container-runtime.test.ts
|
||||
- src/container-runner.ts
|
||||
- container/build.sh
|
||||
- container/Dockerfile
|
||||
structured: {}
|
||||
conflicts: []
|
||||
depends: []
|
||||
|
||||
@@ -0,0 +1,68 @@
|
||||
# NanoClaw Agent Container
|
||||
# Runs Claude Agent SDK in isolated Linux VM with browser automation
|
||||
|
||||
FROM node:22-slim
|
||||
|
||||
# Install system dependencies for Chromium
|
||||
RUN apt-get update && apt-get install -y \
|
||||
chromium \
|
||||
fonts-liberation \
|
||||
fonts-noto-color-emoji \
|
||||
libgbm1 \
|
||||
libnss3 \
|
||||
libatk-bridge2.0-0 \
|
||||
libgtk-3-0 \
|
||||
libx11-xcb1 \
|
||||
libxcomposite1 \
|
||||
libxdamage1 \
|
||||
libxrandr2 \
|
||||
libasound2 \
|
||||
libpangocairo-1.0-0 \
|
||||
libcups2 \
|
||||
libdrm2 \
|
||||
libxshmfence1 \
|
||||
curl \
|
||||
git \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Set Chromium path for agent-browser
|
||||
ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
|
||||
ENV PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/bin/chromium
|
||||
|
||||
# Install agent-browser and claude-code globally
|
||||
RUN npm install -g agent-browser @anthropic-ai/claude-code
|
||||
|
||||
# Create app directory
|
||||
WORKDIR /app
|
||||
|
||||
# Copy package files first for better caching
|
||||
COPY agent-runner/package*.json ./
|
||||
|
||||
# Install dependencies
|
||||
RUN npm install
|
||||
|
||||
# Copy source code
|
||||
COPY agent-runner/ ./
|
||||
|
||||
# Build TypeScript
|
||||
RUN npm run build
|
||||
|
||||
# Create workspace directories
|
||||
RUN mkdir -p /workspace/group /workspace/global /workspace/extra /workspace/ipc/messages /workspace/ipc/tasks /workspace/ipc/input
|
||||
|
||||
# Create entrypoint script
|
||||
# Secrets are passed via stdin JSON — temp file is deleted immediately after Node reads it
|
||||
# Follow-up messages arrive via IPC files in /workspace/ipc/input/
|
||||
# Apple Container only supports directory mounts (VirtioFS), so .env cannot be
|
||||
# shadowed with a host-side /dev/null file mount. Instead the entrypoint starts
|
||||
# as root, uses mount --bind to shadow .env, then drops to the host user via setpriv.
|
||||
RUN printf '#!/bin/bash\nset -e\n\n# Shadow .env so the agent cannot read host secrets (requires root)\nif [ "$(id -u)" = "0" ] && [ -f /workspace/project/.env ]; then\n mount --bind /dev/null /workspace/project/.env\nfi\n\n# Compile agent-runner\ncd /app && npx tsc --outDir /tmp/dist 2>&1 >&2\nln -s /app/node_modules /tmp/dist/node_modules\nchmod -R a-w /tmp/dist\n\n# Capture stdin (secrets JSON) to temp file\ncat > /tmp/input.json\n\n# Drop privileges if running as root (main-group containers)\nif [ "$(id -u)" = "0" ] && [ -n "$RUN_UID" ]; then\n chown "$RUN_UID:$RUN_GID" /tmp/input.json /tmp/dist\n exec setpriv --reuid="$RUN_UID" --regid="$RUN_GID" --clear-groups -- node /tmp/dist/index.js < /tmp/input.json\nfi\n\nexec node /tmp/dist/index.js < /tmp/input.json\n' > /app/entrypoint.sh && chmod +x /app/entrypoint.sh
|
||||
|
||||
# Set ownership to node user (non-root) for writable directories
|
||||
RUN chown -R node:node /workspace && chmod 777 /home/node
|
||||
|
||||
# Set working directory to group workspace
|
||||
WORKDIR /workspace/group
|
||||
|
||||
# Entry point reads JSON from stdin, outputs JSON to stdout
|
||||
ENTRYPOINT ["/app/entrypoint.sh"]
|
||||
@@ -0,0 +1,31 @@
|
||||
# Intent: container/Dockerfile modifications
|
||||
|
||||
## What changed
|
||||
Updated the entrypoint script to shadow `.env` inside the container and drop privileges at runtime, replacing the Docker-style host-side file mount approach.
|
||||
|
||||
## Why
|
||||
Apple Container (VirtioFS) only supports directory mounts, not file mounts. The Docker approach of mounting `/dev/null` over `.env` from the host causes `VZErrorDomain Code=2 "A directory sharing device configuration is invalid"`. The fix moves the shadowing into the entrypoint using `mount --bind` (which works inside the Linux VM).
|
||||
|
||||
## Key sections
|
||||
|
||||
### Entrypoint script
|
||||
- Added: `mount --bind /dev/null /workspace/project/.env` when running as root and `.env` exists
|
||||
- Added: Privilege drop via `setpriv --reuid=$RUN_UID --regid=$RUN_GID --clear-groups` for main-group containers
|
||||
- Added: `chown` of `/tmp/input.json` and `/tmp/dist` to target user before dropping privileges
|
||||
- Removed: `USER node` directive — main containers start as root to perform the bind mount, then drop privileges in the entrypoint. Non-main containers still get `--user` from the host.
|
||||
|
||||
### Dual-path execution
|
||||
- Root path (main containers): shadow .env → compile → capture stdin → chown → setpriv drop → exec node
|
||||
- Non-root path (other containers): compile → capture stdin → exec node
|
||||
|
||||
## Invariants
|
||||
- The entrypoint still reads JSON from stdin and runs the agent-runner
|
||||
- The compiled output goes to `/tmp/dist` (read-only after build)
|
||||
- `node_modules` is symlinked, not copied
|
||||
- Non-main containers are unaffected (they arrive as non-root via `--user`)
|
||||
|
||||
## Must-keep
|
||||
- The `set -e` at the top
|
||||
- The stdin capture to `/tmp/input.json` (required because setpriv can't forward stdin piping)
|
||||
- The `chmod -R a-w /tmp/dist` (prevents agent from modifying its own runner)
|
||||
- The `chown -R node:node /workspace` in the build step
|
||||
@@ -0,0 +1,694 @@
|
||||
/**
|
||||
* Container Runner for NanoClaw
|
||||
* Spawns agent execution in containers and handles IPC
|
||||
*/
|
||||
import { ChildProcess, exec, spawn } from 'child_process';
|
||||
import fs from 'fs';
|
||||
import path from 'path';
|
||||
|
||||
import {
|
||||
CONTAINER_IMAGE,
|
||||
CONTAINER_MAX_OUTPUT_SIZE,
|
||||
CONTAINER_TIMEOUT,
|
||||
DATA_DIR,
|
||||
GROUPS_DIR,
|
||||
IDLE_TIMEOUT,
|
||||
TIMEZONE,
|
||||
} from './config.js';
|
||||
import { readEnvFile } from './env.js';
|
||||
import { resolveGroupFolderPath, resolveGroupIpcPath } from './group-folder.js';
|
||||
import { logger } from './logger.js';
|
||||
import {
|
||||
CONTAINER_RUNTIME_BIN,
|
||||
readonlyMountArgs,
|
||||
stopContainer,
|
||||
} from './container-runtime.js';
|
||||
import { validateAdditionalMounts } from './mount-security.js';
|
||||
import { RegisteredGroup } from './types.js';
|
||||
|
||||
// Sentinel markers for robust output parsing (must match agent-runner)
|
||||
const OUTPUT_START_MARKER = '---NANOCLAW_OUTPUT_START---';
|
||||
const OUTPUT_END_MARKER = '---NANOCLAW_OUTPUT_END---';
|
||||
|
||||
export interface ContainerInput {
|
||||
prompt: string;
|
||||
sessionId?: string;
|
||||
groupFolder: string;
|
||||
chatJid: string;
|
||||
isMain: boolean;
|
||||
isScheduledTask?: boolean;
|
||||
assistantName?: string;
|
||||
secrets?: Record<string, string>;
|
||||
}
|
||||
|
||||
export interface ContainerOutput {
|
||||
status: 'success' | 'error';
|
||||
result: string | null;
|
||||
newSessionId?: string;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
interface VolumeMount {
|
||||
hostPath: string;
|
||||
containerPath: string;
|
||||
readonly: boolean;
|
||||
}
|
||||
|
||||
function buildVolumeMounts(
|
||||
group: RegisteredGroup,
|
||||
isMain: boolean,
|
||||
): VolumeMount[] {
|
||||
const mounts: VolumeMount[] = [];
|
||||
const projectRoot = process.cwd();
|
||||
const groupDir = resolveGroupFolderPath(group.folder);
|
||||
|
||||
if (isMain) {
|
||||
// Main gets the project root read-only. Writable paths the agent needs
|
||||
// (group folder, IPC, .claude/) are mounted separately below.
|
||||
// Read-only prevents the agent from modifying host application code
|
||||
// (src/, dist/, package.json, etc.) which would bypass the sandbox
|
||||
// entirely on next restart.
|
||||
mounts.push({
|
||||
hostPath: projectRoot,
|
||||
containerPath: '/workspace/project',
|
||||
readonly: true,
|
||||
});
|
||||
|
||||
// Main also gets its group folder as the working directory
|
||||
mounts.push({
|
||||
hostPath: groupDir,
|
||||
containerPath: '/workspace/group',
|
||||
readonly: false,
|
||||
});
|
||||
} else {
|
||||
// Other groups only get their own folder
|
||||
mounts.push({
|
||||
hostPath: groupDir,
|
||||
containerPath: '/workspace/group',
|
||||
readonly: false,
|
||||
});
|
||||
|
||||
// Global memory directory (read-only for non-main)
|
||||
// Only directory mounts are supported, not file mounts
|
||||
const globalDir = path.join(GROUPS_DIR, 'global');
|
||||
if (fs.existsSync(globalDir)) {
|
||||
mounts.push({
|
||||
hostPath: globalDir,
|
||||
containerPath: '/workspace/global',
|
||||
readonly: true,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Per-group Claude sessions directory (isolated from other groups)
|
||||
// Each group gets their own .claude/ to prevent cross-group session access
|
||||
const groupSessionsDir = path.join(
|
||||
DATA_DIR,
|
||||
'sessions',
|
||||
group.folder,
|
||||
'.claude',
|
||||
);
|
||||
fs.mkdirSync(groupSessionsDir, { recursive: true });
|
||||
const settingsFile = path.join(groupSessionsDir, 'settings.json');
|
||||
if (!fs.existsSync(settingsFile)) {
|
||||
fs.writeFileSync(
|
||||
settingsFile,
|
||||
JSON.stringify(
|
||||
{
|
||||
env: {
|
||||
// Enable agent swarms (subagent orchestration)
|
||||
// https://code.claude.com/docs/en/agent-teams#orchestrate-teams-of-claude-code-sessions
|
||||
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS: '1',
|
||||
// Load CLAUDE.md from additional mounted directories
|
||||
// https://code.claude.com/docs/en/memory#load-memory-from-additional-directories
|
||||
CLAUDE_CODE_ADDITIONAL_DIRECTORIES_CLAUDE_MD: '1',
|
||||
// Enable Claude's memory feature (persists user preferences between sessions)
|
||||
// https://code.claude.com/docs/en/memory#manage-auto-memory
|
||||
CLAUDE_CODE_DISABLE_AUTO_MEMORY: '0',
|
||||
},
|
||||
},
|
||||
null,
|
||||
2,
|
||||
) + '\n',
|
||||
);
|
||||
}
|
||||
|
||||
// Sync skills from container/skills/ into each group's .claude/skills/
|
||||
const skillsSrc = path.join(process.cwd(), 'container', 'skills');
|
||||
const skillsDst = path.join(groupSessionsDir, 'skills');
|
||||
if (fs.existsSync(skillsSrc)) {
|
||||
for (const skillDir of fs.readdirSync(skillsSrc)) {
|
||||
const srcDir = path.join(skillsSrc, skillDir);
|
||||
if (!fs.statSync(srcDir).isDirectory()) continue;
|
||||
const dstDir = path.join(skillsDst, skillDir);
|
||||
fs.cpSync(srcDir, dstDir, { recursive: true });
|
||||
}
|
||||
}
|
||||
mounts.push({
|
||||
hostPath: groupSessionsDir,
|
||||
containerPath: '/home/node/.claude',
|
||||
readonly: false,
|
||||
});
|
||||
|
||||
// Per-group IPC namespace: each group gets its own IPC directory
|
||||
// This prevents cross-group privilege escalation via IPC
|
||||
const groupIpcDir = resolveGroupIpcPath(group.folder);
|
||||
fs.mkdirSync(path.join(groupIpcDir, 'messages'), { recursive: true });
|
||||
fs.mkdirSync(path.join(groupIpcDir, 'tasks'), { recursive: true });
|
||||
fs.mkdirSync(path.join(groupIpcDir, 'input'), { recursive: true });
|
||||
mounts.push({
|
||||
hostPath: groupIpcDir,
|
||||
containerPath: '/workspace/ipc',
|
||||
readonly: false,
|
||||
});
|
||||
|
||||
// Copy agent-runner source into a per-group writable location so agents
|
||||
// can customize it (add tools, change behavior) without affecting other
|
||||
// groups. Recompiled on container startup via entrypoint.sh.
|
||||
const agentRunnerSrc = path.join(
|
||||
projectRoot,
|
||||
'container',
|
||||
'agent-runner',
|
||||
'src',
|
||||
);
|
||||
const groupAgentRunnerDir = path.join(
|
||||
DATA_DIR,
|
||||
'sessions',
|
||||
group.folder,
|
||||
'agent-runner-src',
|
||||
);
|
||||
if (!fs.existsSync(groupAgentRunnerDir) && fs.existsSync(agentRunnerSrc)) {
|
||||
fs.cpSync(agentRunnerSrc, groupAgentRunnerDir, { recursive: true });
|
||||
}
|
||||
mounts.push({
|
||||
hostPath: groupAgentRunnerDir,
|
||||
containerPath: '/app/src',
|
||||
readonly: false,
|
||||
});
|
||||
|
||||
// Additional mounts validated against external allowlist (tamper-proof from containers)
|
||||
if (group.containerConfig?.additionalMounts) {
|
||||
const validatedMounts = validateAdditionalMounts(
|
||||
group.containerConfig.additionalMounts,
|
||||
group.name,
|
||||
isMain,
|
||||
);
|
||||
mounts.push(...validatedMounts);
|
||||
}
|
||||
|
||||
return mounts;
|
||||
}
|
||||
|
||||
/**
|
||||
* Read allowed secrets from .env for passing to the container via stdin.
|
||||
* Secrets are never written to disk or mounted as files.
|
||||
*/
|
||||
function readSecrets(): Record<string, string> {
|
||||
return readEnvFile(['CLAUDE_CODE_OAUTH_TOKEN', 'ANTHROPIC_API_KEY']);
|
||||
}
|
||||
|
||||
function buildContainerArgs(
|
||||
mounts: VolumeMount[],
|
||||
containerName: string,
|
||||
isMain: boolean,
|
||||
): string[] {
|
||||
const args: string[] = ['run', '-i', '--rm', '--name', containerName];
|
||||
|
||||
// Pass host timezone so container's local time matches the user's
|
||||
args.push('-e', `TZ=${TIMEZONE}`);
|
||||
|
||||
// Run as host user so bind-mounted files are accessible.
|
||||
// Skip when running as root (uid 0), as the container's node user (uid 1000),
|
||||
// or when getuid is unavailable (native Windows without WSL).
|
||||
const hostUid = process.getuid?.();
|
||||
const hostGid = process.getgid?.();
|
||||
if (hostUid != null && hostUid !== 0 && hostUid !== 1000) {
|
||||
if (isMain) {
|
||||
// Main containers start as root so the entrypoint can mount --bind
|
||||
// to shadow .env. Privileges are dropped via setpriv in entrypoint.sh.
|
||||
args.push('-e', `RUN_UID=${hostUid}`);
|
||||
args.push('-e', `RUN_GID=${hostGid}`);
|
||||
} else {
|
||||
args.push('--user', `${hostUid}:${hostGid}`);
|
||||
}
|
||||
args.push('-e', 'HOME=/home/node');
|
||||
}
|
||||
|
||||
for (const mount of mounts) {
|
||||
if (mount.readonly) {
|
||||
args.push(...readonlyMountArgs(mount.hostPath, mount.containerPath));
|
||||
} else {
|
||||
args.push('-v', `${mount.hostPath}:${mount.containerPath}`);
|
||||
}
|
||||
}
|
||||
|
||||
args.push(CONTAINER_IMAGE);
|
||||
|
||||
return args;
|
||||
}
|
||||
|
||||
export async function runContainerAgent(
|
||||
group: RegisteredGroup,
|
||||
input: ContainerInput,
|
||||
onProcess: (proc: ChildProcess, containerName: string) => void,
|
||||
onOutput?: (output: ContainerOutput) => Promise<void>,
|
||||
): Promise<ContainerOutput> {
|
||||
const startTime = Date.now();
|
||||
|
||||
const groupDir = resolveGroupFolderPath(group.folder);
|
||||
fs.mkdirSync(groupDir, { recursive: true });
|
||||
|
||||
const mounts = buildVolumeMounts(group, input.isMain);
|
||||
const safeName = group.folder.replace(/[^a-zA-Z0-9-]/g, '-');
|
||||
const containerName = `nanoclaw-${safeName}-${Date.now()}`;
|
||||
const containerArgs = buildContainerArgs(mounts, containerName, input.isMain);
|
||||
|
||||
logger.debug(
|
||||
{
|
||||
group: group.name,
|
||||
containerName,
|
||||
mounts: mounts.map(
|
||||
(m) =>
|
||||
`${m.hostPath} -> ${m.containerPath}${m.readonly ? ' (ro)' : ''}`,
|
||||
),
|
||||
containerArgs: containerArgs.join(' '),
|
||||
},
|
||||
'Container mount configuration',
|
||||
);
|
||||
|
||||
logger.info(
|
||||
{
|
||||
group: group.name,
|
||||
containerName,
|
||||
mountCount: mounts.length,
|
||||
isMain: input.isMain,
|
||||
},
|
||||
'Spawning container agent',
|
||||
);
|
||||
|
||||
const logsDir = path.join(groupDir, 'logs');
|
||||
fs.mkdirSync(logsDir, { recursive: true });
|
||||
|
||||
return new Promise((resolve) => {
|
||||
const container = spawn(CONTAINER_RUNTIME_BIN, containerArgs, {
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
onProcess(container, containerName);
|
||||
|
||||
let stdout = '';
|
||||
let stderr = '';
|
||||
let stdoutTruncated = false;
|
||||
let stderrTruncated = false;
|
||||
|
||||
// Pass secrets via stdin (never written to disk or mounted as files)
|
||||
input.secrets = readSecrets();
|
||||
container.stdin.write(JSON.stringify(input));
|
||||
container.stdin.end();
|
||||
// Remove secrets from input so they don't appear in logs
|
||||
delete input.secrets;
|
||||
|
||||
// Streaming output: parse OUTPUT_START/END marker pairs as they arrive
|
||||
let parseBuffer = '';
|
||||
let newSessionId: string | undefined;
|
||||
let outputChain = Promise.resolve();
|
||||
|
||||
container.stdout.on('data', (data) => {
|
||||
const chunk = data.toString();
|
||||
|
||||
// Always accumulate for logging
|
||||
if (!stdoutTruncated) {
|
||||
const remaining = CONTAINER_MAX_OUTPUT_SIZE - stdout.length;
|
||||
if (chunk.length > remaining) {
|
||||
stdout += chunk.slice(0, remaining);
|
||||
stdoutTruncated = true;
|
||||
logger.warn(
|
||||
{ group: group.name, size: stdout.length },
|
||||
'Container stdout truncated due to size limit',
|
||||
);
|
||||
} else {
|
||||
stdout += chunk;
|
||||
}
|
||||
}
|
||||
|
||||
// Stream-parse for output markers
|
||||
if (onOutput) {
|
||||
parseBuffer += chunk;
|
||||
let startIdx: number;
|
||||
while ((startIdx = parseBuffer.indexOf(OUTPUT_START_MARKER)) !== -1) {
|
||||
const endIdx = parseBuffer.indexOf(OUTPUT_END_MARKER, startIdx);
|
||||
if (endIdx === -1) break; // Incomplete pair, wait for more data
|
||||
|
||||
const jsonStr = parseBuffer
|
||||
.slice(startIdx + OUTPUT_START_MARKER.length, endIdx)
|
||||
.trim();
|
||||
parseBuffer = parseBuffer.slice(endIdx + OUTPUT_END_MARKER.length);
|
||||
|
||||
try {
|
||||
const parsed: ContainerOutput = JSON.parse(jsonStr);
|
||||
if (parsed.newSessionId) {
|
||||
newSessionId = parsed.newSessionId;
|
||||
}
|
||||
hadStreamingOutput = true;
|
||||
// Activity detected — reset the hard timeout
|
||||
resetTimeout();
|
||||
// Call onOutput for all markers (including null results)
|
||||
// so idle timers start even for "silent" query completions.
|
||||
outputChain = outputChain.then(() => onOutput(parsed));
|
||||
} catch (err) {
|
||||
logger.warn(
|
||||
{ group: group.name, error: err },
|
||||
'Failed to parse streamed output chunk',
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
container.stderr.on('data', (data) => {
|
||||
const chunk = data.toString();
|
||||
const lines = chunk.trim().split('\n');
|
||||
for (const line of lines) {
|
||||
if (line) logger.debug({ container: group.folder }, line);
|
||||
}
|
||||
// Don't reset timeout on stderr — SDK writes debug logs continuously.
|
||||
// Timeout only resets on actual output (OUTPUT_MARKER in stdout).
|
||||
if (stderrTruncated) return;
|
||||
const remaining = CONTAINER_MAX_OUTPUT_SIZE - stderr.length;
|
||||
if (chunk.length > remaining) {
|
||||
stderr += chunk.slice(0, remaining);
|
||||
stderrTruncated = true;
|
||||
logger.warn(
|
||||
{ group: group.name, size: stderr.length },
|
||||
'Container stderr truncated due to size limit',
|
||||
);
|
||||
} else {
|
||||
stderr += chunk;
|
||||
}
|
||||
});
|
||||
|
||||
let timedOut = false;
|
||||
let hadStreamingOutput = false;
|
||||
const configTimeout = group.containerConfig?.timeout || CONTAINER_TIMEOUT;
|
||||
// Grace period: hard timeout must be at least IDLE_TIMEOUT + 30s so the
|
||||
// graceful _close sentinel has time to trigger before the hard kill fires.
|
||||
const timeoutMs = Math.max(configTimeout, IDLE_TIMEOUT + 30_000);
|
||||
|
||||
const killOnTimeout = () => {
|
||||
timedOut = true;
|
||||
logger.error(
|
||||
{ group: group.name, containerName },
|
||||
'Container timeout, stopping gracefully',
|
||||
);
|
||||
exec(stopContainer(containerName), { timeout: 15000 }, (err) => {
|
||||
if (err) {
|
||||
logger.warn(
|
||||
{ group: group.name, containerName, err },
|
||||
'Graceful stop failed, force killing',
|
||||
);
|
||||
container.kill('SIGKILL');
|
||||
}
|
||||
});
|
||||
};
|
||||
|
||||
let timeout = setTimeout(killOnTimeout, timeoutMs);
|
||||
|
||||
// Reset the timeout whenever there's activity (streaming output)
|
||||
const resetTimeout = () => {
|
||||
clearTimeout(timeout);
|
||||
timeout = setTimeout(killOnTimeout, timeoutMs);
|
||||
};
|
||||
|
||||
container.on('close', (code) => {
|
||||
clearTimeout(timeout);
|
||||
const duration = Date.now() - startTime;
|
||||
|
||||
if (timedOut) {
|
||||
const ts = new Date().toISOString().replace(/[:.]/g, '-');
|
||||
const timeoutLog = path.join(logsDir, `container-${ts}.log`);
|
||||
fs.writeFileSync(
|
||||
timeoutLog,
|
||||
[
|
||||
`=== Container Run Log (TIMEOUT) ===`,
|
||||
`Timestamp: ${new Date().toISOString()}`,
|
||||
`Group: ${group.name}`,
|
||||
`Container: ${containerName}`,
|
||||
`Duration: ${duration}ms`,
|
||||
`Exit Code: ${code}`,
|
||||
`Had Streaming Output: ${hadStreamingOutput}`,
|
||||
].join('\n'),
|
||||
);
|
||||
|
||||
// Timeout after output = idle cleanup, not failure.
|
||||
// The agent already sent its response; this is just the
|
||||
// container being reaped after the idle period expired.
|
||||
if (hadStreamingOutput) {
|
||||
logger.info(
|
||||
{ group: group.name, containerName, duration, code },
|
||||
'Container timed out after output (idle cleanup)',
|
||||
);
|
||||
outputChain.then(() => {
|
||||
resolve({
|
||||
status: 'success',
|
||||
result: null,
|
||||
newSessionId,
|
||||
});
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
logger.error(
|
||||
{ group: group.name, containerName, duration, code },
|
||||
'Container timed out with no output',
|
||||
);
|
||||
|
||||
resolve({
|
||||
status: 'error',
|
||||
result: null,
|
||||
error: `Container timed out after ${configTimeout}ms`,
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
||||
const logFile = path.join(logsDir, `container-${timestamp}.log`);
|
||||
const isVerbose =
|
||||
process.env.LOG_LEVEL === 'debug' || process.env.LOG_LEVEL === 'trace';
|
||||
|
||||
const logLines = [
|
||||
`=== Container Run Log ===`,
|
||||
`Timestamp: ${new Date().toISOString()}`,
|
||||
`Group: ${group.name}`,
|
||||
`IsMain: ${input.isMain}`,
|
||||
`Duration: ${duration}ms`,
|
||||
`Exit Code: ${code}`,
|
||||
`Stdout Truncated: ${stdoutTruncated}`,
|
||||
`Stderr Truncated: ${stderrTruncated}`,
|
||||
``,
|
||||
];
|
||||
|
||||
const isError = code !== 0;
|
||||
|
||||
if (isVerbose || isError) {
|
||||
logLines.push(
|
||||
`=== Input ===`,
|
||||
JSON.stringify(input, null, 2),
|
||||
``,
|
||||
`=== Container Args ===`,
|
||||
containerArgs.join(' '),
|
||||
``,
|
||||
`=== Mounts ===`,
|
||||
mounts
|
||||
.map(
|
||||
(m) =>
|
||||
`${m.hostPath} -> ${m.containerPath}${m.readonly ? ' (ro)' : ''}`,
|
||||
)
|
||||
.join('\n'),
|
||||
``,
|
||||
`=== Stderr${stderrTruncated ? ' (TRUNCATED)' : ''} ===`,
|
||||
stderr,
|
||||
``,
|
||||
`=== Stdout${stdoutTruncated ? ' (TRUNCATED)' : ''} ===`,
|
||||
stdout,
|
||||
);
|
||||
} else {
|
||||
logLines.push(
|
||||
`=== Input Summary ===`,
|
||||
`Prompt length: ${input.prompt.length} chars`,
|
||||
`Session ID: ${input.sessionId || 'new'}`,
|
||||
``,
|
||||
`=== Mounts ===`,
|
||||
mounts
|
||||
.map((m) => `${m.containerPath}${m.readonly ? ' (ro)' : ''}`)
|
||||
.join('\n'),
|
||||
``,
|
||||
);
|
||||
}
|
||||
|
||||
fs.writeFileSync(logFile, logLines.join('\n'));
|
||||
logger.debug({ logFile, verbose: isVerbose }, 'Container log written');
|
||||
|
||||
if (code !== 0) {
|
||||
logger.error(
|
||||
{
|
||||
group: group.name,
|
||||
code,
|
||||
duration,
|
||||
stderr,
|
||||
stdout,
|
||||
logFile,
|
||||
},
|
||||
'Container exited with error',
|
||||
);
|
||||
|
||||
resolve({
|
||||
status: 'error',
|
||||
result: null,
|
||||
error: `Container exited with code ${code}: ${stderr.slice(-200)}`,
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
// Streaming mode: wait for output chain to settle, return completion marker
|
||||
if (onOutput) {
|
||||
outputChain.then(() => {
|
||||
logger.info(
|
||||
{ group: group.name, duration, newSessionId },
|
||||
'Container completed (streaming mode)',
|
||||
);
|
||||
resolve({
|
||||
status: 'success',
|
||||
result: null,
|
||||
newSessionId,
|
||||
});
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
// Legacy mode: parse the last output marker pair from accumulated stdout
|
||||
try {
|
||||
// Extract JSON between sentinel markers for robust parsing
|
||||
const startIdx = stdout.indexOf(OUTPUT_START_MARKER);
|
||||
const endIdx = stdout.indexOf(OUTPUT_END_MARKER);
|
||||
|
||||
let jsonLine: string;
|
||||
if (startIdx !== -1 && endIdx !== -1 && endIdx > startIdx) {
|
||||
jsonLine = stdout
|
||||
.slice(startIdx + OUTPUT_START_MARKER.length, endIdx)
|
||||
.trim();
|
||||
} else {
|
||||
// Fallback: last non-empty line (backwards compatibility)
|
||||
const lines = stdout.trim().split('\n');
|
||||
jsonLine = lines[lines.length - 1];
|
||||
}
|
||||
|
||||
const output: ContainerOutput = JSON.parse(jsonLine);
|
||||
|
||||
logger.info(
|
||||
{
|
||||
group: group.name,
|
||||
duration,
|
||||
status: output.status,
|
||||
hasResult: !!output.result,
|
||||
},
|
||||
'Container completed',
|
||||
);
|
||||
|
||||
resolve(output);
|
||||
} catch (err) {
|
||||
logger.error(
|
||||
{
|
||||
group: group.name,
|
||||
stdout,
|
||||
stderr,
|
||||
error: err,
|
||||
},
|
||||
'Failed to parse container output',
|
||||
);
|
||||
|
||||
resolve({
|
||||
status: 'error',
|
||||
result: null,
|
||||
error: `Failed to parse container output: ${err instanceof Error ? err.message : String(err)}`,
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
container.on('error', (err) => {
|
||||
clearTimeout(timeout);
|
||||
logger.error(
|
||||
{ group: group.name, containerName, error: err },
|
||||
'Container spawn error',
|
||||
);
|
||||
resolve({
|
||||
status: 'error',
|
||||
result: null,
|
||||
error: `Container spawn error: ${err.message}`,
|
||||
});
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
export function writeTasksSnapshot(
|
||||
groupFolder: string,
|
||||
isMain: boolean,
|
||||
tasks: Array<{
|
||||
id: string;
|
||||
groupFolder: string;
|
||||
prompt: string;
|
||||
schedule_type: string;
|
||||
schedule_value: string;
|
||||
status: string;
|
||||
next_run: string | null;
|
||||
}>,
|
||||
): void {
|
||||
// Write filtered tasks to the group's IPC directory
|
||||
const groupIpcDir = resolveGroupIpcPath(groupFolder);
|
||||
fs.mkdirSync(groupIpcDir, { recursive: true });
|
||||
|
||||
// Main sees all tasks, others only see their own
|
||||
const filteredTasks = isMain
|
||||
? tasks
|
||||
: tasks.filter((t) => t.groupFolder === groupFolder);
|
||||
|
||||
const tasksFile = path.join(groupIpcDir, 'current_tasks.json');
|
||||
fs.writeFileSync(tasksFile, JSON.stringify(filteredTasks, null, 2));
|
||||
}
|
||||
|
||||
export interface AvailableGroup {
|
||||
jid: string;
|
||||
name: string;
|
||||
lastActivity: string;
|
||||
isRegistered: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* Write available groups snapshot for the container to read.
|
||||
* Only main group can see all available groups (for activation).
|
||||
* Non-main groups only see their own registration status.
|
||||
*/
|
||||
export function writeGroupsSnapshot(
|
||||
groupFolder: string,
|
||||
isMain: boolean,
|
||||
groups: AvailableGroup[],
|
||||
registeredJids: Set<string>,
|
||||
): void {
|
||||
const groupIpcDir = resolveGroupIpcPath(groupFolder);
|
||||
fs.mkdirSync(groupIpcDir, { recursive: true });
|
||||
|
||||
// Main sees all groups; others see nothing (they can't activate groups)
|
||||
const visibleGroups = isMain ? groups : [];
|
||||
|
||||
const groupsFile = path.join(groupIpcDir, 'available_groups.json');
|
||||
fs.writeFileSync(
|
||||
groupsFile,
|
||||
JSON.stringify(
|
||||
{
|
||||
groups: visibleGroups,
|
||||
lastSync: new Date().toISOString(),
|
||||
},
|
||||
null,
|
||||
2,
|
||||
),
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,33 @@
|
||||
# Intent: src/container-runner.ts modifications
|
||||
|
||||
## What changed
|
||||
Updated `buildContainerArgs` to support Apple Container's .env shadowing mechanism. The function now accepts an `isMain` parameter and uses it to decide how container user identity is configured.
|
||||
|
||||
## Why
|
||||
Apple Container (VirtioFS) only supports directory mounts, not file mounts. The previous approach of mounting `/dev/null` over `.env` from the host causes a `VZErrorDomain` crash. Instead, main-group containers now start as root so the entrypoint can `mount --bind /dev/null` over `.env` inside the Linux VM, then drop to the host user via `setpriv`.
|
||||
|
||||
## Key sections
|
||||
|
||||
### buildContainerArgs (signature change)
|
||||
- Added: `isMain: boolean` parameter
|
||||
- Main containers: passes `RUN_UID`/`RUN_GID` env vars instead of `--user`, so the container starts as root
|
||||
- Non-main containers: unchanged, still uses `--user` flag
|
||||
|
||||
### buildVolumeMounts
|
||||
- Removed: the `/dev/null` → `/workspace/project/.env` shadow mount (was in the committed `37228a9` fix)
|
||||
- The .env shadowing is now handled inside the container entrypoint instead
|
||||
|
||||
### runContainerAgent (call site)
|
||||
- Changed: `buildContainerArgs(mounts, containerName)` → `buildContainerArgs(mounts, containerName, input.isMain)`
|
||||
|
||||
## Invariants
|
||||
- All exported interfaces unchanged: `ContainerInput`, `ContainerOutput`, `runContainerAgent`, `writeTasksSnapshot`, `writeGroupsSnapshot`, `AvailableGroup`
|
||||
- Non-main containers behave identically (still get `--user` flag)
|
||||
- Mount list for non-main containers is unchanged
|
||||
- Secrets still passed via stdin, never mounted as files
|
||||
- Output parsing (streaming + legacy) unchanged
|
||||
|
||||
## Must-keep
|
||||
- The `isMain` parameter on `buildContainerArgs` (consumed by `runContainerAgent`)
|
||||
- The `RUN_UID`/`RUN_GID` env vars for main containers (consumed by entrypoint.sh)
|
||||
- The `--user` flag for non-main containers (file permission compatibility)
|
||||
Reference in New Issue
Block a user