Problem
Workflows that use MCP tools for build/test validation during long file-exploration phases hit the MCP connection inactivity timeout (~5 minutes). When the agent finally attempts validation, the MCP transport has been torn down, resulting in MCP error -32003: context canceled. The workflow then fails at the last step rather than early, wasting the entire preceding session context.
Evidence
- Analysis window: 2026-04-13 to 2026-04-27
- PRs analyzed: 1,000; sessions analyzed: 50 (single-day snapshot)
- Key metrics and examples:
- PR body (merged fix): "MCP server connections (HTTP/WebSocket) time out after ~5 min of inactivity. The Go Logger workflow's long file-exploration turns exceed this threshold, causing
mcpscripts.make build and mcpscripts.go test to fail with MCP error -32003: context canceled when the agent finally attempts validation."
- The fix for the Logger workflow replaced MCP tool calls with direct
bash calls (make build, make test-unit) — confirming the pattern is real and solvable
- This pattern affects any workflow where exploration turns are long (reading many files, grepping large codebases) before a terminal MCP-based validation step
- Session data shows 14/50 sessions (28%) ended with
action_required — indicating workflows requiring intervention, consistent with late-stage failures
Proposed Change
- Audit all workflows that call
mcpscripts.* or other MCP tools only at the end of a session and replace end-of-session validation with equivalent bash commands (make build, make test-unit, make recompile).
- Add an intermediate validation checkpoint at the midpoint of long workflows (after the first major code edit) using bash, not MCP, so failures surface before full context is consumed.
- Document the "bash-over-MCP for validation" rule in AGENTS.md and workflow authoring guidelines to prevent the pattern from re-emerging in new workflows.
Expected Impact
- Prevent full-session failures caused by MCP timeout on any workflow with exploration phases > 5 minutes
- Reduce
action_required intervention rate by catching failures earlier
- Lower median session wall-clock time by catching compile/test errors before exploration completes
Notes
- Distinct root cause category: late-stage MCP validation against long-lived connections
- Data quality caveats:
events.jsonl not accessible; evidence from PR bodies and session conclusion distribution. The exact number of affected workflows beyond the Logger workflow is unknown without log access.
Generated by Copilot Opt · ● 955.5K · ◷
Problem
Workflows that use MCP tools for build/test validation during long file-exploration phases hit the MCP connection inactivity timeout (~5 minutes). When the agent finally attempts validation, the MCP transport has been torn down, resulting in
MCP error -32003: context canceled. The workflow then fails at the last step rather than early, wasting the entire preceding session context.Evidence
mcpscripts.make buildandmcpscripts.go testto fail withMCP error -32003: context canceledwhen the agent finally attempts validation."bashcalls (make build,make test-unit) — confirming the pattern is real and solvableaction_required— indicating workflows requiring intervention, consistent with late-stage failuresProposed Change
mcpscripts.*or other MCP tools only at the end of a session and replace end-of-session validation with equivalentbashcommands (make build,make test-unit,make recompile).Expected Impact
action_requiredintervention rate by catching failures earlierNotes
events.jsonlnot accessible; evidence from PR bodies and session conclusion distribution. The exact number of affected workflows beyond the Logger workflow is unknown without log access.