Skip to content

[copilot-opt] Replace end-of-session MCP validation with bash calls to prevent ~5min inactivity timeouts #28706

@github-actions

Description

@github-actions

Problem

Workflows that use MCP tools for build/test validation during long file-exploration phases hit the MCP connection inactivity timeout (~5 minutes). When the agent finally attempts validation, the MCP transport has been torn down, resulting in MCP error -32003: context canceled. The workflow then fails at the last step rather than early, wasting the entire preceding session context.

Evidence

  • Analysis window: 2026-04-13 to 2026-04-27
  • PRs analyzed: 1,000; sessions analyzed: 50 (single-day snapshot)
  • Key metrics and examples:
    • PR body (merged fix): "MCP server connections (HTTP/WebSocket) time out after ~5 min of inactivity. The Go Logger workflow's long file-exploration turns exceed this threshold, causing mcpscripts.make build and mcpscripts.go test to fail with MCP error -32003: context canceled when the agent finally attempts validation."
    • The fix for the Logger workflow replaced MCP tool calls with direct bash calls (make build, make test-unit) — confirming the pattern is real and solvable
    • This pattern affects any workflow where exploration turns are long (reading many files, grepping large codebases) before a terminal MCP-based validation step
    • Session data shows 14/50 sessions (28%) ended with action_required — indicating workflows requiring intervention, consistent with late-stage failures

Proposed Change

  1. Audit all workflows that call mcpscripts.* or other MCP tools only at the end of a session and replace end-of-session validation with equivalent bash commands (make build, make test-unit, make recompile).
  2. Add an intermediate validation checkpoint at the midpoint of long workflows (after the first major code edit) using bash, not MCP, so failures surface before full context is consumed.
  3. Document the "bash-over-MCP for validation" rule in AGENTS.md and workflow authoring guidelines to prevent the pattern from re-emerging in new workflows.

Expected Impact

  • Prevent full-session failures caused by MCP timeout on any workflow with exploration phases > 5 minutes
  • Reduce action_required intervention rate by catching failures earlier
  • Lower median session wall-clock time by catching compile/test errors before exploration completes

Notes

  • Distinct root cause category: late-stage MCP validation against long-lived connections
  • Data quality caveats: events.jsonl not accessible; evidence from PR bodies and session conclusion distribution. The exact number of affected workflows beyond the Logger workflow is unknown without log access.

Generated by Copilot Opt · ● 955.5K ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions