Show HN: Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens https://ift.tt/VDlRvS3
Show HN: Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens LLM agents often place raw JSON tool outputs directly in the prompt. After a few tool calls, earlier results get compacted or truncated and answers become incorrect or inconsistent. I built Sift, a drop-in MCP gateway that stores tool outputs as local artifacts (filesystem blobs indexed in SQLite) and returns an `artifact_id` plus compact schema hints when responses are large or paginated. Instead of reasoning over full JSON in the prompt, the model runs a small Python query: def run(data, schema, params): return max(data, key=lambda x: x["magnitude"])["place"] Query code runs in a constrained subprocess (AST/import guards + timeout/memory caps). Only the computed result is returned to the model. Benchmark (Claude Sonnet 4.6, 103 questions across 12 datasets): - Baseline (raw JSON in prompt): 34/103 (33%), 10.7M input tokens - Sift (artifact + code query): 102/103 (99%), 489K input tokens Open benchmark + MIT code: https://ift.tt/glYUBET Install: pipx install sift-gateway sift-gateway init --from claude Works with Claude Code, Cursor, Windsurf, Zed, and VS Code. Existing MCP servers and tools require no changes. https://ift.tt/glYUBET March 5, 2026 at 08:53PM
Komentar
Posting Komentar