An ablation framework that idealizes agents in search, planning, or data analysis.
When an LLM agent fails at Exploratory QA over a data lake, which part of its runtime is to blame? An agent answers in three stages. SANA replaces one stage at a time with an oracle built from the task's ground truth — the accuracy it gains pinpoints the bottleneck.
Breaks the question into an ordered list of sub-questions.
oracle hands over the correct sub-questions directly.
Hunts a ~40M-file lake for the datasets each sub-question needs.
oracle returns only the datasets the task actually requires.
Writes and runs SQL/Python to compute each intermediate answer.
oracle executes the agent's intent with zero implementation bugs.
Swap any oracle for your own implementation and watch the delta move — see the Pipeline.
The paper link is a placeholder pending the arXiv release.
To appear at the VLDB 2026 Workshop on Systems for Data-centric Agents with Human-in-the-loop (DASHSys).
@inproceedings{sana2026,
title = {SANA: What Matters for QA Agents over Massive Data Lakes?},
author = {Wijaya, Austin Senna and Liu, Jiaxiang and Wang, Haonan and Wu, Eugene},
booktitle = {VLDB 2026 Workshop on Systems for Data-centric Agents with Human-in-the-loop (DASHSys)},
year = {2026}
}
Semantic-match gain from idealizing each component (baseline = first bar of each group).
Idealizing execution gives large gains on both benchmarks (up to +24.1%), even when sources are already found.
On LakeQA's ~40M-file lake, ideal search beats BM25 by +13–14%; on the small KramaBench it matters far less.
Agents produce near-gold decompositions (~78–82% match) yet only follow them ~28–57% of the time.