SANA

An ablation framework that idealizes agents in search, planning, or data analysis.

DAP Lab — Columbia University

How SANA works

When an LLM agent fails at Exploratory QA over a data lake, which part of its runtime is to blame? An agent answers in three stages. SANA replaces one stage at a time with an oracle built from the task's ground truth — the accuracy it gains pinpoints the bottleneck.

  1. 1

    Planning

    Breaks the question into an ordered list of sub-questions.

    oracle hands over the correct sub-questions directly.

  2. 2

    Search

    Hunts a ~40M-file lake for the datasets each sub-question needs.

    oracle returns only the datasets the task actually requires.

  3. 3

    Data Analysis

    Writes and runs SQL/Python to compute each intermediate answer.

    oracle executes the agent's intent with zero implementation bugs.

Swap any oracle for your own implementation and watch the delta move — see the Pipeline.


The paper link is a placeholder pending the arXiv release.


Citation

To appear at the VLDB 2026 Workshop on Systems for Data-centric Agents with Human-in-the-loop (DASHSys).

@inproceedings{sana2026,
  title     = {SANA: What Matters for QA Agents over Massive Data Lakes?},
  author    = {Wijaya, Austin Senna and Liu, Jiaxiang and Wang, Haonan and Wu, Eugene},
  booktitle = {VLDB 2026 Workshop on Systems for Data-centric Agents with Human-in-the-loop (DASHSys)},
  year      = {2026}
}

Ablation deltas

Semantic-match gain from idealizing each component (baseline = first bar of each group).


What we found

Data analysis is the consistent bottleneck

Idealizing execution gives large gains on both benchmarks (up to +24.1%), even when sources are already found.

Search dominates on the large lake

On LakeQA's ~40M-file lake, ideal search beats BM25 by +13–14%; on the small KramaBench it matters far less.

Plans are written, not followed

Agents produce near-gold decompositions (~78–82% match) yet only follow them ~28–57% of the time.