: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval

1POSTECH, Republic of Korea 2Sungkyunkwan University, Republic of Korea
ACL 2025
*Corresponding author
42.6%
AR@2 Improvement
vs. State-of-the-Art
39.9%
nDCG@50 Gain
on OTT-QA Benchmark
3-Stage
Multi-Granular Pipeline
Early + Late + LLM
HELIOS System Overview

Figure 1: Overview of HELIOS. The initial graph Ginit is early-fused to generate a data graph Gd. (1) Edges are retrieved using the query to form a candidate bipartite subgraph Gc. (2) Query-relevant nodes are identified and expanded to form Gl. (3) LLM performs aggregation and passage verification for final refinement.

Abstract

Table-text retrieval aims to retrieve relevant tables and text to support open-domain question answering. Existing studies use either early or late fusion, but face limitations. Early fusion pre-aligns a table row with its associated passages, forming "stars," which often include irrelevant contexts and miss query-dependent relationships. Late fusion retrieves individual nodes, dynamically aligning them, but risks missing relevant contexts. Both approaches also struggle with advanced reasoning tasks, such as column-wise aggregation and multi-hop reasoning.

To address these issues, we propose HELIOS, which combines the strengths of both approaches through three key stages: (1) Edge-based bipartite subgraph retrieval identifies finer-grained edges between table segments and passages, effectively avoiding irrelevant contexts. (2) Query-relevant node expansion identifies the most promising nodes, dynamically retrieving relevant edges to grow the bipartite subgraph. (3) Star-based LLM refinement performs logical inference at the star graph level, supporting advanced reasoning tasks.

Experimental results show that HELIOS outperforms state-of-the-art models with significant improvements up to 42.6% and 39.9% in recall and nDCG, respectively, on the OTT-QA benchmark.

Motivation

Open-domain question answering over tables and text is challenging due to the need to bridge structured tables and unstructured passages. Existing methods face three key limitations:

Motivation examples showing limitations of existing methods

Figure 2: Three cases where existing methods struggle: (a) Inadequate granularity of retrieval units leading to inaccurate results. (b) Entity linking cannot capture query-dependent relationships. (c) Inability to perform advanced reasoning such as table aggregation and multi-hop reasoning.

Inadequate Granularity

Early fusion includes query-irrelevant passages, while late fusion may retrieve incomplete information.

Missing Query-Dependent Links

Pre-defined entity links fail to capture relationships that depend on the specific query context.

Lack of Advanced Reasoning

Semantic similarity alone cannot handle column-wise aggregation or multi-hop reasoning tasks.

Proposed Method

HELIOS operates in three carefully designed stages, each with an optimal granularity for its specific purpose:

1

Edge-based Bipartite Subgraph Retrieval

This stage constructs and retrieves from a bipartite data graph through:

  • Early Fusion: Offline construction of edges between table segments and passages via entity linking
  • Edge-level Multi-vector Encoding: Using ColBERTv2 for fine-grained embeddings that capture richer information
  • Two-stage Retrieval: Initial retrieval followed by reranking for precise edge selection

This edge-level approach balances between eliminating irrelevant contexts (star graph issue) and avoiding partial information (node-based issue).

S1 S2 S3 P1 P2 P3 Table Segments Passages

Bipartite graph with edges connecting table segments to passages

2

Query-relevant Node Expansion

Query-relevant Node Expansion process

Figure 3: The beam search procedure for query-relevant node expansion with beam width b=2.

This stage enhances the retrieved subgraph by dynamically finding additional query-relevant nodes:

  • Seed Node Selection: Identify top-b nodes most relevant to the query using all-to-all interaction reranking
  • Expanded Query Retrieval: Combine seed nodes with query to retrieve related nodes from the complete bipartite graph
  • Beam Search Optimization: Efficiently explore the search space with controlled beam width
3

Star-based LLM Refinement

Star-based LLM Refinement process

Figure 4: Star-based LLM refinement with column-wise aggregation and passage verification.

This stage leverages LLM reasoning for advanced inference beyond semantic similarity:

Column-wise Aggregation

Restores original tables and performs aggregation operations (e.g., finding "most recent" or "highest") using LLM reasoning

Passage Verification

Decomposes the graph into star graphs and uses LLM to verify passage relevance, reducing hallucination risks

Experimental Results

Retrieval Accuracy on OTT-QA Development Set

Model AR@2 AR@5 AR@10 AR@20 AR@50 nDCG@50
OTTeR 31.4 49.7 62.0 71.8 82.0 25.9
DoTTeR 31.5 51.0 61.5 71.9 80.8 26.7
CORE 35.3 50.7 63.1 74.5 83.1 25.4
COS 44.4 61.6 70.8 79.5 87.8 33.6
COS w/ ColBERT & bge 49.6 68.2 78.7 85.0 91.7 36.5
HELIOS (Ours) 63.3 76.7 85.0 90.4 94.2 47.0

HELIOS consistently outperforms all competitors across all metrics, achieving up to 42.6% improvement in AR@2 over the previous state-of-the-art COS model.

End-to-End QA Accuracy on OTT-QA

Model Dev Set Test Set
EM F1 EM F1
OTTeR 37.1 42.8 37.3 43.1
DoTTeR 37.8 43.9 35.9 42.0
CORE 49.0 55.7 47.3 54.1
COS 56.9 63.2 54.9 61.5
HELIOS (Ours) 59.3 65.8 57.0 64.3

HELIOS achieves 4.2% and 4.1% improvements in EM and F1 on the dev set, and 3.8% and 4.6% on the test set compared to COS, using a Fusion-in-Encoder (FiE) reader with 50 edges.

Performance Across Different Readers

EM Score comparison across readers

Exact Match (EM) Score

F1 Score comparison across readers

F1 Score

HELIOS improves performance across all readers, with an average EM improvement of 7.5% and F1 improvement of 6.6% compared to enhanced COS.

Ablation Study

w/o Query-relevant Node Expansion (QNE)
-2.1% Avg AR@k
-4.2% nDCG@50
-4.2% EM

QNE is crucial for generating query-relevant edges missed by offline entity linking.

w/o Star-based LLM Refinement (SLR)
-5.5% AR@2
-2.0% AR@5
-1.1% nDCG@50

SLR helps accurately identify query-relevant nodes in complex queries requiring logical inference, especially at low k values.

Impact of Retrieval Granularity

Node-level
23.8 nDCG@50

Individual nodes lack context

Star Graph
28.5 nDCG@50

Includes irrelevant passages

Edge-level (Ours)
34.2 nDCG@50

Optimal balance of context

Qualitative Analysis

Qualitative analysis of Star-based LLM Refinement

Figure 5: Qualitative analysis of Star-based LLM Refinement results, demonstrating how the LLM performs passage verification and column-wise aggregation to identify query-relevant information.

BibTeX

@inproceedings{park-etal-2025-helios,
    title = "{HELIOS}: Harmonizing Early Fusion, Late Fusion, and {LLM} Reasoning for Multi-Granular Table-Text Retrieval",
    author = "Park, Sungho  and
      Yun, Joohyung  and
      Lee, Jongwuk  and
      Han, Wook-Shin",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.1559/",
    doi = "10.18653/v1/2025.acl-long.1559",
    pages = "32424--32444",
    ISBN = "979-8-89176-251-0",
    abstract = "Table-text retrieval aims to retrieve relevant tables and text to support open-domain question answering. Existing studies use either early or late fusion, but face limitations. Early fusion pre-aligns a table row with its associated passages, forming ``stars,'' which often include irrelevant contexts and miss query-dependent relationships. Late fusion retrieves individual nodes, dynamically aligning them, but it risks missing relevant contexts. Both approaches also struggle with advanced reasoning tasks, such as column-wise aggregation and multi-hop reasoning. To address these issues, we propose HELIOS, which combines the strengths of both approaches. First, the edge-based bipartite subgraph retrieval identifies finer-grained edges between table segments and passages, effectively avoiding the inclusion of irrelevant contexts. Then, the query-relevant node expansion identifies the most promising nodes, dynamically retrieving relevant edges to grow the bipartite subgraph, minimizing the risk of missing important contexts. Lastly, the star-based LLM refinement performs logical inference at the star graph level rather than the bipartite subgraph, supporting advanced reasoning tasks. Experimental results show that HELIOS outperforms state-of-the-art models with a significant improvement up to 42.6{\%} and 39.9{\%} in recall and nDCG, respectively, on the OTT-QA benchmark."
}