HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval

Park, Sungho; Yun, Joohyung; Lee, Jongwuk; Han, Wook-Shin

HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval

Sungho Park¹, Joohyung Yun¹, Jongwuk Lee², Wook-Shin Han^1*

¹POSTECH, Republic of Korea ²Sungkyunkwan University, Republic of Korea

ACL 2025

^*Corresponding author

Paper Code arXiv (Coming Soon)

                    
42.6%
AR@2 Improvement
vs. State-of-the-Art

39.9%
nDCG@50 Gain
on OTT-QA Benchmark

3-Stage
Multi-Granular Pipeline
Early + Late + LLM

Figure 1: Overview of HELIOS. The initial graph G_init is early-fused to generate a data graph G_d. (1) Edges are retrieved using the query to form a candidate bipartite subgraph G_c. (2) Query-relevant nodes are identified and expanded to form G_l. (3) LLM performs aggregation and passage verification for final refinement.

Abstract

Table-text retrieval aims to retrieve relevant tables and text to support open-domain question answering. Existing studies use either early or late fusion, but face limitations. Early fusion pre-aligns a table row with its associated passages, forming "stars," which often include irrelevant contexts and miss query-dependent relationships. Late fusion retrieves individual nodes, dynamically aligning them, but risks missing relevant contexts. Both approaches also struggle with advanced reasoning tasks, such as column-wise aggregation and multi-hop reasoning.

To address these issues, we propose HELIOS, which combines the strengths of both approaches through three key stages: (1) Edge-based bipartite subgraph retrieval identifies finer-grained edges between table segments and passages, effectively avoiding irrelevant contexts. (2) Query-relevant node expansion identifies the most promising nodes, dynamically retrieving relevant edges to grow the bipartite subgraph. (3) Star-based LLM refinement performs logical inference at the star graph level, supporting advanced reasoning tasks.

Experimental results show that HELIOS outperforms state-of-the-art models with significant improvements up to 42.6% and 39.9% in recall and nDCG, respectively, on the OTT-QA benchmark.

Motivation

Open-domain question answering over tables and text is challenging due to the need to bridge structured tables and unstructured passages. Existing methods face three key limitations:

Figure 2: Three cases where existing methods struggle: (a) Inadequate granularity of retrieval units leading to inaccurate results. (b) Entity linking cannot capture query-dependent relationships. (c) Inability to perform advanced reasoning such as table aggregation and multi-hop reasoning.

Inadequate Granularity

Early fusion includes query-irrelevant passages, while late fusion may retrieve incomplete information.

Missing Query-Dependent Links

Pre-defined entity links fail to capture relationships that depend on the specific query context.

Lack of Advanced Reasoning

Semantic similarity alone cannot handle column-wise aggregation or multi-hop reasoning tasks.

Proposed Method

HELIOS operates in three carefully designed stages, each with an optimal granularity for its specific purpose:

1

Edge-based Bipartite Subgraph Retrieval

This stage constructs and retrieves from a bipartite data graph through:

Early Fusion: Offline construction of edges between table segments and passages via entity linking
Edge-level Multi-vector Encoding: Using ColBERTv2 for fine-grained embeddings that capture richer information
Two-stage Retrieval: Initial retrieval followed by reranking for precise edge selection

This edge-level approach balances between eliminating irrelevant contexts (star graph issue) and avoiding partial information (node-based issue).

Bipartite graph with edges connecting table segments to passages

2

Query-relevant Node Expansion

Figure 3: The beam search procedure for query-relevant node expansion with beam width b=2.

This stage enhances the retrieved subgraph by dynamically finding additional query-relevant nodes:

Seed Node Selection: Identify top-b nodes most relevant to the query using all-to-all interaction reranking
Expanded Query Retrieval: Combine seed nodes with query to retrieve related nodes from the complete bipartite graph
Beam Search Optimization: Efficiently explore the search space with controlled beam width

3

Star-based LLM Refinement

Figure 4: Star-based LLM refinement with column-wise aggregation and passage verification.

This stage leverages LLM reasoning for advanced inference beyond semantic similarity:

Column-wise Aggregation

Restores original tables and performs aggregation operations (e.g., finding "most recent" or "highest") using LLM reasoning

Passage Verification

Decomposes the graph into star graphs and uses LLM to verify passage relevance, reducing hallucination risks

Experimental Results

Retrieval Accuracy on OTT-QA Development Set

Model	AR@2	AR@5	AR@10	AR@20	AR@50	nDCG@50
OTTeR	31.4	49.7	62.0	71.8	82.0	25.9
DoTTeR	31.5	51.0	61.5	71.9	80.8	26.7
CORE	35.3	50.7	63.1	74.5	83.1	25.4
COS	44.4	61.6	70.8	79.5	87.8	33.6
COS w/ ColBERT & bge	49.6	68.2	78.7	85.0	91.7	36.5
HELIOS (Ours)	63.3	76.7	85.0	90.4	94.2	47.0

HELIOS consistently outperforms all competitors across all metrics, achieving up to 42.6% improvement in AR@2 over the previous state-of-the-art COS model.

End-to-End QA Accuracy on OTT-QA

Model	Dev Set		Test Set
Model	EM	F1	EM	F1
OTTeR	37.1	42.8	37.3	43.1
DoTTeR	37.8	43.9	35.9	42.0
CORE	49.0	55.7	47.3	54.1
COS	56.9	63.2	54.9	61.5
HELIOS (Ours)	59.3	65.8	57.0	64.3

HELIOS achieves 4.2% and 4.1% improvements in EM and F1 on the dev set, and 3.8% and 4.6% on the test set compared to COS, using a Fusion-in-Encoder (FiE) reader with 50 edges.

Performance Across Different Readers

Exact Match (EM) Score

F1 Score

HELIOS improves performance across all readers, with an average EM improvement of 7.5% and F1 improvement of 6.6% compared to enhanced COS.

Ablation Study

w/o Query-relevant Node Expansion (QNE)

-2.1% Avg AR@k

-4.2% nDCG@50

-4.2% EM

QNE is crucial for generating query-relevant edges missed by offline entity linking.

w/o Star-based LLM Refinement (SLR)

-5.5% AR@2

-2.0% AR@5

-1.1% nDCG@50

SLR helps accurately identify query-relevant nodes in complex queries requiring logical inference, especially at low k values.

Impact of Retrieval Granularity

Node-level

23.8 nDCG@50

Individual nodes lack context

Star Graph

28.5 nDCG@50

Includes irrelevant passages

Edge-level (Ours)

34.2 nDCG@50

Optimal balance of context

Qualitative Analysis

Figure 5: Qualitative analysis of Star-based LLM Refinement results, demonstrating how the LLM performs passage verification and column-wise aggregation to identify query-relevant information.

BibTeX

@inproceedings{park-etal-2025-helios,
    title = "{HELIOS}: Harmonizing Early Fusion, Late Fusion, and {LLM} Reasoning for Multi-Granular Table-Text Retrieval",
    author = "Park, Sungho  and
      Yun, Joohyung  and
      Lee, Jongwuk  and
      Han, Wook-Shin",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.1559/",
    doi = "10.18653/v1/2025.acl-long.1559",
    pages = "32424--32444",
    ISBN = "979-8-89176-251-0",
    abstract = "Table-text retrieval aims to retrieve relevant tables and text to support open-domain question answering. Existing studies use either early or late fusion, but face limitations. Early fusion pre-aligns a table row with its associated passages, forming ``stars,'' which often include irrelevant contexts and miss query-dependent relationships. Late fusion retrieves individual nodes, dynamically aligning them, but it risks missing relevant contexts. Both approaches also struggle with advanced reasoning tasks, such as column-wise aggregation and multi-hop reasoning. To address these issues, we propose HELIOS, which combines the strengths of both approaches. First, the edge-based bipartite subgraph retrieval identifies finer-grained edges between table segments and passages, effectively avoiding the inclusion of irrelevant contexts. Then, the query-relevant node expansion identifies the most promising nodes, dynamically retrieving relevant edges to grow the bipartite subgraph, minimizing the risk of missing important contexts. Lastly, the star-based LLM refinement performs logical inference at the star graph level rather than the bipartite subgraph, supporting advanced reasoning tasks. Experimental results show that HELIOS outperforms state-of-the-art models with a significant improvement up to 42.6{\%} and 39.9{\%} in recall and nDCG, respectively, on the OTT-QA benchmark."
}