Back to Home

A Novel Approach to Dramatically Enhance LLM Long-Context Performance: Query-Guided ACRE

2025-08-08

Boosting Long-Context Information Seeking via Query-Guided Activation Refilling

Authors and Affiliations

  • Hongjin Qian: Beijing Academy of Artificial Intelligence
  • Zheng Liu: Beijing Academy of Artificial Intelligence
  • Peitian Zhang: Gaoling School of Artificial Intelligence, Renmin University of China
  • Zhicheng Dou: Gaoling School of Artificial Intelligence, Renmin University of China
  • Defu Lian: University of Science and Technology of China

Paper Summary

"Boosting Long-Context Information Seeking via Query-Guided Activation Refilling" addresses the challenge of efficiently processing long texts in information retrieval tasks using Large Language Models (LLMs).

This study overcomes the limitations of LLMs’ native context window and the computational burden from large-scale key-value (KV) activations.
Specifically, it proposes a novel Query-Guided Activation Refilling (ACRE) method to dynamically meet query-driven information needs in long-context information retrieval tasks.
By combining a two-layer KV cache with a query-guided refilling mechanism, it effectively leverages both global information and query-specific local details, resolving shortcomings of previous approaches.

Novelty and Contributions

Key novelties of this work include:

  • Novelty 1: Introduction of Query-Guided Activation Refilling (ACRE) to dynamically address query-based information demands in long-context scenarios.
  • Novelty 2: Integration of a two-layer KV cache (global L1 cache and local L2 cache) with a query-guided refilling mechanism for efficient information utilization.

Important contributions are:

  • Contribution 1: Achieved improved efficiency and performance in long-context information retrieval tasks.
  • Contribution 2: Enabled processing of contexts exceeding LLMs’ native context window, greatly enhancing processing capability.

Details of the Proposed Method

Core Idea

  • Two-layer KV Cache: Separately stores global information in an L1 KV cache and query-specific detailed local information in an L2 KV cache.
  • Query-Guided Refilling: Dynamically adds relevant entries from L2 to L1 cache based on the query, enabling query-specific information supplementation.

System Architecture / Algorithm Overview

  1. Build Two-layer KV Cache: Separate global information (L1) and detailed local information (L2) from the long context.
  2. Query-Guided Refilling: Update L1 cache dynamically by adding relevant information from L2 guided by the query.
  3. Answer Generation: Use the refilled KV cache as input for the LLM to generate responses.

Evaluation and Discussion

  • Analysis Method: Performance evaluated on 12 different information retrieval tasks.
  • Metrics: Accuracy, computation time, and memory usage.

Main Results

  • Result 1: ACRE demonstrated superior performance and efficiency over prior methods.
  • Result 2: Enabled processing beyond native context window length, significantly boosting capacity.

These findings show the method’s effectiveness in enhancing efficiency and performance for long-context information retrieval.

Applications and Business Outlook

Potential Applications

  • Improved efficiency in long-context information retrieval tasks such as LLM-based chatbots and QA systems.
  • Information extraction and analysis from large text corpora like papers, books, and news articles.
  • Efficient processing for NLP tasks requiring large-scale data, such as speech recognition and machine translation.

Business Prospects

  • Development of new products and services featuring advanced LLM-based information retrieval.
  • Cost reduction and shorter development cycles through system efficiency improvements.
  • Potential to drive major transformations in information retrieval and data analytics markets.

Additionally, ACRE is expected to play a significant role in specialized fields requiring expert knowledge, such as finance, law, and healthcare.

Notes

  • Large Language Models (LLMs): AI models trained on vast text data enabling human-like generation and question answering.
  • Key-Value (KV) Activations: Data representing tokens and contextual info used during LLM processing.
  • Context Window: Maximum text length an LLM can process at once.
  • Two-layer KV Cache: Combination of a global L1 cache and detailed L2 cache.
  • Query-Guided Refilling: Mechanism dynamically adding relevant information from L2 to L1 cache based on the query.

Related Posts

2025-08-08

This paper proposes a novel method, "Desiview," for automatically identifying desirable review comments (DRC) that lead to code changes in code reviews. By constructing a high-quality dataset using Desiview and fine-tuning and aligning the LLaMA model, we demonstrate a significant improvement in DRC generation capability. This approach is expected to greatly contribute to code review automation and software development support.

2025-08-08

This paper proposes a hybrid Top-k recommendation system that combines traditional recommendation methods with large language models (LLMs). Users are categorized as "active users" and "weak users," with LLMs employed to improve recommendation accuracy and fairness for the latter group. At the same time, the model controls LLM computational costs to ensure practical feasibility.

2025-08-09

Rork is an AI tool that generates native mobile apps from natural language descriptions and supports building and deploying them to the App Store and Google Play. In this article, we share the process and impressions from inputting the requirements of a home life management app into Rork and testing the flow from "generation → functional check → store preparation."