Boosting Reasoning in Large Multimodal Models
with Activation Replay

ArXiv 2025

Yun Xing
Nanyang Technological University
Xiaobin Hu
National University of Singapore
Qingdong He
Tencent Youtu Lab
Jiangning Zhang
Zhejiang University
Shuicheng Yan
National University of Singapore
Shijian Lu
Nanyang Technological University
Yu-Gang Jiang
Fudan University

Abstract

Recently, Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach to incentivizing reasoning capability in Large Multimodal Models (LMMs), while the underlying mechanisms behind this post-training paradigm are poorly understood. We begin by exploring how input activations are affected by RLVR through the perspective of logit lens. Our systematic investigations across multiple post-trained LMMs suggest that RLVR shifts low-entropy activations unexpectedly, while high-entropy ones are less affected. We further demonstrate that such phenomena are associated with LMM reasoning by controlled experiments, suggesting a potentially beneficial role of modulating low-entropy activations.

To this end, we propose Activation Replay, a novel simple yet effective training-free approach that boosts multimodal reasoning of post-trained LMMs without requiring expensive policy optimization. Our design involves manipulation of visual tokens at test time, replaying low-entropy activations from the input context of base LMMs to regulating the RLVR counterparts. Activation Replay triggers better reasoning across diverse scenarios, including mathematics, o3-like visual agents, and video reasoning. We further show that Activation Replay boosts Pass@K and mitigates narrower reasoning coverage of RLVR. Our design is compared against alternative choices, such as replaying high-entropy activations instead of low-entropy ones, or direct cross-model intervention instead of manipulating input tokens, demonstrating the superiority of our implementation. Codes will be made publicly available.

Method Overview

How Input Activations are Affected

From left to right in subplots are low to high base LMM entropy. The shifts of KL divergence is normalized layerwise for illustration purpose. Brighter color suggests relatively more severe shifts on KL divergence.

Existing Benchmark 1

Perturbation Study

We synthesize variations over input activations by interrupting inputs with random noises. We measure perplexity from reasoning LMNs over this response, while lower perplexity indicates higher probability. Four cases from different math domains are evaluated. These cases suggest that when the KL divergence of low-entropy activations shifts less drastically from the base (left in subplots of Figure), the perplexity of the correct responses decreases and that of the incorrect responses increases, encouraging the LMMs to correct output.

Statistics
Statistics

Intervention Study

To further dissect the role of low- and high-entropy activations from base LMMs, we perform a striaghtforward cross-model intervention study, forcing RLVR post-trained LMMs to reason over a combination of base and RLVR activations. We try two combinations, low-entropy activations from base and high from RLVR; high-entropy activations from base and low- from RLVR.

Evaluation Results

Overview of Activation Replay

Activation Replay starts with feeding the multimodal inputs to base LMMs and obtain low- entropy input activations. For inputs to the RLVR LMM, our approach first adds zero-initialized learnable tokens to visual tokens. Then we manipulate these learnable tokens to minimize the token-level KL divergence between low-entropy activations from base LMMs and those from RLVR post-trained counterparts.

Fine-tuning Results

Main Results

Fine-tuning Results

Pass@K Results

Fine-tuning Results

Qualitative Cases

Fine-tuning Results
Fine-tuning Results
Fine-tuning Results
Fine-tuning Results

Citation

Consider citing us if you find this project is helpful:

@article{xing2025boosting,

title={Boosting Reasoning in Large Multimodal Models via Activation Replay},

author={Yun Xing and Xiaobin Hu and Qingdong He and Jiangning Zhang and Shuicheng Yan and Shijian Lu and Yu-Gang Jiang},

journal={arXiv preprint arXiv:2511.19972},

year={2025}

}