LCP

Multitasking memory

Machine learning

The abilities and power of a type of transformer model with memory are greatly improved by learning several key tasks at once during training.

Mastering long-context multi-task reasoning with transformers and recurrent memory

Optical Memory and Neural Networks 33, 466 (2025)

A. Bulatov, Y. Kuratov, M. Burtsev

Recent advancements have significantly improved the skills and performance of language models, but have also increased computational demands due to the increasing number of parameters and the quadratic complexity of the attention mechanism. As context sizes expand into millions of tokens, making long-context processing more accessible and efficient becomes a critical challenge. Furthermore, modern benchmarks such as BABILong [1] underscore the inefficiency of even the most powerful LLMs in long context reasoning. In this paper, we employ finetuning and multi-task learning to train a model capable of mastering multiple BABILong long-context reasoning skills. We demonstrate that even models with fewer than 140 million parameters can outperform much larger counterparts by learning multiple essential tasks simultaneously. By conditioning Recurrent Memory Transformer [2] on task description, we achieve state-of-the-art results on multi-task BABILong QA1–QA5 set for up to 32k tokens. The proposed model also shows generalization abilities to new lengths and tasks, along with increased robustness to input perturbations.

Optical Memory and Neural Networks 33, 466 (2025)

A. Bulatov, Y. Kuratov, M. Burtsev