.::  HOME | NYCU | EMAIL | Sitemap | 中文版 ::.
AM LOGO NYCU HOME
Latest news About us Faculty Research Admission Academics Student area Alumni F.A.Q.

  • Department News
  • Student Council
  • Others

  • Colloquium
  • Lectures
  • Conference / Workshop

    • Calculus Education
    • Division of Curriculum
    • Academic Webs


Colloquium / Seminars

  • Topic:Fast Multipole Attention for Transformer Neural Networks

  • Speaker:Prof. Hans De Sterck
        (University of Waterloo)

  • Date time:

  • Venue:

  • Abstract:

    Abstract. Transformer-based machine learning models have achieved state-of-the-art performance in many areas. However, the quadratic complexity of the self-attention mechanism in Transformer models with respect to the input length hinders the applicability of Transformer-based models to long sequences or large images. To address this, we present Fast Multipole Attention (FMA), a new attention mechanism that uses a divide-and-conquer strategy to reduce the time and memory complexity of attention from $O(n^2)$ to $O(n \log n)$ or $O(n)$, while retaining a global receptive field. The hierarchical approach groups queries, keys, and values into $O(\log n)$ levels of resolution, where groups at greater distances are increasingly larger in size and the weights to compute group quantities are learned. As such, the interaction between tokens far from each other is considered in lower resolution in an efficient hierarchical manner. This multi-level divide-and-conquer strategy is inspired by fast summation methods from n-body physics and the Fast Multipole Method. We perform evaluation on language modeling and image processing tasks and compare our FMA model with other efficient attention variants on medium-size datasets. We find empirically that the Fast Multipole Transformer outperforms other efficient transformers in terms of memory size and accuracy. For large language models, the FMA mechanism has the potential to enable greater sequence lengths, taking the full context into account in an efficient, naturally hierarchical manner during training and when generating long sequences.

  • Download:演講1150609.png



返回go back





  •      
  •      
  •      
  •      
  • 中文|
  • Contact|
  • Go Top

Department of Applied Mathematics National Yang Ming Chiao Tung University copyright © 2026

2F, Science Bld. 1, 1001 Ta Hsueh Road, Hsinchu, Taiwan 30010, ROC

TEL +886-3-572-2088 TEL +886-3-571-2121 ext. 56401 FAX +886-3-572-4679

Last updated:2026-02-12 05:12:26 PM (CST)