Outlier Tokens Drive Attention Patterns in Vision Transformers

Nagini S; Karnam Akhil; Mallupeddi Vamsi Krishna; Pathi Sairoop Teja; Swapnika Chowdary Thanikonda

doi:10.5269/bspm.82398

Outlier Tokens Drive Attention Patterns in Vision Transformers

Autores

Nagini S VNR Vignana Jyothi Institute of Engineering and Technology
Karnam Akhil
Mallupeddi Vamsi Krishna
Pathi Sairoop Teja
Swapnika Chowdary Thanikonda

DOI:

https://doi.org/10.5269/bspm.82398

Resumo

The Vision Transformers are widely used in computer vision, as they can capture global image information. However, there is a persistent problem in all these ViT models: a certain number of tokens in their feature representations show abnormally high values at the background level regions. These tokens capture
populated global information while losing essential local details. This leads to attention maps with sharp peaks
that do not correspond to meaningful parts of the image. This problem hugely affects the spatial understanding
of the model and impacts the performance of tasks needing accurate region-based reasoning. It shows up across supervised, self-supervised, and text-supervised settings and also within small ViT models, showing a clear
gap in current systems. This paper identifies this gap and proposes improved transformer-based models that
reduce the effect of these outlier tokens and help in maintaining proper spatial information. Enhancements
made specify the stabilizing of token behavior and supporting better attention distribution across the image
without relying heavily on background regions. Experimental results clearly show the reduction in abnormal
token activity with smoother and more meaningful attention maps. Improved models also showed better
performance in tasks dependent on spatial accuracy. The motive of this work is to make ViTs more reliable,
interpretable, and consistent. By improving spatial reasoning and avoiding misleading attention patterns, the
proposed models will support stronger and more trustworthy visual understanding in real-world applications.

Downloads

PDF (Inglês)

Publicado

2026-06-19

Edição

v. 44 n. 17 (2026): Recent Trends in Mathematical Sciences and Technological Applications

Seção

Conf. Issue: Recent Trends in Mathematical Sciences and Technological Applic.

Licença

Este trabalho está licenciado sob uma licença Creative Commons Attribution 4.0 International License.

When the manuscript is accepted for publication, the authors agree automatically to transfer the copyright to the (SPM).

The journal utilize the Creative Common Attribution (CC-BY 4.0).

Como Citar

S, N., Karnam Akhil, Mallupeddi Vamsi Krishna, Pathi Sairoop Teja, & Swapnika Chowdary Thanikonda. (2026). Outlier Tokens Drive Attention Patterns in Vision Transformers. Boletim Da Sociedade Paranaense De Matemática, 44(17), 1-32. https://doi.org/10.5269/bspm.82398

Baixar Citação

Outlier Tokens Drive Attention Patterns in Vision Transformers

Autores

DOI:

Resumo

Downloads

Publicado

Edição

Seção

Licença

Como Citar

Desenvolvido por

Idioma

Informações