Outlier Tokens Drive Attention Patterns in Vision Transformers

Nagini S; Karnam Akhil; Mallupeddi Vamsi Krishna; Pathi Sairoop Teja; Swapnika Chowdary Thanikonda

doi:10.5269/bspm.82398

Outlier Tokens Drive Attention Patterns in Vision Transformers

Autores/as

Nagini S VNR Vignana Jyothi Institute of Engineering and Technology
Karnam Akhil
Mallupeddi Vamsi Krishna
Pathi Sairoop Teja
Swapnika Chowdary Thanikonda

DOI:

https://doi.org/10.5269/bspm.82398

Resumen

The Vision Transformers are widely used in computer vision, as they can capture global image information. However, there is a persistent problem in all these ViT models: a certain number of tokens in their feature representations show abnormally high values at the background level regions. These tokens capture
populated global information while losing essential local details. This leads to attention maps with sharp peaks
that do not correspond to meaningful parts of the image. This problem hugely affects the spatial understanding
of the model and impacts the performance of tasks needing accurate region-based reasoning. It shows up across supervised, self-supervised, and text-supervised settings and also within small ViT models, showing a clear
gap in current systems. This paper identifies this gap and proposes improved transformer-based models that
reduce the effect of these outlier tokens and help in maintaining proper spatial information. Enhancements
made specify the stabilizing of token behavior and supporting better attention distribution across the image
without relying heavily on background regions. Experimental results clearly show the reduction in abnormal
token activity with smoother and more meaningful attention maps. Improved models also showed better
performance in tasks dependent on spatial accuracy. The motive of this work is to make ViTs more reliable,
interpretable, and consistent. By improving spatial reasoning and avoiding misleading attention patterns, the
proposed models will support stronger and more trustworthy visual understanding in real-world applications.

Descargas

PDF (Inglés)

Publicado

2026-06-19

Número

Vol. 44 Núm. 17 (2026): Recent Trends in Mathematical Sciences and Technological Applications

Sección

Conf. Issue: Recent Trends in Mathematical Sciences and Technological Applic.

Licencia

Derechos de autor 2026 Boletim da Sociedade Paranaense de Matemática

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.

When the manuscript is accepted for publication, the authors agree automatically to transfer the copyright to the (SPM).

The journal utilize the Creative Common Attribution (CC-BY 4.0).

Cómo citar

S, N., Karnam Akhil, Mallupeddi Vamsi Krishna, Pathi Sairoop Teja, & Swapnika Chowdary Thanikonda. (2026). Outlier Tokens Drive Attention Patterns in Vision Transformers. Boletim Da Sociedade Paranaense De Matemática, 44(17), 1-32. https://doi.org/10.5269/bspm.82398

Descargar cita

Outlier Tokens Drive Attention Patterns in Vision Transformers

Autores/as

DOI:

Resumen

Descargas

Publicado

Número

Sección

Licencia

Cómo citar

Desarrollado por

Idioma

Información