Loading Events

Communications and Signal Processing Seminar

Transformer Meets Nonparametric Kernel Regression

Tan Minh NguyenAssistant ProfessorNational University of Singapore
WHERE:
1008 EECS Building
SHARE:

Abstract: Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision. This dot-product self-attention computes attention weights among the input tokens using Euclidean distance, which makes the model prone to representation collapse and vulnerable to contaminated samples. In this talk, we interpret attention in transformers as a nonparametric kernel regression, which uses an isotropic Gaussian kernel for density estimation. From the non-parametric regression perspective, we show that spherical invariance in the isotropic Gaussian kernel causes the estimator to suffer provably higher variance, which causes both representation collapse and the model’s non-robustness. We then propose Elliptical Attention, a new class of self-attention that constructs hyper-ellipsoidal, rather than hyper-spherical, neighborhoods around the attention queries. The key idea is to stretch the neighborhoods around the queries to upweight attention keys in directions of high importance, allowing the self-attention mechanism to learn higher-quality contextual representations that prevent representation collapse while simultaneously exhibiting stronger robustness.Building further on the nonparametric regression perspective of attention, we develop the FourierFormer, another new class of transformers in which the dot-product kernels are replaced by the novel generalized Fourier integral kernels. Different from the dot-product kernels, where we need to choose a good covariance matrix to capture the dependency of the features of data, the generalized Fourier integral kernels can automatically capture such dependency and remove the need to tune the covariance matrix.

References
[1] Stefan Nielsen, Laziz Abdullaev, Rachel S.Y. Teo, Tan M. Nguyen. “Elliptical Attention”. Conference on Neural Information Processing Systems (NeurIPS), 2024.
[2] Tan M. Nguyen, Minh Pham, Tam Nguyen, Khai Nguyen, Stanley J. Osher, Nhat Ho. “FourierFormer: Transformer Meets Generalized Fourier Integral Theorem”. Conference on Neural Information Processing Systems (NeurIPS), 2022.

Bio: Dr. Tan Nguyen is currently an Assistant Professor of Mathematics (Presidential Young Professor) at the National University of Singapore (NUS). Before joining NUS, he was a postdoctoral scholar in the Department of Mathematics at the University of California, Los Angeles, working with Dr. Stanley J. Osher. He obtained his Ph.D. in Machine Learning from Rice University, where he was advised by Dr. Richard G. Baraniuk. Dr. Nguyen is an organizer of the 1st Workshop on Integration of Deep Neural Models and Differential Equations at ICLR 2020. He also had two awesome long internships with Amazon AI and NVIDIA Research. He is the recipient of the prestigious Computing Innovation Postdoctoral Fellowship (CIFellows) from the Computing Research Association (CRA), the NSF Graduate Research Fellowship, and the IGERT Neuroengineering Traineeship. He received his M.S. and B.S. in Electrical and Computer Engineering from Rice University in May 2018 and May 2014, respectively.

*** The event will take place in a hybrid format. The location for in-person attendance will be room 1008 EECS. Attendance will also be available via Zoom.

Join Zoom Meeting: https://umich.zoom.us/j/93679028340

Meeting ID: 936 7902 8340

Passcode: XXX (Will be sent via email to attendees)

Zoom Passcode information is available upon request to Kristi Rieger ([email protected]).

See full seminar by Assistant Professor Tan Minh Nguyen from National University of Singapore.

Faculty Host

Qing QuAssistant ProfessorElectrical Engineering and Computer Science