- Title：MAIB-class-013:Mixed Transformer: Spectral and Space Transformer
- Date：10:00pm US East time, 04/22/2023
- Date：10:00am Beijing time, 04/23/2023
- Zoom ID：933 1613 9423
- Zoom PWD：416262
- Zoom: https://uwmadison.zoom.us/meeting/register/tJcudu-prTIuGNda1MsF8PKyRQlnGn06TP2E
Momiao Xiong, Ph. D, Professor in Department of Biostatistics snd Data Science , University of Texas, School of Public Health. Dr. Xiong graduated from the Department of Statistics at the University of Georgia in 1993. From 1993 to 1995, Dr. Xiong was postdoctoral fellow at the University of Southern California working with Michael Waterman.
Research Interest： Causal Inference, Artificial Intelligence , Manifold Learning, Statistic Genetics and Bioinformatics .
Theoretic foundation of transformer
View transformer as response of system
View transformer as nonlinear regression
Generalized Fourier Integral Theorems and their applications to transformer
The transformer is a neural network architecture that has become increasingly popular in natural language processing tasks such as machine translation, language modeling, and text classification. Its theoretical foundation can be viewed from several perspectives.
One way to understand the transformer is to view it as the response of a system to an input signal. In this view, the transformer can be seen as a dynamic system that transforms an input sequence of vectors into an output sequence of vectors. Each step of the transformer involves passing the input through a series of nonlinear transformations, which are applied in parallel across all elements of the sequence. This allows the transformer to capture complex dependencies between elements of the input sequence.
Another way to view the transformer is as a nonlinear regression model. In this view, the transformer can be seen as a function that maps an input sequence to an output sequence. The transformer learns this mapping by minimizing a loss function that measures the discrepancy between the predicted and actual output sequences. This approach allows the transformer to capture complex patterns in the input sequence that may be difficult to model with simpler linear models.
The transformer can also be viewed from the perspective of kernel methods. In this view, the transformer can be seen as a kernel function that maps pairs of input sequences to a high-dimensional feature space. This mapping allows the transformer to capture complex nonlinear relationships between elements of the input sequence.
Generalized Fourier Integral Theorems can also be applied to the transformer. These theorems allow us to express the transformer as a sum of basis functions, similar to the Fourier series expansion of periodic signals. This view of the transformer allows us to better understand its properties and behavior.
Finally, the transformer can be viewed as a functional model that operates on entire sequences rather than individual elements. This allows the transformer to capture higher-level properties of the input sequence, such as its overall structure and context.
In addition, the transformer architecture often incorporates a mixing MLP (multilayer perceptron) that combines information from different parts of the input sequence. This helps the transformer to capture long-range dependencies and enables it to model more complex relationships between elements of the input sequence.