 Title：MAIBclass013:Mixed Transformer: Spectral and Space Transformer
 Date：10:00pm US East time, 04/22/2023
 Date：10:00am Beijing time, 04/23/2023
 Zoom ID：933 1613 9423
 Zoom PWD：416262
 Zoom: https://uwmadison.zoom.us/meeting/register/tJcuduprTIuGNda1MsF8PKyRQlnGn06TP2E

Momiao Xiong, Ph. D, Professor in Department of Biostatistics snd Data Science , University of Texas, School of Public Health. Dr. Xiong graduated from the Department of Statistics at the University of Georgia in 1993. From 1993 to 1995, Dr. Xiong was postdoctoral fellow at the University of Southern California working with Michael Waterman.

Research Interest： Causal Inference, Artificial Intelligence , Manifold Learning, Statistic Genetics and Bioinformatics .
Background
Theoretic foundation of transformer
View transformer as response of system
View transformer as nonlinear regression
Kernel transformer
Generalized Fourier Integral Theorems and their applications to transformer
Functional Model
Mixing MLP
变换器是自然语言处理，图象和视频分析的核心。它的性能决定了这些分析的好坏。如何发展性能优异的变换器仍然是现在人工智能研究的热门课题之一。变换器可以看作是一个动态系统对输入信号的反应，可以看作是对输入信号的滤波，可以看作是非线性回归分析，可以视作核分析，函数分析。推广的付里叶定理可应用导出各种形式的变换器。因此我们就要分几次介绍变换器。介绍它们在分类，描述图象，视频，图象切割，检查产品的质量，诊断疾病，寻找癌组织的位置，遗传流行病学，分子生物学，药物开发和因果分析中的应用。这方面的理论对于我们理解和应用现代人工智能是十分重要的。
变换器是现代人工智能中的重要组成部分，广泛应用于自然语言处理，图像和视频分析等领域。在自然语言处理中，变换器被用于机器翻译、文本摘要、问答系统等任务中。在图像和视频分析中，变换器则被用于目标检测、图像识别、视频跟踪等任务中。其性能的优劣决定了这些任务的质量，因此如何发展性能优异的变换器一直是人工智能研究的热门课题之一。
变换器可以被看作是一个动态系统对输入信号的反应，从而可以被看作是对输入信号的滤波、非线性回归分析、核分析、函数分析等。在现代人工智能的研究中，变换器可以被看作是一个具有许多层的神经网络结构，其中每一层都对输入数据进行特定的转换和处理，以提取出对任务有用的信息。通过使用深度学习等技术，可以训练这些变换器来对输入数据进行高效的处理和分析，从而实现各种任务的自动化处理。
在实际应用中，变换器被广泛应用于分类、描述图像、视频、图像切割、检查产品的质量、诊断疾病、寻找癌组织的位置、遗传流行病学、分子生物学、药物开发和因果分析等领域。这些应用展示了变换器在现代人工智能中的重要性和广泛性，也为未来的人工智能研究和应用提供了巨大的发展空间。
The transformer is a neural network architecture that has become increasingly popular in natural language processing tasks such as machine translation, language modeling, and text classification. Its theoretical foundation can be viewed from several perspectives.
One way to understand the transformer is to view it as the response of a system to an input signal. In this view, the transformer can be seen as a dynamic system that transforms an input sequence of vectors into an output sequence of vectors. Each step of the transformer involves passing the input through a series of nonlinear transformations, which are applied in parallel across all elements of the sequence. This allows the transformer to capture complex dependencies between elements of the input sequence.
Another way to view the transformer is as a nonlinear regression model. In this view, the transformer can be seen as a function that maps an input sequence to an output sequence. The transformer learns this mapping by minimizing a loss function that measures the discrepancy between the predicted and actual output sequences. This approach allows the transformer to capture complex patterns in the input sequence that may be difficult to model with simpler linear models.
The transformer can also be viewed from the perspective of kernel methods. In this view, the transformer can be seen as a kernel function that maps pairs of input sequences to a highdimensional feature space. This mapping allows the transformer to capture complex nonlinear relationships between elements of the input sequence.
Generalized Fourier Integral Theorems can also be applied to the transformer. These theorems allow us to express the transformer as a sum of basis functions, similar to the Fourier series expansion of periodic signals. This view of the transformer allows us to better understand its properties and behavior.
Finally, the transformer can be viewed as a functional model that operates on entire sequences rather than individual elements. This allows the transformer to capture higherlevel properties of the input sequence, such as its overall structure and context.
In addition, the transformer architecture often incorporates a mixing MLP (multilayer perceptron) that combines information from different parts of the input sequence. This helps the transformer to capture longrange dependencies and enables it to model more complex relationships between elements of the input sequence.