06 May 2023

  • Momiao Xiong, Ph. D, Professor in Department of Biostatistics snd Data Science , University of Texas, School of Public Health. Dr. Xiong graduated from the Department of Statistics at the University of Georgia in 1993. From 1993 to 1995, Dr. Xiong was postdoctoral fellow at the University of Southern California working with Michael Waterman.

  • Research Interest: Causal Inference, Artificial Intelligence , Manifold Learning, Statistic Genetics and Bioinformatics .


Develop genotype language model as a fundemental model for genetic studies of complex diseases. Generative AI raises a great challenge in both philosophy and practice “on a scale not experienced since the beginning of the Enlightenment” Now AI-powered sequencers were capable of sequencing whole-genome at $100 per individual12, which allows generating a large amount of sequence data. An exponential growth of DNA and protein sequence data is paving the way to develop DNA and protein language models for genomics and biomedicine DNA and protein sequences contain rich information about their evolution, fitness, protein structure and stability, mutation semantics and mechanism of disease.

Information about biological properties of the sequences are encoded in the representations. The representations can be used for association and causal analysis of genetic variants, including QTL, and eQTL. One limitation of fundemental models is lack of hyhpothesis testing which lead to untranspanic and unexplainable results. To overcome these limitations, I will first develop a general framework for hypothesis test theory in aritificial intelligence in general and in fundemental models in special. I will view the transformer as a universe approximation to function from sequence to sequence and use nonlinear testing theory instatistics to define null hypothesis, test statistics and derive their distribtuion. The developed testing theory is applied to genome-wide association studies.



工智能在最近的十年取得了巨大的进步,以至于有些科学家主要从人工智能的負面方面来评价人工智能对于现代科学研究的影响。不透明、不可靠和欠解释性是他们诟病人工智能的主要论据之一。人工智能研究的主要工具之一是预测。正是预测导致了上述人工智能所具有的常为人们批评的缺点。预测实际上是计算事件发生的概率。事件包含了很多因素。有些因素起作用,有些因素不起作用。因为在许多情况下,神经网络是一个黑箱。它一般没有,在许多情况下也不能识别出那些因素对预测起了重要的作用。在统计学中另一与预测同样重要的是假设检验。Lehmann 为统计学的研究生写了两本书,第一本是估计,第二本就是假设检验。假设检验也是费歇为统计学所奠定的基石之一。假设检验就是识别导致事件发生的因素。在经典统计学里,假设检验的理论都是在欧氏空间中进行的。


blog comments powered by Disqus