Sign Up

What can we learn from signals and systems in a transformer? Insights for probabilistic modeling and inference architecture

Speaker: Prashant G. Mehta

Coordinated Science Laboratory (CSL)

University of Illinois at Urbana-Champaign (UIUC)

Transformer is the name of the core algorithm inside a large language model (LLM). In the so-called decoder-only transformer, a finite sequence of symbols (tokens) is mapped to the conditional probability of the next token.

In this talk, Mehta situates the transformer within the broader history of the prediction theory. In the early 1940s, Wiener introduced a linear predictor, where the conditional expectation of future data is computed by linearly combining the past data. Mehta argues that a decoder-only transformer generalizes this idea and that a transformer is best understood as a causal nonlinear predictor. The technical results for causal nonlinear prediction are described for the special case where the data is discrete-valued and generated from an underlying hidden Markov model (HMM).

The aim of this on-going research is to bridge the classical nonlinear filtering theory with modern inference architectures inspired by transformers. The work is jointly carried out with Heng-Sheng Chang and Jin Won Kim, and the talk is based on the paper: https://www.arxiv.org/abs/2508.20211.

0 people are interested in this event