Choose timezone
Your profile timezone:
Title: "Transformers as a dynamical system"
We start by noting that the repeated filtering of information through the attention layers of a transformer can be interpreted as a discrete dynamical system. The continuous limit of this system satisfies a differential equation, which can be interpreted under appropriate assumptions as a gradient flow. After explaining this, we will see how this gradient flow has a tendency to form (often metastable) clusters. This perhaps offers a partial explanation of why a chatbot can extract "meaning" from input text.
https://sites.google.com/view/informal-math-ai-bonn/home