Demistifying the Confusing Dimensions in Attention Mechanism.
Published:
Many people understand the main ideas of how Transformers and attention work. But when it comes to the small details inside, like the size of the data and how each part changes it, they often get confused. This blog post will look at how the data really moves inside and what things actually happen in each part.