This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Status-Shock-880 on 2024-09-05 14:06:55+00:00.


Concerning information loss in transformers, this is an interesting alternative. Would love to hear what you think about it!

Masked Mixers for Language Generation and Retrieval