Retentive Network: A Successor to Transformer for Large Language Models

noneabove1182@sh.itjust.works · 2 years ago

Retentive Network: A Successor to Transformer for Large Language Models

rufus · 2 years ago

Nice. Seems like it outperforms everything prior in every way possible. Now we have to see if that still holds true at scale. I’d love to see that trained with a trillion tokens and more than 7b parameters, compared to an fully optimized and quantized LLaMA implementation.