This New Optimizer Called Lion Could Replace Adam as the Go-To for Training Neural Nets

Google Brain researchers have discovered a new optimization algorithm for training deep neural networks called Lion that outperforms the popular Adam optimizer on a variety of computer vision and natural language processing tasks.

In a paper published on arXiv, the researchers describe how they used evolutionary search techniques to explore a large space of possible programs to find Lion. While Adam and other adaptive optimizers like Adafactor and AdaGrad update each parameter separately based on its history, Lion takes a different approach.

What Makes Lion Different

The key to Lion is that it only tracks momentum, not second-order momentum statistics like Adam. This makes it more memory efficient. Lion then uses the sign of the momentum to calculate the update, which gives every parameter update the same magnitude.

This uniform update norm acts as a regularizer, helping the model generalize better. It also allows Lion to work well with larger batch sizes compared to Adam.

Outperforming Adam on Vision and Language Tasks

Experiments across computer vision and NLP tasks show Lion consistently matches or improves upon Adam:

  • Lion boosts the accuracy of Vision Transformer models on ImageNet classification by up to 2%

  • It reduces the pre-training compute on JFT-300M by up to 5x

  • On text-to-image generation, Lion improves the FID score and cuts training time by 2.3x

  • For language modeling, Lion provides similar perplexity to Adam but with up to 2x less compute

Lion also improves vision-language models. When used to train BASIC, a state-of-the-art contrastive vision-language model, Lion achieves 88.3% top-1 accuracy on ImageNet zero-shot classification and 91.1% with fine-tuning, surpassing prior SOTA by 2% and 0.1% respectively.

Simple Yet Powerful

Despite its simplicity, Lion consistently matches or outperforms the much more complex adaptive methods like Adam and Adafactor across models, datasets, and tasks. This demonstrates the power of automatically discovering algorithms rather than hand-engineering them.

The researchers do point out some limitations of Lion, like reduced gains when using small batch sizes or little regularization during training. Nonetheless, the strong empirical results suggest that Lion could become the new go-to optimizer for training neural networks.

The implementation of Lion is open-sourced on GitHub so anyone can try it out in their own projects. Just be sure to adjust the learning rate and weight decay accordingly. Lion promises to make neural net training more efficient and improve generalization, so it will be exciting to see if it gets widely adopted!