The "Android Bench," Google's benchmark for evaluating AI models in Android app development, has been updated, with OpenAI's GPT 5.4 and GPT 5.3 Codex now sharing the top ranking with Gemini. The benchmark evaluates models based on criteria such as compatibility with Jetpack Compose, use of Coroutines and Flows, and integration with Room and Hilt. The latest rankings are as follows:
1. GPT 5.4: 72.4%
2. Gemini 3.1 Pro Preview: 72.4%
3. GPT 5.3-Codex: 67.7%
4. Claude Opus 4.6: 66.6%
5. GPT-5.2 Codex: 62.5%
6. Claude Opus 4.5: 61.9%
7. Gemini 3 Pro Preview: 60.4%
8. Claude Sonnet 4.6: 58.4%
9. Claude Sonnet 4.5: 54.2%
10. Gemini 3 Flash Preview: 42%
11. Gemini 2.5 Flash: 16.1%
The rankings have not changed since the initial assessment in late February, and the latest models were evaluated in mid-March. The findings should be interpreted cautiously, as real-world performance may vary based on specific workflows and project requirements.