Android Bench

AppWizard
April 9, 2026
The "Android Bench," Google's benchmark for evaluating AI models in Android app development, has been updated, with OpenAI's GPT 5.4 and GPT 5.3 Codex now sharing the top ranking with Gemini. The benchmark evaluates models based on criteria such as compatibility with Jetpack Compose, use of Coroutines and Flows, and integration with Room and Hilt. The latest rankings are as follows: 1. GPT 5.4: 72.4% 2. Gemini 3.1 Pro Preview: 72.4% 3. GPT 5.3-Codex: 67.7% 4. Claude Opus 4.6: 66.6% 5. GPT-5.2 Codex: 62.5% 6. Claude Opus 4.5: 61.9% 7. Gemini 3 Pro Preview: 60.4% 8. Claude Sonnet 4.6: 58.4% 9. Claude Sonnet 4.5: 54.2% 10. Gemini 3 Flash Preview: 42% 11. Gemini 2.5 Flash: 16.1% The rankings have not changed since the initial assessment in late February, and the latest models were evaluated in mid-March. The findings should be interpreted cautiously, as real-world performance may vary based on specific workflows and project requirements.
AppWizard
March 6, 2026
Google has introduced Android Bench, a tool for assessing AI model performance in Android app development. The top performer is Gemini 3.1 Pro, scoring 72.2%, followed by Claude Opus 4.6 at 66.6% and GPT 5.2 Codex at 62.5%. The benchmark evaluates models through real-world Android coding challenges with task completion rates between 16% and 72%. Google aims to facilitate the creation of Android applications from user prompts and has made the benchmark's methodology and tools available on GitHub.
Search