Google is testing various AI models for Android app development through a new platform called “Android Bench,” which evaluates the performance of leading AI language models (LLMs) against benchmarks specific to Android development. The benchmarks assess capabilities in areas such as Jetpack Compose, asynchronous programming, data persistence, dependency injection, navigation migrations, Gradle/build configurations, and interaction with Android components. Google has identified Gemini 3.1 Pro Preview as the top-performing model with a score of 72.4%, followed by Claude Opus 4.6 at 66.6% and OpenAI’s GPT 5.2 Codex at 62.5%. Gemini 2.5 Flash scored the lowest at 16.1%.