Google launched the Android Bench benchmarking portal in March to help software developers evaluate AI models for Android app development. The leaderboard was updated last week to include open-weight models and new metrics for latency, tokens, and cost. Matthew McCullough, Google's VP of Product for Android Development, stated that the goal is to provide a benchmark for evaluating large language models (LLMs) in Android development. As of May 18, GPT 5.5 is the top AI model for Android app development, with Gemini 3.1 Pro and GPT 5.4 ranked as joint leaders. Android Bench evaluates LLMs based on real-world challenges and tasks sourced from public GitHub repositories. Other benchmarking tools in the Android ecosystem include Jetpack Microbenchmark, Jetpack Macrobenchmark, Firebase Performance Monitoring, Android Vitals, Apptim, and Android Performance Analyzer. The overall benchmark score on Android Bench is calculated using four core values: Confidence Interval Range, Average Latency Score, Average Total Tokens Score, and Average Cost. The test harness for Android Bench is publicly available on GitHub.