At the recent I/O 2025 event, Google showcased an intriguing glimpse into the future of mobile technology with its Project Astra. This research prototype demonstrated how Gemini, Google’s advanced AI, could potentially take control of Android devices, performing tasks such as retrieving web content, playing YouTube videos, managing emails, and even making phone calls on behalf of users.
Gemini’s Capabilities Unveiled
The nearly two-minute demonstration highlighted Gemini’s ability to navigate a PDF in Chrome for Android, seamlessly transitioning to the YouTube app to search for and select videos. This functionality is part of Google’s broader ambition to integrate these capabilities into Gemini Live, enhancing user interaction with their devices.
In October, Google introduced a Computer Use model for developers, allowing Gemini to interact with user interfaces in a human-like manner—scrolling, clicking, and typing. Currently optimized for web browsers, Google has expressed optimism about its potential for mobile user interface control tasks. This advancement is viewed as a significant step toward creating powerful, general-purpose digital agents, particularly as many digital tasks still necessitate direct interaction with graphical interfaces.
Comparative Approaches to Voice Control
Meanwhile, Apple is also advancing its voice control capabilities, with plans for Siri to enable users to perform actions across multiple apps using voice commands. The vision presented in 2024 suggests that tasks that previously required navigating through several applications could be accomplished in mere seconds through a series of voice prompts. However, as of now, Google has yet to announce a similar system tailored for Android.
…Siri can take actions across apps, so after you ask Siri to enhance a photo for you by saying “Make this photo pop,” you can ask Siri to drop it in a specific note in the Notes app — without lifting a finger.
Google’s approach appears more generalized, not reliant on prior integrations, which may be a pragmatic strategy, especially if Android developers are slow to adapt their applications. This is not Google’s first attempt at enhancing user interaction; the new Google Assistant introduced in 2019 aimed to revolutionize on-device voice processing, promising to make traditional tapping seem outdated.
This next-generation Assistant will let you instantly operate your phone with your voice, multitask across apps, and complete complex actions, all with nearly zero latency.
Despite its ambitious goals, the 2019 initiative did not gain widespread traction and remained exclusive to Pixel devices, facing challenges similar to those of earlier voice assistants, including the need for regimented voice commands.
Looking Ahead: The Role of Generative AI
With advancements in large language models (LLMs), there is hope that users will be able to issue commands in a more conversational manner. This could also address previous limitations, allowing actions to be taken across various apps or websites without prior exposure, a challenge faced by Apple’s system.
The integration of generative AI seems poised to resolve many of the criticisms directed at Google’s earlier approaches. However, the reception of this technology remains uncertain. Scenarios where hands-free usage is beneficial, as demonstrated in the Astra demo, are likely to see mainstream adoption in the near future.
The implications for wearable technology, such as smart glasses or watches, are significant. With the current limitations of running phone-sized applications on these devices, envisioning a future where your phone can be controlled and information relayed from secondary devices—while keeping the screen off in your pocket—opens up exciting possibilities.
Ultimately, the question remains whether this voice control, assuming it achieves perfect accuracy, could become the primary mode of interaction for smartphones, potentially surpassing touch-based methods in the future.