Gemini on Android becomes more capable and works with Gmail, Messages, YouTube and more | TechCrunch

Enhancing Android Experience with Gemini’s AI Capabilities

As the digital landscape continues to evolve, Google’s Gemini on Android is set to further enhance user interaction with its suite of apps. The AI, which is poised to replace Google Assistant, is leveraging its deep integration with the Android operating system to offer new, intuitive ways to interact with applications.

One of the most anticipated features is the ability for users to drag and drop AI-generated images directly into their communications, such as Gmail and Google Messages. This seamless integration promises to streamline workflows and add a touch of personalization to digital correspondence. Additionally, YouTube aficionados will soon enjoy the “Ask this video” feature, enabling them to extract specific information from videos without the need to scrub through the timeline manually.

For those who opt for the premium Gemini Advanced service, an innovative “Ask this PDF” function will be available. This feature allows users to query content within a PDF document, bypassing the need to sift through pages to find answers. Subscribers to Gemini Advanced, at a monthly fee of .99, not only gain access to these advanced AI capabilities but also receive a generous 2TB of storage and other Google One benefits.

Already, Gemini has demonstrated its prowess by generating photo captions, facilitating queries about articles, and executing various generative AI tasks. However, it faces competition from OpenAI’s announcement of the GenAI model, GPT-4o, which boasts the ability to work with text, speech, and video inputs. Despite this, Gemini’s tight integration with Android devices gives it a unique edge in the mobile arena.

Google has announced that the latest features of Gemini on Android will be progressively rolled out to hundreds of millions of compatible devices in the coming months. The vision for Gemini includes the evolution of its capabilities to offer contextually relevant suggestions based on the user’s current screen content.

Moreover, the on-device foundation model, Gemini Nano, is set to receive an upgrade that will introduce multimodality. This enhancement will enable the AI to process not only text but also interpret a variety of inputs such as visual cues, auditory signals, and spoken words, paving the way for a more holistic and immersive user experience.

For those interested in staying abreast of the latest developments in AI, Google is launching a dedicated AI newsletter, with subscriptions starting on June 5.

AppWizard