multimodal capabilities Archives

AppWizard

May 20, 2025

AI Mode levels up with new AI features, stealing Google Search’s spotlight

Google's advancements in AI Mode and Search include the rollout of a custom version of the Gemini 2.5 model, enabling users to pose complex queries. Project Astra introduces multimodal capabilities, allowing real-time conversations with Gemini using device cameras. The AI can process website elements and perform tasks like adding items to shopping carts or creating travel itineraries. Users can ask AI Mode to find specific tickets or activities tailored to their interests, with curated suggestions provided. AI Mode will enhance online shopping by integrating with the Shopping Graph, offering visually engaging product listings and a virtual "try-on" feature. Users can upload photos to experiment with outfits and track prices for products. The virtual try-on experience is being rolled out in Search Labs for U.S. users, with full availability of AI Mode in the U.S. starting today.

AppWizard

May 2, 2025

How to use Gemini Live’s camera and screen sharing features

Gemini Live has transitioned from a voice-based AI assistant to a multimodal platform that can process camera feeds and screen-sharing inputs, enhancing user interactions with visual context. It requires an Android device with at least 2 GB of RAM and Android 10 or later, along with a Google One AI Premium subscription for access to camera and screen-sharing features. These features are complimentary for Google Pixel 9 and Samsung Galaxy S25 users, and newer Pixel devices may offer a trial for Gemini Advanced. To share a live video feed, users must launch Gemini, tap the Live icon, select the Camera button, and ensure the desired items are visible. For screen sharing, users open the relevant app or screen, activate Gemini, and select Share screen with Live. Gemini can summarize content and answer questions based on the shared screen. The multimodal capabilities are particularly beneficial for scenarios requiring detailed descriptions, positioning Gemini Live competitively alongside other AI platforms.

AppWizard

April 14, 2025

Google tests a new home for AI Mode in its Search bar for Android app users

Google is experimenting with a redesign of its AI Mode search launcher, integrating it into the main Search bar of its Android app. This change aims to enhance user engagement with AI functionalities. The updated interface allows users to tap the Search bar while removing the Lens and voice icons. AI Mode was first introduced in early March for a select group of early access users. The new AI Mode button will be distinct from the traditional Search function and aims to improve user understanding of complex subjects by providing advanced reasoning and analytical capabilities. It delivers succinct answers with relevant links and detailed explanations when needed. Google plans to integrate its Gemini technology into AI Mode to introduce multimodal capabilities for more comprehensive responses to complex queries.

AppWizard

February 4, 2025

Perplexity launches an assistant for Android

AI-powered search engine Perplexity has launched a new feature called Perplexity Assistant, available for Android devices, which integrates reasoning, search capabilities, and app functionalities. The assistant can perform multi-app actions, such as hailing rides and searching for songs, and can set reminders by creating calendar entries. It utilizes the phone's camera for contextual inquiries and maintains context across various tasks, like researching restaurants and making reservations. The assistant is initially free for users in 15 languages. CEO Aravind Srinivas acknowledged that some features may not perform as expected and improvements are planned. Perplexity has also introduced Sonar, an API service for enterprises, and acquired Read.cv. Founded in 2022, Perplexity has raised over 0 million in funding and processes over 100 million queries weekly. The company faces legal challenges from publishers, including lawsuits from News Corp and a cease and desist order from The New York Times, but emphasizes its commitment to respecting publisher content through a revenue-sharing program.

AppWizard

February 3, 2025

Perplexity’s AI assistant goes mobile on Android

Perplexity AI has launched a new Android app called the Perplexity Assistant, available on the Google Play Store. The app is designed to assist with various tasks through voice, text, and camera interactions, and it can converse in 15 languages. It utilizes Perplexity’s proprietary search engine to provide real-time web information and maintain context across multiple tasks. Users can perform activities such as booking rides, identifying objects, and making restaurant reservations through voice commands. The app is free and aims to integrate Perplexity’s AI into users' daily workflows. Perplexity has also introduced an API called Sonar for businesses and acquired the professional social media platform Read.cv.

AppWizard

December 5, 2024

Google App ‘AI Mode’ For Android Lets Users ‘Talk’ Directly to Search, Multimodal Capabilities Available

Google is developing an 'AI Mode' for its Search app on Android, allowing users to interact in a conversational manner. This feature, discovered in an APK teardown, will enable voice inputs and the submission of photos and videos, creating a more intuitive search experience. The AI Mode will be accessible through a dedicated tab represented by a 'wink' icon in the app. This development follows Google's earlier integration of generative AI into Search, including the Search Generative Experience and AI Overviews, which have been refined based on user feedback.

Winsage

November 20, 2024

Microsoft readies Copilot Studio for agentic AI

Microsoft has introduced new services and products to enhance its AI agent portfolio at the Ignite 2024 conference, including significant upgrades to Copilot Studio with improved knowledge sources and tuning capabilities. The autonomous agents in Copilot Studio, currently in public preview, now feature multimodal capabilities for voice and image analysis. Updated security measures have been implemented, including encryption and data loss prevention, to ensure data protection. Microsoft plans to roll out autonomous capabilities in Copilot Studio by November. A Capgemini survey indicates that over 80% of executives intend to integrate AI agents within the next three years, with Toyota Motor Corporation already using generative AI agents. Gartner's Avivah Litan warned that by 2028, one in four enterprise breaches may be linked to AI agent misuse. KPMG is exploring AI agents but prioritizes establishing security measures before production. The deployment of agentic AI will require increased computing capacity, prompting Microsoft to develop customized chips and an Azure Boost DPU for enhanced security and workload optimization. Additionally, the Azure Integrated Hardware Security Module has been created to improve data center security.