multimodal

BetaBeacon
May 30, 2025
Niantic Spatial, a company founded by the creators of Pokémon GO, is shifting its focus towards artificial intelligence. They are working on creating a large geospatial model to merge the digital and physical worlds, with the goal of improving the way AI machines perceive and understand our world.
AppWizard
May 28, 2025
The One UI 8 beta is now available for Galaxy S25 models in select regions, featuring enhanced AI capabilities, a tailored user experience for different device types, and proactive suggestions. It introduces improvements to the Reminder app, Quick Share, multitasking, Samsung Internet, and accessibility features. The rollout is limited to regions including Germany, Korea, the U.K., and the U.S., excluding the Galaxy S25 Edge. A stable version is expected to launch with new foldable devices this summer. Key features include multimodal capabilities, enhanced Now Bar and Now Brief features, local data processing options, and improvements to the Auracast feature. The Reminder app will consolidate tasks into a single interface, and Quick Share will receive enhancements. Additional features include improved file search, a redesign of Samsung Internet, multitasking enhancements, new Calendar features, and social health management options through Samsung Health. More features may be revealed as the beta progresses.
AppWizard
May 22, 2025
OPPO has expanded its collaboration with Google Gemini to enhance user experience for the Reno 14 series by creating seamless "multi-app" journeys. This partnership will allow Gemini to interact with various OPPO applications, including Calendar, Notes, and the Clock app, simplifying task completion. The integration of Gemini into OPPO's app ecosystem coincides with the introduction of ColorOS 15 (Android 15) and follows the AI enhancements seen in the Reno 11 series. OPPO aims to provide monthly AI updates for its devices and emphasizes the importance of AI in understanding complex user inputs. The collaboration also involves efforts with MediaTek and utilizes Gemini 1.5 Pro and Google's Flash model for improved communication between OPPO's native applications.
AppWizard
May 21, 2025
A tipster discovered hints of a Material 3 Expressive redesign in Google Photos' code, featuring a new card for creating memories with a collage-style layout. Following I/O 2025, Google released Android 16 QPR1 Beta 1 for Pixels, showcasing the new design language. The redesign includes a revamped user interface, an updated notification shade, an expanded quick toggle section, and a significant redesign of Gmail. Additionally, advancements in Gemini AI and collaboration between Google Search and Project Astra were highlighted.
AppWizard
May 21, 2025
Google has announced enhancements to its Gemini 2.5 models, including the 2.5 Pro version and the new 2.5 Flash model, which improves speed and efficiency. The 2.5 Pro will include native audio output controls for developers to customize speech. Enhanced security measures will protect against malicious instructions and prompt injection attacks. Project Mariner's functionality will be integrated into Gemini and Vertex AI. Google is also introducing insightful summaries for developers to aid in debugging, along with cost control features through a "thinking budget." A generally available model will be released, and support for Model Context Protocol (MCP) will facilitate the integration of open-source tools into Gemini projects.
AppWizard
May 20, 2025
Google's advancements in AI Mode and Search include the rollout of a custom version of the Gemini 2.5 model, enabling users to pose complex queries. Project Astra introduces multimodal capabilities, allowing real-time conversations with Gemini using device cameras. The AI can process website elements and perform tasks like adding items to shopping carts or creating travel itineraries. Users can ask AI Mode to find specific tickets or activities tailored to their interests, with curated suggestions provided. AI Mode will enhance online shopping by integrating with the Shopping Graph, offering visually engaging product listings and a virtual "try-on" feature. Users can upload photos to experiment with outfits and track prices for products. The virtual try-on experience is being rolled out in Search Labs for U.S. users, with full availability of AI Mode in the U.S. starting today.
AppWizard
May 16, 2025
Google is expanding its Gemini Nano AI model by introducing new ML Kit GenAI APIs, expected to be unveiled at the I/O 2025 event. These APIs will allow developers to integrate features such as text summarization, proofreading, rewriting, and image description generation into their applications. Gemini Nano operates on devices, enhancing privacy by processing data locally. The ML Kit GenAI APIs will support various languages and functionalities, including generating concise summaries, correcting grammar, transforming chat messages, and providing image descriptions. Unlike the experimental AI Edge SDK, the GenAI APIs will be in beta, allowing for broader device compatibility beyond the Pixel 9 series, including other Android devices. Public documentation for the ML Kit GenAI APIs is now available for developers.
AppWizard
May 2, 2025
Gemini Live has transitioned from a voice-based AI assistant to a multimodal platform that can process camera feeds and screen-sharing inputs, enhancing user interactions with visual context. It requires an Android device with at least 2 GB of RAM and Android 10 or later, along with a Google One AI Premium subscription for access to camera and screen-sharing features. These features are complimentary for Google Pixel 9 and Samsung Galaxy S25 users, and newer Pixel devices may offer a trial for Gemini Advanced. To share a live video feed, users must launch Gemini, tap the Live icon, select the Camera button, and ensure the desired items are visible. For screen sharing, users open the relevant app or screen, activate Gemini, and select Share screen with Live. Gemini can summarize content and answer questions based on the shared screen. The multimodal capabilities are particularly beneficial for scenarios requiring detailed descriptions, positioning Gemini Live competitively alongside other AI platforms.
Search