multimodal Archives

AppWizard

July 22, 2026

3 Google updates from Galaxy Unpacked 2026

Gemini allows users to monitor task progress on their screens while multitasking across applications or letting it operate in the background. Users can intervene for adjustments before Gemini finalizes tasks and notifies them for review. Gemini Notebook, available on Samsung's latest foldable devices, enhances project management by allowing users to drag and drop various elements into a side-by-side workspace. It features a chat function for discussions about sources and enables users to create custom resources like slide decks, videos, flash cards, quizzes, and podcasts.

BetaBeacon

June 17, 2026

Get it reserved: Xreal’s Aura smart glasses refresh the game with powerful Android XR and AI

Xreal announced the launch of its Aura smart glasses with Google's Android XR and Qualcomm's Snapdragon Reality Elite, featuring a 70-degree FOV, hand-tracking, and strong multimodal AI.

AppWizard

May 20, 2026

Google thinks Gemini 3.5 Flash can finally make AI agents more useful

Google has rolled out its AI model, Gemini 3.5 Flash, across various platforms, claiming it outperforms its predecessor, Gemini 3.1 Pro, in key benchmarks. Gemini 3.5 Flash generates responses four times faster than competing AI systems and is designed for complex workflows and coding tasks. Google plans to introduce Gemini 3.1 Pro next month, which excels in decision-making and coding tests. The model is particularly effective for "long-horizon" tasks, aiding app development and document preparation. Google Antigravity, an agentic development platform, integrates with Gemini 3.5 Flash to manage large workloads. The company also introduced Gemini Spark, a personal AI agent for managing digital tasks, with a beta rollout for select testers. Gemini 3.5 was developed under the Frontier Safety Framework, incorporating enhanced safety measures and interpretability tools.

AppWizard

May 19, 2026

Google is giving Search its biggest overhaul in 25 years

Google is implementing a major transformation in its Search platform, driven by artificial intelligence, marking the end of traditional keyword searches. The new AI Mode, powered by Gemini 3.5 Flash, introduces a conversational interface and supports multimodal inputs, including text, images, videos, and files. Continuous background Search agents will provide real-time updates and facilitate bookings for local services, initially available only to Google AI Pro and Ultra subscribers, while booking features will be accessible to all U.S. users this summer. Google is also enhancing the search experience with Antigravity technology and generative UI elements, which will be free for all users this summer. Additionally, Personal Intelligence will be available across 98 languages in nearly 200 countries without a subscription, allowing users to link applications like Gmail and Google Photos for personalized assistance while maintaining control over their data.

AppWizard

May 12, 2026

Google brings agentic AI and vibe-coded widgets to Android

Google introduced new AI features under the Gemini Intelligence brand at its Android Show: I/O Edition event. These features allow users to perform tasks across applications, navigate the web, fill out forms, dictate speech, and create personalized Android widgets using natural language. Gemini's capabilities now include managing multi-step processes, such as copying a grocery list and adding items to a shopping cart, with user confirmation required before checkout. A web browsing feature that allows Gemini to book appointments is being rolled out to Android devices, and by late June, it will be integrated into Chrome on Android. Gemini can also fill out forms using insights from Personal Intelligence, with an opt-in option for users. Additionally, Gemini will be integrated into Android's Gboard keyboard, featuring a tool called Rambler that transcribes speech while removing filler words. Users can create Android widgets through natural language descriptions, and Gemini will follow Google's Material 3 design language. The rollout of these features is expected to start this summer on Samsung Galaxy and Google Pixel devices, with wider availability later in the year.

AppWizard

March 19, 2026

ChatGPT Android App Hints At Sora Video Integration

OpenAI's generative video model, Sora, is likely to be integrated into the ChatGPT Android app, as indicated by discoveries in the beta version 1.2026.076. Testers found in-app text suggesting end-to-end video generation capabilities, allowing users to convert text and images into videos with dialogue, soundtracks, and customizable styles. The language used in the app is polished and consumer-ready, indicating a transition towards user-facing integration. Previous reports have indicated OpenAI's intention to incorporate Sora's video capabilities into ChatGPT, consolidating multimodal creation within a single platform. If integrated, users could transform text prompts and images into short videos, with options for voiceovers and music, facilitating easy sharing on social media. OpenAI's demonstrations have shown Sora's ability to create intricate 1080p videos, potentially redefining ChatGPT into a mobile video studio. The integration would likely handle intensive tasks in the cloud, with possible limitations on file size and resolution for free users. The integration of Sora into ChatGPT's Android app would provide access to a large user base, enhancing the mainstream adoption of AI video creation. The competitive landscape includes rivals like Runway and Google, all developing video capabilities. The introduction of mobile video generation raises challenges such as misinformation and copyright issues, prompting OpenAI to emphasize safety measures and content provenance strategies. While the beta strings do not confirm a launch date, features typically undergo final refinements late in development. Indicators to watch for include a new “Video” option in input modes and prompts for camera roll access. If Sora is launched in ChatGPT for Android, it will mark a significant shift for the app, making video creation an integral part of the user experience.

AppWizard

March 19, 2026

Google Labs’ Stitch is a design canvas that turns your voice into an app

Google has launched an upgraded version of Stitch, a tool from Google Labs aimed at improving user interface (UI) design through a concept called “vibe design,” which allows users to create designs using simple text prompts. Stitch utilizes Google’s Gemini models to interpret both text and visual inputs, enabling real-time design adjustments. It can produce editable design files and front-end code, integrating into existing engineering workflows. Currently in the experimental phase, Stitch aims to democratize design, allowing individuals without extensive expertise to contribute to UI development. Concerns have been raised about the potential for uniformity in design due to its streamlined approach.

AppWizard

March 18, 2026

ChatGPT’s free tier gets GPT 5.4 mini model with improved coding capabilities

OpenAI has introduced the GPT 5.4 mini and nano models, making advanced AI capabilities accessible to free users of the ChatGPT platform. The GPT 5.4 mini operates more than twice as fast as its predecessor and closely matches the performance of the larger GPT 5.4 model in key evaluations. These models are designed for environments where latency is critical, excelling in coding, reasoning, multimodal understanding, and tool utilization. The GPT 5.4 mini is available in ChatGPT’s free and Go tiers, as well as in OpenAI’s API and Codex, while the nano variant is accessible exclusively through the API, both at lower costs than the original GPT 5.4 model.

AppWizard

February 26, 2026

Google details MCP-like ‘AppFunctions’ that let Gemini use Android apps

Google has introduced early-stage developer capabilities for Android aimed at connecting applications with intelligent agents and personalized assistants, specifically Google Gemini, while prioritizing privacy and security. A key feature of this initiative is AppFunctions, introduced with Android 16, which allows applications to expose specific capabilities for access by agent apps, enabling seamless task execution on devices. Developers can define app functionalities for AI assistants, facilitating various use cases such as task management, media creation, cross-app workflows, and calendar scheduling. A practical example includes the Samsung Gallery app, where users can request specific photos through Gemini, which triggers the appropriate function to retrieve them. Additionally, Google is advancing a UI automation framework for AI agents, allowing for the execution of generic tasks across applications with minimal coding. Future expansions of these capabilities are planned for Android 17, with ongoing collaboration with select app developers to enhance user experiences.

AppWizard

February 24, 2026

Circle to Search has 3 secret powers most people don’t know about

Circle to Search has reached its second anniversary, marking a significant milestone for Google. It was introduced to Android as a practical application of artificial intelligence and has evolved to include enhanced functionalities relevant in 2026. Users can access the generative AI model Nano Banana directly through Circle to Search for image creation and editing, streamlining the remixing process. The tool also features a full-screen translation capability that allows instant translation of text displayed on screens across various apps and websites, supporting multiple languages and enabling scrolling translations. Additionally, Circle to Search can scan QR codes and barcodes displayed on screens, functioning similarly to the Camera app. Its capabilities include text selection, image searching, generative AI, code scanning, song recognition, and on-screen translation, making it a versatile tool that enhances user experience. The Google Pixel 10 is highlighted as an ideal companion for Circle to Search, equipped with AI-powered tools that enhance overall user experience.