text recognition

Winsage
June 2, 2026
Microsoft unveiled a series of enhancements for developers at Build 2026, aiming to retain its existing developer base and attract new ones to Windows 11. Key offerings include: - Windows Developer Configuration: A feature that creates a distraction-free environment for software development, now generally available. - Windows Developer Skills: Introduction of WinApp CLI with AI agents for creating native Windows applications, also generally available. - Terminal Improvements: An experimental preview of an Intelligent Terminal mode that features a dual-pane display. - Enhanced Linux Capabilities: Windows Subsystem for Linux (WSL) will support containers in public preview and has native support for Coreutils, now generally available. - Agentic Capabilities: Microsoft Execution Containers (MXC) SDK in early preview, allowing resource specification for agents, with integration for security protections. - On-device AI: Introduction of Aion 1.0 Instruct and Aion 1.0 Plan for local AI tasks, with a preview available through Edge Insider channels and an open-source model expected in July. - Surface RTX Dev Box: A desk-based datacenter focused on AI capabilities set to launch later this year.
AppWizard
May 17, 2026
Oppo's Multi-X team has introduced X-OmniClaw, an open-source AI agent for Android that operates on the device without cloud processing. It uses the camera, screen, and voice functionalities to perform tasks across applications. Unlike cloud-based platforms, X-OmniClaw processes information locally, with the cloud serving as a supplementary resource. The architecture integrates three perception channels into a unified pipeline, allowing it to interpret scenes and user requests effectively. It transforms local data into semantic entries for long-term memory, processes gallery photos into descriptions, and filters out sensitive information. X-OmniClaw captures user behavior into reusable skills, enabling direct navigation to app pages through deeplinks. Demonstrations show its ability to retrieve product prices, assist with homework, and create highlight albums from photos. The project is built on the open-source HermesApp codebase and is accessible on GitHub. It draws inspiration from existing models, including Google's local model and ByteDance's UI-TARS, while enhancing functionality through on-device execution and structural XML data integration.
Winsage
April 6, 2026
AI PCs with Neural Processing Units (NPUs) are becoming common in technology, enabling innovative applications that utilize on-device AI. Microsoft's Windows AI APIs facilitate easy integration of AI into applications, requiring only a Copilot+ PC with a capable NPU. Lance McCarthy, a Microsoft MVP, highlights tools available for developers, including Phi Silica for local language modeling, AI Text Recognition for OCR, and AI Imaging tools for image processing. McCarthy's Xkcd Viewer app exemplifies these tools, featuring AI-powered image descriptions for visually impaired users, which enhance the experience beyond simple text readouts. The modification of the app took ten minutes and significantly improved user accessibility.
AppWizard
November 16, 2025
TapScanner is a document management app that transforms smartphones into document scanners and PDF toolkits, with over 100 million downloads and a rating of 4.6 stars. It allows users to capture, organize, and share documents easily, featuring high-quality scanning with automatic edge detection and smart image correction. The app includes a comprehensive PDF workspace for merging, splitting, reordering, signing, and annotating PDFs, along with OCR text recognition for over 110 languages. It offers batch scanning, one-tap renaming, automated file organization, and secure cloud backup options with platforms like Google Drive, Dropbox, and OneDrive. Multi-page support is available, along with image enhancements such as brightness adjustments and shadow removal. Users can share and print documents via email or messaging apps, and the app has a user-friendly interface. TapScanner is free with a trial period that transitions into a subscription model.
Winsage
November 30, 2024
The Windows Snipping Tool is preinstalled on Windows 11, allowing users to access it easily with the keyboard shortcut Windows + Shift + S. It features a user-friendly interface for easy annotation of screenshots, including tools for drawing, shapes, emojis, and handwriting. The tool offers a visual search feature that enables users to search for information about images using Bing. It includes text recognition capabilities for copying text from images and a redact feature for protecting sensitive information. Additionally, the Snipping Tool can record screens with system and microphone audio, providing a built-in solution for creating tutorials or guides.
Search