local inference

AppWizard
December 30, 2025
A recent survey by Google Cloud found that 90% of game developers are exploring generative AI. Millennium Whisper, a dating sim by Parable Studios, is the first game on Steam to run a large language model entirely on-device, avoiding cloud-based inference. This approach allows for unique in-game conversations and addresses ethical data usage, as actors retain ownership of their data and receive royalties. The game uses actor-led role-play sessions for character behavior, ensuring high-quality data. Running AI locally enhances energy efficiency and sustainability, contrasting with traditional server-based models that incur ongoing costs. Millennium Whisper received grant funding from Innovate UK and is currently in Early Access. Ambrose Robinson, the founder of Parable Studios, emphasizes the importance of energy savings and the sustainability of their AI approach.
Winsage
November 18, 2025
Microsoft Windows is evolving to incorporate AI agents that act autonomously, resembling digital coworkers. This shift is facilitated by the Model Context Protocol (MCP), which standardizes agent interactions with tools and data sources, ensuring secure access to system resources. Windows introduces an on-device registry of "agent connectors" for functionalities like file access and system settings, managed through an OS-level proxy that oversees identity, permissions, consent, and audit logging. The initial connectors focus on File Explorer and System Settings, defining clear capabilities and restrictions for agents. A transparent consent model allows users to manage permissions easily, promoting a user-friendly experience. The introduction of an Agent Workspace provides a dedicated environment for agents, ensuring they operate independently and with least-privileged access. Security measures include signed connectors and a standardized proxy for authentication and auditing, enabling visibility into agent actions. Windows is also expanding on-device AI processing with APIs for various functionalities, allowing agents to leverage local models securely. While Windows is not becoming an agent-first operating system, it is establishing a framework for human and agent interactions, positioning itself as a safe environment for AI operations. The foundational elements for this evolution include standard interfaces, clear permissions, isolated environments, and system-level observability.
Winsage
October 10, 2025
Microsoft is promoting the potential of neural processing units (NPUs) to enhance Windows intelligence, although NPUs are not yet part of official hardware requirements. NPUs are designed to accelerate local inference with lower power consumption and are found in devices from smartphones to Copilot+ PCs. Microsoft claims that NPUs can make sophisticated AI experiences more affordable, stating that tasks that once required expensive computing can now be performed on less costly devices. However, the current utility of local AI processing for users is limited, with few OS features requiring local processing and minimal impact on productivity. Microsoft has integrated AI capabilities into applications like Notepad and Photos, but these changes are not driven by user demand. Despite the lukewarm reception, Microsoft advocates for NPUs, highlighting their ability to run multiple AI applications concurrently. Concerns exist regarding future hardware requirements, as the company may eventually include NPUs in its specifications, echoing past experiences with Windows 10. As of early September, AI-enabled notebooks with NPUs made up 40.5 percent of the European distribution channel, a figure expected to rise.
Winsage
August 6, 2025
OpenAI has released a new, free, and open GPT model called gpt-oss-20b, which can run on personal computers. Microsoft is facilitating its integration for Windows users through the Windows AI Foundry, with plans to extend support to macOS. The model requires a PC or laptop with at least 16GB of VRAM and is optimized for code execution and tool utilization. Microsoft has pre-optimized the model for local inference, indicating potential future support for more devices. This is the first instance of an OpenAI model running locally on Windows, coinciding with Amazon's adoption of the new open-weight GPT-OSS models for its cloud services.
Search