Microsoft and NVIDIA Supercharge AI Development on RTX AI PCs

Generative AI-powered laptops and PCs are paving the way for significant advancements across various sectors, including gaming, content creation, productivity, and software development. Currently, over 600 Windows applications and games leverage AI capabilities locally on more than 100 million GeForce RTX AI PCs globally, ensuring swift, dependable, and low-latency performance.

During the recent Microsoft Ignite event, NVIDIA and Microsoft unveiled new tools designed to assist Windows developers in swiftly building and optimizing AI-powered applications on RTX AI PCs. These innovations aim to make local AI more accessible, enabling developers to utilize the robust capabilities of RTX GPUs to enhance complex AI workflows for applications such as AI agents, app assistants, and digital humans.

RTX AI PCs Power Digital Humans With Multimodal Small Language Models

Meet James, an interactive digital human knowledgeable about NVIDIA and its products. James uses a collection of NVIDIA NIM microservices, NVIDIA ACE, and ElevenLabs digital human technologies to provide natural and immersive responses.

NVIDIA ACE, a comprehensive suite of digital human technologies, breathes life into agents, assistants, and avatars. For these digital entities to engage with users more effectively, they must possess a level of visual perception akin to that of humans. This heightened understanding allows for context-aware responses, enriching user interactions.

To enhance the realism of digital human interactions, NVIDIA has developed multimodal small language models capable of processing both text and imagery. These models excel in role-playing scenarios and are optimized for rapid response times. The forthcoming NVIDIA Nemovision-4B-Instruct model, utilizing the latest NVIDIA VILA and NVIDIA NeMo frameworks, is designed to be compact enough to operate on RTX GPUs while maintaining the accuracy developers require.

This model empowers digital humans to interpret visual imagery from both the real world and digital screens, enabling them to deliver pertinent responses. The concept of multimodality lays the groundwork for agentic workflows, hinting at a future where digital humans can reason and act with minimal user intervention.

NVIDIA is also set to introduce the Mistral NeMo Minitron 128k Instruct family, a collection of large-context small language models tailored for efficient digital human interactions. This suite will be available in 8B-, 4B-, and 2B-parameter versions, providing flexible options for balancing speed, memory usage, and accuracy on RTX AI PCs. These models can process extensive datasets in a single pass, eliminating the need for data segmentation and reassembly. Built in the GGUF format, they enhance efficiency on low-power devices and support compatibility with various programming languages.

Turbocharge Gen AI With NVIDIA TensorRT Model Optimizer for Windows

As developers transition models to PC environments, they often encounter challenges related to limited memory and computational resources for running AI locally. Their goal is to make these models accessible to a broader audience while minimizing accuracy loss.

In response to these challenges, NVIDIA has announced enhancements to the NVIDIA TensorRT Model Optimizer (ModelOpt), providing Windows developers with an improved method for optimizing models for ONNX Runtime deployment. The latest updates enable models to be streamlined into an ONNX checkpoint for deployment within ONNX runtime environments, utilizing GPU execution providers such as CUDA, TensorRT, and DirectML.

The TensorRT-ModelOpt includes advanced quantization algorithms, including INT4-Activation Aware Weight Quantization. This new approach significantly reduces the memory footprint of the model and enhances throughput performance on RTX GPUs compared to other tools like Olive. During deployment, models can achieve up to a 2.6x reduction in memory footprint compared to FP16 models, resulting in faster throughput with minimal accuracy degradation. This advancement allows for broader compatibility across various PC configurations.

Discover how developers on Microsoft systems, from Windows RTX AI PCs to NVIDIA Blackwell-powered Azure servers, are revolutionizing daily interactions with AI.

Winsage
Microsoft and NVIDIA Supercharge AI Development on RTX AI PCs