Speechify Launches with On-Device Voice AI for 1B+ Windows Users Worldwide

At Speechify, the commitment to enhancing productivity through Voice AI remains steadfast. The recent launch of their Windows application introduces a sophisticated real-time text-to-speech and speech-to-text functionality, offering users the flexibility of either cloud-based or fully on-device processing. Notably, when opting for on-device processing, user voice data is securely retained on the machine, ensuring privacy and security.

Innovative Technology Integration

The engineering team has effectively leveraged the Windows ML stack and platform APIs to facilitate seamless operation across x64 and Arm64 architectures, as well as various chipsets utilizing NPU and GPU—all from a unified codebase. Raheel Kazi, Engineering Leader at Speechify, elaborated on this achievement: “The Windows ML stack provided us with a singular runtime to deploy three production AI models on-device across multiple architectures, including NPUs and GPUs, all via ONNX Runtime.”

Kazi further explained how the integration with Qualcomm’s Snapdragon technology accelerated their market entry. “The QNN execution provider enabled our transcription model to run on Qualcomm’s NPU with FP16 precision, allowing us to maintain continuity without starting from scratch. This foundational approach to inference has proven scalable across both x64 and ARM64 platforms, including Intel and AMD chipsets.”

Real-Time Performance on Snapdragon

When Speechify transitioned to Windows on Arm, the primary challenge was ensuring that on-device transcription could deliver real-time performance, particularly on fanless Snapdragon laptops. The solution lay in the ONNX Runtime’s QNN execution provider, which facilitated a split encoder-decoder architecture directly interfacing with Qualcomm’s Hexagon Tensor Processor in HTP burst mode.

This innovative setup allows the NPU to manage heavy inference tasks while keeping the CPU available for other application functions. Upon startup, Speechify’s dependency injection container automatically identifies the processor architecture and selects the appropriate engine. Users also have the option to switch to cloud-based transcription, with the app capable of hot-swapping engines in real-time. A simple hotkey press initiates immediate transcription, enhancing user experience.

A Seamless User Experience

The effectiveness of on-device AI is heavily reliant on the surrounding pipeline, and Speechify has meticulously crafted each component to ensure a smooth, uninterrupted experience. The integration of the Windows platform stack—including NAudio for audio input/output, ONNX Runtime for inference, and CsWin32 for native API access—provides the essential building blocks for this seamless interaction.

Designed for Windows

Speechify’s application for Windows is not merely a port; it is a WinUI3 application that boasts deep native integration through CsWin32’s source-generated P/Invoke bindings for over 30 Win32 APIs. Key features include:

  • RegisterHotKey: Enables system-wide shortcuts that function regardless of the active application.
  • SendInput: Facilitates auto-pasting of transcribed text directly into the user’s current field.
  • GDI Screen Capture: Powers OCR functionality for enhanced accessibility.
  • DwmSetWindowAttribute: Ensures the floating microphone indicator remains on top with appropriate DPI scaling.
  • Windows DPAPI: Encrypts authentication tokens for secure data handling.
  • Custom speechify:// Protocol Handler: Enables OAuth callbacks through the MSIX package manifest.

These capabilities distinguish an application that merely runs on Windows from one that is inherently designed for the platform. Cliff Weitzman, Founder & CEO of Speechify, remarked, “With over a billion users on Windows, our launch aims to eliminate barriers to reading and writing, regardless of the device or user preference. We are particularly enthusiastic about the enterprise opportunities, given the demand from professionals for Speechify on their PCs.”

The Speechify Windows application is now available for both x64 and Arm64 devices, exclusively through the Microsoft Store.

About Speechify: Speechify stands as the world’s most utilized Voice AI productivity app, allowing users to have a variety of content read aloud, including books, websites, PDFs, emails, and more. Additionally, Speechify offers voice typing capabilities, enabling real-time dictation with either on-device or cloud-based transcription. The app was initially developed by CEO & Founder Cliff Weitzman to assist him in overcoming dyslexia during his studies at Brown University.

Media Contact: Rohan P., Speechify, 1 (747) 302-4454, [email protected], speechify.com

Winsage
Speechify Launches with On-Device Voice AI for 1B+ Windows Users Worldwide