The Gemini Android application beta has taken a significant step forward with the introduction of a feature that allows users to attach audio files, such as MP3s, to their chat conversations. This new functionality, noted by Android Authority in version 16.30.59.sa.arm64 of the Google app beta, brings forth a “Talk live about this” prompt whenever a file is attached. However, it is important to note that the audio processing capabilities within this beta version are still in the early stages of development.
Upon attaching an audio file, users are given the choice to either type a question or engage with the “talk live” prompt. Early observations reveal that Gemini does not consistently process the audio input effectively. In some instances, the application may overlook the attached audio file altogether, while in others, it may produce responses that lack relevance to the audio content, a phenomenon reminiscent of chatbot hallucinations.
Despite these current limitations within the Android beta, the Gemini API has already embraced audio input capabilities. Developers can leverage the API to submit audio files for a variety of processing tasks, including:
- Generating descriptions of audio content
- Summarizing spoken information
- Transcribing speech
The API also allows for specific timestamp requests, enabling users to process segments of audio, such as from “2:30 to 3:29.” Supported audio formats for the API include MP3, WAV, and FLAC, providing a versatile range for developers to work with.
Ongoing Development
The integration of audio file attachment in the Gemini Android app appears to be part of a broader development initiative by Google. While there is no official announcement regarding a specific launch date for this feature, the existing image upload functionality within the Gemini Android application indicates that audio support is a natural evolution of the app’s capabilities.
Featured image credit