Google Adds 10-minute Audio Transcription and Task Extraction to Gemini AI

New Delhi: Google has introduced new audio processing features in its Gemini AI assistant, allowing users to upload and analyze recordings of up to 10 minutes. The update enables transcription, summarization, and task identification from pre-recorded files, marking a shift in how Gemini can be used for work and study purposes.

The new option, available through Gemini’s web and mobile platforms, processes recordings such as meetings, lectures, and interviews. Once uploaded, the audio is converted into searchable text, with the assistant also capable of isolating speaker statements, generating simplified summaries, and extracting action points. This makes it useful for students preparing study notes, professionals drafting follow-ups, or content creators repurposing material.

According to Google, the addition of audio uploads came in response to user demand. Josh Woodward, VP of Gemini, said that transcription accuracy remained high in testing, though recognition of specific names showed some inconsistencies. Unlike Gemini Live, which handles real-time conversations, this feature is designed for users who want to review past recordings.

Despite the utility, there are constraints. The upload length is capped at 10 minutes, and free-tier users face daily limits. The feature draws from existing Gemini usage quotas, and Google has not disclosed whether larger-scale transcription will require a separate pricing model.

The update reflects Google’s broader attempt to expand Gemini beyond text-based interactions, positioning it as a multipurpose tool for productivity. By integrating transcription with task extraction, the company is targeting practical use cases in education, workplace collaboration, and content management.