Microsoft Unveils VibeVoice for Longer Conversational AI Audio

B&T Television

What is microbetting? The rise of the fast bet and how it’s become commonplace

November

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

more tags

Microsoft Unveils VibeVoice for Longer Conversational AI Audio

Tags: audio digital google media microsoft new tech technology

Author: DATE POSTED:September 2, 2025

Feed: PYMNTS.com

View: Original article

Microsoft has released VibeVoice, a new open-source artificial intelligence (AI) model that lets users create podcasts and other audio — a counter to Google’s popular NotebookLM.

But there are notable differences. Microsoft’s text-to-speech model can generate four voices and up to 90 minutes of podcast-quality speech. NotebookLM can do two voices.

Additionally, VibeVoice reads and organizes text while NotebookLM ingests documents and turns them into two-person podcasts. Users can also query and get document summaries, according to tech firm Hugging Face.

That means VibeVoice doesn’t try to understand the text but rather performs it audibly, ostensibly to replace a recording studio.

VibeVoice is the latest offering in voice AI technology, which has been attracting venture capital funding.

In 2024, voice AI startups raised $2.1 billion, up eightfold from the prior year, according to market research firm CB Insights. There’s rising interest in voice shopping: A PYMNTS Intelligence report shows that 30.4% of Gen Z consumers already shop by voice every week, followed by millennials. For all ages, the average is 17.9% of consumers using voice to shop.

VibeVoice runs on 1.5 billion parameters, relatively small for a model capable of sustaining dialogue across multiple speakers.

It was trained using Alibaba’s open-source Qwen2.5, a large language model that helps orchestrate natural turn-taking and contextually aware speech patterns during dialogues.

Microsoft claims this means VibeVoice can produce fluid conversations among four voices and yet maintain each voice’s distinct characteristics, even in longer conversations.

How to use VibeVoice

Potential research applications of VibeVoice include the following:

Prototyping podcasts and training content

Creators could generate mock podcasts, panel discussions or training modules with multiple AI voices. Instead of hiring four voice actors to test dialogue flow, users can create a synthetic version in minutes using text.

Accessibility and education

Educational material, textbooks or research papers could be turned into long-form audio with distinct narrators. This could help people who learn better by listening, or make dense material more engaging.

Game and media development

Game developers or storytellers could use VibeVoice to prototype dialogue between characters. Because it handles four speakers, you can stage a full in-game conversation without recording sessions.

Recognizing the risks of deepfakes, Microsoft said VibeVoice’s safeguards include ensuring every audio file includes both a disclaimer—such as “This segment was generated by AI”—and a hidden digital watermark.

It bars impersonation, disinformation and live deepfake uses such as real-time voice conversion in calls. It supports only English and Chinese speech for now. The model is available for research, not commercial deployment.

AWS and Vonage Partner to Distribute ‘Natural-Sounding’ AI Voice Agents

Meta to Make a Bid for Voice AI Startup PlayAI

The post Microsoft Unveils VibeVoice for Longer Conversational AI Audio appeared first on PYMNTS.com.

Feed: PYMNTS.com

View: Original article

Tags: audio digital google media microsoft new tech technology