Microsoft Unveils VibeVoice for Longer Conversational AI Audio


Microsoft has released Vibevoice, a new Open Source Artificial Intelligence (AI) model that allows users to create podcasts and other audio – a counter Googlepopular Notebook.

But there are notable differences. Microsoft’s vocal text model can generate four votes and up to 90 minutes of podcast quality discourse. Notebooklm can make two voices.

In addition, vibevoice reads and organizes text while Notebooklm ingests documents and transforms them into podcasts to two people. Users can also question and obtain documents of documents, according to the technological company Face.

This means that vibevoice does not try to understand the text but rather performs it audible, ostensibly to replace a recording studio.

Vibevoice is the latest offer for Voice AI technology, which attracted funding for venture capital.

In 2024, voice startups AI raised $ 2.1 billion, in eight years compared to the previous year, according to the market research company CB Insights. There is a growing interest in vocal shopping: a Pymnts intelligence report shows that 30.4% of generation Z consumers are already shopping by voice every week, followed by millennials. For all ages, the average is 17.9% of consumers who use voice to shop.

Vibevoice operates on 1.5 billion parameters, relatively small for a model capable of maintaining dialogue on several speakers.

It was formed using Qwen2.5 Open-Source d’Alibaba, an important model of language which helps to orchestrate the natural socket of turning and contextually conscious speech during the dialogues.

Microsoft claims that this means that vibevoice can produce fluid conversations between four votes while retaining the distinct characteristics of each voice, even in longer conversations.

See also:: How the world is digital: a deep dive in global digital engagement

How to use vibevoice

Potential vibevoice research applications include the following elements:

Prototyping podcasts and training content

  • Creators can generate simulated podcasts, group discussions or training modules with several AI voices. Instead of hiring four vocal actors to test the dialogue flow, users can create a synthetic version in minutes using text.

Accessibility and education

  • Educational materials, manuals or research articles could be transformed into long -shaped audio with distinct narrators. It could help people who learn better by listening or making dense equipment more attractive.

Development of games and media

  • Game developers or storytellers can use vibevoice to prototyper the dialogue between the characters. Because it manages four speakers, you can organize a complete conversation in the game without recording sessions.

Recognizing the risks of Deepfakes, Microsoft said that the guarantees of Vibevoice include the guarantee of ensuring that each audio file includes both a warning – as “this segment has been generated by AI” – and a hidden digital watermark.

It prohibits identity, disinformation and live Deepfake uses such as vocal conversion in real time in calls. He only supports English and Chinese speech for the moment. The model is available for research, not on commercial deployment.

Find out more:

No one speaks: vocal interfaces face obstacles for a large adoption

AWS and vonage parter

Meta to make an offer for the Ai Playai voice startup

Leave a Reply

Your email address will not be published. Required fields are marked *