Dubai-based CAMB.AI open-sources MARS5 speech emulator
AR <> NAR model introduces a new level of prosody to speech emulation
#UAE #syntheticspeech - Dubai-based live video dubbing platform Camb.ai has announced the arrival of its new synthetic speech emulation model, MARS5. Whilst the model can accurately replicate vocal performances in over 140 languages, the company has chosen to open-source the English language version of MARS5. Developed in the United Arab Emirates for Camb.ai's video dubbing platform, MARS5 combines a Mistral-style autoregressive model with a novel non-autoregressive model to capture emotion, performance, and meaning in its synthetic voice outputs.
SO WHAT? - Text-to-speech (TTS) platforms have become more sophisticated over the past few years and a number of platforms now offer easy and affordable 'voice-cloning', where the original voice is emulated to create new content with a similar sounding voice. However, synthetic speech platforms tend to struggle to create cloned voices that have the right rhythm, intonation and emotion. According to Camb.ai, its new MARS5 model is able to capture these nuances and recreate voices with a much higher level of prosody, allowing it to create better voice tracks for dubbing audio and video. The open-source code released this month will give other developers insight into how this was achieved and so, potentially, have an impact far beyond the product itself.
Here are some key details about the new MARS5 speech emulator:
Dubai-based live video dubbing and translation platform Camb.ai has announced the latest version of its synthetic speech emulation model. MARS5 is able to emulate voices in 140 different languages, although only the English language capable model has been released as open-source.
MARS5 underpins Camb.ai’s automatic voice dubbing platform, which allows large media enterprises, sports leagues, movie production companies and content creators to transform their stories, videos and live streams into multiple languages.
The new model can emulate a performers original voice, together with the rhythm, stress and intonation - or prosody - required for the circumstance, and then generate an output in any of 140 languages in near real-time.
MARS5 can create a high level of prosody and realism from just a few seconds of audio input. Camb.ai has achieved this by combining Mistral-style autoregressive model with a novel non-autoregressive model, allowing it to capture emotion, performance, and meaning via one integrated process.
The practical benefit of MARS5’s ability to handle prosody is that it can create convincing synthetic speech outputs for challenging voice performances such as sports commentary, movies, and anime, which closed-source and open-source TTS models are not able to capture well.
MARS5 combines two AI models to create this new level of prosody. Camb.ai says that the Mistral-style ~750M parameter autoregressive model has been integrated with a ~450M parameter non-autoregressive multinomial diffusion model, using 6kbps encodec tokens.
Camb.ai used AWS (Amazon Web Services) for GPU compute resources and NVIDIA infrastructure.
MARS5’s English-language model is now open-sourced via GitHub, under a Free Software Foundation's GNU Affero General Public Licence (v. 3.0). By open sourcing, Camb.ai aims to encourage developer and research communities to build on and learn about the new model.
MARS5 was announced at the Dubai AI Retreat 2024, organised by the Dubai Centre for Artificial Intelligence Applications, in collaboration with the UAE’s National Programme for Artificial Intelligence.
ZOOM OUT - The UAE is not well-known for deep tech and it is only very recently that it has begun to grow a reputation for creating software intellectual property (IP), that compares and competes with software from the United States and Europe. Today, the outcomes of government supported research and development in AI, such as Falcon LLM, have started to be joined by artificial intelligence platforms developed by the private sector. Camb.ai's open-sourcing of its MARS5 synthetic speech emulator stands to not only raise the profile of the Dubai-based startup, but also the recognition for the UAE as a country that is able to produce world-class software.
LINKS
View the source code and information (GitHub)
Watch the demo video of MARS5 (YouTube)
Also check out February’s live show with Camb.ai CEO:
🎧 Listen to the podcast of Middle East AI News LIVE on Thursday 15th February, with Avneesh Prakash, co-founder and CEO of Camb.ai