HUMAIN reveals Arabic TTS benchmark at Interspeech 2025
SawtArabi benchmark addresses dialectal speech quality, code-switching challenges

#SaudiArabia #AI - HUMAIN, Saudi Arabia’s national AI company, and Saudi Data and Artificial Intelligence Authority (SDAIA) presented a new Arabic text-to-speech (TTS) evaluation benchmark at Interspeech 2025 in Rotterdam last week. The SawtArabi benchmark is the first Arabic-English TTS corpus to address dialectal and code-switching TTS in the Arabic language. Developed in collaboration with KTH Royal Institute of Technology (Sweden) and Qatar Computing Research Institute (QCRI) the four-hour dataset covers Modern Standard Arabic, Egyptian Arabic, and English recordings from a single voice talent, tackling vowelisation and phonemisation issues that have limited Arabic speech quality in AI applications.
SO WHAT? - SawtArabic is the first comprehensive Arabic dialectal and code-switching TTS evaulation data set, addressing critical gaps in Arabic language AI capabilities. As AI adoption in the the Arab world continues to grow, Arabic AI app development is hindered by the lack of natural-sounding Arabic speech synthesis. Previous evaluation data sets failed to cover dialectal TTS, which is essential for capturing Arabic’s linguistic diversity, and the lack of code-switching TTS, crucial for handling mixed-language speech common in everyday conversations. SawtArabi could pave the way for more effective voice assistants, accessibility tools, and multimedia applications for over 400 million Arabic speakers.
Here are some key details about SawtArabi:
HUMAIN and Saudi Data and Artificial Intelligence Authority (SDAIA) presented SawtArabi — a new Arabic text-to-speech (TTS) evaluation benchmark— at Interspeech 2025 in Rotterdam last week.
Developed in collaboration with KTH Royal Institute of Technology (Sweden) and Qatar Computing Research Institute (QCRI), SawtArabi benchmark is the first Arabic-English TTS corpus to address dialectal and multilingual code-switching TTS in the Arabic language.
The four-hour evaluation dataset includes recordings in the Egyptian Arabic dialect, Modern Standard Arabic (MSA), English, and Egyptian-English code-switching from a professional 34-year-old male speaker, representing the first work to address dialectal and code-switching TTS in Arabic language contexts.
Researchers modified the widely-used espeak-ng phonemizer to handle Arabic text irregularities, resolving issues such as:
Tāʾ marbūṭa pronunciation (an Arabic letter that appears only at the end of nouns and adjectives to indicate the feminine gender);
Hamzat Al-Wasl (an Arabic character found in specific nouns and verbs) in definite articles; and
Shaddah gemination representation that previously caused mispronunciations in Arabic speech synthesis (Shaddah is a diacritical mark that creates a double consonant)
The dataset addresses code-switching challenges common in everyday Arabic conversations where speakers blend Arabic with English. This previously was a significant gap in multilingual speech technology development for Arabic speaking markets.
Extensive subjective evaluations using Mean Opinion Score methodology with 25 listeners proficient in different Arabic dialects validated the effectiveness of the modified phonemizer, showing consistent improvements over standard implementations across all evaluation criteria.
The SawtArabi corpus, modified espeak-ng phonemizer, and baseline checkpoints are available for public access, supporting broader Arabic speech technology research and development across academic and commercial applications.
The SawtArabi research team includes: Vasista Sai Lodagala (HUMAIN) , Lamya Alkanhal (SDAIA), Daniel Izham (HUMAIN), Shivam Mehta (3KTH Royal Institute of Technology, Sweden), Shammur Chowdhury (4QCRI, Qatar), Aqeelah Makki (HUMAIN), Hamdy S. Hussein (QCRI), Gustav Eje Henter (3KTH), Ahmed Ali (HUMAIN).
HUMAN and SDAIA also presented the following research projects at Interspeech 2025:
CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
Towards a Unified Benchmark for Arabic Pronunciation Assessment: Qur’anic Recitation as a Case Study
ZOOM OUT - One of the reasons that Arabic language AI development has lagged behind English and other major languages are its unique linguistic challenges. The language has complex vowelisation systems, diverse dialectal variations, while the number of high-quality datasets remains limited. Arabic language AI speech platforms to-date offer understandable pronunciation, but not necessarily linguistically correct dialectical speech that Arabic speakers are familiar with. If conversational AI in the Arabic language is to become significant in commerce, finance and public services, benchmarks such as SawtArabi will be crucial for developers of Arabic speech models.
[Written and edited with the assistance of AI]
LINKS
SawtArabi research paper - PDF download (ISCA)
SawtArabi datasets (Hugging Face)
Read more about Arabic language AI benchmarks:
Abu Dhabi's TII releases new Arabic STEM AI benchmark (Middle East AI News)
Inception & MBZUAI launch new Arabic LLM leaderboard (Middle East AI News)
MBZUAI launches multimodal Arabic AI benchmark (Middle East AI News)
Arabic LLM index launched at GAIN (Middle East AI News)
Hugging Face introduces Open Arabic LLM Leaderboard (Middle East AI News)