MBZUAI, Inception launch enhanced Nanda Hindi LLM
Open-source model targets India’s 600 million Hindi speakers
#UAE #LLMs - Abu Dhabi AI powerhouse G42 has released Nanda 87B, an upgraded version of its open-source Hindi-English large language model Llama-3-Nanda developed by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in collaboration with Inception and California-based AI infrastructure provider Cerebras. The 87-billion parameter model, built upon Meta’s open-source Llama-3.1 70B. Trained on over 65 billion Hindi language data tokens, developers built a custom Hindi-centric tokeniser that reduced training and inference time, whilst delivering fluency across formal Hindi, casual speech and Hinglish. Available as an open-weight model via AI community Hugging Face, Nanda 87B targets India’s 600 million Hindi speakers and fast growing digital economy.
SO WHAT? - The release of Nanda 87B reinforces G42’s and the government-backed Abu Dhabi AI ecosystem’s commitment to advance AI model development for underserved world languages with non-Latin scripts. With its origins in UAE-India bilateral agreements, Nanda aims to bridge a significant gap in AI language capabilities for one of the world’s largest linguistic communities. By making the model available as open-weight software, G42 enables Indian developers, enterprises and content creators to build locally-relevant applications without licensing restrictions or dependence on proprietary platforms.
Here are some key points about the release of the new Nanda version:
G42 has released Nanda 87B, a major upgrade to its open-source Hindi-English large language model, developed by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in collaboration with Inception and California-based AI infrastructure provider Cerebras. .
The 87-billion parameter model is built upon Meta’s Llama-3.1 70B and trained on a curated Hindi-English dataset containing over 65 billion Hindi tokens, making it the largest and one of the most capable Hindi-centric models available in open weights.
Developers built a custom Hindi-centric tokeniser that allowed them to boost efficiency by reducing both training and inference time, whilst delivering fluency across formal Hindi written in Devanagari script, casual conversational speech and Hinglish code-switching between Hindi and English.
The model demonstrates strong performance across translation, summarisation, instruction-following and transliteration tasks, with safety and cultural alignment integrated into its design to generate context-aware, responsible responses appropriate for Indian cultural contexts.
The model was trained on Condor Galaxy, one of the world’s most powerful AI supercomputers for training and inference built by G42 and Cerebras, and is now available as an open-weight model via Hugging Face.
India represents a crucial market for AI innovation with over 600 million Hindi speakers and one of the world’s fastest-growing digital economies, where over 80% of new internet users prefer local languages over English for digital interactions.
The release follows G42’s first Nanda model announced in 2024, with the upgraded version representing a substantial advancement in scale and capabilities as G42 continues to expand its operations across India’s technology sector.
Inception and MBZUAI have developed three bilingual large language models designed to ensure language accessibility: Jais for Arabic, Nanda for Hindi and Sherkala for Kazakh, reflecting the company’s commitment to creating positive societal impact by removing language barriers to innovation.
ZOOM OUT - The first version of the Hindi large language model Nanda was launched in September 2024, a 13-billion parameter model trained on 2.13 trillion tokens. The development of a Hindi LLM was originally added to a bilateral agreement on digital infrastructure between India’s Ministry of Electronics and Information Technology and the UAE’s Ministry of Investment earlier in 2024. The Nanda launch was to become G42 group’s first significant expansion of its language AI capabilities beyond Arabic, coming just over one year after the successful release of the Jais Arabic large language model family. Aligned with India’s Digital India and Startup India initiatives, Nanda offering an openly accessible model with code for a market of over 500 million Hindi speakers.
[Written and edited with the assistance of AI]
LINKS
MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat (Hugging Face)
Read more about G42-MBZUAI language models:
Inception, Cerebras and MBZUAI release Jais 2 Arabic LLM (Middle East AI News)
Inception & MBZUAI unveil Kazakh LLM (Middle East AI News)
Inception launches new JAIS Chat mobile app (Middle East AI News)
MBZUAI open-sources NANDA LLM (Middle East AI News)
G42 launches new Hindi LLM in Mumbai (Middle East AI News)


