Qatar’s national AI platform's powerful upgrade explained
Research paper confirms major capability leap for Qatar’s Fanar Arabic AI platform
#Qatar #LLMs – Qatar Computing Research Institute (QCRI) at Doha-based Hamad Bin Khalifa University (HBKU) has published a research paper confirming the capabilities of Fanar 2.0, the second generation of Qatar’s sovereign Arabic-centric generative AI platform. According to the new paper, Fanar 2.0 delivers significant benchmark improvements over its predecessor despite using approximately eight times fewer training tokens. Launched at the second World Summit AI in Doha in December 2025, the model was built by a domestic team on 256 NVIDIA H100 GPUs with no dependency on external AI providers, The Fanar 2.0 platform covers the full generative AI spectrum for Arabic, including language, speech, vision, Islamic knowledge, poetry, translation and agentic reasoning, making it one of the most comprehensive sovereign Arabic AI stacks in existence.
SO WHAT? – Fanar 2.0 is a direct challenge to the assumption that competitive AI requires vast compute budgets and access to foreign infrastructure. The persistent problem for Arabic language model developers is the availability of training data. Despite having over 400 million native speakers, Arabic represents just 0.5 percent of web content. QCRI’s solution is to focus on data quality rather than scale and has built a platform that outperforms its predecessor across every benchmark. The model development strategy has implications for model builders far beyond Qatar.
Here are some key points from new data made available on Fanar 2.0:
Fanar 2.0 was designed, built and is operated entirely at Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University (HBKU), with no dependency on external AI providers. Sovereignty is described in the research paper as a “first-class design principle” rather than a policy aspiration, giving Qatar full control over data governance and culturally sensitive components.
The platform’s core language model, Fanar-27B, is a 27-billion parameter transformer built through continual pre-training of the open-weight Gemma-3-27B backbone on approximately 120 billion carefully curated tokens. Around eight times fewer tokens than were used to train Fanar 1.0, yet delivering consistently better results across all benchmarks.
Benchmark improvements over Fanar 1.0 include a 9.1-point gain in Arabic world knowledge, a 7.3-point gain in general Arabic comprehension, a 7.6-point gain in English capability and a 3.5-point gain in dialectal Arabic comprehension — a significant across-the-board improvement achieved with a fraction of the compute used by frontier model providers.
The entire Fanar 2.0 development effort ran on 256 NVIDIA H100 GPUs (32 nodes of eight GPUs each), which is a fraction of the compute available to the world’s leading AI labs, making the benchmark gains all the more notable as a demonstration of resource-constrained sovereign AI development.
FanarGuard, a new 4-billion parameter bilingual moderation filter, was trained on 468,000 annotated Arabic and English prompt-response pairs. It achieves state-of-the-art Arabic safety and cultural alignment performance at a fraction of the parameter cost of competing systems.
Fanar-Sadiq, the platform’s Islamic AI component, has been upgraded from a single-pipeline system to a multi-agent architecture. Specialised handlers hav been developed for Fiqh reasoning, Quranic retrieval, zakat and inheritance calculations, prayer times and the Hijri calendar. The platform has been designed to play a role as the liturgical language AI for over two billion Muslims worldwide.
New speech capabilities include Aura-STT-LF, the first Arabic-centric bilingual long-form speech recognition model capable of processing hours-long recordings with speaker-change handling. QCRI has also released Aura-STT-BenchLF, the first publicly available Arabic long-form speech recognition benchmark.
Additional new components include Fanar-Diwan for classical Arabic poetry generation, FanarShaheen for LLM-powered bilingual Arabic-English translation, and Oryx-IVU for Arabic-aware image and video understanding — together covering modalities that most Arabic AI efforts have not yet addressed.
ISLAMIC KNOWLEDGE - Of all Fanar 2.0’s specialised components, Fanar-Sadiq may be the most culturally significant. It is a multi-agent Islamic knowledge system that routes queries across nine specialised handlers, combining neural retrieval, symbolic reasoning and deterministic validation to deliver grounded, citation-backed answers. It is already deployed in production on IslamWeb and IslamOnline, two of the world’s largest Islamic information platforms, where it has processed millions of queries.
Fanar-Sadiq handles nine distinct Islamic query types through dedicated specialist agents: Fiqh jurisprudential reasoning, Quranic verse retrieval, Hadith verification across more than 51,000 Hadith, supplication lookup, zakat calculation, inheritance distribution, prayer times, Qibla direction and Hijri calendar conversion.
A dedicated Quranic text validation pipeline detects any Quranic content in generated responses and automatically replaces it with verified canonical verses, using pattern detection, fuzzy matching and reference verification to eliminate the risk of misquotation. This is critical safeguard given the religious significance of precise Quranic text.
The system’s hybrid routing classifier, tested against 705 real user queries from production logs, achieved 90.1% accuracy in correctly identifying query intent and directing it to the right specialist agent. According to the researchers, this outperforms standard large language model baselines and demonstrating reliable performance across the full range of Islamic query types.
Fanar-Sadiq addresses a well-documented failure mode of general-purpose AI: hallucination of religious sources. By separating retrieval, reasoning and validation into distinct processes, and requiring structured outputs that include rulings, evidence, explanations and citation tags, the system is designed to meet the standard of accuracy that users of Islamic knowledge platforms expect.
ZOOM OUT – The new research paper also hints at what is yet to come. Researchers expect Fanar 3.0 to move away from continual pre-training on an external model backbone and instead train a new architecture from scratch using a Mixture-of-Experts design. This could prove to be a more efficient approach that delivers greater model capacity without proportional increases in inference cost. The team also acknowledges that the quality-over-quantity strategy has limits, and that a much larger, systematically curated Arabic corpus spanning diverse dialects, domains and registers will be essential. Multi-turn safety is also flagged as a top priority, in order to ensure the model stays culturally and religiously aligned across extended conversations and resists gradual manipulation. QCRI’s stated ambition is a shift from a resource-efficient sovereign stack to a genuinely frontier Arabic AI platform.
[Written and edited with the assistance of AI]
LINKS
Fanar LLM (website)
Read more about Fanar LLM:
Qatar announces Fanar 2.0 Arabic AI model (Middle East AI News)
Qatar launches Fanar sovereign large language model (Middle East AI News)


