Inception & MBZUAI share Arabic AI Leaderboards Space
Leaderboards to measure accuracy, instruction following, and usability
#UAE #LLMs - , G42’s applied research arm Inception, in collaboration with Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) have launched the Arabic Leaderboards Space on community platform Hugging Face, a dedicated evaluation hub for Arabic large language models (LLMs). Positioned as a go-to hub for developers to assess AI model performance, the space introduces two key leaderboards: AraGen, which evaluates generative Arabic AI models, and Arabic Instruction Following, which measures how well models respond to complex Arabic-language commands. The goal is to democratise AI evaluation tools, empowering researchers and developers to build more effective and reliable Arabic AI models with improved accuracy, usability, and ability to follow instructions.
SO WHAT? - Arabic AI models have lacked robust evaluation benchmarks, making it difficult to measure performance and drive innovation. However, a variety of Arabic benchmarks, evaluation datasets and leaderboards have been released over the past year, giving developers tools to assess their models. The initiative by Inception and MBZUAI to aggregate evaluation benchmarks and leaderboards in one space, is another solid step towards filling a critical gap in Arabic-centric AI development.
Here are some key points about the new Arabic Leaderboards Space:
Applied AI research lab Inception, in collaboration with Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) have released the Arabic Leaderboards Space on Hugging Face, a space that offers a number leaderboards for benchmarking Arabic large language models (LLMs).
The Arabic Leaderboards Space features two main leaderboards:
AraGen Leaderboard, which evaluates generative Arabic models based on accuracy, coherence, and usability.
Arabic Instruction Following Leaderboard, which assesses how well AI models follow complex Arabic instructions.
Launched last December, AraGen introduced the 3C3H Measure, a new evaluation metric for natural language generation (NLG) in Arabic.
The platform includes the Arabic IFEval dataset, designed to test Arabic-specific linguistic features, such as diacritization and contextual understanding.
Meanwhile, the 3C3H-HeatMap tool offers a detailed breakdown of model strengths and weaknesses, improving AI model transparency.
Inception and MBZUAI plan to add more benchmarks in the future, including visual question-answering and other real-world AI applications.
This initiative aims to democratise AI evaluation tools, making benchmarking resources widely available to researchers and developers.
The developers hope that by establishing standardised Arabic AI benchmarks, the Arabic Leaderboards Space will support greater AI adoption across government, business, and academia.
Researchers on this project include: Ali El Filali (Inception); Neha Sengupta (Inception); Arwa Abouelseoud (Inception); Sarah Albarri (Inception); Arwa Abouelseoud (Inception); and Preslav Nakov (MBZUAI).
ZOOM OUT - While Arabic AI development is now accelerating fast, Arabic AI models have lacked dedicated evaluation benchmarks—unlike English and Chinese LLMs, which benefit from well-established performance metrics. However, recent developments have significantly expanded Arabic AI benchmarking efforts. The Open Arabic LLM Leaderboard (OALL), launched on Hugging Face by 2A2I and the Technology Innovation Institute (TII), provides a community-driven platform for evaluating Arabic LLMs. Meanwhile, MBZUAI’s CAMEL-Bench has introduced a multimodal Arabic AI benchmark, covering areas such as OCR, medical imaging, and remote sensing, revealing performance gaps in even top-tier models.
In addition, the launch of AraGen (now found in the new Arabic Leaderboards Space by Inception, MBZUAI, and Hugging Face in December 2024, offers a holistic approach to evaluation. AraGen’s new 3C3H Measure introduced a comprehensive framework evaluating six dimensions: Correctness, Completeness, Conciseness, Helpfulness, Honesty, and Harmlessness.
[Written and edited with the assistance of AI]
LINKS
Arabic Leaderboards Space (Hugging Face)
Read more about Arabic LLM leaderboards:
New benchmark challenges inclusivity of global models (Middle East AI News)
Inception & MBZUAI launch new Arabic LLM leaderboard (Middle East AI News)
MBZUAI launches multimodal Arabic AI benchmark (Middle East AI News)
Hugging Face brings new Open Arabic LLM Leaderboard (Middle East AI News)