Abu Dhabi's TII releases new Arabic STEM AI benchmark
New TII benchmark addresses critical Arabic language gaps in STEM and coding
#UAE #ArtificialIntelligence - Technology Innovation Institute (TII), the global applied research centre of Advanced Technology Research Council (ATRC), has released 3LM (علم),, a comprehensive benchmark designed to evaluate Arabic large language models (LLMs) on STEM reasoning and code generation capabilities. The open-source benchmark addresses a critical gap in Arabic natural language processing by providing structured evaluation tools for science and coding applications. The evaluation suite includes 865 native Arabic multiple-choice questions extracted from real educational materials, 1,744 synthetically generated STEM questions, and fully translated coding benchmarks covering mathematics, physics, chemistry, biology and programming domains.
SO WHAT? - Despite 380 million Arabic speakers worldwide, the language remains underrepresented in artificial intelligence development. There has been a recent surge in the development of proprietary and opensource Arabic GenAI models, however existing Arabic benchmarks focusing primarily on cultural or linguistic content rather than technical domains. The 3LM benchmark addresses this critical shortage by providing the first native, scientifically grounded evaluation framework for Arabic AI models in STEM fields, potentially incentivising developers to prioritise quality Arabic language model development.
Here are some key points about the new 3LM Arabic benchmark:
Technology Innovation Institute (TII) researchers have developed and released 3LM (علم) ), a comprehensive Arabic benchmark specifically targeting STEM reasoning and code generation evaluation for large language models (LLMs).
The benchmark suite comprises three components:
Native STEM: 865 native Arabic multiple-choice questions from real textbooks.
Synthetic STEM: 1,744 synthetically generated STEM questions.
Arabic code benchmarks: HumanEval+ and MBPP+ fully translated and verified coding assessments in Arabic.
Research team evaluated over 40 state-of-the-art models including Arabic-specific, multilingual and bilingual systems to establish comprehensive performance baselines across scientific domains.
Native STEM questions span mathematics, physics, chemistry, biology and general science, sourced from Arabic educational materials across multiple countries and regions.
Code generation benchmarks include carefully translated versions of established HumanEval+ and MBPP+ assessments with human-in-the-loop validation processes ensuring translation quality.
All three benchmarks are released as fully open-source resources with complete datasets, evaluation code and documentation to support reproducible research.
The evaluation revealed insights into cross-task correlations, robustness testing and relationships between different cognitive capabilities in Arabic language models.
The 3LM research team consists of Basma El Amel Boussaha, Leen AlQadi, Mugariya Farooq, Shaikha Alsuwaidi, Giulia Campesan, Ahmed Alzubaidi, Mohammed Alyafeai, and Hakim Hacid.
Technology Innovation Institute serves as the applied research pillar of Abu Dhabi's Advanced Technology Research Council (ATRC), driving the UAE’s research and development strategy.
ZOOM OUT - Last year Technology Innovation Institute and open-source platform Hugging Face supported the development of the Open Arabic LLM Leaderboard (OALL) by a group of AI researchers from the community-driven Arabic AI Initiative (2A2I). The leaderboard provides a platform specifically for evaluating and comparing the performance of Arabic large language models (LLMs), thus promoting research and development in Arabic natural language processing (NLP).
[Written and edited with the assistance of AI]
LINKS
3LM Arabic benchmark data (Hugging Face)
3LM Arabic benchmark code (Github)
Read more about Technology Innovation Institute’s LLM research
Falcon 3 LLM series gets first Arabic model (Middle East AI News)
TII releases Falcon-Edge 1.58bit language models (Middle East AI News)
TII launches most powerful SLMs under 13B parameters (Middle East AI News)
TII launches Falcon's first SSLM (Middle East AI News)
TII debuts multimodal Falcon 2 Series (Middle East AI News)