Hugging Face introduces new Open Arabic LLM Leaderboard
New leaderboard evaluates and compares the performance of Arabic LLMs
#Arabic #LLMs - After months of planning and development, the Open Arabic LLM Leaderboard (OALL) has been released on Hugging Face, the data science platform and open-source AI community that strives to advance and democratise artificial intelligence. The new leaderboard provides a platform specifically for evaluating and comparing the performance of Arabic large language models (LLMs), thus promoting research and development in Arabic natural language processing (NLP). The OALL was developed by a group of AI researchers from the community-driven Arabic AI Initiative (2A2I), with the support of Abu Dhabi-based Technology Innovation Institute (TII), together with Hugging Face.
SO WHAT? - The public release of OpenAI's ChatGPT in late 2022 inspired a huge upswing in large language model development across the world, but the majority of the most well-funded projects have developed models centred on English language content and usage. Therefore evaluation and benchmarking has also been heavily skewed towards English language models.
Development of Arabic models in particular has been limited, due to the scarcity of Arabic language content, the fragmented distribution of top Arabic data science developers, and the lack of community resources. The new Open Arabic LLM Leaderboard focuses on evaluating and benchmarking Arabic models that prioritise the language, culture and heritage. In doing so, the leaderboard will help developers see how their Arabic models compare with competitors and help to increase the overall level of education about Arabic LLMs.
Some key details about Hugging Face and the new Open Arabic LLM Leaderboard:
The Open Arabic LLM Leaderboard (OALL) went live on the Hugging Face community website this week to serve as a key platform for benchmarking Arabic large language models (LLMs), helping to address the resource gap for Arabic NLP (natural language processing).
The new leaderboard was developed by a team of researchers and developers from the community-driven Arabic AI Initiative (2A2I), Technology Innovation Institute (TII), together with Hugging Face.
The OALL has been designed to address the growing need for specialised benchmarks for Arabic language processing. The platform specifically focuses on evaluating and comparing the performance of Arabic large language models.
The leaderboard will empower Arabic LLM developers to accurately evaluate and improve their Arabic models. It will play a key role in helping to evaluate the nuances of the Arabic language, culture and heritage present in LLMs.
The Open Arabic LLM Leaderboard is built around HuggingFace’s LightEval, a lightweight framework designed to streamline the evaluation process.
To ensure comprehensive model evaluation, the OALL draws on an extensive and diverse collection of datasets including TII-supported AlGhafa benchmark with 11 native Arabic datasets, and the ACVA benchmark, which features 58 datasets via an AceGPT paper by FreedomIntelligence. Additionally, translated versions of MMLU, EXAMS and other benchmarks were used from AceGPT (50 datasets) and from TII (including 11 translated datasets).
The leaderboard primarily uses 'normalised log likelihood accuracy' for all evaluation tasks. This metric was chosen for its ability to provide a clear and fair measurement of model performance across different types of questions.
Hugging Face provides open-source libraries containing pre-trained models, an AI community and evaluation platforms.
ZOOM OUT - The Arabic speaking world has long suffered from the limited volume of Arabic language content, applications and development tools. Whilst there are some 380 million Arabic speakers around the world, Arabic digital content remains scarce and LLM developers struggle to obtain the volume of quality Arabic language content that they need for pre-training.
The fast rise of generative AI and the dominance of English language in these new technologies, risks leaving Arabic digital development far behind, as software solution developers and data scientists are deprived of high quality Arabic platforms. For this reason, having a dedicated platform for the evaluation and comparison of Arabic language models, could provide a huge incentive for developers to overcome challenges and focus on quality.
IMO - It is still early days for Arabic large language model development. We have already seem some notable new Arabic models developed over the past year or so, but both developers and end-users have found it difficult to compare and evaluate them.
Following the example of trailblazers such as G42's Jais Arabic and AceGPT Arabic, we can expect to see more Arabic models developed during the next 12-18 months. The Open Arabic LLM Leaderboard should provide an invaluable platform to list, evaluate and compare all models within one framework. Although any leaderboard has limitations in terms of what it evaluates, the layer of transparency it provides can only be good for investors, researchers, developers and end-users.
LINKS
Hugging Face Open Arabic LLM Leaderboard (Hugging Face)
Introducing the Open Arabic LLM Leaderboard (Hugging Face blog)
Introducing the Open Arabic LLM Leaderboard (TII, LinkedIn)
Read more about Arabic large language models:
First LLM trained exclusively on Saudi data sets (Middle East AI News)
Will GenAI champion the Arabic language? (Middle East AI News)