Arabic LLM index launched at GAIN

New Saudi-developed Balsam Index to measure Arabic LLMs

Sep 13, 2024

Balsam Index launch panel discussion at GAIN (Image credit: SPA)

#Saudi #LLMs - King Salman Global Academy for Arabic Language (KSGAAL) and Saudi Data and Artificial Intelligence Authority (SDAIA) have launched the Balsam Index, an Arabic language index to evaluate and benchmark AI models. Developed by the academy’s Arabic Intelligence Centre, the index provides a set of specialised data that can be used to assess large language models (LLMs) and a variety of other natural language processing (NLP) models. The evaluation model already has 50,000 questions that can be used to test Arabic models.

In its current form the evaluation model will test the Arabic language text produced by AI models, but the potential is there to develop Balsam further in order to have the capability to test Arabic AI voice pronunciation and even sign language used by the deaf and hard of hearing.

SO WHAT? - There are now an increasing number of Arabic-centric large language models in development across the Arab world and beyond, which all face numerous challenges that are specific to Arabic language development. Developers of natural language processing systems for Arabic, must deal with a wide variety of linguistic differences across different forms of formal and colloquial Arabic. There is also a well-known shortage of Arabic digital data, in particular for specialised domains and colloquial usage.

Therefore, enhancements made to Arabic AI models must be made across a wide variety of factors, according to the amount of quality data that can be acquired to enhance any particular facet of performance. The upshot of this is that different Arabic-centric models improve in different ways and many important improvements may be nuanced and so difficult to measure. The development of benchmarks to measure Arabic language LLM quality and performance is going to be vital if Arabic models are going to have comparable utility and performance to English models.

Here are the key points about the launch of the Balsam Index:

King Salman Global Academy for Arabic Language (KSGAAL) and Saudi Data and Artificial Intelligence Authority (SDAIA) have announced the Balsam Index, to evaluate and measure Arabic large language models (LLMs).
The Balsam Index launch took place on the third day of the Global AI Summit (GAIN) in Riyadh.
The new index is part of SDAIA's and its strategic partners' efforts to develop and improve advanced Arabic LLMs, and contribute to the development of global standards for measuring the performance of models executing tasks in Arabic.
The Balsam Index includes more than 1,400 data sets, consisting of 50,000 test questions, covering 67 diverse tasks, including grammar and spell-checking, rephrasing, cause and effect classification, and text understanding.
SDAIA and KSGAAL, also created a AI dictionary and glossary. The glossary compiles the most important technical terms related to data and AI in both Arabic and English.
In addition to SDAIA and KSGAAL, a variety of partners contributed to the development of Balsam including: aiXplain, King Abdulaziz University, King Saud University, NYU Abu Dhabi, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Qatar Computing Research Institute (QCRI), Qatar University and University of Bisha.
The Balsam project also aims to strengthen research collaboration between research groups globally in the field of Arabic AI.

Middle East AI News

Discussion about this post

Ready for more?

Middle East AI News

Arabic LLM index launched at GAIN

New Saudi-developed Balsam Index to measure Arabic LLMs

LINKS

Discussion about this post

Ready for more?