#Egypt #LLMs - Huawei Cloud became the first global cloud services company to launch a public cloud in Egypt, with an announcement made at the Huawei Cloud Egypt Summit in Cairo. The Chinese technology giant also announced a new Huawei-developed Arabic large language model (LLM) with Arabic speech recognition. Trained with Modern Standard Arabic language (fusha) data, the 100 billion parameter multi-modal model is the largest Arabic language-focused LLM to be announced to-date.
SO WHAT? - Huawei's new Arabic LLM is the first specialised Arabic language AI model announced by a global technology vendor. The Arabic LLM is a 100 billion parameter model based on PanGu, a large language model originally trained on Chinese language data by Huawei and Recurrent AI. Although the model is not open-source, it will be made available to customers via Huawei Cloud services.
The number of big, high quality Arabic language-capable LLMs has been strictly limited, so even the announcement of one new model such as this, could have a significant impact.
Here are a few details announced by Huawei:
The PanGu 100+ billion parameter Arabic large language model was pre-trained on Modern Standard Arabic data sets and so can be used across the Arab world.
The model is based on the PanGu large language model, originally developed by Huawei and Recurrent AI, and announced in 2021 as the world’s largest pre-trained Chinese language LLM.
The Pangu Arabic 100B has Arabic automatic speech recognition (ASR), which has show 96 percent accuracy in tests, according to Huawei.
The new Arabic LLM was been trained on Arabic data from across the Arab World to allow it to understand and reference data about local culture, history, customs and other region-specific knowledge. It was also trained on industry data including oil and gas, and financial services.
Huawei's Arabic PanGu model will allow enterprise customers to more easily create their own models in local Arabic dialects, and for a variety of industry domains such as education and banking. Huawei Cloud Stack makes it easy for customers to deploy on-premises large AI models.
The LLM announcement was made during the launch of Huawei Cloud's Cairo region, covering 28 African countries. It is the cloud services company's 33rd region, globally, and offers AI platforms, data platforms, and development platforms.
The Arabic LLM will be provided as a service via Huawei Cloud's new data centre in Cairo.
ZOOM OUT - High performance large language models with a high quality Arabic language capability have been slow to arrive, compared with model development by global technology companies focused on Western languages. Abu Dhabi-based G42 group open-sourced its 13 billion parameter Jais Arabic capable large language model in August last year, followed by Jais 30B in November. Earlier this year a collective of Arab and Chinese researchers open-sourced AceGPT, as 7 billion and 13 billion parameter models. Other specialised Arabic models developed are proprietary, built to serve the particular needs of the developer.
Meanwhile, global models (notably Meta’s Llama) offer Arabic language capabilities but lack the cultural context, Arabic dialects and Arabic languages nuances that specialised Arabic models can offer. The Huawei PanGu Arabic LLM arrives while the field is still quite open, although more specialised Arabic language models are now in development.
IMO - Huawei’s launch of its new 100B+ LLM as a cloud service will certainly attract the interest of enterprise customers in the region. However, many questions remain. The model is being made available via Huawei Cloud Cairo region, which has been built to serve Africa. This being the case, likely first users of the PanGu Arabic language LLM are organisations in Egypt, plus the countries in the Maghreb, which have very different Arabic dialects to Egypt. This model will also be available for Huawei customers to customise and build on, but details of licencing have not yet been shared publically.
Read more about Arabic large language models:
SDAIA's Arabic LLM now live on watsonx (Middle East AI News)
New Hugging Face Open Arabic LLM Leaderboard (Middle East AI News)
First LLM trained exclusively on Saudi data sets (Middle East AI News)
Will GenAI champion the Arabic language? (Middle East AI News)