AtlasIA releases smarter, faster Moroccan darija AI models
AtlasIA's open-source Terjman v2 LLM family bridges critical language gap
#Morocco #LLMs - AtlasIA, a non-profit, community-focused on developing Moroccan-centric AI, has released a new open-source family of Terjman large language models (LLMs). The four models are trained to translate between English and the Moroccan Arabic language dialect (or darija) and improve on the Terjman version one models released in May and June 2024. The new Terjman V2 models are trained on larger, better datasets and provide improved efficiency and accuracy. All the latest models are available via open-source community Hugging Face, together with a chat preview window.
SO WHAT? - Although Modern Standard Arabic (MSA) is relatively well-documented with a volume of data available to train LLMs, much less data is available for regional Arabic dialects. Therefore, AI models trained on Arabic language data can provide translations and generate text for written use, but they struggle with translations of spoken Arabic or colloquial Arabic when used in chat or messaging. The ability to understanding and translating Arabic dialects, or darija, is particularly important for models supporting call centres, CRM (customer relationship management) and customer service. AtlasIA is focused on overcoming the challenge of scarce data resources for Moroccan darija and buiding accurate, high performance models.
AtlasIA, a non-profit, community-focused on developing Moroccan-centric AI, has released new versions for its open-source family of Terjman large language models. The models are trained to translate between English and the Moroccan Arabic language dialect (also known as darija).
Four new Moroccan Arabic models have been released via open-source AI community Hugging Face:
Terjman v2-Nano (77 million parameters)
Terjman v2-Large (240 million parameters)
Terjman v2-Ultra (1.3 billion parameters)
Terjman v2-Supreme (3.3 billion parameters)
The new v2 models use a powerful transformer architecture, have been trained on a larger, more refined dataset and optimised to improve translation performance.
According to AtlasIA’s evaluation using TerjamaBench benchmark, the Terjman v2 model achieves results on par with OpenAI’s GPT-4o (gpt-4o-2024-08-06).
Meanwhile, the new 77 million parameter Terjman v2-Nano outperforms the 1.3 billion parameter Terjman v1 Ultra (2024) using the TerjamaBench evaluation, for which AtlasIA credits improvements in both data and optimisation
AtlasIA has also released a chat demo to showcase the abilities of all four Terjman v2 open-source models (see LINKS below).
AtlasIA released its first family of Terman large language models in May and June 2024.
AtlasIA is also working on Atlas-Chat together with MBZUAI France Lab, the Paris-based lab of Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). The collaboration released its first open-source model in September.
ZOOM OUT - AtlasIA has also worked together with MBZUAI France Lab to develop the Atlas-Chat family of large language models. First released last September, AtlasIA and MBZUAI are working on new versions for Atlas-Chat for 2025. Whilst there are a number of projects formed by various developers that aim to make AI more accessible for darija speakers, there are many challenges. The Arabic language is considered by developers to be a low-resource language and training data resources for local dialects are even scarcer. Meanwhile, there are an estimated 40 million speakers of the Moroccan colloquial Arabic alone.
[ 100% human written and edited ]
LINKS
Terjman v2-Nano 77M (Hugging Face)
Terjman v2-Large 240M (Hugging Face)
Terjman v2-Ultra 1.3B (Hugging Face)
Terjman v2-Supreme 3.3B (Hugging Face)
Tarjman v2 chat demo (Hugging Face)
Read more about North African Arabic language LLM development:
Atlas-Chat 9B demo goes live (Middle East AI News)
MBZUAI-led research team builds Moroccan AI models (Middle East AI News)
Algerian AI researchers crowdsource local language data (Middle East AI News)
Huawei reveals 100B Arabic LLM (Middle East AI News)
SDAIA's Arabic LLM now live on watsonx (Middle East AI News)
Hugging Face introduces Open Arabic LLM Leaderboard (Middle East AI News)