GSMA releases telecom AI benchmarks ahead of MCW 2025
New telecom benchmarks test LLMs for real-world performance and sustainability
#Spain #UAE #MWC25 - Global telecom industry association GSMA has launched the first benchmarking framework for real-world telecom use cases, to assess the performance of large language models (LLMs) in handling telecom-specific knowledge. As part of a new GSMA Open-Telco LLM Benchmarks initiative, the evaluation framework was developed by open-source community platform Hugging Face, Abu Dhabi’s Khalifa University and The Linux Foundation, in collaboration with global operators and vendors. The standardised AI benchmarking framework uses four benchmark tests to assess LLMs across a variety of criteria.
SO WHAT? - This benchmarking initiative addresses a critical gap in the rapidly evolving AI landscape. The telecom industry is investing billions of dollars in artificial intelligence, but current Generative AI models show significant deficiencies when handling telecommunications knowledge. Today’s models are unable to support telecoms jargon and standards, provide specialised telecom knowledge, or support telecom decision makers, engineers and researchers effectively. In real-world scenarios this means that current AI models can not yet be used by telecom operators to troubleshoot telecom network or service issues.
GSMA Foundry the innovation hub of the global telecom industry association GSMA has launched GSMA Open-Telco LLM Benchmarks, the first open-source community and benchmarking framework for the telecom industry. The announcement was made a few days in advance of the world’s largest and most influential connectivity event, the Mobile World Congress.
Recent tests have exposed significant limitations in AI models' telecom knowledge, with GPT-4 scoring less than 75% on TeleQnA and below 40% on 3GPP standards documentation classification tasks.
GSMA Open-Telco LLM Benchmarks will provide transparent, open evaluations of AI models across capabilities, energy efficiency, and safety parameters specifically tailored to telecom applications.
Meanwhile, the new community enables mobile network operators, AI researchers, and developers to submit use cases, datasets, and models for evaluation against real-world telecom challenges.
The evaluation framework, which consists of four specific benchmark tests (TeleQnA, 3GPPTdocs Classification, MATH500, and FOLIO) was developed by open-source community platform Hugging Face, Abu Dhabi’s Khalifa University and The Linux Foundation. Each test focuses on a different aspect of AI performance in telecom.
Other launch partners include technology companies and mobile operators such as Deutsche Telekom, LG Uplus, SK Telecom, Turkcell, and Huawei.
The four specific datasets for benchmark tests are:
TeleQnA – Telecom domain knowledge & technical understanding
3GPPTdocs Classification – Telecom sandards comprehension & documentation parsing
MATH500 – Mathematical reasoning & modeling
FOLIO – Logic & reasoning
The initiative builds upon last year's industry commitment to exploring ethical and sustainable telco AI use cases, including the GSMA's Responsible AI Maturity Roadmap.
The also introduces three new telecom-specific AI models:
TelBench (developed by SK Telecom) evaluates technical queries.
Telco-RAG is built for retrieval-augmented AI tasks for telecom.
TelecomGPT (developed by the 6G Centre of Khalifa University) is a first-of-its-kind telecom LLM.
Khalifa University work on the new GSMA benchmark initiative follows earlier work between the university and Huawei Paris Research Centre the GSMA to develop the first open leaderboard for telecom-focused large language models announced at the 6G Summit in Abu Dhabi last year.
Mobile network operators, vendors, startups, and researchers can now contribute by submitting interest and LLM telecoms use cases to aiusecase(at)gsma(dot)com.
ZOOM OUT - AI is expected to have a profound effect on the development, implementation and management of telecom networks. With organisations around the world currently working on the development of standards and technologies for 6G (sixth generation mobile networks), AI is expected to revolutionise mobile communications. Beyond conventional deep learning models, LLMs and other transformer models are likely to facilitiate more sophisticated and adaptive communication protocols, boosting potential network efficiency, resilience and intelligence. So, the ability to develop reliable, domain-specific models for the sector is vital to the successful development of 6G.
LINKS
GSMA Open-Telco LLM Benchmarks (Hugging Face)
Read more about telecom AI Models
New leaderboard to support development of telecom LLMs (Middle East AI News)
Testing phase begins for TelecomGPT (Middle East AI News)
Abu Dhabi researchers create first-of-its-kind telecom LLM (Middle East AI News)