Khalifa University announces new telecom AI model benchmarks
Khalifa University contributes to global telecoms LLM standards
#UAE #telecom - The GSMA Foundry community of global telecom industry association GSMA has launched GSMA Open-Telco LLM Benchmarks 2.0 on the Hugging Face AI community platform. Announced this week by Abu Dhabi research university Khalifa University, the new benchmark evaluates large language model performance on real-world telecommunications use cases including network configuration, troubleshooting and standards interpretation. The initiative involves 15 mobile network operators including AT&T, China Telecom, Deutsche Telekom, Orange, Telefónica, Vodafone and du.
The benchmarks rigorously assesses AI models on telecom-specific knowledge including 5G core network configuration generation, root-cause analysis from network logs, mathematical reasoning for engineering calculations, and standards comprehension. Results show general-purpose frontier models including GPT-5, Grok-4-fast and Claude Sonnet 4.5 lead overall performance, whilst domain-tuned models demonstrate competitive results on specialised tasks.
SO WHAT? - The framework addresses a critical gap in evaluating whether AI models possess the deep domain expertise required for mission-critical network operations, where errors can cost time, money and service quality. Despite the fact that the telecom sector is investing billions of dollars in AI, there remains a shortfall in the telecommunications knowledge of Generative AI models. Systematic, industry-wide benchmarks can enable telecommunications companies to make better, evidence-based decisions about deploying AI for network automation rather than relying on vendor claims or generic model capabilities:
Here are some key points about the new telco LLM benchmarks:
The GSMA Foundry community of global telecom industry association GSMA has launched GSMA Open-Telco LLM Benchmarks 2.0 on the Hugging Face AI community platform. This follows the initial release of the benchmark in February.
Announced this week by Abu Dhabi research university Khalifa University, the new benchmark evaluates large language model performance on real-world telecommunications. It includes 34 use cases summitted by telecom operators, spanning eight strategic domains including RAN optimisation, forecasting, customer support and knowledge retrieval.
The benchmark leverages five complementary dimensions to assess AI models, TeleYAML (intent generation), TeleLogs (network troubleshooting), TeleMATH (mathematical reasoning), 3GPP-TSG (standard comprehension), and TeleQnA (domain question answering).
Fifteen mobile network operators were involved in the Open-Telco LLM development project including AT&T, China Telecom, Deutsche Telekom, Orange, Telefónica, Vodafone and the UAE’s du. Ther was also participation from research institutions including University of Texas, Queen’s University and Universitat Pompeu Fabra, plus technology companies including The Linux Foundation and Huawei GTS.
The Open-Telco LLM initiative falls under the GSMA Foundry’s Network Management & Configuration track, which is co-led by Khalifa University’s 6G Research Centre. The group developed the TeleYAML dataset, the intent-to-configuration benchmark that translates operator intents into standards-aligned YAML for 5G Core functions, subscriber provisioning, and network slicing.
The two dedicated working groups under the Open-Telco LLM initiative focus on network management and configuration, co-led by Khalifa University, and network troubleshooting, co-led by US telecommunications operator AT&T and Chinese technology company Huawei.
The TeleLogs dataset assesses root-cause analysis capabilities using synthetic yet realistic datasets seeded from real network traces, measuring models’ ability to interpret complex telemetry data, correlate symptoms with underlying causes and support autonomous decision-making during network incidents.
The TeleQnA dataset provides 10,000 multiple-choice questions covering telecommunications terminology, research trends and technical details from standards bodies including IEEE and 3GPP. TeleMath evaluates quantitative reasoning through 500 expert-curated telecom-specific mathematical problems.
Benchmark results to-date
Benchmark results show GPT-5 achieving the highest overall score of 65.55%, leading across most benchmarks including network troubleshooting (80%), mathematical reasoning (70.27%) and domain question answering (82.51%), followed by Grok-4-fast (61.52%) and Claude Sonnet 4.5 (60.64%).
Domain-specific fine-tuned models demonstrated competitive performance on targeted tasks, with AT&T’s customised Gemma model leading all systems on network troubleshooting scenarios, whilst TSLAM-18B approached frontier model performance on mathematical reasoning and standards comprehension.
Performance on intent-to-configuration tasks remained relatively low across all models, with even top systems scoring below 28%, highlighting ongoing challenges in translating natural language intents into valid, standards-compliant configurations required for network automation workflows.
ZOOM OUT - The initiative builds upon Open-Telco LLM Benchmarks 1.0 launched by GSMA in February 2025 ahead of the Mobile World Congress, which established the first systematic evaluation framework for telecom-specific AI performance. The framework was developed by Hugging Face, Khalifa University and The Linux Foundation with operators including Deutsche Telekom, LG Uplus, SK Telecom and Turkcell. The original framework consisted of four benchmark tests: TeleQnA for domain knowledge, 3GPPTdocs Classification for standards comprehension, MATH500 for mathematical reasoning, and FOLIO for logic assessment.
[Written and edited with the assistance of AI]
LINKS
GSMA Open-Telco LLM Benchmarks (Hugging Face)
Read the researchers article (Hugging Face)
GSMA Foundry (GSMA)
Read more about telecom AI Models
Telecom industry partners develop Arabic Telecom LLM (Middle East AI News)
GSMA releases telecom AI benchmarks (Middle East AI News)
New leaderboard to support development of telecom LLMs (Middle East AI News)
Testing phase begins for TelecomGPT (Middle East AI News)
Abu Dhabi researchers create first-of-its-kind telecom LLM (Middle East AI News)