Khalifa University announces new telecom AI model benchmarks

Khalifa University contributes to global telecoms LLM standards

Oct 22, 2025

#UAE #telecom - The GSMA Foundry community of global telecom industry association GSMA has launched GSMA Open-Telco LLM Benchmarks 2.0 on the Hugging Face AI community platform. Announced this week by Abu Dhabi research university Khalifa University, the new benchmark evaluates large language model performance on real-world telecommunications use cases including network configuration, troubleshooting and standards interpretation. The initiative involves 15 mobile network operators including AT&T, China Telecom, Deutsche Telekom, Orange, Telefónica, Vodafone and du.

The benchmarks rigorously assesses AI models on telecom-specific knowledge including 5G core network configuration generation, root-cause analysis from network logs, mathematical reasoning for engineering calculations, and standards comprehension. Results show general-purpose frontier models including GPT-5, Grok-4-fast and Claude Sonnet 4.5 lead overall performance, whilst domain-tuned models demonstrate competitive results on specialised tasks.

SO WHAT? - The framework addresses a critical gap in evaluating whether AI models possess the deep domain expertise required for mission-critical network operations, where errors can cost time, money and service quality. Despite the fact that the telecom sector is investing billions of dollars in AI, there remains a shortfall in the telecommunications knowledge of Generative AI models. Systematic, industry-wide benchmarks can enable telecommunications companies to make better, evidence-based decisions about deploying AI for network automation rather than relying on vendor claims or generic model capabilities:

Here are some key points about the new telco LLM benchmarks:

The GSMA Foundry community of global telecom industry association GSMA has launched GSMA Open-Telco LLM Benchmarks 2.0 on the Hugging Face AI community platform. This follows the initial release of the benchmark in February.
Announced this week by Abu Dhabi research university Khalifa University, the new benchmark evaluates large language model performance on real-world telecommunications. It includes 34 use cases summitted by telecom operators, spanning eight strategic domains including RAN optimisation, forecasting, customer support and knowledge retrieval.
The benchmark leverages five complementary dimensions to assess AI models, TeleYAML (intent generation), TeleLogs (network troubleshooting), TeleMATH (mathematical reasoning), 3GPP-TSG (standard comprehension), and TeleQnA (domain question answering).
Fifteen mobile network operators were involved in the Open-Telco LLM development project including AT&T, China Telecom, Deutsche Telekom, Orange, Telefónica, Vodafone and the UAE’s du. Ther was also participation from research institutions including University of Texas, Queen’s University and Universitat Pompeu Fabra, plus technology companies including The Linux Foundation and Huawei GTS.
The Open-Telco LLM initiative falls under the GSMA Foundry’s Network Management & Configuration track, which is co-led by Khalifa University’s 6G Research Centre. The group developed the TeleYAML dataset, the intent-to-configuration benchmark that translates operator intents into standards-aligned YAML for 5G Core functions, subscriber provisioning, and network slicing.
The two dedicated working groups under the Open-Telco LLM initiative focus on network management and configuration, co-led by Khalifa University, and network troubleshooting, co-led by US telecommunications operator AT&T and Chinese technology company Huawei.
The TeleLogs dataset assesses root-cause analysis capabilities using synthetic yet realistic datasets seeded from real network traces, measuring models’ ability to interpret complex telemetry data, correlate symptoms with underlying causes and support autonomous decision-making during network incidents.
The TeleQnA dataset provides 10,000 multiple-choice questions covering telecommunications terminology, research trends and technical details from standards bodies including IEEE and 3GPP. TeleMath evaluates quantitative reasoning through 500 expert-curated telecom-specific mathematical problems.

Benchmark results to-date

Benchmark results show GPT-5 achieving the highest overall score of 65.55%, leading across most benchmarks including network troubleshooting (80%), mathematical reasoning (70.27%) and domain question answering (82.51%), followed by Grok-4-fast (61.52%) and Claude Sonnet 4.5 (60.64%).
Domain-specific fine-tuned models demonstrated competitive performance on targeted tasks, with AT&T’s customised Gemma model leading all systems on network troubleshooting scenarios, whilst TSLAM-18B approached frontier model performance on mathematical reasoning and standards comprehension.
Performance on intent-to-configuration tasks remained relatively low across all models, with even top systems scoring below 28%, highlighting ongoing challenges in translating natural language intents into valid, standards-compliant configurations required for network automation workflows.

Open-Telco LLM research team

The Open-Telco LLM research team includes:
- Lina Bariah, Adjunct Professor, Khalifa University
- Antonio De Domenico, Principal Research Engineer, Huawei Technologies
- Louis Powell, Director of AI Initiatives, GSMA
- Mohamed Sana, Senior Research Engineer, Huawei Paris Research Centre
- Merouane Debbah, Founder & Director of KU 6G Research Centre
- Mark Austin, Vice President Data Science, AT&T
- Farbod Tavakkoli, Data Scientist, AT&T
- George Hotelling, Senior Software Engineer, RocketReach
- Nicola Piovesan, Senior Researcher, Huawei
- Simone Mangiante, Research & Standards Specialist, Vodafone
- Sihem Cherrared, Research Engineer, Orange
- Sümeyye Baş, Next Generation Researcher, Turkcell
- Ghada Soliman, Head of Software Engineering, Orange Innovation Egypt
- Dilara Zeynep Gurer,6G Researcher, Turkcell
- Laszlo Suto, Lead Mobile Core Architect, Liberty Global
- Pierre Wang,

ZOOM OUT - The initiative builds upon Open-Telco LLM Benchmarks 1.0 launched by GSMA in February 2025 ahead of the Mobile World Congress, which established the first systematic evaluation framework for telecom-specific AI performance. The framework was developed by Hugging Face, Khalifa University and The Linux Foundation with operators including Deutsche Telekom, LG Uplus, SK Telecom and Turkcell. The original framework consisted of four benchmark tests: TeleQnA for domain knowledge, 3GPPTdocs Classification for standards comprehension, MATH500 for mathematical reasoning, and FOLIO for logic assessment.

[Written and edited with the assistance of AI]

Middle East AI News

Discussion about this post

Ready for more?

Middle East AI News

Khalifa University announces new telecom AI model benchmarks

Khalifa University contributes to global telecoms LLM standards

LINKS

Discussion about this post

Ready for more?