GSMA & Khalifa University test AI agents on real telecom tasks
TelcoAgent benchmark reveals AI still struggles with structured troubleshooting
#UAE #telecom– Global telecom association GSMA, US telco AT&T and the Digital Future Institute of Abu Dhabi-based Khalifa University have released TelcoAgent-Bench, a specialised benchmark designed to test whether AI agents can reliably handle real telecom network troubleshooting. The framework evaluates AI across 15 troubleshooting intents, 49 scenario blueprints, and approximately 1,470 dialogues in both English and Arabic languages. Key findings show that while current AI models understand telecom problems reasonably well, they consistently struggle to follow correct diagnostic sequences, particularly across scenario variations and in bilingual settings.
SO WHAT? – There is a big difference between an AI that sounds like a telecom engineer and one that can actually perform like one. As telecoms operators move toward autonomous network management, the reliability of AI agents in operational settings becomes a safety-critical question. TelcoAgent-Bench is one of the first frameworks to test that distinction rigorously. Findings using the benchmark suggest the industry should be cautious about deploying current models in live network environments without significant guardrails.
Here are some key points regarding the research:
Global telecom association GSMA, US telco giant AT&T and the Digital Future Institute of Khalifa University have released TelcoAgent-Bench, a specialised benchmark designed to test how reliably AI agents perform telecom network troubleshooting.
TelcoAgent-Bench evaluates AI agents across four core capabilities under realistic operational constraints:
correctly identifying the troubleshooting intent;
selecting the right diagnostic tools;
executing them in the correct sequence; and
generating an accurate final resolution summary.
The benchmark covers 15 telecom troubleshooting intents and 49 scenario blueprints, generating approximately 1,470 dialogues. Each blueprint captures variations of the same scenario across different parameter ranges, testing whether AI agents remain consistent when the same problem is presented differently.
Testing runs in both English and Arabic, addressing the practical requirement for multilingual AI deployment in regional telecom networks. Performance gaps between the two languages were noted, with bilingual settings proving particularly challenging for current models.
The headline finding is a clear capability gap: today’s AI models are reasonably competent at understanding the problem and writing a plausible resolution summary, but they struggle to follow the correct troubleshooting sequence consistently (the operational step that matters most in a live network environment).
Existing general-purpose AI benchmarks such as AgentBench, GAIA, and WebArena were not designed for telecom-specific requirements. They test task completion and tool use broadly, but do not measure consistency of resolution paths, alignment with structured troubleshooting flows, or time to resolution under operational constraints.
The research represents one of the first domain-specific benchmarking frameworks built explicitly for evaluating AI agents in telecom network operations.
The authors acknowledge current limitations in the framework itself: TelcoAgent-Bench does not yet model fully closed-loop reasoning, where an agent interprets tool outputs, makes configuration changes, and re-evaluates network behaviour before reaching a resolution. That capability will be the focus of future research.
The research team includes: Brahim Mefgouda, Lina Bariah, Farbod Tavakkoli, Enrique Molero, Louis Powell, and Merouane Debbah.
ZOOM OUT – TelcoAgent-Bench is the latest in a series of collaborations between GSMA and Khalifa University's Digital Future Institute this year. In March, the two organisations were central to the launch of the Open Telco AI initiative at MWC Barcelona (a fully global industry programme bringing together AT&T, AMD, and others to build open AI foundations for telecoms). That launch included the release of the third edition of the Open Telco AI Leaderboard, first, which ranks large language models on telecom-specific tasks. Khalifa University leads the Network Management and Configuration Group within the Open Telco AI programme.
[Written and edited with the assistance of AI]
LINKS
TelcoAgent Bench code (GitHub)
DOWNLOAD
Read more about Khalifa University’s work on telecom AI:
GSMA whitepaper sets out 6G role for agentic AI (Middle East AI News)
Open Telco AI Leaderboard Release 3 launched (Middle East AI News)
Khalifa University unveils breakthrough RF AI model (Middle East AI News)
UAE University, KU release first open 6G AI benchmark (Middle East AI News)
GSMA, Khalifa University to update TelecomGPT (Middle East AI News)



