Falcon-H1 LLM joins NVIDIA's inference microservice

Falcon-H1 now available as optimised inference microservice for AI factories

Jun 11, 2025

(Image credit: Carrington Malin via ChatGPT)

#UAE #LLMs - The new Falcon-H1 large language model (LLM) developed by Technology Innovation Institute (TII) will now be made available via NVIDIA's NIM microservices platform, enabling enterprise-scale AI across cloud and on-premise environments. The hybrid Transformer-Mamba architecture supports 256,000-token context windows, whilst delivering superior performance against models twice its parameter count across mathematics, reasoning and multilingual tasks. The institute’s flagship model joins NVIDIA's production-ready inference system following 55 million global downloads of the Falcon series.

SO WHAT? - Falcon-H1’s availability via the NVIDIA Inference Microservice makes its easier for enterprise clients and AI data centres to deploy, opening up new global opportunities for the high-performing language model. By combining pre-optimised models, inference engines, and industry-standard APIs, NIM simplifies the process of deploying and scaling AI applications. So, organisations can deploy faster, more reliably and with a guarantee of a fast-inference service.

Here are some key points about Falcon-H1 on NVIDIA NIM:

Technology Innovation Institute (TII), the applied research arm of Abu Dhabi's Advanced Technology Research Council (ATRC), has announced that its flagship language model Falcon-H1 will soon be available via NVIDIA's NIM microservices platform.
NVIDIA NIM integration can reduce deployment time from weeks to minutes through automated hardware optimisation and enterprise-grade service level agreements
Announced by TII last month, Falcon-H1 utilises breakthrough hybrid architecture combining Transformers with state space models, enabling 10x improvement in long-context reasoning whilst reducing memory consumption and inference costs
The model family spans six variants from 500 million to 34 billion parameters, with each variant reportedly outperforming competitor models twice its size across industry benchmarks
Falcon-H1 supports 18 languages natively including Arabic, with multilingual tokenizer capability extending to over 100 languages for global applications
The 34 billion parameter model leads multiple industry benchmarks whilst smaller variants excel in mathematical reasoning and coding tasks
Integration with NVIDIA NeMo microservices provides full lifecycle tooling from data curation to post-deployment tuning for regulated environments
Production deployment supports retrieval-augmented generation workflows, agentic systems and domain-specific assistants through standard Docker and Hugging Face tools.

ZOOM OUT - Technology Innovation Institute launched the new Falcon-H1 model in May, which introduced a series of models built on breakthrough hybrid architecture outperforming Meta's LLaMA and Alibaba's Qwen models in the 30-70 billion parameter range. Last month saw the release of six model variants (34B, 7B, 3B, 1.5B, 1.5B-deep, and 500 million parameter) plus Falcon’s first Arabic-language model (although not part of the H1 series). As part of TII’s efforts to expand global access to Falcon, TII also recently announced availability of Falcon models via the Amazon Bedrock Marketplace. TII’s Falcon family of language models has so far achieved 55 million downloads since the first model was launched in 2023.

[Written and edited with the assistance of AI]

Read more about Falcon:

TII's Falcon AI models to join Bedrock Marketplace (Middle East AI News)
Falcon 3 LLM series gets first Arabic model (Middle East AI News)
TII releases Falcon-Edge 1.58bit language models (Middle East AI News)
TII launches most powerful SLMs under 13B parameters (Middle East AI News)
TII launches Falcon's first SSLM (Middle East AI News)

Middle East AI News

Falcon-H1 LLM joins NVIDIA's inference microservice

Falcon-H1 now available as optimised inference microservice for AI factories

Discussion about this post