Inception, the artificial intelligence (AI) applied research unit of UAE technology group G42, has announced the open-source release of Jais, described as the world’s highest-quality Arabic large language model (LLM). Jais is a 13-billion parameter model trained on a newly developed 395-billion-token Arabic and English dataset.
With a name inspired by UAE’s highest peak, Jais, says Inception, will bring the advantages of generative AI across the Arabic-speaking world. The model is the result of a collaboration between Inception, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and Cerebra, a team of pioneering computer architects, system engineers, software engineers, and machine learning (ML) researchers.
It was trained on Condor Galaxy, the recently announced multi-exaFLOP AI supercomputer built by G42 and Cerebras, a company that says it is revolutionizing compute for deep learning.
Jais is a model home-grown in the UAE’s capital, Abu Dhabi, offering more than 400 million Arabic speakers the opportunity to harness the potential of generative AI.
By open-sourcing Jais, Inception aims to engage the scientific, academic, and developer communities to accelerate the growth of a vibrant Arabic language AI ecosystem. This, it says, can serve as a model for other languages currently underrepresented in mainstream AI.
More importantly perhaps, we are told that Jais outperforms existing Arabic models by a sizable margin. It is also said to be competitive with English models of similar size despite being trained on significantly less English data. This result shows that the model’s English component learned from the Arabic data and vice versa, opening a new era in LLM’s development and training.
The UK’s Financial Times newspaper says Jais uses modern standard Arabic, which is understood across the Middle East, as well as the region’s diverse spoken dialects, by drawing on media, social media and code.
It has also been designed to have a more accurate understanding of the culture and context of the region, in contrast to most US-centric models.
However, is it really “better than anything out there in Arabic” as Professor Timothy Baldwin, acting provost of MBZUAI, tells the FT? That question will be answered over time, but it’s a reasonable guess that Jais, along with other LLMs, will “accelerate the growth of a vibrant Arabic language AI ecosystem”, as G42 puts it.