IBM releases new Granite foundation models under ‘permissive’ Apache license

Furthering its drive to build a distinctive position in enterprise artificial intelligence, IBM Corp. today is rolling out a series of new language models and tools to ensure their responsible use.

The company is also unveiling a new generation of its watsonx Code Assistant for application development and modernization. All of these new capabilities are being bundled together in a multimodel platform for use by the company’s 160,000 consultants.

The new Granite 3.0 8B & 2B models come in “Instruct” and “Guardian” variants used for training and risk/harm detection, respectively. Both will be available under an Apache 2.0 license, Rob Thomas (pictured), IBM’s senior vice president of software and chief commercial officer, called “the most permissive license for enterprises and partners to create value on top.” The open-source license allows models to be deployed for as little as $100 per server, with intellectual property indemnification aimed at giving enterprise customers confidence in merging their data with the IBM models.

“We’ve gone from a world of ‘plus AI,’ where clients were running their business and adding AI on top of it, to a notion of AI first, which is companies building their business model based on AI,” Thomas said. IBM intends to lead in the use of AI for information technology automation through organic development and its acquisitions and pending acquisitions of infrastructure-focused firms like Turbonomic Inc., Apptio Inc. and HashiCorp Inc.

“The book of business that we have built on generative AI is now $2 billion-plus across technology and consulting,” Thomas said. “I’m not sure we’ve ever had a business that has scaled at this pace.”

The Instruct versions of Granite, which are used for training, come in 8 billion- and 2 billion-parameter versions. They were trained on more than 12 trillion tokens of training data in 12 languages and 116 programming languages, making them capable of coding, documentation and translation.

By year’s end, IBM said, it plans to extend the foundational models to a 128,000-token context length with multimodality. That refers to enhancing a model’s ability to process significantly longer input sequences and handle multiple data types simultaneously. Context length is the number of tokens — such as words, symbols and or other units of input data — that the AI model can process and retain. Typical models have context lengths of between 1,000 and 8,000 tokens.

Enterprise workhorses

IBM said the new Granite models are designed as enterprise “workhorses” for tasks such as retrieval-augmented generation or RAG, classification, summarization, agent training, entity extraction and tool use. They can be trained with enterprise data to deliver the task-specific performance of much larger models at up to 60 times lower cost. Internal benchmarks showed the Granite 8B model achieving better performance than comparable models from Google LLC and Mistral AI SAS and equivalent performance to comparable models from Meta Platforms Inc.

An accompanying technical report and responsible use guide provide extensive documentation of training datasets used to train the models as well as details of the filtering, cleansing and curation steps that were applied and comparative benchmark data.

An updated release of the pretrained Granite models IBM released earlier this year are trained on three times more data and provide greater modeling flexibility with support for external variables and rolling forecasts.

The Granite Guardian 3.0 models are intended to provide safety protections by checking user prompts and model responses for a variety of risks. “You can concatenate both on the input before you make the inference query and the output to prevent the core model from jailbreaks and to prevent violence, profanity, et cetera,” said Dario Gil, senior vice president and director of research at IBM. “We’ve done everything possible to make it as safe as possible.”

Jailbreaks are malicious attempts to bypass the restrictions or safety measures imposed on an AI system to make it behave in unintended or potentially harmful ways. Guardian also performs RAG-specific checks such as context relevance, answer relevance and “groundedness,” which refers to the extent to which the model is connected to and informed by real-world data, facts or context.

AI at the edge

A set of smaller models called Granite Accelerators and Mixture of Experts are intended for low-latency and CPU-only applications. MoE is a type of machine learning architecture that combines multiple specialized models and dynamically selects and activates only a subset of them to enhance efficiency.

“Accelerator allows you to implement speculative decoding so you can achieve twice the throughput of the core model with no loss of quality,” Gil said. The MoE model can be trained with 10 trillion tokens but uses only 800 million used during inferencing for efficiency in edge use cases.

The Instruct and Guardian variants of Granite 8B and 2B models are available immediately for commercial use on IBM’s watsonx platform. A selection of Granite 3.0 models will also be available n partner platforms like Nvidia Corp.’s NIM stack and Google’s Vertex. The entire suite of Granite 3.0 models and the updated Time Series models are available for download on HuggingFace Inc.’s open-source platform and Red Hat Enterprise Linux.

The new Granite 3.0-based watsonx Code Assistant supports the C, C++, Go, Java and Python languages with new application modernization capabilities for enterprise Java Applications. IBM said the assistant has yielded 90% faster code documentation for certain tasks within its software development business. The code capabilities are accessible through a Visual Studio Code extension called IBM Granite.Code.

More better agents

New tools for developers include agentic frameworks, integrations with existing environments and low-code automations for common use cases such as RAG and agents.

With agentic AI, or systems that are capable of autonomous behavior or decision-making, set to become the next big wave in AI development, IBM also said it’s equipping its consulting division with a multimodal agentic platform. The new Consulting Advantage for Cloud Transformation and Management and Consulting Advantage for Business Operations consulting lines will include domain-specific AI agents, applications and methods trained on IBM intellectual property and best practices that consultants can apply to their clients’ cloud and AI projects.

About 80,000 IBM consultants are currently using Consulting Advantage, with most deploying only one or two agents at a time, said Mohamad Ali, senior vice president head of IBM Consulting. As usage grows, however, IBM Consulting will need to support over 1.5 million agents, making Granite’s economics “absolutely essential because we will continue to scale this platform and we needed to be very cost-efficient,” he said.

Enterprise workhorses

AI at the edge

More better agents

Photo: SiliconANGLE

Related stories

Other stories