What Is an LLM and How Does It Relate to AI Chatbots? Here's What to Know

When you ask an AI chatbot like ChatGPT, Claude, Copilot or Gemini to do something, it may seem like you're interacting with a person. They can give you a response -- an email note, an essay, a summary of a search request -- that's articulate, grammatical and convincing.

But you're not dealing with a person. These chatbots don't actually understand the meaning of words the way we do. Instead, they're the interface we use to interact with large language models, or LLMs. These underlying technologies are trained to recognize how words are used and which words frequently appear together, so they can predict future words, sentences or paragraphs.

The makers of generative AI tools are constantly refining their LLMs' understanding of words to make better predictions. It's all part of a constant flux of one-upmanship kicked off by OpenAI's introduction of ChatGPT in late 2022, followed quickly in early 2023 by the arrival of Microsoft's AI-enhanced Bing search and Google's Bard (now Gemini).

We're now several generations into the evolution of LLMs. OpenAI introduced GPT-4o in May, GPT-4o Mini in July and OpenAI o1 in September. Google has variations including Gemini 1.5 Pro and 1.5 Flash. Meta is now at Llama 3, while Anthropic is up to Claude 3.5.

If you're wondering what LLMs have to do with AI, this explainer is for you.

What is a language model?

You can think of a language model as a soothsayer for words.

"A language model is something that tries to predict what language looks like that humans produce," said Mark Riedl, professor in the Georgia Tech School of Interactive Computing and associate director of the Georgia Tech Machine Learning Center. "What makes something a language model is whether it can predict future words given previous words."

This is the basis of autocomplete functionality when you're texting, as well as of AI chatbots.

What is a large language model?

A large language model contains vast amounts of words, from a wide array of sources. These models are measured in what is known as "parameters."

What's a parameter?

Well, LLMs use neural networks, which are machine learning models that take an input and perform mathematical calculations to produce an output. The number of variables in these computations are parameters. A large language model can have 1 billion parameters or more.

"We know that they're large when they produce a full paragraph of coherent fluid text," Riedl said.

Is there such a thing as a small language model?

Yes. Tech companies like Microsoft are rolling out smaller models that are designed to operate "on device" and to not require the same computing resources as an LLM but nevertheless help users tap into the power of generative AI.

What's under the hood of a large language model?

When Anthropic mapped the "mind" of its Claude 3.0 Sonnet large language model, it found each internal state ("what the model is 'thinking' before writing its response") is made by combining features, or patterns of neuron activations. (The artificial neurons in neural networks mimic the behavior of the neurons in our brains.)

By extracting these neuron activations from Claude 3.0 Sonnet, Anthropic was able to see a map of its internal states as it generates answers. The AI startup found patterns of neuron activations were focused on cities, people, atomic elements, scientific fields and programming syntax, as well as more abstract concepts like bugs in computer code, gender bias at work and conversations about keeping secrets.

In the end, Anthropic said, "the internal organization of concepts in the AI model corresponds, at least somewhat, to our human notions of similarity."

How do large language models learn?

LLMs learn via a core AI process called deep learning.

"It's a lot like when you teach a child -- you show a lot of examples," said Jason Alan Snyder, global CTO of ad agency Momentum Worldwide.

In other words, you feed the LLM a library of content (what's known as training data) such as books, articles, code and social media posts to help it understand how words are used in different contexts, and even the more subtle nuances of language. This model digests far more than a person could ever read in their lifetime -- something on the order of trillions of tokens.

Tokens help AI models break down and process text. You can think of an AI model as a reader who needs help. The model breaks down a sentence into smaller pieces, or tokens -- which are equivalent to four characters in English, or about three-quarters of a word -- so they can understand each piece and then the overall meaning.

From there, the LLM can analyze how words connect and determine which words often appear together.

"It's like building this giant map of word relationships," Snyder said. "And then it starts to be able to do this really fun, cool thing, and it predicts what the next word is … and it compares the prediction to the actual word in the data and adjusts the internal map based on its accuracy."

This prediction and adjustment happens billions of times, so the LLM is constantly refining its understanding of language and getting better at identifying patterns and predicting future words. It can even learn concepts and facts from the data to answer questions, generate creative text formats and translate languages. But they don't understand the meaning of words like we do -- all they know are the statistical relationships.

LLMs also learn to improve their responses through reinforcement learning from human feedback.

"You get a judgment or a preference from humans on which response was better given the input that it was given," said Maarten Sap, assistant professor at the Language Technologies Institute at Carnegie Mellon University. "And then you can teach the model to improve its responses."

What do large language models do?

Given a series of input words, a LLM will predict the next word in a sequence.

For example, consider the phrase, "I went sailing on the deep blue..."

Most people would probably guess "sea" because sailing, deep and blue are all words we associate with the sea. In other words, each word sets up context for what should come next.

"These large language models, because they have a lot of parameters, they can store a lot of patterns," Riedl said. "They are very good at being able to pick out these clues and make really, really good guesses at what comes next."

What do large language models do really well?

LLMs are very good at figuring out the connection between words and producing text that sounds natural.

"They take an input, which can often be a set of instructions, like, 'Do this for me' or 'Tell me about this' or 'Summarize this' and are able to extract those patterns out of the input and produce a long string of fluid response," Riedl said.

Where do large language models struggle?

But they have several weaknesses.

First, they're not good at telling the truth. In fact, they sometimes just make stuff up that sounds true, like when ChatGPT cited six fake court cases in a legal brief or when Bard mistakenly credited the James Webb Space Telescope with taking the first pictures of a planet outside of our solar system. Those are known as hallucinations.

"They are extremely unreliable in the sense that they confabulate and make up things a lot," Sap said. "They're not trained or designed by any means to spit out anything truthful."

They also struggle with queries that are fundamentally different from anything they've encountered before. That's because they're focused on finding and responding to patterns.

A good example is a math problem with a unique set of numbers.

"It may not be able to do that calculation correctly because it's not really solving math," Riedl said. "It is trying to relate your math question to previous examples of math questions that it has seen before."

And while they excel at predicting words, they're not good at predicting the future, which includes planning and decision making.

"The idea of doing planning in the way that humans do it with … thinking about the different contingencies and alternatives and making choices, this seems to be a really hard roadblock for our current large language models right now," Riedl said.

Finally, they struggle with current events because their training data typically only goes up to a certain point in time and anything that happens after that isn't part of their knowledge base. And because they don't have the capacity to distinguish between what is factually true and what is likely, they can confidently provide incorrect information about current events.

They also don't interact with the world the way we do.

"This makes it difficult for them to grasp the nuances and complexities of current events that often require an understanding of context, social dynamics and real-world consequences," Snyder said.

How will large language models evolve?

We're already seeing generative AI companies like OpenAI, Google and Adobe debut multimodal models, which are trained not just on text but also on images, video and audio.

And we're seeing retrieval capabilities evolve beyond what the models have been trained on, including connecting with search engines like Google so the models can conduct web searches and then feed those results into the LLM. This means they could better understand queries and provide responses that are more timely.

"This helps our linkage models stay current and up-to-date because they can actually look at new information on the internet and bring that in," Riedl said.

That was the goal, for instance, with AI-powered Bing. Instead of tapping into search engines to enhance its responses, Microsoft looked to AI to improve its own search engine, in part by better understanding the true meaning behind consumer queries and better ranking the results for said queries.

But there are catches. Web search could make hallucinations worse without adequate fact-checking mechanisms in place. And LLMs would need to learn how to assess the reliability of web sources before citing them. Google learned that the hard way with the error-prone debut of its AI Overviews search results earlier this year. The search company subsequently refined its AI Overviews results to reduce misleading or potentially dangerous summaries.

Meanwhile, models including Google's Lumiere and OpenAI's Sora are even learning to generate images, video and audio. Google and Adobe have released peeks at tools that can generate virtual games and music, to show consumers where the technology is headed.

We'll also likely see improvements in LLMs' abilities to not just translate languages from English but to understand and converse in additional languages as well.