pwshub.com

Understanding LangChain LLM Output Parser

The large Language Model, or LLM, has revolutionized how people work. By helping users generate the answer from a text prompt, LLM can do many things, such as answering questions, summarizing, planning events, and more.

However, there are times when the output from LLM is not up to our standard. For example, the text generated could be thoroughly wrong and need further direction. This is where the LLM Output Parser could help.

By standardizing the output result with LangChain Output Parser, we can have some control over the output. So, how does it work? Let’s get into it.

Preparation

In this article, we would rely on the LangChain packages, so we need to install them in the environment. To do that, you can use the following code.

pip install langchain langchain_core langchain_community langchain_openai python-dotenv

Also, we would use the OpenAI GPT model for text generation, so ensure that you have API access to them. You can get the API key from the OpenAI platform.

I would work in the Visual Studio Code IDE, but you could work in any preferred IDE. Create a file called .env within your project folder and put the OpenAI API key inside. It should look like this.

OPENAI_API_KEY = sk-XXXXXXXXXX

Once everything is ready, we will move on to the central part of the article.

Output Parser

We can use many types of output parsers from LangChain to standardize our LLM output. We would try several of them to understand the output parser better.

First, we would try Pydantic Parser. It’s an output parser that we could use to control and validate the output from the generated text. Let’s use them better with an example. Create a Python script in your IDE and then copy the code below to your script.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

from typing import List

from dotenv import load_dotenv

from langchain.output_parsers import PydanticOutputParser

from langchain_core.prompts import PromptTemplate

from langchain_core.pydantic_v1 import BaseModel, Field, validator

from langchain_openai import ChatOpenAI

load_dotenv()

class MovieReview(BaseModel):

    title: str = Field(description="The movie title")

    year: int = Field(description="The year of the movie was released")

    genre: List[str] = Field(description="The main genres of the movie")

    rating: float = Field(description="Rating out of 10")

    summary: str = Field(description="Brief summary of the movie plot")

    review: str = Field(description="Critical review of the movie")

    @validator("year")

    def valid_year(cls, val):

        if val  2025:

            raise ValueError("Must a valid movie year")

        return val

    @validator("rating")

    def valid_rating(cls, val):

        if val  10:

            raise ValueError("Rating must be between 0 and 10")

        return val

parser = PydanticOutputParser(pydantic_object=MovieReview)

prompt = PromptTemplate(

    template="Generate a movie review for the following movie:\n{movie_title}\n\n{format_instructions}",

    input_variables=["movie_title"],

    partial_variables={"format_instructions": parser.get_format_instructions()}

)

model = ChatOpenAI(temperature=0)

chain = prompt | model | parser

movie_title = "The Matrix"

review = chain.invoke({"movie_title": movie_title})

print(review)

We initially imported the packages in the code above and loaded the OpenAI key with the load_dotenv. After that, we create a class called MovieReview which contains all the information output we want. The output would deliver output from the title, year, genre, rating, summary, and review. In each output, we define the description of the output we want.

From the output, we create a validator for the year and rating to ensure the result is not what we wanted. You can also add more validation mechanisms if required.

Then we create the prompt template that would accept our query input and the format it should be.

The last thing we do is create the model chain and pass the query to get our result. For note, the chain variable above accepts structure using “|” which is a unique method in the LangChain.

Overall, the result is similar to below.

Output:

title='The Matrix' year=1999 genre=['Action', 'Sci-Fi'] rating=9.0 summary='A computer hacker learns about the true nature of reality and his role in the war against its controllers.' review="The Matrix is a groundbreaking film that revolutionized the action genre with its innovative special effects and thought-provoking storyline. Keanu Reeves delivers a memorable performance as Neo, the chosen one who must navigate the simulated reality of the Matrix to save humanity. The film's blend of martial arts, philosophy, and dystopian themes make it a must-watch for any movie enthusiast."

As you can see the output follows the format we want and the result passes our validation method.

Pedantic Parser is the standard Output Parser we can use. We can use the other Output Parser if we already have a specific format in mind. For example, we can use the CSV Parser if we want the result only in the comma-separated items.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

from dotenv import load_dotenv

from langchain.output_parsers import CommaSeparatedListOutputParser

from langchain_core.prompts import PromptTemplate

from langchain_openai import ChatOpenAI

load_dotenv()

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()

prompt = PromptTemplate(

    template="List six {subject}.\n{format_instructions}",

    input_variables=["subject"],

    partial_variables={"format_instructions": format_instructions},

)

model = ChatOpenAI(temperature=0)

chain = prompt | model | output_parser

print(chain.invoke({"subject": "Programming Language"}))

Output:

['Java', 'Python', 'C++', 'JavaScript', 'Ruby', 'Swift']

The result is a list with the values separated by the comma. You can expand the template in any way you like if the result is comma-separated.

It’s also possible to get the output in datetime format. By changing the code and prompt, we can expect the result we want.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

from dotenv import load_dotenv

from langchain.output_parsers import DatetimeOutputParser

from langchain_core.prompts import PromptTemplate

from langchain_openai import ChatOpenAI

load_dotenv()

output_parser = DatetimeOutputParser()

format_instructions = output_parser.get_format_instructions()

prompt = PromptTemplate(

    template="""Answer the users question:

    {question}

    {format_instructions}""",

    input_variables=["question"],

    partial_variables={"format_instructions": format_instructions},

)

model = ChatOpenAI(temperature=0)

chain = prompt | model | output_parser

print(chain.invoke({"question": "When is the Python Programming Language invented?"}))

Output:

You can see that the result is in the datetime format.

That’s all about the LangChain LLM Output Parsers. You can visit their documentation to find the Output Parsers you require or use the Pydantic to structure it yourself.

Conclusion

In this article, we have learned about the LangChain Output Parser, which standardizes the generated text from LLM. We can use the Pydantic Parser to structure the LLM output and provide the result you want. There are many other Output Parsers from LangChain that could be suitable for your situation, such as the CSV parser and the Datetime parser.

Source: machinelearningmastery.com

Related stories
2 weeks ago - Learn how to build a multilingual AI support assistant using Twilio SendGrid for email communication, Langchain.js for text processing and document analysis, and OpenAI for natural language generation.
1 month ago - LlamaIndex provides tools for ingesting, processing, and implementing complex query workflows that combine data access with LLM prompting. The post Using LlamaIndex to add personal data to LLMs appeared first on LogRocket Blog.
1 month ago - For aspiring start-ups and established companies alike, understanding barriers helps you develop effective marketing and sustain growth. The post Understanding barriers to entry: Types, examples, and impact appeared first on LogRocket Blog.
1 month ago - The challenge of understanding complex systems Imagine a scenario where a project is so large and its features so complex that tracking and understanding the …
3 weeks ago - Explore React’s `useOptimistic` Hook and how it can be used to implement optimistic UI updates and make your app feel faster and more responsive. The post Understanding optimistic UI and React’s useOptimistic Hook appeared first on...
Other stories
8 hours ago - Looking for a powerful new Linux laptop? The new KDE Slimbook VI may very well appeal. Unveiled at Akademy 2024, KDE’s annual community get-together, the KDE Slimbook VI marks a major refresh from earlier models in the KDE Slimbook line....
12 hours ago - Fixes 130 bugs (addressing 250 👍). `bun pm pack`, faster `node:zlib`. Static routes in Bun.serve(). ReadableStream support in response.clone() & request.clone(). Per-request timeouts. Cancel method in ReadableStream is called. `bun run`...
1 day ago - Have you ever used an attribute in HTML without fully understanding its purpose? You're not alone! Over time, I've dug into the meaning behind many HTML attributes, especially those that are crucial for accessibility. In this in-depth...
1 day ago - Lifetimes are fundamental mechanisms in Rust. There's a very high chance you'll need to work with lifetimes in any Rust project that has any sort of complexity. Even though they are important to Rust projects, lifetimes can be quite...
1 day ago - The first interaction sets the tone for the entire experience — get it right, and you’ve hooked your users from the start. So as a UX designer, you need to know how to put the primacy effect of UX design to good use. The post Leveraging...