In just half a year, OpenAI’s ChatGPT has seamlessly integrated into our daily lives, transcending traditional tech boundaries. From students seeking guidance to writers honing their craft, individuals of all ages and professions have embraced its precision, speed, and remarkably human-like conversations. These chat models, now equipped with Langchain PDF rendering capabilities, are poised to revolutionize various industries, extending far beyond the realm of technology.
The emergence of open-source tools like AutoGPTs, BabyAGI, and Langchain, with its innovative Langchain PDF feature, marks a significant milestone in leveraging the capabilities of large language models. These tools empower users to automate programming tasks through simple prompts, establish connections between language models and data sources, and expedite the development of AI applications. Langchain, in particular, stands out as a ChatGPT-enabled Q&A tool for PDFs, providing a comprehensive solution for creating AI applications with ease.
In this article, we will explore how to chat with PDF using LangChain. Our LangChain tutorial PDF provides step-by-step guidance for leveraging LangChain’s capabilities to interact with PDF documents effectively.
The Langchain streamlines the process of achieving these objectives, guiding users through each stage systematically. With support for multiple services, including embedding models, chat models, and vector databases, Langchain facilitates the creation of chatbots tailored for PDF interactions. This seamless workflow extends to integrating with Streamlit, handling multiple PDFs, and utilizing RAG for semantic search capabilities.
This article was published as a part of the Data Science Blogathon.
Langchain is an open-source tool, ideal for enhancing chat models like GPT-4 or GPT-3.5. It connects external data seamlessly, making models more agentic and data-aware. With Langchain, you can introduce fresh data to models like never before. The platform offers multiple chains, simplifying interactions with language models. In addition to Langchain, tools like Models for creating vector embeddings play a crucial role. When dealing with Langchain, the capability to render images of a PDF file is also noteworthy. Now, let’s delve into the significance of text embeddings.
Text embeddings are the heart and soul of Large Language Operations. Technically, we can work with language models with natural language but storing and retrieving natural language is highly inefficient. For example, in this project, we will need to perform high-speed search operations over large chunks of data. It is impossible to perform such operations on natural language data.
To make it more efficient, we need to transform text data into vector forms. There are dedicated ML models for creating embeddings from texts. The texts are converted into multidimensional vectors. Once embedded, we can group, sort, search, and more over these data. We can calculate the distance between two sentences to know how closely they are related. And the best part of it is these operations are not just limited to keywords like the traditional database searches but rather capture the semantic closeness of two sentences. This makes it a lot more powerful, thanks to Machine Learning.
Langchain has wrappers for all major vector databases like Chroma, Redis, Pinecone, Alpine db, and more. And same is true for LLMs, along with OpeanAI models, it also supports Cohere’s models, GPT4ALL- an open-source alternative for GPT models. For embeddings, it provides wrappers for OpeanAI, Cohere, and HuggingFace embeddings. You can also use your custom embedding models as well.
So, in short, Langchain is a meta-tool that abstracts away a lot of complications of interacting with underlying technologies, which makes it easier for anyone to build AI applications quickly.
In this article, we will use the OpeanAI embeddings model for creating embeddings. If you want to deploy an AI app for end users, consider using any Opensource models, such as Huggingface models or Google’s Universal sentence encoder.
To store vectors, we will use Chroma DB, an open-source vector store database. Feel free to explore other databases like Alpine, Pinecone, and Redis. Langchain has wrappers for all of these vector stores.
To create a Langchain chain, we will use ConversationalRetrievalChain(), ideal for conversation with chat models with history (to keep the context of the conversation). Do check out their official documentation regarding different LLM chains.
There are quite a few libraries we will use. So, install them beforehand. To create a seamless, clutter-free development environment, use virtual environments or Docker.
gradio = "^3.27.0"
openai = "^0.27.4"
langchain = "^0.0.148"
chromadb = "^0.3.21"
tiktoken = "^0.3.3"
pypdf = "^3.8.1"
pymupdf = "^1.22.2"
Now, import these libraries
import gradio as gr
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import PyPDFLoader
import os
import fitz
from PIL import Image
The interface of the application will have two major functionalities, one is a chat interface, and the other renders the relevant page of the PDF as an image. Apart from this, a text box for accepting OpenAI API keys from end users. I would highly recommend going through the article for building a GPT chatbot with Gradio from scratch. The article discusses the fundamental aspects of Gradio. We will borrow a lot of things from this article.
Gradio Blocks class allows us to build a web app. The Row and Columns classes allow for aligning multiple components on the web app. We will use them to customize the web interface.
with gr.Blocks() as demo:
# Create a Gradio block
with gr.Column():
with gr.Row():
with gr.Column(scale=0.8):
api_key = gr.Textbox(
placeholder='Enter OpenAI API key',
show_label=False,
interactive=True
).style(container=False)
with gr.Column(scale=0.2):
change_api_key = gr.Button('Change Key')
with gr.Row():
chatbot = gr.Chatbot(value=[], elem_id='chatbot').style(height=650)
show_img = gr.Image(label='Upload PDF', tool='select').style(height=680)
with gr.Row():
with gr.Column(scale=0.70):
txt = gr.Textbox(
show_label=False,
placeholder="Enter text and press enter"
).style(container=False)
with gr.Column(scale=0.15):
submit_btn = gr.Button('Submit')
with gr.Column(scale=0.15):
btn = gr.UploadButton("📁 Upload a PDF", file_types=[".pdf"]).style()
The interface is simple with a few components.
It has:
Here is a snapshot of the web UI.
The frontend part of our application is complete. Let’s hop on to the backend.
First, let’s outline the processes we will be dealing with.
These are the overview of our application. Let’s start building it.
When a specific action on the web UI is performed, these events are triggered. So, the events make the web app interactive and dynamic. Gradio allows us to define events with Python codes.
Gradio Events use component variables that we defined earlier to communicate with the backend. We will define a few Events that we need for our application. These are
with gr.Blocks() as demo:
# Create a Gradio block
with gr.Column():
with gr.Row():
with gr.Column(scale=0.8):
api_key = gr.Textbox(
placeholder='Enter OpenAI API key',
show_label=False,
interactive=True
).style(container=False)
with gr.Column(scale=0.2):
change_api_key = gr.Button('Change Key')
with gr.Row():
chatbot = gr.Chatbot(value=[], elem_id='chatbot').style(height=650)
show_img = gr.Image(label='Upload PDF', tool='select').style(height=680)
with gr.Row():
with gr.Column(scale=0.70):
txt = gr.Textbox(
show_label=False,
placeholder="Enter text and press enter"
).style(container=False)
with gr.Column(scale=0.15):
submit_btn = gr.Button('Submit')
with gr.Column(scale=0.15):
btn = gr.UploadButton("📁 Upload a PDF", file_types=[".pdf"]).style()
# Set up event handlers
# Event handler for submitting the OpenAI API key
api_key.submit(fn=set_apikey, inputs=[api_key], outputs=[api_key])
# Event handler for changing the API key
change_api_key.click(fn=enable_api_box, outputs=[api_key])
# Event handler for uploading a PDF
btn.upload(fn=render_first, inputs=[btn], outputs=[show_img])
# Event handler for submitting text and generating response
submit_btn.click(
fn=add_text,
inputs=[chatbot, txt],
outputs=[chatbot],
queue=False
).success(
fn=generate_response,
inputs=[chatbot, txt, btn],
outputs=[chatbot, txt]
).success(
fn=render_file,
inputs=[btn],
outputs=[show_img]
)
So far we have not defined our functions called inside above event handlers. Next, we will define all these functions to make a functional web app.
Handling the API keys of a user is important as the entire thing runs on the BYOK(Bring Your Own Key) principle. Whenever a user submits a key, the textbox must become immutable with a prompt suggesting the key is set. And when the “Change Key” event is triggered the box must be able to take inputs.
To do this, define two global variables.
enable_box = gr.Textbox.update(value=None,placeholder= 'Upload your OpenAI API key',
interactive=True)
disable_box = gr.Textbox.update(value = 'OpenAI API key is Set',interactive=False)
Define functions
def set_apikey(api_key):
os.environ['OPENAI_API_KEY'] = api_key
return disable_box
def enable_api_box():
return enable_box
The set_apikey function takes a string input and returns the disable_box variable, which makes the textbox immutable after execution. In the Gradio Events section, we defined the api_key Submit Event, which calls the set_apikey function. We set the API key as an environment variable using the OS library.
Clicking the Change API key button returns the enable_box variable, which enables the mutability of the textbox again.
This is the most important step. This step involves extracting texts and creating embeddings and storing them in vector stores. Thanks to Langchain, which provides wrappers for multiple services making things easier. So, let’s define the function.
def process_file(file):
# raise an error if API key is not provided
if 'OPENAI_API_KEY' not in os.environ:
raise gr.Error('Upload your OpenAI API key')
# Load the PDF file using PyPDFLoader
loader = PyPDFLoader(file.name)
documents = loader.load()
# Initialize OpenAIEmbeddings for text embeddings
embeddings = OpenAIEmbeddings()
# Create a ConversationalRetrievalChain with ChatOpenAI language model
# and PDF search retriever
pdfsearch = Chroma.from_documents(documents, embeddings,)
chain = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0.3),
retriever=
pdfsearch.as_retriever(search_kwargs={"k": 1}),
return_source_documents=True,)
return chain
Once the chain is created, we will call the chain and send our queries. Send a chat history along with the queries to keep the context of conversations and stream responses to the chat interface. Let’s define the function.
def generate_response(history, query, btn):
global COUNT, N, chat_history
# Check if a PDF file is uploaded
if not btn:
raise gr.Error(message='Upload a PDF')
# Initialize the conversation chain only once
if COUNT == 0:
chain = process_file(btn)
COUNT += 1
# Generate a response using the conversation chain
result = chain({"question": query, 'chat_history':chat_history}, return_only_outputs=True)
# Update the chat history with the query and its corresponding answer
chat_history += [(query, result["answer"])]
# Retrieve the page number from the source document
N = list(result['source_documents'][0])[1][1]['page']
# Append each character of the answer to the last message in the history
for char in result['answer']:
history[-1][-1] += char
# Yield the updated history and an empty string
yield history, ''
The final step is to render the image of the PDF file with the most relevant answer. We can use the PyMuPdf and PIL libraries to render the images of the document.
def render_file(file):
global N
# Open the PDF document using fitz
doc = fitz.open(file.name)
# Get the specific page to render
page = doc[N]
# Render the page as a PNG image with a resolution of 300 DPI
pix = page.get_pixmap(matrix=fitz.Matrix(300/72, 300/72))
# Create an Image object from the rendered pixel data
image = Image.frombytes('RGB', [pix.width, pix.height], pix.samples)
# Return the rendered image
return image
This is everything we need to do for a functional web app for chatting with any PDF.
#import csv
import gradio as gr
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import PyPDFLoader
import os
import fitz
from PIL import Image
# Global variables
COUNT, N = 0, 0
chat_history = []
chain = ''
enable_box = gr.Textbox.update(value=None,
placeholder='Upload your OpenAI API key', interactive=True)
disable_box = gr.Textbox.update(value='OpenAI API key is Set', interactive=False)
# Function to set the OpenAI API key
def set_apikey(api_key):
os.environ['OPENAI_API_KEY'] = api_key
return disable_box
# Function to enable the API key input box
def enable_api_box():
return enable_box
# Function to add text to the chat history
def add_text(history, text):
if not text:
raise gr.Error('Enter text')
history = history + [(text, '')]
return history
# Function to process the PDF file and create a conversation chain
def process_file(file):
if 'OPENAI_API_KEY' not in os.environ:
raise gr.Error('Upload your OpenAI API key')
loader = PyPDFLoader(file.name)
documents = loader.load()
embeddings = OpenAIEmbeddings()
pdfsearch = Chroma.from_documents(documents, embeddings)
chain = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0.3),
retriever=pdfsearch.as_retriever(search_kwargs={"k": 1}),
return_source_documents=True)
return chain
# Function to generate a response based on the chat history and query
def generate_response(history, query, btn):
global COUNT, N, chat_history, chain
if not btn:
raise gr.Error(message='Upload a PDF')
if COUNT == 0:
chain = process_file(btn)
COUNT += 1
result = chain({"question": query, 'chat_history': chat_history}, return_only_outputs=True)
chat_history += [(query, result["answer"])]
N = list(result['source_documents'][0])[1][1]['page']
for char in result['answer']:
history[-1][-1] += char
yield history, ''
# Function to render a specific page of a PDF file as an image
def render_file(file):
global N
doc = fitz.open(file.name)
page = doc[N]
# Render the page as a PNG image with a resolution of 300 DPI
pix = page.get_pixmap(matrix=fitz.Matrix(300/72, 300/72))
image = Image.frombytes('RGB', [pix.width, pix.height], pix.samples)
return image
# Gradio application setup
with gr.Blocks() as demo:
# Create a Gradio block
with gr.Column():
with gr.Row():
with gr.Column(scale=0.8):
api_key = gr.Textbox(
placeholder='Enter OpenAI API key',
show_label=False,
interactive=True
).style(container=False)
with gr.Column(scale=0.2):
change_api_key = gr.Button('Change Key')
with gr.Row():
chatbot = gr.Chatbot(value=[], elem_id='chatbot').style(height=650)
show_img = gr.Image(label='Upload PDF', tool='select').style(height=680)
with gr.Row():
with gr.Column(scale=0.70):
txt = gr.Textbox(
show_label=False,
placeholder="Enter text and press enter"
).style(container=False)
with gr.Column(scale=0.15):
submit_btn = gr.Button('Submit')
with gr.Column(scale=0.15):
btn = gr.UploadButton("📁 Upload a PDF", file_types=[".pdf"]).style()
# Set up event handlers
# Event handler for submitting the OpenAI API key
api_key.submit(fn=set_apikey, inputs=[api_key], outputs=[api_key])
# Event handler for changing the API key
change_api_key.click(fn=enable_api_box, outputs=[api_key])
# Event handler for uploading a PDF
btn.upload(fn=render_first, inputs=[btn], outputs=[show_img])
# Event handler for submitting text and generating response
submit_btn.click(
fn=add_text,
inputs=[chatbot, txt],
outputs=[chatbot],
queue=False
).success(
fn=generate_response,
inputs=[chatbot, txt, btn],
outputs=[chatbot, txt]
).success(
fn=render_file,
inputs=[btn],
outputs=[show_img]
)
demo.queue()
if __name__ == "__main__":
demo.launch()
Now that we have configured everything, let’s launch our application.
You can launch the application in debug mode with the following command
gradio app.py
Otherwise, you can also simply run the application with the Python command. Below is a snapshot of the end product. GitHub repository of the codes.
The current application works great. But there are a few things you can do to make it better.
Use the tools across multiple fields from Education to Law to Academia or any field you can imagine that requires the person to go through huge texts. Some of the practical use cases of ChatGPT for PDFs are
A chatGPT-enabled PDF Q&A tool can gather information faster from heaps of PDF text. It is like a search engine for text data. Not just PDFs, but we can also extend this tool to anything with text data with a little code manipulation.
LangChain is a tool that helps you work with language models. To load a PDF file using LangChain, you need to follow these steps:
pip install PyPDF2
from langchain.document_loaders import PyPDF2Loader
loader = PyPDF2Loader("path/to/your/file.pdf")
documents = loader.load()
for doc in documents:
print(doc.page_content) # This will print the text from each page
In this tutorial, we’ve explored how to build a chatbot interface for interacting with PDF files using ChatGPT, leveraging the capabilities of Langchain PDF. By integrating natural language processing techniques and parameter tuning, we’ve created a seamless user experience for querying and retrieving information from PDF documents.
Hope you enjoy exploring how to chat with PDF using LangChain! This LangChain tutorial PDF will guide you through the process, making your document interactions seamless and efficient.
To get started with building similar applications, consider exploring prompt engineering techniques, fine-tuning model parameters, and installing necessary libraries like transformers via pip install. With these tools and techniques at your disposal, you can unlock the potential of natural language processing and build innovative solutions for various domains.
A. LangChain is a decentralized platform that aims to bridge language barriers by combining blockchain technology and language services. It facilitates secure and transparent interactions between language service providers and clients, creating a global ecosystem for language-related services.
A. LangChain is used to connect language service providers with clients, enabling seamless access to translation, interpretation, localization, and other language-related services. It streamlines the process, increases efficiency, and promotes trust and fairness in the language industry.
A. LangChain incorporates different chain types to cater to various requirements. These chain types include translation chains, interpretation chains, localization chains, and more. Each chain type focuses on specific language services and provides a dedicated environment for their execution and management.
A. In LangChain, agents act as intermediaries between service providers and clients. They facilitate the coordination, negotiation, and execution of language service contracts. Agents can be individuals or organizations, and they play a vital role in maintaining the integrity and efficiency of the LangChain ecosystem.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,