Build a Smart PDF AI Agent with n8n
Learn how to build a powerful AI agent that automatically reads, classifies, summarizes, and routes PDF documents like invoices and resumes from a Google Drive folder.
Build a Smart PDF AI Agent with n8n
For any business dealing with a high volume of PDFs—such as invoices, contracts, or resumes—managing these documents can be a manual and time-consuming process. This tutorial will show you how to build a smart AI agent from scratch using n8n. This agent will watch a Google Drive folder, automatically process any new PDFs, understand their content, and then summarize and route them to the correct person via email.
System Overview
The workflow automates the entire PDF handling process:
- Trigger: The workflow starts when a new PDF file is uploaded to a specific Google Drive folder.
- Download & Extract: The agent downloads the new file and extracts all the text from the PDF.
- AI Analysis: A Large Language Model (LLM) reads the text to:
- Classify the document type (e.g., "Invoice," "Resume").
- Generate a concise summary of its contents.
- Determine a confidence score for its classification.
- Conditional Routing: Based on the confidence score, the workflow decides whether to proceed.
- Email Notification: If confident, the agent drafts and sends an email containing the summary and the original PDF as an attachment.
- Labeling: Finally, it adds a label to the email in Gmail for easy organization.
Part 1: Setting Up the Google Drive Trigger
First, we need to tell our workflow to watch for new files.
- Start a new workflow in n8n.
- Add a Google Drive Trigger node.
- Authentication: Connect your Google account. You will need to create credentials and grant n8n permission to access your Google Drive.
- Configuration:
- Mode:
On Changes Involving Folder (Poll). You can set the poll time (e.g., every 1 minute). - Folder: Select the specific Google Drive folder you want the agent to monitor.
- Watch For: Choose
File Created.
- Mode:
- Fetch a test event to ensure the trigger is working correctly. You should see data for a file in your folder.
Part 2: Downloading and Extracting PDF Content
Once a new file is detected, we need to read its contents.
Step 1: Download the File
- Add a Google Drive node (this is an action node, not the trigger).
- Authentication: Select the same credential you used for the trigger.
- Configuration:
- Resource:
File. - Operation:
Download. - File ID: Use an expression to get the ID from the trigger node. Drag the
idfield from the trigger's output into this field. The expression will look something like{{ $json.id }}.
- Resource:
- Execute the node. You should see the binary data of the file in the output.
Step 2: Extract Text from the PDF
- Add an Extract from File node.
- This node is simple: it takes the binary data from the previous step as input. n8n handles this automatically.
- Execute the node. The output will be the raw text extracted from your PDF, along with metadata.
Part 3: The AI Brain - Classification and Summary
This is where the magic happens. We'll use an LLM to understand the document.
-
Add a Basic LLM Chain node.
-
Prompt Source: Select
Define Below. -
User Prompt: Use an expression to pass the extracted text from the previous node. The expression should be
{{ $json.text }}. -
System Prompt: This is crucial. It tells the AI exactly what to do. Paste the following prompt:
You are a reliable PDF intelligence extractor. Your task is to read the OCR text from any PDF and perform the following actions: 1. Identify the document type. The type must be one of the following: "Resume", "Invoice", "Contract", "Report", "Other". 2. Generate a concise and accurate document summary, 3-5 sentences long, describing the main purpose, entities, and key details. 3. Return a confidence score from 0.0 to 1.0 representing how certain you are about the detected document type. Your final output must be a valid JSON object with the following structure. Do not output anything else. { "document_type": "...", "confidence": 0.0, "summary": "..." } -
Chat Model: Connect an AI model. For this, the Google Gemini Chat Model is a great choice due to its large context window.
- Create new credentials by getting an API key from Google AI Studio.
- Select a model like
gemini-1.5-flash-latest.
-
Output Parser: To ensure we get clean JSON, configure the output.
- Toggle on Require Specific Output Format.
- Output Format:
Structured (JSON). - Generate from Example: Paste the example JSON from the system prompt to show the model the exact structure you need.
Part 4: Sending the Automated Email
Now, we'll use the AI's output to draft and send an email.
Step 1: Check the Confidence Score
- Add an IF node. This will prevent the workflow from sending emails if the AI is not sure about the document type.
- Set a condition:
- Value 1: Use an expression to get the confidence score from the LLM node:
{{ $json.confidence }}. - Operation:
Is Greater Than or Equal To. - Value 2: Set a threshold, for example,
0.8.
- Value 1: Use an expression to get the confidence score from the LLM node:
Step 2: Send the Email
Connect a Gmail node to the true output of the IF node.
- Authentication: Connect your Gmail account.
- Configuration:
- Resource:
Message. - Operation:
Send. - To: Enter the recipient's email address. You could add another IF node here to route emails based on
document_type(e.g., resumes to "hr@example.com", invoices to "finance@example.com"). - Subject: Create a dynamic subject line using the AI's output:
New {{ $json.document_type }} Uploaded. - Body/HTML: Insert the summary from the AI:
{{ $json.summary }}. - Attachments: To attach the original PDF, you need to provide its binary data. You can get this by connecting the output of the Download File node (from Part 2) to the Gmail node's attachment input.
- Resource:
Step 3: (Optional) Label the Email
- Add another Gmail node.
- Operation:
Label > Add. - Message ID: Use an expression to get the ID of the email you just sent.
- Label: Choose a label from your Gmail account (e.g., "Automated Invoices").
And that's it! Activate your workflow, and your AI agent is ready to start processing PDFs automatically.