Document Classifier
Classifies documents into user-defined categories using AI.
Overview
The Document Classifier node uses AI to analyze a document and assign it to one of your predefined categories. It returns the chosen category and a confidence score. In Output Per Category mode, it can route items to different outputs based on the classification result.
Use it to:
- Sort incoming invoices, contracts, and receipts by document type
- Route documents to different processing pipelines based on content
- Triage support documents by topic or department
- Categorize scanned mail before further extraction
Parameters
| Parameter | Description | Required |
|---|---|---|
| File | File to classify (supports expressions) | Yes |
| Page Range | Pages to analyze (e.g. "1-3") | No |
| Categories | List of category names and optional descriptions | Yes (min 2) |
| Instructions | Additional instructions for the classifier | No |
| Output Mode | Single Output or Output Per Category | Yes |
File
The file to classify. Typically comes from a trigger or file operation node:
{{$item.data.file}}
Supports PDFs, images, and other document types.
Page Range
Limit classification to specific pages. Useful for large documents where the relevant content is on certain pages:
| Example | Description |
|---|---|
1 | First page only |
1-3 | Pages 1 through 3 |
1,3,5 | Specific pages |
Categories
Define at least two categories for classification. Each category has:
| Field | Description | Required |
|---|---|---|
| Name | Category name (used in output and routing) | Yes |
| Description | Helps the AI understand what belongs in this category | No |
Adding descriptions improves classification accuracy:
| Name | Description |
|---|---|
| Invoice | Bills and payment requests with line items and totals |
| Contract | Legal agreements, terms of service, NDAs |
| Receipt | Proof of payment, transaction confirmations |
Instructions
Optional text giving the AI additional context for classification. Supports expressions and multiline input.
Examples:
- "Focus on the document header to determine the type"
- "If the document contains both an invoice and a receipt, classify it as an invoice"
- "Documents in Spanish should still be classified using the English category names"
Output Mode
| Mode | Description |
|---|---|
| Single Output | All classified items go to the main output |
| Output Per Category | Each category becomes a separate output, routing items to the matching category |
Settings
| Setting | Description |
|---|---|
| Execution Mode | Once per item (default) or Once |
| Output Mode | How to output results when running once |
| Batch Size | Items to process concurrently (default 5) |
| Stop on Error | Stop workflow on failure |
Output
Each classified item contains:
{
"category": "Invoice",
"confidence": 95
}
Access in expressions:
- Category:
{{$item.data.category}} - Confidence:
{{$item.data.confidence}}
Output Per Category Mode
When Output Mode is set to Output Per Category, the node creates one output per category. Each item is routed to the output matching its classification result. The AI is constrained to only return one of your defined category names.
For example, with categories Invoice, Contract, and Receipt:
- An item classified as "Invoice" goes to the Invoice output
- An item classified as "Contract" goes to the Contract output
- An item classified as "Receipt" goes to the Receipt output
Examples
Classify and Route Documents
Process different document types with specialized pipelines:
┌─ Invoice ──→ [Data Extractor (invoices)]
[Google Drive Trigger] → [Document Classifier] ─┼─ Contract ─→ [Copy File (contracts folder)]
└─ Receipt ──→ [Data Extractor (receipts)]
- Set Output Mode to Output Per Category
- Define categories: Invoice, Contract, Receipt
- Connect each output to the appropriate downstream node
Classify Then Filter by Confidence
Only process high-confidence classifications:
[Google Drive Trigger] → [Document Classifier] → [Filter (confidence > 80)] → [Insert Rows]
- Set Output Mode to Single Output
- Add a Filter node checking
{{$item.data.confidence}}Greater Than80
Triage Incoming Mail Attachments
[Lido Mailbox Trigger] → [Edit Item (extract attachment)] → [Document Classifier] → [Switch]
- Extract the attachment file from the email
- Classify the attachment
- Use Switch or Output Per Category to route by document type
Classify with Custom Instructions
[OneDrive Trigger] → [Document Classifier] → [Insert Rows]
Set Instructions to guide classification:
These are medical documents. Classify based on the document header.
If a document contains both a lab report and a prescription, classify it as a lab report.
Tips
- Add descriptions to categories for better accuracy
- Use the Page Range parameter for large documents where the first page is sufficient for classification
- The confidence score ranges from 0 to 100
- Output Per Category mode is useful for building branching pipelines without a separate If/Switch node
- Classification is a long-running operation processed on heavy executor pods
- Connect the error output to handle classification failures gracefully