Skip to main content

Document Classifier

Classifies documents into user-defined categories using AI.

Overview

The Document Classifier node uses AI to analyze a document and assign it to one of your predefined categories. It returns the chosen category and a confidence score. In Output Per Category mode, it can route items to different outputs based on the classification result.

Use it to:

  • Sort incoming invoices, contracts, and receipts by document type
  • Route documents to different processing pipelines based on content
  • Triage support documents by topic or department
  • Categorize scanned mail before further extraction

Parameters

ParameterDescriptionRequired
FileFile to classify (supports expressions)Yes
Page RangePages to analyze (e.g. "1-3")No
CategoriesList of category names and optional descriptionsYes (min 2)
InstructionsAdditional instructions for the classifierNo
Output ModeSingle Output or Output Per CategoryYes

File

The file to classify. Typically comes from a trigger or file operation node:

{{$item.data.file}}

Supports PDFs, images, and other document types.

Page Range

Limit classification to specific pages. Useful for large documents where the relevant content is on certain pages:

ExampleDescription
1First page only
1-3Pages 1 through 3
1,3,5Specific pages

Categories

Define at least two categories for classification. Each category has:

FieldDescriptionRequired
NameCategory name (used in output and routing)Yes
DescriptionHelps the AI understand what belongs in this categoryNo

Adding descriptions improves classification accuracy:

NameDescription
InvoiceBills and payment requests with line items and totals
ContractLegal agreements, terms of service, NDAs
ReceiptProof of payment, transaction confirmations

Instructions

Optional text giving the AI additional context for classification. Supports expressions and multiline input.

Examples:

  • "Focus on the document header to determine the type"
  • "If the document contains both an invoice and a receipt, classify it as an invoice"
  • "Documents in Spanish should still be classified using the English category names"

Output Mode

ModeDescription
Single OutputAll classified items go to the main output
Output Per CategoryEach category becomes a separate output, routing items to the matching category

Settings

SettingDescription
Execution ModeOnce per item (default) or Once
Output ModeHow to output results when running once
Batch SizeItems to process concurrently (default 5)
Stop on ErrorStop workflow on failure

Output

Each classified item contains:

{
"category": "Invoice",
"confidence": 95
}

Access in expressions:

  • Category: {{$item.data.category}}
  • Confidence: {{$item.data.confidence}}

Output Per Category Mode

When Output Mode is set to Output Per Category, the node creates one output per category. Each item is routed to the output matching its classification result. The AI is constrained to only return one of your defined category names.

For example, with categories Invoice, Contract, and Receipt:

  • An item classified as "Invoice" goes to the Invoice output
  • An item classified as "Contract" goes to the Contract output
  • An item classified as "Receipt" goes to the Receipt output

Examples

Classify and Route Documents

Process different document types with specialized pipelines:

                         ┌─ Invoice ──→ [Data Extractor (invoices)]
[Google Drive Trigger] → [Document Classifier] ─┼─ Contract ─→ [Copy File (contracts folder)]
└─ Receipt ──→ [Data Extractor (receipts)]
  1. Set Output Mode to Output Per Category
  2. Define categories: Invoice, Contract, Receipt
  3. Connect each output to the appropriate downstream node

Classify Then Filter by Confidence

Only process high-confidence classifications:

[Google Drive Trigger] → [Document Classifier] → [Filter (confidence > 80)] → [Insert Rows]
  1. Set Output Mode to Single Output
  2. Add a Filter node checking {{$item.data.confidence}} Greater Than 80

Triage Incoming Mail Attachments

[Lido Mailbox Trigger] → [Edit Item (extract attachment)] → [Document Classifier] → [Switch]
  1. Extract the attachment file from the email
  2. Classify the attachment
  3. Use Switch or Output Per Category to route by document type

Classify with Custom Instructions

[OneDrive Trigger] → [Document Classifier] → [Insert Rows]

Set Instructions to guide classification:

These are medical documents. Classify based on the document header.
If a document contains both a lab report and a prescription, classify it as a lab report.

Tips

  • Add descriptions to categories for better accuracy
  • Use the Page Range parameter for large documents where the first page is sufficient for classification
  • The confidence score ranges from 0 to 100
  • Output Per Category mode is useful for building branching pipelines without a separate If/Switch node
  • Classification is a long-running operation processed on heavy executor pods
  • Connect the error output to handle classification failures gracefully