OCR PDF

Adds a searchable text layer to scanned PDF documents.

Overview

The OCR PDF node processes scanned PDF documents using Optical Character Recognition (OCR) to add an invisible text layer. This makes the PDF searchable and selectable while preserving the original layout.

Use it to:

Make scanned documents searchable
Preprocess scanned PDFs before data extraction
Add text layers to image-based PDFs

Parameters

Parameter	Description	Required
File	PDF file to process (supports expressions)	Yes
File Destination	Where to save the OCR-processed PDF	Yes
File Name	Output filename without extension	No

File

The PDF file to OCR. Typically comes from a trigger or file operation:

{{$item.data.file}}

Settings

Setting	Description
Execution Mode	`Once per item` (default) or `Once`
Output Mode	How to output results when running once
Batch Size	Items to process concurrently (default 5)
Stop on Error	Stop workflow on failure

Output

{
  "file": {
    "type": "fileData",
    "name": "document-ocr.pdf",
    "mimeType": "application/pdf",
    "fileInfo": { "type": "..." }
  }
}

Access in expressions:

File object: {{$item.data.file}}

Examples

OCR Before Data Extraction

Preprocess scanned invoices for extraction:

[Google Drive Trigger] → [OCR PDF] → [Data Extractor] → [Insert Rows]

Batch OCR Scanned Documents

Process all scanned PDFs from a folder:

[OneDrive Trigger] → [OCR PDF] → [Copy File (processed folder)]

OCR and Archive

[Lido Mailbox Trigger] → [Split (attachments)] → [OCR PDF] → [Copy File (archive)]

Tips

Use OCR PDF before Data Extractor when processing scanned documents
Already-searchable PDFs can still be processed — the text layer is added/updated
This is a long-running operation — processing time depends on document size and page count
The original PDF layout and visual content are preserved
Connect error output to handle OCR failures

Overview​

Parameters​

File​

Settings​

Output​

Examples​

OCR Before Data Extraction​

Batch OCR Scanned Documents​

OCR and Archive​

Tips​