Data Extractor
Extracts structured data from files using AI-powered processing.
Overview
The Data Extractor node uses Lido's AI extraction capabilities to pull structured data from documents. It can process PDFs, images, and other file types to extract tabular data.
Use it to:
- Extract invoice line items
- Parse financial statements
- Convert document tables to structured data
- Process scanned documents
Parameters
| Parameter | Description | Required |
|---|---|---|
| Worksheet Name | Worksheet with extraction configuration | Yes |
| Source Type | Type of source to extract from (File or Email) | Yes |
| File | File to extract data from (when Source Type is File) | Conditional |
| Email to extract data from (when Source Type is Email) | Conditional | |
| Populate Worksheet | Write extracted data to the worksheet | No |
| Response Format | Output format (Array or Objects) | Yes |
| Split Rows as Items | Create separate items per row | No |
| Include Headers | Include column headers in array format | No |
| Lido Spreadsheet URL | Override the default spreadsheet | No |
Worksheet Name
Select the worksheet that contains your extraction configuration. This defines the columns and structure of the extracted data.
Source Type
Choose whether to extract data from a File or an Email. Defaults to File.
- File — Extract from a document (PDF, image, spreadsheet, etc.)
- Email — Extract from an email message (including its attachments)
File
The file to extract data from. Visible when Source Type is File. Typically comes from a trigger or file operation node:
{{$item.data.file}}
Email
The email to extract data from. Visible when Source Type is Email. Typically comes from a Lido Mailbox Trigger or Outlook Trigger:
{{$item.data.email}}
Response Format
| Format | Description |
|---|---|
| Array | Returns data as 2D array of values |
| Objects | Returns data as array of objects with column names as keys |
Split Rows as Items
When enabled, each extracted row becomes a separate workflow item. When disabled, all rows are returned in a single item.
Output
The output contains a data array with extracted rows and a columns array listing the column names:
{
"data": [
{
"Product": "Widget A",
"Quantity": 10,
"Price": 25.0
},
{
"Product": "Widget B",
"Quantity": 5,
"Price": 50.0
}
],
"columns": ["Product", "Quantity", "Price"]
}
Access extracted data in expressions:
- First row's product:
{{$item.data.data[0].Product}} - All columns:
{{$item.data.columns}}
With Split Rows enabled:
Each extracted row becomes a separate item, so the data is directly accessible:
{
"Product": "Widget A",
"Quantity": 10,
"Price": 25.0
}
Access fields directly: {{$item.data.Product}}
Examples
Extract Invoice Items from Files
Process incoming invoices from Google Drive:
[Google Drive Trigger] → [Data Extractor] → [Insert Rows]
- Connect Google Drive Trigger watching for new PDFs
- Set Source Type to
File - Set Worksheet to your invoice extraction config
- Set File:
{{$item.data.file}} - Enable Split Rows as Items
- Connect to Insert Rows to save extracted line items
Extract Data from Emails
Process incoming emails with the Lido Mailbox Trigger:
[Lido Mailbox Trigger] → [Data Extractor] → [Insert Rows]
- Connect Lido Mailbox Trigger to receive incoming emails
- Set Source Type to
Email - Set Email:
{{$item.data.email}} - Set Worksheet to your extraction config
- Enable Split Rows as Items
Batch Document Processing
[Google Drive Trigger] → [Data Extractor] → [Edit Item] → [Insert Rows]
Use Edit Item to add metadata like source file name before saving.
Extract Without Splitting
Get all data as a single item for aggregation:
- Set Split Rows as Items: disabled
- Use the
rowsarray in downstream nodes
Tips
- Configure extraction templates in your Lido spreadsheet first
- Test extraction settings using the Data Extractor in the UI before automating
- Use Split Rows when processing items individually downstream
- The Objects format is easier to work with in most cases
- Large documents may take longer to process
- Connect error output to handle extraction failures