Skip to main content

Data Extractor

Extracts structured data from files using AI-powered processing.

Overview

The Data Extractor node uses Lido's AI extraction capabilities to pull structured data from documents. It can process PDFs, images, and other file types to extract tabular data.

Use it to:

  • Extract invoice line items
  • Parse financial statements
  • Convert document tables to structured data
  • Process scanned documents

Parameters

ParameterDescriptionRequired
Worksheet NameWorksheet with extraction configurationYes
FileFile to extract data fromYes
Populate WorksheetWrite extracted data to the worksheetNo
Response FormatOutput format (Array or Objects)Yes
Split Rows as ItemsCreate separate items per rowNo
Include HeadersInclude column headers in array formatNo
Lido Spreadsheet URLOverride the default spreadsheetNo

Worksheet Name

Select the worksheet that contains your extraction configuration. This defines the columns and structure of the extracted data.

File

The file to extract data from. Typically comes from a trigger or file operation node:

{{$item.data.file}}

Response Format

FormatDescription
ArrayReturns data as 2D array of values
ObjectsReturns data as array of objects with column names as keys

Split Rows as Items

When enabled, each extracted row becomes a separate workflow item. When disabled, all rows are returned in a single item.

Output

The output contains a data array with extracted rows and a columns array listing the column names:

{
"data": [
{
"Product": "Widget A",
"Quantity": 10,
"Price": 25.00
},
{
"Product": "Widget B",
"Quantity": 5,
"Price": 50.00
}
],
"columns": ["Product", "Quantity", "Price"]
}

Access extracted data in expressions:

  • First row's product: {{$item.data.data[0].Product}}
  • All columns: {{$item.data.columns}}

With Split Rows enabled:

Each extracted row becomes a separate item, so the data is directly accessible:

{
"Product": "Widget A",
"Quantity": 10,
"Price": 25.00
}

Access fields directly: {{$item.data.Product}}

Examples

Extract Invoice Items

Process incoming invoices from Google Drive:

[Google Drive Trigger] → [Data Extractor] → [Insert Rows]
  1. Connect Google Drive Trigger watching for new PDFs
  2. Set Worksheet to your invoice extraction config
  3. Set File: {{\$item}}
  4. Enable Split Rows as Items
  5. Connect to Insert Rows to save extracted line items

Batch Document Processing

[Google Drive Trigger] → [Data Extractor] → [Edit Item] → [Insert Rows]

Use Edit Item to add metadata like source file name before saving.

Extract Without Splitting

Get all data as a single item for aggregation:

  1. Set Split Rows as Items: disabled
  2. Use the rows array in downstream nodes

Tips

  • Configure extraction templates in your Lido spreadsheet first
  • Test extraction settings using the Data Extractor in the UI before automating
  • Use Split Rows when processing items individually downstream
  • The Objects format is easier to work with in most cases
  • Large documents may take longer to process
  • Connect error output to handle extraction failures