Extract File Data

Extract structured data from documents and images using AI-powered processing. This endpoint supports multiple upload methods and provides flexible configuration options for data extraction.

POST /extract-file-data

Overview

The Extract File Data endpoint processes your documents and returns a job ID for tracking the extraction progress. You can specify exactly what data fields you need, provide custom instructions, and choose from multiple upload methods.

Extraction Tips

Be specific about the data you need - this improves accuracy
Use descriptive column names that match your document structure
Include extraction instructions for complex or unusual document layouts
Test in Lido first - Use the Data Extractor in Lido app to fine-tune your settings, then click the API button to get the exact configuration

Configure in Lido, Then Use the API

The easiest way to get your API configuration perfect is to use the Data Extractor in the Lido app first:

Data Extractor API Integration

Step-by-Step Configuration

Open the Data Extractor in your Lido workspace
Upload your test document and configure your extraction settings:
- Add the column names you want to extract
- Write any special instructions for the AI
- Set page ranges if needed
- Toggle multi-row extraction if your data spans multiple rows
Test the extraction to ensure it works correctly
Click the API button in the bottom left corner of the Data Extractor
Copy the generated configuration - this gives you the exact JSON structure to use in your API calls

This workflow ensures your API integration will work exactly as expected, since you've already tested and validated the extraction settings in the Lido app.

For more detailed guidance on using the PDF Data Extractor, see our help documentation.

Upload Methods

Choose the upload method that best fits your application:

Method 1: JSON with Base64 Encoding

Perfect for web applications and smaller files (up to 50MB).

Method 2: Multipart Form Data

Ideal for larger files (up to 500MB) and server-side integrations.

JSON Base64 Upload

Upload files encoded as base64 strings within a JSON payload.

Request Details

Content-Type: application/json Max File Size: 50MB Rate Limit: 5 requests per 30 seconds

Request Body

Parameter	Type	Required	Description
`file`	object	✅	File information object
`file.type`	string	✅	Must be "base64"
`file.data`	string	✅	Base64 encoded file content
`file.name`	string	❌	Original filename (e.g., "invoice.pdf")
`columns`	array	✅	List of data fields to extract
`instructions`	string	❌	Additional extraction guidance
`multiRow`	boolean	❌	Set to `true` for tabular data (default: `false`)
`pageRange`	string	❌	Page range to process (e.g., "1-3", "2", "1,3,5")

Examples

JavaScript
Python
cURL

const fs = require('fs');

const url = "https://sheets.lido.app/api/v1/extract-file-data";
const headers = {
  Authorization: "Bearer YOUR_API_KEY",
  "Content-Type": "application/json"
};

// Read and encode file
const fileBase64 = fs.readFileSync("invoice.pdf").toString("base64");

const payload = {
  file: {
    type: "base64",
    data: fileBase64,
    name: "invoice.pdf"
  },
  columns: ["Vendor Name", "Invoice Number", "Total Amount", "Due Date"],
  instructions: "Extract the main vendor information and financial details. Ensure amounts include currency symbols.",
  multiRow: false,
  pageRange: "1"
};

try {
  const response = await fetch(url, {
    method: "POST",
    headers,
    body: JSON.stringify(payload)
  });

  const result = await response.json();
  console.log("Job ID:", result.jobId);
} catch (error) {
  console.error("Error:", error);
}

import requests
import base64

url = "https://sheets.lido.app/api/v1/extract-file-data"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

# Read and encode file
with open("invoice.pdf", "rb") as file:
    file_base64 = base64.b64encode(file.read()).decode("utf-8")

payload = {
    "file": {
        "type": "base64",
        "data": file_base64,
        "name": "invoice.pdf"
    },
    "columns": ["Vendor Name", "Invoice Number", "Total Amount", "Due Date"],
    "instructions": "Extract the main vendor information and financial details. Ensure amounts include currency symbols.",
    "multiRow": False,
    "pageRange": "1"
}

try:
    response = requests.post(url, headers=headers, json=payload)
    result = response.json()
    print(f"Job ID: {result['jobId']}")
except Exception as error:
    print(f"Error: {error}")

curl --request POST \
  --url 'https://sheets.lido.app/api/v1/extract-file-data' \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "file": {
      "type": "base64",
      "data": "'$(base64 -w 0 invoice.pdf)'",
      "name": "invoice.pdf"
    },
    "columns": ["Vendor Name", "Invoice Number", "Total Amount", "Due Date"],
    "instructions": "Extract the main vendor information and financial details. Ensure amounts include currency symbols.",
    "multiRow": false,
    "pageRange": "1"
  }'

Multipart Form Data Upload

Upload files directly using multipart/form-data encoding, ideal for larger files.

Request Details

Content-Type: multipart/form-data Max File Size: 500MB Rate Limit: 5 requests per 30 seconds

Form Fields

Field	Type	Required	Description
`file`	file	✅	The document file to process
`config`	string	✅	JSON string with extraction configuration

Configuration Object

The config field should contain a JSON string with these properties:

Parameter	Type	Required	Description
`columns`	array	✅	List of data fields to extract
`instructions`	string	❌	Additional extraction guidance
`multiRow`	boolean	❌	Set to `true` for tabular data (default: `false`)
`pageRange`	string	❌	Page range to process (e.g., "1-3")

Examples

JavaScript
Python
cURL

const FormData = require('form-data');
const fs = require('fs');

const url = "https://sheets.lido.app/api/v1/extract-file-data";
const headers = { Authorization: "Bearer YOUR_API_KEY" };

const formData = new FormData();
formData.append("file", fs.createReadStream("financial-report.pdf"));
formData.append("config", JSON.stringify({
  columns: ["Account Name", "Debit Amount", "Credit Amount", "Balance"],
  instructions: "Extract all account balances from the financial statement. Preserve original number formatting including commas and decimal places.",
  multiRow: true,
  pageRange: "2-5"
}));

try {
  const response = await fetch(url, {
    method: "POST",
    headers,
    body: formData
  });

  const result = await response.json();
  console.log("Job ID:", result.jobId);
} catch (error) {
  console.error("Error:", error);
}

import requests
import json

url = "https://sheets.lido.app/api/v1/extract-file-data"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

config = {
    "columns": ["Account Name", "Debit Amount", "Credit Amount", "Balance"],
    "instructions": "Extract all account balances from the financial statement. Preserve original number formatting including commas and decimal places.",
    "multiRow": True,
    "pageRange": "2-5"
}

files = {
    "file": open("financial-report.pdf", "rb"),
    "config": (None, json.dumps(config), "application/json")
}

try:
    response = requests.post(url, headers=headers, files=files)
    result = response.json()
    print(f"Job ID: {result['jobId']}")
except Exception as error:
    print(f"Error: {error}")
finally:
    files["file"].close()

curl --request POST \
  --url 'https://sheets.lido.app/api/v1/extract-file-data' \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --form 'file=@financial-report.pdf' \
  --form 'config={
    "columns": ["Account Name", "Debit Amount", "Credit Amount", "Balance"],
    "instructions": "Extract all account balances from the financial statement. Preserve original number formatting including commas and decimal places.",
    "multiRow": true,
    "pageRange": "2-5"
  }'

Response

Both upload methods return the same response format:

{
  "status": "running",
  "jobId": "c37e0a3b-0bd7-44d0-be7a-3cd1e2a70837"
}

Response Fields

Field	Type	Description
`status`	string	Current job status (always "running" for new jobs)
`jobId`	string	Unique identifier for tracking your extraction job

Next Steps

After receiving your job ID:

Wait for Processing - Allow 10-30 seconds for document processing
Check Status - Use the Job Result endpoint to check progress
Retrieve Data - Once complete, get your extracted data from the same endpoint

Important

Job results are stored for 24 hours. Make sure to retrieve your data within this timeframe.

Extract File Data

Overview

Configure in Lido, Then Use the API

Step-by-Step Configuration

Upload Methods

Method 1: JSON with Base64 Encoding

Method 2: Multipart Form Data

JSON Base64 Upload

Request Details

Request Body

Examples

Multipart Form Data Upload

Request Details

Form Fields

Configuration Object

Examples

Response

Response Fields

Next Steps

Best Practices

Column Names

Instructions

Overview​

Configure in Lido, Then Use the API​

Step-by-Step Configuration​

Upload Methods​

Method 1: JSON with Base64 Encoding​

Method 2: Multipart Form Data​

JSON Base64 Upload​

Request Details​

Request Body​

Examples​

Multipart Form Data Upload​

Request Details​

Form Fields​

Configuration Object​

Examples​

Response​

Response Fields​

Next Steps​

Best Practices​

Column Names​

Instructions​

Overview

Configure in Lido, Then Use the API

Step-by-Step Configuration

Upload Methods

Method 1: JSON with Base64 Encoding

Method 2: Multipart Form Data

JSON Base64 Upload

Request Details

Request Body

Examples

Multipart Form Data Upload

Request Details

Form Fields

Configuration Object

Examples

Response

Response Fields

Next Steps

Best Practices

Column Names

Instructions