Skip to main content

Extract File Data

Extract structured data from documents and images using AI-powered processing. This endpoint supports multiple upload methods and provides flexible configuration options for data extraction.

POST /extract-file-data

Overview

The Extract File Data endpoint processes your documents and returns a job ID for tracking the extraction progress. You can specify exactly what data fields you need, provide custom instructions, and choose from multiple upload methods.

Extraction Tips
  • Be specific about the data you need - this improves accuracy
  • Use descriptive column names that match your document structure
  • Include extraction instructions for complex or unusual document layouts
  • Test in Lido first - Use the Data Extractor in Lido app to fine-tune your settings, then click the API button to get the exact configuration

Configure in Lido, Then Use the API

The easiest way to get your API configuration perfect is to use the Data Extractor in the Lido app first:

Data Extractor API Integration

Step-by-Step Configuration

  1. Open the Data Extractor in your Lido workspace
  2. Upload your test document and configure your extraction settings:
    • Add the column names you want to extract
    • Write any special instructions for the AI
    • Set page ranges if needed
    • Toggle multi-row extraction if your data spans multiple rows
  3. Test the extraction to ensure it works correctly
  4. Click the API button in the bottom left corner of the Data Extractor
  5. Copy the generated configuration - this gives you the exact JSON structure to use in your API calls

This workflow ensures your API integration will work exactly as expected, since you've already tested and validated the extraction settings in the Lido app.

For more detailed guidance on using the PDF Data Extractor, see our help documentation.

Upload Methods

Choose the upload method that best fits your application:

Method 1: JSON with Base64 Encoding

Perfect for web applications and smaller files (up to 50MB).

Method 2: Multipart Form Data

Ideal for larger files (up to 500MB) and server-side integrations.


JSON Base64 Upload

Upload files encoded as base64 strings within a JSON payload.

Request Details

Content-Type: application/json Max File Size: 50MB Rate Limit: 5 requests per 30 seconds

Request Body

ParameterTypeRequiredDescription
fileobjectFile information object
file.typestringMust be "base64"
file.datastringBase64 encoded file content
file.namestringOriginal filename (e.g., "invoice.pdf")
columnsarrayList of data fields to extract
instructionsstringAdditional extraction guidance
multiRowbooleanSet to true for tabular data (default: false)
pageRangestringPage range to process (e.g., "1-3", "2", "1,3,5")

Examples

const fs = require('fs');

const url = "https://sheets.lido.app/api/v1/extract-file-data";
const headers = {
Authorization: "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
};

// Read and encode file
const fileBase64 = fs.readFileSync("invoice.pdf").toString("base64");

const payload = {
file: {
type: "base64",
data: fileBase64,
name: "invoice.pdf"
},
columns: ["Vendor Name", "Invoice Number", "Total Amount", "Due Date"],
instructions: "Extract the main vendor information and financial details. Ensure amounts include currency symbols.",
multiRow: false,
pageRange: "1"
};

try {
const response = await fetch(url, {
method: "POST",
headers,
body: JSON.stringify(payload)
});

const result = await response.json();
console.log("Job ID:", result.jobId);
} catch (error) {
console.error("Error:", error);
}

Multipart Form Data Upload

Upload files directly using multipart/form-data encoding, ideal for larger files.

Request Details

Content-Type: multipart/form-data Max File Size: 500MB Rate Limit: 5 requests per 30 seconds

Form Fields

FieldTypeRequiredDescription
filefileThe document file to process
configstringJSON string with extraction configuration

Configuration Object

The config field should contain a JSON string with these properties:

ParameterTypeRequiredDescription
columnsarrayList of data fields to extract
instructionsstringAdditional extraction guidance
multiRowbooleanSet to true for tabular data (default: false)
pageRangestringPage range to process (e.g., "1-3")

Examples

const FormData = require('form-data');
const fs = require('fs');

const url = "https://sheets.lido.app/api/v1/extract-file-data";
const headers = { Authorization: "Bearer YOUR_API_KEY" };

const formData = new FormData();
formData.append("file", fs.createReadStream("financial-report.pdf"));
formData.append("config", JSON.stringify({
columns: ["Account Name", "Debit Amount", "Credit Amount", "Balance"],
instructions: "Extract all account balances from the financial statement. Preserve original number formatting including commas and decimal places.",
multiRow: true,
pageRange: "2-5"
}));

try {
const response = await fetch(url, {
method: "POST",
headers,
body: formData
});

const result = await response.json();
console.log("Job ID:", result.jobId);
} catch (error) {
console.error("Error:", error);
}

Response

Both upload methods return the same response format:

{
"status": "running",
"jobId": "c37e0a3b-0bd7-44d0-be7a-3cd1e2a70837"
}

Response Fields

FieldTypeDescription
statusstringCurrent job status (always "running" for new jobs)
jobIdstringUnique identifier for tracking your extraction job

Next Steps

After receiving your job ID:

  1. Wait for Processing - Allow 10-30 seconds for document processing
  2. Check Status - Use the Job Result endpoint to check progress
  3. Retrieve Data - Once complete, get your extracted data from the same endpoint
Important

Job results are stored for 24 hours. Make sure to retrieve your data within this timeframe.

Best Practices

Column Names

  • Use descriptive, specific names: "Invoice Total" instead of "Amount"
  • Match the terminology used in your documents
  • Be consistent across similar document types

Instructions

  • Provide context about document layout or unusual formatting
  • Specify number formats, date formats, or units when relevant
  • Mention any data validation requirements