Skip to main content
POST
/
v1
/
excel
/
jobs
Extract Data
curl --request POST \
  --url https://api.stru.ai/v1/excel/jobs \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '{
  "operation": "<string>",
  "source_file": "<string>",
  "extraction_targets": [
    {}
  ]
}'
{
  "422": {}
}

Overview

Extract data from existing Excel (.xlsx) files by specifying named ranges, cell addresses, or table names. Perfect for reading calculation results, extracting project data, or integrating Excel files into automated workflows.
Asynchronous Operation: Returns immediately with a job ID. Poll for the extracted data.

Authentication

Authorization
string
required
Bearer token (API key from app.stru.ai)

Request Body

operation
string
required
Must be "extract_data"
"operation": "extract_data"
source_file
string
required
Base64-encoded .xlsx file
"source_file": "UEsDBBQAAAAIAOiG..."
extraction_targets
array
required
Array of data extraction specifications
"extraction_targets": [
  {"type": "named_range", "name": "ProjectName"},
  {"type": "cell", "address": "B5"},
  {"type": "range", "address": "A1:D10"},
  {"type": "table", "name": "MaterialsTable"}
]

Extraction Target Types

Named Range

Extract data from a named range:
{
  "type": "named_range",
  "name": "BeamSpan",
  "sheet": "Calculations"  // Optional, if multi-sheet
}

Cell Address

Extract data from a specific cell:
{
  "type": "cell",
  "address": "B5",
  "sheet": "Results"  // Optional
}

Cell Range

Extract data from a rectangular range:
{
  "type": "range",
  "address": "A1:D10",
  "sheet": "Data",
  "include_headers": true  // Include first row as headers
}

Table

Extract an Excel table by name:
{
  "type": "table",
  "name": "MaterialsTable"
}

Example Request

# Encode Excel file
FILE_B64=$(base64 -i calculation.xlsx)

curl -X POST https://api.stru.ai/v1/excel/jobs \
  -H "Authorization: Bearer sk_live_abc123..." \
  -H "Content-Type: application/json" \
  -d "{
    \"operation\": \"extract_data\",
    \"source_file\": \"$FILE_B64\",
    \"extraction_targets\": [
      {\"type\": \"named_range\", \"name\": \"ProjectName\"},
      {\"type\": \"named_range\", \"name\": \"TotalCost\"},
      {\"type\": \"range\", \"address\": \"A5:D15\", \"include_headers\": true}
    ]
  }"

Example Response

{
  "job_id": "job_xls_extract_abc123",
  "status": "completed",
  "created_at": "2025-10-17T10:30:00Z",
  "completed_at": "2025-10-17T10:30:03.456Z",
  "results": {
    "extracted_data": {
      "ProjectName": "Building A Foundation",
      "TotalCost": 125000.50,
      "EngineerName": "John Doe, PE",
      "range_A5_D15": [
        ["Material", "Quantity", "Unit Price", "Total"],
        ["Concrete", 150, 175, 26250],
        ["Rebar", 8000, 1.25, 10000],
        ["Formwork", 2500, 8.5, 21250]
      ],
      "MaterialsTable": [
        {"Material": "Concrete", "Quantity": 150, "Unit_Price": 175, "Total": 26250},
        {"Material": "Rebar", "Quantity": 8000, "Unit_Price": 1.25, "Total": 10000}
      ]
    }
  }
}

Use Cases

Read final results from engineering calculations for reporting or validation.
targets = [
    {"type": "named_range", "name": "MaxMoment"},
    {"type": "named_range", "name": "MaxShear"},
    {"type": "named_range", "name": "Deflection"},
    {"type": "named_range", "name": "UtilizationRatio"}
]

results = extract_data('beam-calc.xlsx', targets)

# Validate results
if results['UtilizationRatio'] > 1.0:
    alert("Design capacity exceeded!")
Extract data from multiple Excel files for aggregation and reporting.
all_projects = []

for excel_file in project_files:
    data = extract_data(excel_file, standard_targets)
    all_projects.append(data)

# Create summary report
generate_master_report(all_projects)
Extract Excel data and store in a database for searchability and analysis.
data = extract_data('project.xlsx', targets)

# Store in database
db.insert_project({
    'number': data['ProjectNumber'],
    'client': data['ClientName'],
    'total_cost': data['TotalCost'],
    'materials': data['MaterialsTable']
})
Extract key values from design calculations to verify against requirements.
data = extract_data('structural-calc.xlsx', check_targets)

# Run QC checks
checks = {
    'Deflection': data['Deflection'] < limit,
    'Utilization': data['Utilization'] < 1.0,
    'Safety Factor': data['SafetyFactor'] >= 1.5
}

if not all(checks.values()):
    flag_for_review(checks)

Best Practices

Use named ranges - Extract via named ranges when possible for robustness against layout changes
Handle missing data - Check if extraction keys exist in the results before accessing
value = results.get('OptionalField', 'N/A')
Validate data types - Verify extracted data matches expected types
if isinstance(results['Cost'], (int, float)):
    process_cost(results['Cost'])
Batch extract - Extract multiple values in a single job to minimize API calls
For tables with consistent structure, use "type": "table" extraction. The API will return data as an array of objects with keys matching column names.
Cell address changes: If you use cell addresses ("B5") and the spreadsheet layout changes, extraction will fail or return wrong data. Use named ranges for stability.

Error Responses

422
Validation Error
Target not found in file
{
  "error": {
    "code": "EXTRACTION_TARGET_NOT_FOUND",
    "message": "Named range 'ProjectName' not found in file",
    "details": {
      "missing_targets": ["ProjectName", "TotalCost"]
    }
  }
}

Next Steps

After extraction:
  1. Validate extracted data types and values
  2. Store in database or use in calculations
  3. Generate reports or trigger workflows based on extracted values